Cross-Sectional Dependence in Panel Data Analysis
Cross-Sectional Dependence in Panel Data Analysis
February 2010
Abstract
This paper provides an overview of the existing literature on panel data models
with error cross-sectional dependence. We distinguish between spatial dependence
and factor structure dependence and we analyse the implications of weak and strong
cross-sectional dependence on the properties of the estimators. We consider estima-
tion under strong and weak exogeneity of the regressors for both T …xed and T large
cases. Available tests for error cross-sectional dependence and methods for determ-
ining the number of factors are discussed in detail. The …nite-sample properties of
some estimators and statistics are investigated using Monte Carlo experiments.
Key words: Panel data, Cross-sectional dependence, Spatial dependence, Factor
structure, Strong/Weak exogeneity.
JEL Classi…cation: C33; C50.
1 Introduction
The analysis of longitudinal data is common across many …elds of research. In econo-
metrics, the topic is invariably called panel data analysis. Over the last fourty years, it
has grown into a major sub…eld of econometrics. Traditionally, the focus has been on
panels involving a large number of individual units i = 1; : : : ; N , with a few observations
over time, t = 1; : : : ; T .1 Often, the data come from surveys where a large group of
people or households has been followed over a few years. The National Longitudinal
Surveys of Labor Market Experience and the University of Michigan’s Panel Study of
Income Dynamics are prominent examples. One of the primary reasons for collecting
these data has been to overcome aggregation problems that arise with time series data
in modelling the behaviour of heterogeneous agents on the basis of the “representative
agent” assumption. More recently, considerable interest has also been directed to pan-
els where the cross-sectional and time series dimensions are of similar magnitude. For
Corresponding author. Faculty of Economics and Business, University of Sydney, NSW 2006, Aus-
tralia. Tel: +61-2-9036 9120; E-mail: vasilis.sara…[email protected].
y
University of Groningen, P.O.Box 800, 9700 AV Groningen, The Netherlands. Tel: +31-50-363-8339;
E-mail: [email protected].
1
An exception to this is the seemingly unrelated regression (SUR) approach due to Zellner (1962);
see Section 4.1.
1
instance, the Penn-World tables cover several countries over relatively long periods and
the main focus of study lies in cross-country economic, social and political comparisons.
One major issue that inherently arises in every panel data study with potential
implications on parameter estimation and inference is the possibility that the individual
units are interdependent. In fact, this notion of ‘between group’dependence is familiar
in the social sciences since the 1930’s, i.e. well before the emergence of panel data
econometrics. In speci…c, Stephan (1934, pg. 165) argues that “in dealing with social
data, we know that by virtue of their very social character, persons, groups and their
characteristics are interrelated and not independent”. Neprash (1934, pg. 168) asserts
that “the correlation of spatially distributed variables must be accepted with severe
limitations of interpretation. The data involved violate two important conditions of
sound application of correlation and sample techniques namely, the independence of
the units of which the traits are measured, and the homogeneity of distribution of the
traits within a given area”. Fisher in his “Design of Experiments” book (1935, pg. 66)
claims that “patches in close proximity are commonly more alike ... than those further
apart”. Later on, in the …eld of economic geography Tobler (1970, pg. 236) invoked his
‘…rst law of geography’: “everything is related to everything else; but near things are
more related than distant things”.
Naturally, the issue of how to characterise cross-sectional dependence has attracted
considerable attention among researchers over the years. Perhaps the earliest methodo-
logy put forward to deal with this issue was the spatial approach. Spatial models were
developed primarily for cross-sectional data using a concept of a distance metric, which
allowed formulating models with a structure similar to that provided by the time index
in time series. The concept of ‘economic distance’eventually allowed the use of spatial
models in certain economic applications as well, mainly drawn from regional science and
urban economics. The increasing availability of panel data during the last decades gave
rise to new possibilities in characterising error cross-sectional dependence. A prominent
alternative to the spatial approach is the factor structure approach, which assumes that
the disturbance term contains a …nite number of unobserved factors that in‡uence each
individual separately. Initially, the inferential theory for factor models was developed
for cases where one dimension was …xed and the other went to in…nity. Recently, this
theory has been extended for large panels, where both dimensions can go to in…nity; see
Bai (2003) and Bai and Ng (2002).
In this paper we attempt to provide an overview of some of the recent developments
that have been made in the …eld and link them to earlier related work. Realistically,
it is impossible to do justice to the voluminous and still rapidly growing literature of
panel data models with error cross-sectional dependence. In what follows, we shall focus
on stationary models with a static error structure. Throughout, we try to employ a
uni…ed notation. Some of the issues discussed in the paper are also brie‡y mentioned
by Baltagi and Pesaran (2007), which is an introduction to a special issue of the Journal
of Applied Econometrics, and Hsiao (2007) in a paper reviewing the state of the art and
the current issues in panel data analysis. There is a growing literature on dynamic
factor models (see e.g. Forni and Lippi 2001, and Forni, Hallin, Lippi and Reichlin,
2
2000), which is mainly concerned with extraction of common components of economic
variables rather than with estimation of structural (regression) parameters; therefore,
we do not review this in the present paper.2 There are also important developments
in non-stationary panel data models with error cross-sectional dependence (see e.g. Bai
and Ng, 2004, Moon and Perron, 2004, and Phillips and Sul, 2003) for which a succint
overview has already been provided by Hurlin and Mignon (2004) and Breitung and
Pesaran (2008). The main theoretical results for large dimensional factor analysis using
principal components are reviewed in an excellent survey by Bai and Ng (2008).
The set-up of the paper is as follows. The next section describes the spatial and
factor structure approaches. Section 3 links these approaches with the concepts of
weak and strong cross-sectional dependence. Sections 4 and 5 analyse estimation under
strong and weak exogeneity respectively. Section 6 discusses available tests of error
cross-sectional dependence and methods for determining the number of factors. We
conclude by indicating a number of topics for future research.
In what follows we adopt the conventional mathematics notation where capital letters
denote matrices and small letters in bold denote vectors.
where yit is the observation on the dependent variable for individual i at time t, xit is a
column vector of regressors with dimension K, is the corresponding parameter vector
of …xed coe¢ cients, i is an individual-speci…c time-invariant unobserved e¤ect, and it
is the error component that may be cross-sectionally correlated. The latter would imply
that the following is true:
3
On the other hand, the vector xit is weakly exogenous if 0 is mean-independent of its
it
past and present, so
E 0it jxi1 ; :::; xit = 0. (4)
Ignoring cross-sectional dependence may a¤ect the …rst-order properties (unbiased-
ness, consistency) of standard panel estimators because even if xit is strongly, or weakly,
exogenous with respect to 0it , it may not be so with respect to it . In addition, even
if the …rst-order properties of these estimators remain una¤ected, the presence of error
cross-sectional dependence may largely reduce the extent to which they can provide ef-
…ciency gains over estimating (1) using, say, OLS for each individual i. In a sense, if all
individuals behave similarly there is little gain to be obtained by looking at more than
one of them.
Unfortunately, modelling general forms of cross-sectional dependence is not a straight-
forward task. In speci…c, contrary to a time series model, where it is natural to specify
the correlations between the disturbances to be functions of distance measured by time,
in a cross-section there is no such natural ordering of the observations. To deal with this
issue, the panel data literature has mainly adopted two di¤erent approaches to modelling
error cross-sectional dependence, the spatial approach and the factor structure approach.
The former assumes that the structure of cross-sectional dependence is a function of
an immutable distance measure, de…ned according to a pre-speci…ed metric. In economic
applications, spatial techniques are adapted using alternative measures of “economic
distance” (Conley, 1999), or “policy and social distance” (Conley and Topa, 2002). A
number of di¤erent spatial processes have been proposed in the literature to model cross-
sectional dependence, the most popular of which have been the Spatial Moving Average
(SMA), Spatial Auto-Regressive (SAR) and Spatial Error Components (SEC) processes.
These can be de…ned as follows:
N
X
SM A; it = wij "jt + "it ;
j=1
N
X
SAR; it = wij jt + "it ;
j=1
N
X
SEC; it = wij jt + "it , (5)
j=1
where wij is the i-speci…c spatial weight attached to individual j, typically determined
before estimation, "it is white noise, and for SEC jt denotes a zero mean random
component, uncorrelated with "it and xit . These spatial models can be estimated using
a generalized method of moments (GMM) approach (see e.g. Kapoor, Kelejian and
Prucha, 2007; Kelejian and Prucha, 2009), or a method based on maximum likelihood
(e.g. Lee, 2004).
The factor structure approach assumes the presence of an unobserved common com-
ponent in the disturbance which is a linear combination of a …xed number of factors (e.g.
4
::
Lawley and Maxwell, 1971; Goldberger, 1972, and Joreskog and Goldberger, 1975). In
this case the error can be written as
0
it = i t + "it , (6)
0
where t = 1t ; :::; M0 t denotes an M0 1 vector of unobserved factors, i =
( 1i ; :::; M0 i )0 is an M0 1 vector of factor loadings3 and "it is a purely idiosyncratic
component such that E ("it ) = 0 and
2
for t = s and i = j,
"
E ("it "js ) =
0 otherwise.
This formulation generates a taxonomy of models depending on whether the i and/or
t are correlated with xit or not. The relative size of N and T is also important. As an
example, suppose that (1) combined with (6) is used to model the returns to education,
where, as is typical in micro-econometric panels, N is large and T small; in this case the
vector of covariates, xit , may include variables like education, experience, and tenure of
individual i with the same employer, i may capture innate ability (which is constant
by de…nition) and i may re‡ect time-varying productivity of individual i. Both i and
4 For small T , the factors can be treated as time-
i are likely to be correlated with xit .
speci…c parameters that re‡ect how productivity varies over time. Another example can
be drawn from the estimation of production and cost functions. For a cost function the
vector xit represents input prices and output, i may capture cost e¢ ciency of …rm i, t
may re‡ect changes in the regulatory regime over time, with i , the impact on …rm i,
depending on the size of the …rm in the market, on …nancial constraints, technology and
other considerations. In this case both t and i are likely to be correlated with input
prices and output.5 Depending upon the size of N and T , as well as on the properties
of xit , di¤erent methods can be used to estimate these models, as we shall see in the
following sections.
Sara…dis (2009) shows that all spatial processes can be expressed in the following
form:
it = ( i wi )0 t + "it ,
by setting M0 = N and imposing appropriate zero restrictions on wi and homogeneity
restrictions on i .6 This may be useful because spatial dependence can be viewed in
this case as a special form of factor structure dependence, in which one may think of the
unobserved components, t as shocks, the impact of which is either ‘global’(factors) or
‘local’(spatially correlated components).
3
There is large variation in the literature regarding the notation used for factor models. Following
Kiviet and Sara…dis (2009), our choice is based on the following reasoning: we use Greek symbols for
unobserved variables/parameters and Latin symbols for observed ones. Consequently, we use " (epsilon)
to denote the purely idiosyncratic error component, (phi) to denote the factors and (lamda) to denote
¯ ¯
their loadings. Similarly, (eta) is used to denote the individual-speci…c e¤ect.
4 ¯ ¯
The argument here would be that it is the most able and productive individuals who embark on
higher education, all other things being equal, such as equal opportunities and so on.
5
Many other examples are provided by Ahn, Lee and Schmidt (2001) and Bai (2009).
6
For the factor structure wi = N , where N is a T 1 column vector of ones.
5
3 Weak and Strong Cross-sectional Dependence
The spatial approach and the factor structure approach imply di¤erent degree of error
cross-sectional dependence. However, there is no unique de…nition of what is ‘weak’
and what is ‘strong’dependence in the literature. In particular, let t, i 1 be the
i
scalar sequence 1t ; 2t ; 3t ,..., and notice that there are T such scalar sequences, for
t = 1; :::; T . Weak dependence can be de…ned in the following ways:
where t;s
i;j =Cov( it ; js j i;j ), and i;j denotes the conditioning set of all time-invariant
characteristics of individuals i and j.
It is important to emphasise that spatial dependence in the residuals does not a¤ect the
…rst-order properties (consistency) of standard panel data estimators. In particular,
it is straightforward to show that the mean-independence conditions (3) and (4) are
preserved when the error term of the mispeci…ed model, it , follows either one of the
three processes in (5). Therefore, the potential gains from modelling spatial dependence
arise with respect to estimation e¢ ciency and the validity of inference.
7
These conditions ensure that the weights are not dominated by a few individuals.
8
For alternative de…nitions of weak cross-sectional dependence that require covariance stationarity
see Forni and Lippi (2001).
9
Notice, however, that uniform boundedness is not actually necessary for weak dependence; see
Sara…dis (2009).
10
See e.g. Kapoor, Kelejian and Prucha (2007, pg. 106) and Lee (2007, pg. 491).
6
The di¤erence between the two de…nitions provided above lies mainly in factor struc-
tures and it can be illustrated through an example. Consider a single-factor error process
On the other hand, according to De…nition 2 the error process (9) is not weakly de-
P
pendent because t;t i;j = Cov( it ; jt j i ; j ) = i j
2 6= 0 and therefore
j6=i
t;t
i;j is
unbounded. Intuitively, since all individuals are subject to the same shock, t , the
sum of the absolute conditionalPcovariances between individual disturbances grows with
N regardless of whether N 1 N i=1 i ! 0 or not. As a result, all factor structures,
provided they are non-degenerate, imply strong dependence under De…nition 2 but not
under De…nition 1.
De…nition 1 has an encompassing property in the sense that any factor structure
with E 0i t = 0 reduces to a weakly dependent process when the observations are
expressed in terms of deviations from time-speci…c averages. Speci…cally, suppose that
the error term follows process (9). Averaging it over all i for each t and subtracting
yields
it = i t + "it , (11)
P N P N
where it = it :t with :t = N 1 i=1 it , and so on. Therefore, even if N 1 i=1 i 9
0, thus violating (10), we have
N
X
1
lim N i !0 (12)
N !1
i=1
7
weakly exogenous regressors. Under De…nition 2, weak dependence is not preserved by
product. For instance, consider
P the2 product between it and is , as de…ned in (9). This
product then involves N 1 N i=1 i , which is not converging to zero.
One …nal remark. Neither De…nition 1 implies that weak error cross-sectional de-
pendence cannot a¤ect on …rst-order properties of standard panel data estimators, nor
De…nition 2 implies that strong dependence has always an adverse e¤ect on the …rst-
order properties of these estimators. For the former case, one can think of a regression
model with a single covariate, xit , and one-factor error structure in which i and t
are both random with zero mean and xit contains i t with Cov( i ; i ) 6= 0. For the
latter case one can think of a single-factor error process in which neither i nor t are
correlated with xit .
The literature on spatial dependence is rich and is developing rapidly. Notwith-
standing, the remainder of this paper focuses mainly on residual factor structures. This
is partly because of the generality of this approach relative to spatial dependence, in
that it does not require a priori the speci…cation of a distance metric, which may or
may not be appropriate in certain economic applications. Furthermore, modelling a
factor structure is likely to sweep out the spatial correlations as well (see Pesaran and
Tosetti, 2009). Finally, notice that in the spatial dependence case standard panel data
estimators can still be used to make robust inferences on the parameters. In particular,
one may employ spectral density matrix estimation techniques of the sort popularised
in econometrics by Newey and West (1987), valid for large T and …xed N (see Arellano,
pg. 19, for details) or the methods of Driscoll and Kraay (1998) and Pesaran and Tosetti
(2009), valid for large N and large T .
where both i and i are treated as …xed. There are two underlying assumptions
behind this approach. Firstly, E ( it jxit ) = 0, that is, all regressors remain strongly
exogenous in the mispeci…ed model. Therefore, neglecting cross-sectional dependence
does not a¤ect the …rst-order properties of standard panel estimators. Secondly, the
asymptotics are …xed N and T ! 1. These assumptions combined imply that the
error covariance matrix, = [ ij ], can be left unrestricted, i.e. there is no need to
impose a factor structure in the residuals. The SUR approach leads to a feasible
12
A thorough review of the large literature accumulated on SUR can be found in Srivastava and
Dwivedi (1979) and Srivastava and Giles (1987). A survey of more recent developments is provided by
Fiebig (2001) and Moon and Perron (2006).
8
GLS estimator, in which OLS is used at …rst-stage for each individual-speci…c equation
to obtain consistent estimates of the parameters, including the N (N + 1) =2 distinct
entries in the error covariance matrix. The resulting estimator of i is consistent and
asymptotically e¢ cient. When T is only slightly greater than N , the estimate of may
be ill-conditioned. Kontoghiorghes and Clarke (1995) propose an numerical procedure
for estimating a SUR model that avoids the di¢ culty in directly computing the inverse
of the estimated covariance matrix.
When N > T , the least-squares estimate of the general, unstructured error covariance
matrix, b , with typical entry T 1 Tt=1 bit bjt , is singular. This implies that the standard
SUR estimator is not feasible. Robertson and Symons (2007) propose imposing a
factor structure in the residuals, according to (6), and then they estimate the residual
covariance matrix using maximum likelihood. Therefore, their method allows SUR
estimation of panel models by providing a full-rank estimator of the error covariance
matrix when the usual estimate is rank-de…cient.
An alternative approach, valid for …xed T , is to impose a factor structure, as in (6),
and use a GMM estimator that makes use of the second-order moment restrictions im-
posed on the covariance matrix of i +"i , where = ( 1 ; :::; T )0 and "i = ("i1 ; :::; "iT )0 ;
see e.g. Wansbeek and Meijer (2000). This method requires that i is random with zero
mean, and is uncorrelated with the covariates. Under these assumptions, the model
becomes a particular case of a so-called structural equation model (SEM), which can
be handled routinely by softwares like LISREL, EQS, AMOS, MX, Mplus, MECOSA,
RAMONA, LINCS and PROC CALIS. As Wansbeek and Meijer (2007) indicate, the
availability of these programs is not generally known among econometricians, leading
sometimes to papers dealing with special cases of a SEM, which in fact are not needed.
9
1
of ones, Xi = (xi1 ; :::; xiT )0 , y
ei = QT yi , yi = (yi1 ; :::; yiT )0 , M b = IT b b0 b b0
0
and b = b 1 ; :::; b T , a T M0 matrix, which is computed as the vector of principal
components extracted from the covariates, xit . Intuitively, the idea is to sweep out the
factors that are common between the yit and xit processes by orthogonalising the data
prior to estimation. A similar estimator is proposed by Kapetanios and Pesaran (2007)
except that b is computed from the vector of principal components extracted from zit =
(yit ; x0it )0 . This can be useful
np when o di¤erent factors hit the y and x processes. The rate of
convergence of b is min N ; T . Therefore for …xed T , b is not consistent, in general,
unless the purely idiosyncratic component is serially uncorrelated and homoskedastic (see
Bai, 2003).
Bai (2009) proposes an iterative principal components (IP C) estimator such that
T
b ;b is the solution to (14) and the following non-linear equation:
IP C
" N
#
1 X ei b IP C ei b IP C
0
b = b VbN T ,
ei
y X ei
y X (15)
NT
i=1
where VbN T is a diagonal matrix that consists of the M0 largest eigenvalues of the mat-
P ei b IP C y
0
ei b IP C , arranged in decreasing order. Therefore,
rix N1T N i=1 y ei X ei X
given b one can estimate and given one can estimate b . The solution can simply be
obtained by iteration. The resulting estimator is consistent and asymptotically normal
jointly p d
as (N; T ) ! 1, i.e. N T b IP C ! N (0; IP C ), where IP C is the asymp-
p
totic variance of N T b IP C . For …xed T the estimator is inconsistent under
serial correlation or heteroskedasticity13
10
intuition of this method lies in that even if t is unobserved, it is in the space spanned
by the cross-sectional weighted averages of the observed variables. As a result, the
projection as in (16) eliminates the factors and hence the inconsistency due to possible
correlations that exist between the factors and the regressors.16 To see this, assume the
following general model for the correlation between t and xit :
0
xit = i t + i + it , (17)
Assuming that
Rank = M0 K + 1 for all N (19)
p
and using the result that t ! 0 as N ! 1 for each t, we have
0 1 p
t (zt ) ! 0 as N ! 1. (20)
Thus employing the Frisch-Waugh theorem, (20) suggests using y t , xt and T as observ-
able proxies for t .17
E¢ ciency gains from pooling the observations over the cross-sectional dimension can
be achieved when the individual slope coe¢ cients are the same, i.e. i = . Setting
wij = N 1 for all i yields the following pooled CCE estimator:
" N
# 1 N
X X
b = Xi0 M Xi Xi0 M yi , (21)
P CCE
i=1 i=1
1 b
where M =IT Z Z 0Z Z 0 with Z = (z1 ; :::zT )0 ; T and zt = N 1 N z .
j=1 jt P CCE
jointly
is asymptotically (large N ) unbiased for , and as (N; T ) ! 1,
p d
N T b P CCE ! N (0; P CCE ) , (22)
16
A similar projection is proposed by Mundlak (1978) with the di¤erence being that Pesaran’s approach
includes the cross-sectional mean of the dependent variable as well. Mundlak’s projection will not work
if the regressors are correlated with the factors.
17
The scaling does not a¤ect t .
11
p
where P CCE is the asymptotic variance of N T b P CCE . For …xed T the
distribution of b P CCE is non-standard because it depends on nuisance parameters. The
method of bootstrapping could be used to obtain standard errors for b P CCE in this case
although this is still a matter of research. Kapetanios, Pesaran, and Yamagata (2009)
have extended the results of Pesaran (2006) by allowing unobserved common factors to
follow unit root processes.
The CCE estimator is attractive because it is computationally very simple. Further-
more, the estimator has the additional advantage that it does not require specifying the
number of factors, which is necessary if the latent factors are estimated using maximum
likelihood or an approach based on principal components analysis. On the other hand,
it is clear from (20) that the rank condition (19) might be crucial for the estimator.
This will be violated if the number of unobserved factors is larger than K + 1 or if,
for example, the average of the factor loadings in the yit and xit equations tends to a
zero vector, in which case Rank < M0 .18 When the rank condition is violated,
the CCE estimator requires, for consistency, that the factor loadings satisfy a random
coe¢ cients type assumption speci…cally that i and i are mutually independent and
also independent from t . Notice that under such assumption, the …xed e¤ects estim-
ator remains unbiased and consistent even if t is correlated with xit , provided that the
observations are expressed in terms of deviations from time-speci…c averages.19 This is
because
0 0
E( it jxi1 ; :::; xiT ) = E i t + "it jxi1 ; :::; xiT =E i t jxi1 ; :::; xiT
0 0
= E i jxi1 ; :::; xiT E ( t jxi1 ; :::; xiT ) = E i E( t jxi1 ; :::; xiT ) = 0, (23)
where the second equality holds under strong exogeneity of the covariates with respect
to the idiosyncratic error, the third equality holds because i and t are mutually
independent and the fourth equality because i and i are mutually independent. This
result implies that even if t is correlated with xit it is still possible to obtain consistent
estimates of the parameters using the SUR approach of Robertson and Symons (2007)
and the residual principal components estimator of Coakley, Fuertes and Smith (2002),
provided that the …rst-stage estimated error covariance matrix, b , is based on the two-
way …xed e¤ects regression.
12
where i i:i:d:N 0; 2 ; "it i:i:d:N 0; 2" and "it i:i:d:N (0; 1).20 Furthemore,
m
t i:i:d:N (0; 1), m
i i:i:d:N m
2
; m for m = 1; 2, and
m m 2
i = m
[ m i = m + (1 m
)1=2 i
m
]= m
m
i + m i
m
, (25)
where m = E[( m i m
)( i m m
)]=( m m ), m = m = m , m = m (1
m
2 )1=2 and
m i i:i:d:N ( m ; 1). Hence, i m is a weighted sum of the two mu-
tually independent random components, m i and i
m
, which are weighted such that
m 2 m m
V ar( i ) = and E[( i m
)( i m
)] = m m m
, as required. We set
m
= and 2 2
= m for m = 1; 2. We also set = 2, N = 100, T = 50. 2,000
m m
replications are performed.
Following Kiviet and Sara…dis (2009) we choose values for the simulation parameters
on the basis of (i) 1 , the fraction of the ‘structured’noise, it , over the total noise, ! it ,
(hence just excluding the idiosyncratic disturbance noise), i.e.
2 +
P2 2 +
P2 2
m=1 m m=1 m
1 P 2 2 +
P 2 2
; (26)
2+ 2 +
" m=1 m m=1 m
(ii) 2, which is the fraction of the factor noise over all structured noise, i.e.
P2 2 +
P2 2
m=1 m m=1 m
2 P 2 2 +
P 2 2
; (27)
2 +
m=1 m m=1 m
and (iii) 3 , which re‡ects the closeness of the factor structure to an ordinary time e¤ect
(which it is when 2 m = 0 for m = 1; 2), i.e.
P2 2
m=1
3 P2 2
Pm
2 2
. (28)
m=1 m
+ m=1 m
1
P2 2 ,
P2 2
1=2
2 m=1 m 1
= m=1 m
1 and 2 = 1 We consider Case I in which
20
We also performed the experiments with it = uit + uit 1 , = 0:5, uit i:i:d:N (0; 1). However,
the results were very similar and therefore they are not reported in the paper. They are available from
the authors upon request.
21
An alternative design, which is common practice in the literature, would be to choose the mean and
variance of the error components such that the average error cross-sectional correlation, , equals a
speci…c value. However, as noted in Kiviet and Sara…dis (2009), this is problematic because a particular
value of the average error cross-sectional correlation can be obtained at a multitude of combinations of
parameter values. On the contrary, reporting the values of these ratios enhance the transparency of the
design.
13
the rank condition (19) is satis…ed and Case II in which the rank condition is violated.
1 1 1 1
The former sets = 1 2 1=2
and = 2 1=2 .
1
(1 ) 1
2
(1 2 ) 2
1 2
This implies that E i = 1
but E i = 2
6= 2
. The latter sets m
=
1
m 1=2
1
, which implies that E ( i m ) = m for m = 1; 2. These two
(1 2 ) m
cases yield the following expectation for the matrix of factor loadings:
1 1
i i 1:347 1:347
Case I : E 2 2 ,
i i 1 1
and
1 1
i i 1:347 1:347
Case II : E 2 2 .
i i 1 1
We also consider two sub-cases for the correlation between i and i speci…cally,
2 f0; 0:5g, which generates Case I(I)a and Case I(I)b respectively.
4.4.2 Results
Table 1 reports bias, expressed as a percentage, and root mean square error (RMSE)
for all estimators.22 F E and T W F E denote the one-way and two-way error compon-
ent …xed e¤ects estimators respectively, F E-P C and T W F E-P C denote the principal
components estimator proposed by Coakley, Fuertes and Smith (2002), based on F E
and T W F E residuals; and P CCE and P C denote (21) and the iterative version of (14)
respectively. Firstly, we can see that F E exhibits a large bias in all cases. This is
because the within transformation does not eliminate the factor structure and the t s
are correlated with xit given (24). T W F E performs, perhaps surprisingly, very well in
terms of both bias and RMSE so long as the factor loadings of y and x are uncorrelated
namely, = 0. As expected, the estimator is not a¤ected by whether the rank
condition is satis…ed or not. However, when = 0:5 both bias and RMSE of T W F E
increase substantially because (23) does no longer hold true. The performance of F E-
P C and T W F E-P C is naturally a¤ected by the properties of the residuals they use at
…rst-stage. Hence, F E-P C is biased in all circumstances, while T W F E-P C appears
to perform very well when T W F E also does well. Of course in this case T W F E-P C
outperforms T W F E in terms of variance and RMSE since T W F E-P C augments the
T W F E model by including estimates of the M0 principal components in the set of re-
gressors. P CCE performs best when the rank condition is satis…ed. In fact, in this case
P CCE outperforms P C, which is remarkable given that M0 is assumed to be known.
When the rank condition is violated the estimator seems to do well for = 0, although
it is outperformed by T W F E-P C and P C in this case. When = 0:5 the performance
22
Speci…cally, we report bias in terms of 100 c = , where is the average estimate over all
replications of , obtained using method c. Since = 1, the entries represent essentially bias multiplied
by one hundred.
14
of P CCE deteriorates substantially. On the other hand, while P C is not a¤ected by the
rank condition, it is a¤ected by the value of . Therefore, for = 0:5 P C outperforms
P CCE only when the rank condition is violated. In this case T W F E-P C does best.
15
5.1 Asymptotic Properties of Least Squares Estimators
Often the weakly exogenous regressor takes the form of a lagged dependent variable.
Since this variable is by construction correlated with the individual e¤ect, i , estimation
of the dynamic panel data model is not straightforward and indeed it has spawned a
vast literature, which is still growing. In its simplest form, where the lagged dependent
variable is the only regressor, the model is
yit = yit 1 + i + it , j j < 1, i = 1; 2; : : : ; N , t = 1; 2; : : : ; T . (29)
Since strong exogeneity is violated in (29), standard least-squares-based estimators that
rely on the elimination of the individual e¤ect yield inconsistent parameter estimates
even if there is no error cross-sectional dependence. Two such estimators are the …xed
e¤ects and …rst-di¤erenced estimators, which converge to the following limiting values23 :
1
plimN !1 ^ F E = N ( ;T) D( ;T) , and (30)
1
plimN !1 ^ F D = , (31)
2
h i h i
1
where A ( ; T ) = [T (1 )] 1 T 1 T (1 ) 1 and B ( ; T ) 1 = (T 1) 1 2
h i
1 2 [(1 ) (T 1)] 1 1 1 T [T (1 )] 1 . It follows that both ^ F E and
^ F D are inconsistent for …xed T as N ! 1. For T ! 1, ^ F E is consistent but ^ F D is
not, unless Var( i ) = 0. One way to obtain consistent parameter estimates is to start
from (30) or (31); since both estimators converge
p into functions of (and T ) alone, it is
possible to solve in terms of and obtain N -consistent estimates of the autoregressive
parameter. In the case of (30) the solution requires a numerical approach due to the
fact that A and B are highly nonlinear. However, (31) involves a linear function,
making the construction of a consistent, or “bias-corrected”, estimator trivial24
^ BCF D = 2^ F D + 1. (32)
Phillips and Sul (2007) analyse the properties of ^ F E under error cross-sectional
dependence. They show that the estimator converges, for …xed T , to
h ih i 1
2 (1) 2 (1)
plimN !1 ^ F E = " A ( ; T ) + AT " B ( ; T ) + BT , (33)
16
P PT
where wt 1 = 1=0 t 1 , w; 1 = T
1
t=1 wt 1 , E ( i ) = and E ( i )( i )0 =
. Therefore, cross-sectional dependence adds an extra source of bias; for i = 0 (33)
reduces to (30). It is worth noting that contrary to (30), the probability limit in (33)
depends on nuisance parameters, in particular t . According to Phillips and Sul (2007),
this may explain the substantial variability observed in dynamic panel estimates when
there is cross-sectional dependence, even in situations where N is large.
It is worth mentioning that time-speci…c demeaning of the observations will not
remove the source of bias that is due to the factor structure from (33), even if the factor
loadings satisfy a random coe¢ cients type assumption. Instead, it is straightfoward to
show that the only di¤erence in the plim of b F E is that (34) changes to
T
X
e (1) =
0
(wt 1 w; 1) , and
AT t
t=1
XT
e (1) = (wt 1 w; 1)
0
(wt 1 w; 1) . (35)
BT
t=1
This is contrary to the case of strong exogeneity, in which the two-way error component
…xed e¤ects estimator is consistent under a random coe¢ cients type assumption for the
factor loadings, even if the latent factors are correlated with the regressors.
Unfortunately, under weak exogeneity the methods discussed in the previous section,
based on maximum likelihood and principal components analysis, will not generally yield
consistent parameter estimates either. For instance, the P C estimator may be thought
of as a two-stage process, whereby is purged from the model by multiplying through
by the projection M and then b is obtained by (non-)linear regression. However this
procedure transforms the residuals of the model to M e i . Each entry of this vector
is a linear combination of the elements of the whole time-series e i and thus it is not
orthogonal to the corresponding entry in X ei . This means that writing (14) as
" N
# 1 N
X X
b = + N 1 e 0 Mb X
X ei N 1 e 0 Mb e i ,
X (36)
PC i i
i i
the second term on the right-hand side will not have zero probability limit as N ! 1
jointly
for …xed T , or even (N; T ) ! 1. The same issue arises with the CCE estimator,
which is equivalent to a (stacked) linear regression of yi on M Xi where in this case M
is the projection that removes yt and xt , and as such the transformed residuals are not
orthogonal to the regressor unless the latter is strongly exogenous.
The properties of the bias-corrected FD estimator (32) under error cross-sectional
dependence are investigated by Hayakawa (2007). The author shows that
1 (2)
2T AT 2 2"
plimN !1 ^ BCF D = 2 + 1 + (2)
, (37)
1 2 2
T BT + 1+ "
17
where
T
X T
X
(2) 0 0 (2)
AT = t + wt 1 and BT = wt0 1 + 0
wt 1, (38)
t=1 t=1
P
where wt 1 = 1=0 t 1 . Therefore, in this case cross-sectional dependence
turns an otherwise consistent estimator inconsistent, for …xed T . On the other hand, for
T ! 1 we have that plimN !1 ^ BCF D = , regardless of whether N is …xed or N ! 1.
Replacing the expectation by the sample average and minimising a weighted quadratic
distance function with respect to produces a consistent estimator. Subsequent exten-
sions augment the standard …rst-di¤erenced GMM estimator with additional moment
conditions implied either by the same basic assumptions (Ahn and Schmidt, 1995) or
by additional assumptions regarding the initial conditions (Arellano and Bover, 1995,
and Blundell and Bond, 1998). The latter allows one to combine the equations in
…rst-di¤erences with the equations in levels, constructing a ‘system’GMM estimator.
Sara…dis and Robertson (2009) analyse the behavior of the standard IV estimator
under error cross-sectional dependence. They show that the estimator has the following
probability limit:
P PT
plimN !1 N1 Ni=1 t=2 yit 2 it (3) (3) (T 1) 2
1
plimN !1 (^ IV )= 1 PN P T
= AT BT " ,
plimN !1 N i=1 t=2 yit 2 yit 1 1+
(40)
where
T
X T
X
(3) (3)
AT = ft0 + 0
wt 2 and BT = wt0 1 + 0
wt 2. (41)
t=2 t=2
Therefore, similarly to the result for BCFD, cross-sectional dependence renders an oth-
erwise consistent estimator inconsistent for …xed T .25 Essentially, this is because
the numerator in (40) converges to the population moment condition, conditional on
f gt 1 , which is E (yit 2 it ) j f gt 1 6= 0, even if the unconditional expectation
E [(yit 2 it )] = 0. A direct by-product of this result is that all standard GMM es-
timators that make use of lagged values of the dependent variable as instruments for
25 1 (3)
Notice, however, that for large T the bias diminishes because T AT = op (1).
18
the endogenous regressor are inconsistent. This holds true for any lag length of the
instruments used. For instance, (39) becomes
E yit s it j f gt 1 = 0
t + 0
wt s 6= 0; for t = 2; :::; T and 2
1. s t
(42)
A similar result applies for system GMM. In general, the asymptotic bias of these estim-
ators will depend on the particular transformation employed, the number of instruments
used and the choice of the weighting matrix. It is worth emphasizing that not all forms
of cross-sectional dependence are detrimental to GMM. Sara…dis (2009) focuses on the
conditions required on the cross-sectional dimension of the error process for the standard
dynamic panel GMM estimator to remain consistent. He demonstrates that, if there is
cross-sectional dependence in the errors, it su¢ ces that this is weak (under De…nition
2).
Notice that for the single-factor case, the asymptotic bias of ^ IV reduces to
2 + 2
1
plimN !1 (^ IV )= (T 1) 2
, (43)
2 + 2
2 1+ "
P P P PT
where 1 = Tt=2 wt 2 t and 2 = Tt=2 wt 1 wt 2 = Tt=2 wt 1 wt 2 2
t=2 (wt 2 ) .
Sara…dis and Robertson (2009) demonstrate that ^ IV is biased downwards in this case.
When the observations are expressed in terms of deviations from time-speci…c aver-
ages, the asymptotic bias of the IV estimator is
P PT
plimN !1 N1 Ni=1 t=2 y it 2 it (3) (3) (T 2) 2
1
plimN !1 (e IV )= 1 PN P T
= AT BT " ,
plimN !1 N i=1 t=2 y it 2 y it 1 1+
(44)
(3) PT (3) PT
where y it = yit y t , AT = t=2 ft0 wt 2 and BT = t=2 wt0 1 wt 2, while for
M0 = 1 (44) reduces to
2
1
plimN !1 (e IV )= (T 1) 2
: (45)
2
2 1+ "
Notice that for 2 = 0 (i.e. factor loadings have zero variance) the bias in e IV disappears.
Sara…dis and Robertson demonstrate that unless = 0 the asymptotic bias of e IV is,
in general, going to be smaller than ^ IV . Intuitively, this is because time-speci…c
demeaning reduces the impact of the factor structure (by removing the mean value of
i ), which is the reason for the asymptotic bias of the IV estimator. Simulation results
con…rm these …ndings and provide a formal justi…cation to the practice of including
common time e¤ects in the context of a short dynamic panel data model with large N
and T …xed.
Although time-speci…c demeaning may reduce the impact of cross-sectional depend-
ence (provided that 6= 0) it will not eliminate it, unless =p0. Sara…dis, Yamagata
and Robertson (2009) show that one can still obtain …xed-T , N -consistent estimates
19
of the parameters within the GMM framework by using instruments with respect to the
subset of regressors that are strongly exogenous (if any), provided that they remain so
in the mispeci…ed model. Strong exogeneity of a subset of xit will be maintained in the
mispeci…ed model if their factor loadings are either zero, or mutually uncorrelated with
the factor loadings involved in the y process. Empirically, this can be determined using
Sargan’s (1958) or Hansen’s (1982) overidenti…cation restrictions test statistic.
Sara…dis (2009) demonstrates that under weakly correlated errors (using De…nition
2), an additional, non-redundant, set of moment conditions arises for each individual i
speci…cally, instruments with respect to the individual(s) with which unit i is weakly
correlated. This set of instruments can be used to obtain consistent estimates of the
parameters in situations where the error structure is subject to both weak and strong
correlations.26 For instance, consider (29) and let
0
it = i t + "it + "jt : (46)
Hence the composite error, it , is subject to a multi-factor structure and the purely
idiosyncratic component, "it , is spatially correlated and follows an MA(1) process.27
In other words, mispeci…cation of the model results in both global (factors) and local
(spatial) correlations. Transforming in terms of deviations from time-speci…c averages
yields
y it = y it 1 + i + 0i t + "i;t + "j;t . (47)
As mentioned above, the moment conditions with respect to lagged values of y are
invalidated under the multi-factor structure (see e.g. (42)). However, it turns out that
The required assumption for the above result is that the factor loadings are cross-
sectionally uncorrelated, i.e. E i 0j j . A similar expression to (48) (mutatis mutandis)
applies for system GMM. The resulting GMM approach is attractive because it does
not require strongly exogenous regressors under the mispeci…ed model, although it does
require the speci…cation of a weighting matrix, which may or may not be appropriate in
certain economic applications.
20
the within transformation cannot eliminate the common factors in this case. To this
end, a number of di¤erent transformations, all based on quasi-di¤erencing, have been
proposed to eliminate the factors from the model and estimate the structural parameters
using the generalised method of moments.
An early application of this approach has been considered by Wansbeek and Knaap
(1999) who imposed M0 = 1 and t = t. So instead of an arbitrary sequence of time
…xed e¤ects 1 ; :::; T , entering the model multiplicatively, there is a linear trend with
individual-speci…c coe¢ cients. After taking …rst-di¤erences i drops out of the model.
The linear trend becomes a constant, which disappears after taking …rst-di¤erences
again. Double-di¤erencing may eliminate much of the variation of the data and the
issue of weak instruments might arise, cf. Bekker (1994), also discussed by Wansbeek
and Knaap (1999).
A generalization of the model above is given by Nauges and Thomas (2003), employ-
ing a transformation proposed by Holtz-Eakin, Newey and Rosen (1988). In particular,
they consider (29) with a single-factor error structure, i.e. it = i t + "it , and T …xed.
They use …rst-di¤erencing to eliminate the individual e¤ects, which yields
yit = yit 1 + it , it = i t + "it . (49)
De…ne %t = t= t 1; lagging (49), multiplying by %t and subtracting yields
yit %t yit 1 = ( yit 1 %t yit 2) + i t %t t 1 + ( "it %t "it 1)
= ( yit 1 %t yit 2) + ( "it %t "it 1) : (50)
Notice that appropriate lagged values of the dependent variable will be uncorrelated
with the transformed error term, leading to a GMM estimator based on Arellano-Bond
type of moment conditions. Assuming "it is serially uncorrelated, this set of moment
conditions is
E [yit s( "it %t "it 1 )] = 0; for t = 3; :::; T and 3 s t. (51)
The main di¤erence with (39) is that the moment conditions above are non-linear be-
cause the time-speci…c nuisance parameters, %t , need to be estimated jointly with the
structural parameter, . The results from their Monte Carlo study are mixed; while
the proposed GMM estimator exhibits, in general, smaller bias compared to the stand-
ard …rst-di¤erenced GMM estimator, it also has larger variance to the extent that it is
outperformed in terms of RMSE.28
Ahn, Lee and Schmidt (2006) consider a model with a multi-factor error structure
and weakly/strongly exogenous regressors. They use a di¤erent transformation, based
on multi-quasi-di¤erencing, and propose a GMM estimator applied on the multi-quasi-
di¤erenced model. To see how this method works, assume, without loss of generality,
that M0 = 2 and consider the following model:
1 1 2 2
yit = yit 1 + it , it = i t + i t + "it , (52)
28
An alternative transformation for the single-factor model, based on quasi-di¤erencing as well, is
provided by Ahn, Lee and Schmidt (2001).
21
where "it is serially uncorrelated. Identi…cation of this factor model requires M02 [= 4]
restrictions. Typically, M0 (M0 + 1) =2 restrictions arise by normalising
T
X
m n 1 for m = n,
t t = (53)
0 otherwise.
t=1
Additional M0 (M0 1) =2 restrictions are usually obtained by requiring that the factor
loadings are mutually uncorrelated. Since M0 = 2, this would yield one extra restriction
in the present case. Alternatively, one can impose M02 restrictions solely on the factors,
which are treated as parameters. This is the approach followed by Ahn, Lee and
Schmidt. In particular, they normalise 1T = 1, 1T 1 = 0, 2T = 0, 2T 1 = 1. In this
case model (52) becomes, for periods T 1 and T , respectively,
2
yiT 1 = yiT 2 + i + "iT 1, (54)
and
1
yiT = yiT 1 + i + "iT . (55)
1 2
Multiplying (54) and (55) by t and t respectively and subtracting from (52) yields
1 2
yit t yiT t yiT 1
1 2 1 2
= yit 1 t yiT 1 t yiT 2 + "it t "iT t "iT 1 . (56)
which lead to joint estimation of the structural parameter, , and the (T 2) 2 nuisance
parameters. In a compact form, for any …xed number of factors one may write the model
as
yi = yi; 1 + i + "i , (58)
where = ( 0u ; IM ) and u is the (T M0 1) M0 matrix of unrestricted parameters
with typical entry mt for t = 2; :::; T M0 and m = 1; :::; M0 . The transformation that
makes the error orthogonal to the factors amounts to pre-multiplying (58) by 01 , where
0 0
1 = (IT M ; u ) , since 1 = 0 by construction.
22
correlation in "it , we have E ("it yis ) = 0 for any s t 1. Therefore, the following
T (T + 1) =2 centered moment conditions exist:
0
E yis it s t = 0, for t = 1; :::; T and s t 1. (59)
The above expression is similar to a moment condition like E (Xi ) = 0, except that
the former is non-linear because some of the parameters enter multiplicatively. Writing
the model in vector form, we have
where yi = (yi1 ; :::; yiT )0 , yi; 1 = (yi0 ; :::; yiT 1 )0 , = ( 1 ; ::: T )0 is a T M0 matrix,
and "i = ("i1 ; :::; "iT )0 . De…ne the matrix of instruments as follows:
2 3
yi0 0 0 0 0 0
6 0 yi0 yi1 0 0 0 7
6 7
6 .. .
.. .
.. .. 7
Zi =6 . . 7. (61)
T T (T +1)=2 6 7
4 5
0 0 0 ::: yi0 yi1 ::: yiT 1
We have
E Zi0 ui S (IT ) T
= 0, (62)
where S is a selector matrix of order T (T + 1) =2 T 2 that consists of 0s and 1s, with a
0 0
single 1 in each row29 , = 0 ; ::: T 1 is a T M0 matrix, and T = 01 ; 02 ; :::; 0T
is a T M0 1 vector.
Replacing the population moments with their sample averages yields
N
X
1
N Zi0 ui S (IT ) T
= 0. (63)
i=1
T
De…ning = ; ; the GMM estimator is
N
!0 N
!
X X
b = arg min N 1
Zi0 ui S (IT ) T
AN N 1
Zi0 ui S (IT ) T
,
i=1 i=1
(64)
where AN is a non-negative de…nite weight matrix.
Robertson, Sara…dis and Symons (2010) call estimators in this class Factor Instru-
mental Variable (FIV) estimators. They note that in most practical circumstances a
set of linear restrictions can be demonstrated to hold among the parameters, namely the
matrix . These can be obtained by writing the model as
z0it (1 )= 0
i t + "it , (65)
29
The number of rows of S corresponds to the number of moment conditions available and the number
of columns corresponds to the number of regressors (1 at present) times the number of time periods
available squared.
23
0
where zit = (yit ; yit 1) , and then multiplying through by i and taking expectations:
0
E i zit (1 )= t, (66)
where = E i 0i . The key point here is that the elements in E ( i z0it ) include terms
in various of the s because the instrument set includes elements of zit , so the left-hand
side of (66) is a linear function of the entries in . For example, for the single-factor
model the linear restrictions take the form
2
s+1 = s + s+1 , s = 0; :::; T 1,
Ayi+ = +
+ +
i + "+
i , (69)
and 2 3
1 0 0
6 1 0 7
6 7
A=6 . . .. 7.
4 .. .. ... . 5
0 1
Let + = + +0 +
"+ , the covariance of
+
i + "+
i , and
+
i = Ayi+ +
. The
log-likelihood function for yi+ is
N
N + 1X +0 + 1 +
ln i i . (70)
2 2
i=1
24
Notice that since A is a lower-triangular matrix, det (A) = 1 and therefore the Jacobian
term does not enter into the likelihood. Bai suggests estimating (70) using a quasi-
maximum likelihood (QML) procedure based on the ECM (expectation and conditional
maximization) algorithm.
When the model includes covariates, xit , the reduced form of yi0 can be written as
0
yi0 = 0 + i 0 + wi0 0 + "0 , (71)
where wi = vec (x0i ) and 0 is loosely speaking the linear projection of xi on i . Hence
0 0
the residual is + i = Ayi
+ +
wi , with = 0
0 IT , and the likelihood
function is identical to (70).
The attractive feature of the FIVR and QML estimators is that they can both allow
a …xed e¤ects speci…cation as a special case, in which one of the factors is constant over
time. Furthermore, they are valid under strongly and weakly exogenous regressors and
permit unit roots. In this way, these estimators generalise the classical error components
formulation for a wide range of panel data models. Moreover, FIV estimators share
the traditional advantage of method of moments estimators in that they exploit only
a set of orthogonality conditions and make no use of subsidiary assumptions such as
homoskedasticity or other assumed distributional properties of the error process. One
di¤erence between FIV and QML is that in the former approach, the factors, the loadings
of which are uncorrelated with the regressors, will enter into the residuals of the model,
thus resulting in fewer parameters to be estimated. QML will estimate all factors, which
can also be desirable if these factors have a structural signi…cance.
When T is large, treating t …xed leads to an incidental parameters problem so the
methods described above are not appropriate. One way to proceed is to treat t as
random and use the panel feasible generalised median unbiased (PFGMU) estimator
proposed by Phillips and Sul (2003). This involves using the residuals obtained from
a …rst-stage panel median unbiased estimator to construct an invertible estimate of the
error covariance matrix by means of a method of moments procedure, estimating the
regression model using a feasible generalised FE (FGFE) estimator and subsequently
calculating PFGMU using the median function of FGFE. Alternative methods for pro-
jecting out estimates of the factor loadings have been proposed by Moon and Perron
(2004) and Bai and Ng (2004). All these methods are valid for large T only, and it
is not straightforward to generalise them into models that include weakly exogenous
regressors other than the lagged dependent variable.
25
vs alternative hypothesis (2). Several tests for error cross-sectional dependence have
been proposed in the literature. Perhaps the most widely known test is the LM statistic
by Breusch and Pagan (1980). The basic idea of this kind of tests is to substitute in the
score vector the parameter estimates obtained from the restricted model under the null
hypothesis and check whether the null vector is su¢ ciently close to zero. It turns out
that under the null the test statistic can be based on the residuals from individual-speci…c
OLS regressions. Let
PT
t=1 bit bjt
^ij = ^ji = : (73)
PT 2
1=2 P
T 2
1=2
b
t=1 it b
t=1 jt
where the number of degrees of freedom equals the number of distinct o¤-diagonal ele-
ments of the error covariance matrix.
As noted by Pesaran (2004) and Pesaran, Ullah and Yamagata (2008), the LM stat-
istic (74) is likely to have poor size properties when N is large clearly, an empirically
relevant situation. Pesaran (2004) shows that when both N and T are large, (74) can
be modi…ed in a straightforward way. In particular under H0 , for any given pair i 6= j,
we have
d
T ^2ij ! 21 , (75)
for T ! 1. Therefore, since the ^2ij are asymptotically uncorrelated, the following
scaled version of the LM statistic can be considered:
s N
1 X1 X N
d
LM2 = T ^2ij 1 ! N (0; 1) , (76)
N (N 1)
i=1 j=i+1
where
N
X1 N
X
1
RAV E = rbij (77)
N (N 1) =2
i=1 j=i+1
26
and rbij denotes Spearman’s rank correlation coe¢ cient given by
PT
t=1 (ri;t (T + 1=2)) (rj;t (T + 1=2))
rij = rji = PT . (78)
t=1 (ri;t (T + 1=2))2
Pesaran shows that the above statistic is valid under a wide class of panel data models,
including heterogeneous models, dynamic models and regression models with multiple
structural breaks, provided that the unconditional means of yit and xit are time-invariant
and their innovations are symmetrically distributed. Chen, Gao and Li (2009) extend
this method by developing a nonparametric counterpart of the CD statistic for testing
error cross-sectional dependence in nonparametric models.
Both the CD and F R statistics share a common weakness in that they may lack
power to detect the alternative hypothesis under which the sign of the elements of the
error covariance matrix is alternating that is, there are positive and negative correl-
ations in the residuals. This can arise if, for example, cross-sectional dependence is
characterised by a factor model with zero mean factor loadings. Notice that the same
problem might arise even if the factor loadings have mean di¤erent from zero; one such
instance is when time-speci…c dummies are included in the regression model to capture
possible common variations in the dependent variable. In fact, this practice is not
uncommon for …xed T and amounts to transforming the observations in terms of time-
speci…c averages. Thus, suppose that the disturbance follows a single-factor process, as
in (9), in which case time-speci…c demeaning yields process (11). Observe that
Cov it ; jt = E ( i) E j = 0. (80)
Therefore the CD and RAV E statistics will be centered around zero, which implies that
the power of the tests will not increase with N and therefore they may be inconsistent.
Frees (1995) proposes a test statistic that is not subject to this problem and is valid
for …xed T , large N . Speci…cally, de…ne
2 2
Q = b1 (T ) 1 (T 1) + b2 (T ) 2 T (T 3) =2, (81)
4 (T + 2) 2 (5T + 6)
b1 (T ) = 2 , b2 (T ) = . (82)
5 (T 1) (T + 1) 5T (T 1) (T + 1)
27
Also, let
N
X1 N
X
2 2 2
RAV E = rbij . (83)
N (N 1) =2
i=1 j=i+1
h i
2 1
Frees (1995) shows that F RE = N RAV E (T 1) follows asymptotically a Q
2
distribution for N ! 1, T …xed. Therefore, the null is rejected if RAV E is larger than
1
(T 1) + Qq =N , where Qq is an appropriate quantile from the Q distribution.30
Pesaran, Ullah and Yamagata (2008) argue that the F RE statistic tends to behave
similarly to the uncorrected version of the LM statistic for large N when the model
involves more than one explanatory variable (intercept). They propose a bias-adjusted
version of the LM test that makes use of the exact mean and variance of the LM statistic
and is valid under strongly exogenous regressors and normal errors. This is de…ned as
s " #
2
N
X1 X N
(T K) ^2ij Tij d
LM3 = 2 ! N (0; 1) , (84)
N (N 1) Tiji=1 j=i+1
where
1
Tij = E (T K) ^2ij = tr (Mi Mj ) , (85)
T K
1
1
with Mi = IT Xi (Xi0 Xi ) Xi0 , Mj = IT Xj Xj0 Xj Xj0 and
2
Tij = var (T K) ^2ij = [tr (Mi Mj )]2 1T + 2tr [(Mi Mj )]2 2T , (86)
2
with 1T = 2T (T K) and
(87)
Notice that the test statistic is feasible only when T > K + 8, and it has exactly mean
zero regardless of the value of T p
, unlike the LM pstatistic. On the other hand, unless T
is large the covariance between T K^ij and T K^2ij 0 , for any j 6= j 0 , is di¤erent
2
from zero even under the normality assumption. Therefore (84) is valid under the
sequential asymptotic T ! 1 …rst and then N ! 1. Simulation evidence provided by
Pesaran, Ullah and Yamagata (2008) indicate that the test has good size and power for
T 20.
Sara…dis, Robertson and Yamagata (2009) propose a testing procedure that does not
require normality and is valid for …xed T , large N panel data models with weakly exo-
genous regressors. Their testing procedure is based on Sargan’s di¤erence-test statistic
for overidentifying restrictions. In particular, consider the following model
T
yi = X w;i w + X s;i s + i T
+ i, i = i + "i , (88)
30
de Hoyos and Sara…dis (2006) show how to perform all these tests in Stata using the command xtcsd;
see http://ideas.repec.org/c/boc/bocode/s456736.html.
28
where yi is a T 1 vector of stacked time series observations expressed in terms of
deviations from time-speci…c averages, and similarly for the remaining variables, while
X w;i and X s;i are T Kw and T Ks matrices of weakly and strongly exogenous
regressors respectively. The null hypothesis of interest is
H0 : var( i ) = =0 (89)
where b i is the residual vector obtained from the following two-stage linear GMM es-
0
timator of = 0w ; s with the general form
N N
! 1 N N
b X 1X X 1X
• F = W i Z i b_ F
0
Z 0i W i W i Z i b_ F
0
Z 0i yi , (92)
i=1 i=1 i=1 i=1
.
where yi and W i denote some transformation33 of yi and W i = X w;i .. X s;i respect-
ively and b_ F is the estimated weight matrix obtained from a …rst-stage GMM estimator.
Similarly, Sargan’s/Hansen’s test of overidentifying restrictions based on the subset of
moment conditions with respect to X se;i is given by
N
! N
!
X e_ 1 X
0
SR = N 1
e i Z se;i Z 0se;i e i 0 , (93)
i=1 i=1
31
The authors phrase H0 and H1 as ‘homogeneous’ and ‘heterogeneous’ cross-sectional dependence
respectively.
32
Membership in the subset X se;i is testable using Sargan’s/Hansen’s test for overidentifying restric-
tions.
33
For example, …rst-di¤erences, orthogonal deviations and so on.
29
where e i is the residual obtained from the two-stage linear GMM estimator of the
following general form
N N
! 1 N N
b X b 1X X 1X
•R = W i Z se;i _ R
0 0
Z se;i W i W i 0 Z se;i b_ F Z 0se;i yi , (94)
i=1 i=1 i=1 i=1
with similar de…nitions applying as before (mutatis mutandis) and we assume that the
number of columns of Z se;i is larger than W i . Under the null hypothesis as N ! 1 for
…xed T ,
d
DSY R = (SF SR ) ! 2hd , (95)
where hd is the di¤erence between the number of columns of Z i and Z se;i .
The DSY R statistic is very general since it can be performed using alternative GMM
estimators which are not necessarily asymptotically e¢ cient. On the other hand, it
requires hd > Kw + Ks valid moment restrictions under the alternative. This will
be violated if, for example, the (non-zero) factor loadings of the covariates included
in X se;i are correlated with i , or if s = 0 (all regressors are weakly exogenous in
the correctly speci…ed model).34 Yamagata (2008) proposes testing for error cross-
sectional dependence using a joint serial correlation test applied after estimating the
model using the …rst-di¤erenced GMM estimator (Arellano and Bond, 1991). Essentially
the procedure involves an examination of the joint signi…cance of estimates of second
and up to pth-order (…rst-di¤erenced) error serial correlations. The intuition of the test
lies in that error cross-sectional dependence is also likely to show up as serial correlation
in the residuals. To see this, consider the single-factor error process (9) and let i
i:i:d: 0; 2 . Applying time-speci…c demeaning and taking expectations, conditional
upon t , yields
2
E bit bt+s = E ( i t + "it ) i t+s + "it+s = t t+s 6= 0. (96)
Notice that the magnitude of E bit bt+s does not necessarily decrease as s increases
for a given t. Therefore, the null hypothesis of interest becomes
H0 : E bit bt+s = 0 jointly for s = 2; 3; :::; p [ T 2] , (97)
against the alternative
H1 : E bit bt+s 6= 0 for some s, (98)
and t = 2; 3; :::; T s. Under the null hypothesis, as N ! 1 for …xed T , the joint
statistic for second up to pth-order serial correlation is
1 d
m2(2;p) = 0
NH G0 G H0 N ! 2
(p 1) , (99)
0 0 PT s
where H = ( 1 ; :::; N) , i =( i2 ; :::; ip ) , is = t=2 bit bt+s , G = (g1 ; :::; gN ) 0 ,
1 0 b
1 P PT s
gi = (gi2 ; :::; gip )0 , gis = is
0
N s QN AN
• Z0
i bi, Ns =N 1 N i=1 t=2 bit wit+s ,
34
In this case, the null hypothesis could be addressed using a simple overidentifying restrictions test.
30
QN = A0N b• AN , AN = N 1 PN Z 0 W i b • N 1 PN W 0 Z i and b • is the estim-
i i i i
ated weighting matrix obtained from the two-stage …rst-di¤erenced GMM estimator.
It would be interesting to extend this approach to alternative models and estimation
methods but we do not have any results as yet.
where T(M0 ) contains T observations for each of the M0 1 largest principal components
of the covariance matrix of Wi . The main task is to estimate M0 . De…ne
N
T 1 X
V M; b (M ) Wi0 Wi Wi0 P T Wi , (101)
NT (M )
i
T
where P T is the projection of Wi onto the column space de…ned by (M ) , for any
(M )
M Mmax , where is Mmax is the maximum possible value of M0 . Bai and Ng (2002)
estimate M0 as the solution to either one of the following minimisation problems:
T N +T NT
c1 = arg
M min ln V M; b M + M ln , (102)
M Mmax NT N +T
T N +T
c2 = arg
M min ln V M; b +M 2
ln CN T , (103)
M Mmax NT
and
T ln CN2
c3 = arg
M min ln V M; b +M 2
T
, (104)
M Mmax CN T
where CN 2
T = min (N; T ). The authors demonstrate that (102)-(104) are asymptotically
equivalent and they estimate the true number of factors consistently as min (N; T ) ! 1,
p
cj !
i.e. M M0 for j = 1; 2; 3. In …nite samples the performance of the above information
criteria will be di¤erent. Using simulated data, Bai and Ng show that M c1 and M c2 are
31
more robust than M c3 when either N or T is fairly small and they perform well so long
as min (N; T ) 40. Otherwise, these criteria may not work well, leading to too many
factors being estimated.
Kapetanios (2009) proposes a di¤erent method to determine the appropriate number
of factors. This is based on the result that the largest eigenvalue of the sample covariance
p 2
matrix of the data converges almost surely to (1 + c) , where c = limN;T !1 N T , which
implies that if there is no factor structure in the data, the maximum eigenvalue of the
p 2
sample covariance matrix should not exceed (1 + c) almost surely, in large samples.
Therefore, the method starts essentially be checking whether the factor structure is sup-
p 2
ported by the data at all, using as threshold the value (1 + c) + d; where d > 0 is
chosen a priori. Kapetanios suggests choosing for d the mean eigenvalue of the covari-
ance matrix (for standardised data this equals to 1). Hence, if the maximum eigenvalue
of the covariance matrix exceeds this threshold, the maximum principal component is
obtained and the data are orthogonalised from a regression on the …rst principal com-
ponent. Next, the maximum eigenvalue of the resulting covariance matrix is compared
p 2
against (1 + c) + d and the process is repeated until the maximum eigenvalue of the
resulting covariance matrix does not exceed the threshold value. Using simulated data,
Kapetanios shows that in a majority of circumstances of empirical interest this method
outperforms the information criteria (102)-(104).
The method proposed by Kapetanios requires that the idiosyncratic errors of the
approximate factor model are i.i.d. Onatski (2007) develops a similar estimator that
makes less stringent assumptions on the serial correlation and heteroskedasticity pattern
of the idiosyncratic errors. His method is based on the mirror image of Kapetanios’
argument, i.e. for data characterised by M0 latent common factors, the largest M0
eigenvalues of the covariance matrix of the data grow with N , while the rest of the
eigenvalues are bounded. Hence, the Onatski estimator equals the number of eigenvalues
greater than a threshold value:
c4 = arg max [M j
M M > (1 + ) c1 ] , (105)
M Mmax
32
rate. In particular, we have
c5 = arg max
M M = M +1 , (106)
M Mmax
and " #
c6 = arg max ln ( M )
M , (107)
M Mmax ln M +1
PT PT
where M = j=M j = j=M +1 j and M Mmax . Using simulated data they
show that the proposed estimators outperform the existing ones even in samples with
small N and T unless the signal-to-noise ratio of the model is too small.35
Notice that the set-up in all the above methods is such that the factors are extracted
from observed variables. Therefore, it is not clear what the properties of these methods
are when the factors are extracted from estimated residuals, which is precisely what is
of main interest in this paper. We explore this issue via Monte Carlo experiments.
which is similar to Section 4.4 except that we add an extra regressor, x2it . To examine
the impact of strict/weak exogeneity on the properties of the tests we set (i) x2it =
i + $ it , $ it i:i:d:N x2 ; 2x2 and (ii) x2it = yit 1 . In the former case we specify the
parameters such that the signal-to-noise ratio depends solely on the slope coe¢ cients,
1 and 2 . In particular, de…ne yit = yit i such that
33
We consider each of the terms in (110) sequentially. We have
2 2
var (yit ) = 1 var (x1it ) + 2 var (x2it ) + var ( it ) +2 1 cov (x1it ; it ) =
" M M
#
X X
2 2 2
= 1 m
+ m
+ 2" + 2 2
2 x2
m=1 m=1
" M M
# M
X X X
2 2 2
+ m
+ m
+ " +2 1 m m m
, (111)
m=1 m=1 m=1
M
X M
X
2 2 2
var ( it ) = m
+ m
+ ", (112)
m=1 m=1
and
2
2cov (yit ; it ) = 2cov 1 x1it + 2 x2it + it ; it = 2[ 1 cov (x1it ; it ) + var ( it )]
M
" M M
#
X X X
2 2 2
= 2 1 m m m
+ m
+ m
+ " . (113)
m=1 m=1 m=1
Setting
2 2
" = " ;
" M M
#
X X
2 2 2 2
x2 = m
+ m
+ " ; and
m=1 m=1
2 2
m
= m
; m
= m
, m
= for m = 1; ::; M , (114)
Therefore,
2 2 2 2
= s= = 1 + 2. (115)
In the case of weak exogeneity (so x2it = yit 1) we de…ne yit = yit 1
i
such that
2
1 1
yit = x1it + it . (116)
1 2 1 2
34
Using (114) and imposing = 0 it is straightforward to show that
2 2 2 2 2
= s= = 1 + 2 = (1 2) . (118)
6.3.2 Results
Table 2 reports the results in terms of the frequency of the statistics to select the true
number of factors, M0 . If the statistic selects an incorrect number of factors with higher
frequency than M0 , then we report both frequencies, as well as the value of M c 6= M0
in brackets. For example, ‘:000 (0; 1:00)’ means that the frequency of selecting M0 is
zero and the statistic has selected M = 0 with frequency 1. The M cj refer to the
corresponding statistics de…ned in (102)-(107). Following Onatski (2007) we choose
2 0; max N 1=2 ; T 1=2 ; max N 2=3 ; T 2=3 . Therefore, M c4 contains three cases,
c4 , M
M c4 and M c4 that correspond to each of these di¤erent values of respectively.
(1) (2) (3)
As we can see, the performance of the statistics varies across di¤erent experiments
and depends crucially upon the size of M0 (the smaller the better), T (the larger the
better) and the values of 1 and 2 (the smaller the better). For T = 100, most stat-
istics perform well even if the factors are extracted from residuals rather than observed
variables, unless 1 and 2 are both close to unity and M0 = 1. In this case all statistics
heavily underestimate M0 although M c4 and M c4 do less so than the others. This
(1) (3)
…nding is not surprising because most of the noise is idiosyncratic in this case. M c3
35
outperforms M c1 and M c2 , unless 1 = 2 = 0:5, in which case it compares less favorably.
c
M4(1) does relatively poorly in most circumstances but M c4 and M c4 perform quite
(2) (3)
well even for large values of either 1 or 2 , although they are both sensitive to small
values of T . M c5 appears to perform well even for T = 10 and it underestimates M0
mostly when 1 is large. Similar results have been obtained for = 5 and = 0:5
and therefore it appears that these two parameters are not crucial for the performance
of the statistics. Furthermore, we have reached similar conclusions for the case where
x2it = yit 1 but to save space we do not report these here.36
In summary, we may argue that some of the statistics considered here, based on resid-
uals rather than observed variables, perform reasonably well (especially M c4 , M c4 and
(2) (3)
c5 ) under both strong and weak exogeneity, unless a large proportion of the variation
M
in total noise is due to the purely idiosyncratic component, or there is little variation in
the factor loadings, or T is small. The …rst case might be less of a problem in practice
because the impact of the factor structure can be small while the second case can be
accounted for quite e¤ectively using time dummies. Determining the number of factors
under small T is certainly an issue that requires further research.
To this end, Ahn, Lee and Schmidt (2006) propose determining the number of factors
using a sequential method, based on GMM and Sargan’s (1958) or Hansen’s (1982) test
statistic. Their method is appropriate for …xed T . The intuition of this approach is
c < M0 , this is likely to show up as a signi…cant overidentifying restrictions test
that if M
statistic. Therefore, one may start by testing the null M0 = 0 against the alternative
M0 > 0. Then if the null is rejected, one can move to test the null M0 = 1 against
the alternative M0 > 1 and so one until the null hypothesis is not rejected. Naturally,
the signi…cance level used for this sequential method needs to be appropriately adjus-
ted. This approach is valid under strongly and weakly exogenous regressors. In the
former case, it will identify only the factors whose factor loadings are correlated with
the regressors. An alternative approach can be based on the joint serial correlation test
(Yamagata, 2008) combined with a GMM estimator that allows for factor residuals in
the same sequential manner, but we have no results as yet. A further possibility under
…xed T is to construct a criterion based on a likelihood ratio test statistic. Lawley
and Maxwell (1971, section 2.6) provide details for the case of extracting latent factors
from observed variables, although the case of extracting factors from regression residuals
remains unexplored in the literature. We do expect the issue of determining the number
of factors in …xed T cases to attract more attention in the near future.
36
The results are available from the authors upon request.
36
Table 2 Performance of statistics for selecting the number of factors, = 1.
T M0 1 3 M M M M (1)
M (2)
M (3)
M M
100 1 0:5 0:5 1:00 1:00 :995 :314 (2; :45) :885 :613 1:00 1:00
c1 c2 c3 c4 c4 c4 c5 c6
50 1 0:5 0:5 1:00 1:00 1:00 :174 (2; :43) :835 :562 1:00 1:00
10 1 0:5 0:5 :003 (6; :97) :054 (6; :70) :000 (0; 1:0) :156 (0; :84) :002 (0; :99) :006 (0; :99) :904 :000 (7; 1:0)
100 1 0:9 0:5 1:00 :994 1:00 :351 (2; :45) :913 :672 1:00 1:00
50 1 0:9 0:5 :850 :609 :999 :208 (2; :45) :880 :602 1:00 1:00
10 1 0:9 0:5 :000 (6; :70) :001 (0; :71) :000 (6; :98) :001 (0; :99) :000 (0; 1:0) :000 (0; 1:0) :218 (7; :25) :000 (7; 1:0)
100 1 0:5 0:8 1:00 1:00 :999 :301 (2; :48) :902 :628 1:00 1:00
50 1 0:5 0:8 1:00 1:00 1:00 :184 (2; :43) :850 :554 1:00 1:00
10 1 0:5 0:8 :007 (6; :93) :026 (6; :63) :000 (6; 1:0) :074 (0; :93) :000 (0; 1:0) :001 (0; 1:0) :849 :000 (7; 1:0)
100 1 0:9 0:8 :448 (0; :55) :055 (0; :95) 1:00 :368 (2; :42) :907 :682 1:00 1:00
37
50 1 0:9 0:8 :023 (0; :97) :002 (0; :99) :573 :201 (2; :45) :856 :590 :979 :992
10 1 0:9 0:8 :000 (6; :88) :000 (6; :61) :000 (6; :99) :000 (0; 1:0) :000 (0; 1:0) :000 (0; 1:0) :147 (7; :30) :000 (7; 1:0)
100 3 0:5 0:5 1:00 1:00 :907 :726 :980 :907 1:00 :119 (1; :72)
50 3 0:5 0:5 :998 :998 1:00 :649 :998 :914 :999 :118 (1; :67)
10 3 0:5 0:5 :000 (6; :99) :000 (6; :98) :000 (6; 1:0) :002 (0; :53) :000 (0; :97) :000 (0; :90) :340 :000 (7; 1:0)
100 3 0:9 0:5 :238 (0; :55) :012 (0; :75) :999 :813 :995 :995 :785 :000 (0; :99)
50 3 0:9 0:5 :017 (1; :68) :000 (1; :60) :619 :720 :938 :926 :449 :002 (1; :96)
10 3 0:9 0:5 :000 (6; 1:0) :000 (6; :50) :000 (6; :99) :000 (0; :60) :000 (0; 1:0) :000 (0; 1:0) :074 (1; :34) :000 (7; 1:0)
100 3 0:5 0:8 :014 (1; :72) :000 (1; :98) :997 :796 :988 :944 :746 :000 (1; 1:0)
50 3 0:5 0:8 :001 (1; :96) :000 (1; 1:0) :237 (2; :54) :701 :770 :841 :449 :000 (1; 1:0)
10 3 0:5 0:8 :000 (6; :97) :000 (6; :71) :000 (6; 1:0) :000 (0; :95) :000 (0; 1:0) :000 (0; 1:0) :094 (1; :90) :002 (7; 1:0)
100 3 0:9 0:8 :000 (0; :55) :000 (0; :96) :000 (1; :98) :296 (2; :52) :017 (1; :68) :108 (2; :51) :000 (1; 1:0) :000 (1; 1:0)
50 3 0:9 0:8 :000 (0; :96) :000 (0; :99) :000 (1; :59) :359 (2; :37) :018 (1; :78) :096 (1; :50) :005 (1; :97) :002 (1; :99)
10 3 0:9 0:8 :000 (6; :89) :000 (6; :62) :000 (6; :99) :000 (0; 1:0) :000 (0; 1:0) :000 (0; 1:0) :085 (7; :31) :000 (7; 1:0)
7 Current Challenges and Future Directions
There have been several major advances in the theoretical literature of panel data ana-
lysis with error cross-sectional dependence over the last ten years. Methods developed
for dealing with …xed- and large-T cases, strongly and weakly exogenous regressors,
non-stationary panels and testing for non-zero correlations across individuals have all
helped to (re)address more e¤ectively the issue of cross-sectional dependence and ulti-
mately that of unobserved heterogeneity. Notwithstanding, there is still an abundance
of non-trivial problems that require research attention. For instance, the literature is
mute on dealing with cross-sectional dependence in non-linear panel data models, in
which case it is typically assumed, for identi…cation purposes rather than descriptive
accuracy, that all observations are independent across individuals. Testing for cross-
sectional dependence in non-linear models is not straightforward either, although some
progress has been made by Hsiao, Pesaran and Pick (2009). There is a large range of
other models that await possible extensions of the existing methods, such as panel VARs
with a multi-factor error structure, systems of simultaneous equations and models with
heterogeneous coe¢ cients.
Finally, there is yet a relatively small empirical literature that deals with cross-
sectional dependence in practice. It will be useful, as well as interesting, to see the
extent to which economic applications can bene…t from theoretical advances in the …eld.
References
[1] Ahn, S.C. and Schmidt, P. 1995. E¢ cient Estimations of Models for Dynamic Panel
Data. Journal of Econometrics, 68, 5-28.
[2] Ahn, S. C. and Horenstein, A. 2008. Eigenvalue ratio test for the number of factors.
Mimeo.
[3] Ahn, S. C., Y. H. Lee and P. Schmidt. 2006. GMM estimation of linear panel data
models with time-varying individual e¤ects. Journal of Econometrics, 101, 219-255.
[4] Ahn, S. C., Y. H. Lee and P. Schmidt. 2006. Panel Data Models with Multiple
Time-Varying Individual E¤ects. Mimeo.
[5] Amengual, D. and Watson, M. W. 2007. Consistent estimation of the number of dy-
namic factors in a large N and T panels. Journal of Business & Economic Statistics,
25(1), 91-96.
[6] Anderson, T.W. and Hsiao, C. 1981. Estimation of Dynamic Models with Error
Components. Journal of the American Statistical Association, 76, 598-606.
[7] Arellano, M. 2003. Panel Data Econometrics. Oxford University Press, Oxford.
[8] Arellano, M. and Bond S. 1991. Some Tests of Speci…cation for Panel Data: Monte
Carlo Evidence and an Application to Employment Equations. Review of Economic
Studies, 58, 277-297.
38
[9] Arellano, M. and Bover, O. 1995. Another Look at the Instrumental Variable Es-
timation of Error-Component Models. Journal of Econometrics, 68, 29-51.
[10] Bai, J. 2009. Panel Data Models with Interactive Fixed E¤ects. Econometrica, 77,
1229-1279.
[11] Bai, J. 2010. Likelihood approach to small T dynamic panel models with interactive
e¤ects. Mimeo.
[12] Bai, J. and Ng, S. 2002. Determining the Number of Factors in Approximate Factor
Models. Econometrica, 70, 191-22.
[13] Bai, J. and Ng, S. 2002. A PANIC Attack on Unit Roots and Cointegration. Eco-
nometrica, 72(4), 1127-1177.
[14] Baltagi, B. 2008. Econometric Analysis of Panel Data, 4th ed. John Willey & Sons,
West Sussex.
[15] Baltagi B. H. and Pesaran M. H. 2007. Heterogeneity and cross section dependence
in panel data models: theory and applications - Introduction. Journal of Applied
Econometrics, 22(2), 229-232.
[17] Blundell, R. and Bond, S. 1998. Initial Conditions and Moment Restrictions in
Dynamic Panel Data Models. Journal of Econometrics, 87, 115-143.
[18] Breitung, J. and Pesaran, M.H. 2008. Unit Roots and Cointegration in Panels, in
L. Matyas and Sevestre P. (eds.) The Econometrics of Panel Data: Fundamentals
and Recent Developments in Theory and Practice, Kluwer Academic Publishers.
[19] Breusch, T. and A. Pagan. 1980. The Lagrange multiplier test and its application
to model speci…cation in econometrics. Review of Economic Studies 47, 239-253.
[20] Chen, J., Gao, J. and Li, D. 2009. A New Diagnostic Test for Cross–Section Inde-
pendence in Nonparametric Panel Data Models. Mimeo.
[21] Chudik, A. Pesaran, M. H. and Tosetti, E. 2009. Weak and Strong Cross Section
Dependence and Estimation of Large Panels. Mimeo.
[22] Coakley, J., A. Fuertes and R. Smith (2002). A Principal Components Approach to
Cross-Section Dependence in Panels. Working paper, Birckbeck College, University
of London.
[23] Conley, T.G. 1999. GMM Estimation with Cross Sectional Dependence. Journal of
Econometrics, 92,1-45.
39
[24] Conley, T.G., and Topa, G. 2002. Socio-economic Distance and Spatial Patterns in
Unemployment. Journal of Applied Econometrics, 17, 303-327.
[26] Driscoll, J.C., and Kraay, A.C. 1998. Consistent Covariance Matrix Estimation with
Spatially Dependent Data. The Review of Economics and Statistics, 80, 549-560.
[28] Fisher, R.A. 1935. The Design of Experiments. Oliver and Boyd, Edinburgh.
[29] Forni, M. and Lippi, M. 2001. The Generalized Dynamic Factor Model: Represent-
ation Theory. Econometric Theory 17, 1113-1141.
[30] Forni, M., M. Hallin, M. Lippi, and L. Reichlin (2000). The generalized factor model:
identi…cation and estimation. The Review of Economics and Statistics, 82, 540-554.
[32] Friedman, M. 1937. The use of ranks to avoid the assumption of normality implicit
in the analysis of variance. Journal of the American Statistical Association 32, 675–
701.
[33] Goldberger, A. 1972. Structural equation methods in the social sciences. Economet-
rica 40 (6), 979–1001.
[34] Hallin, M. and Liška, R. 2007. Determining the number of factors in the general
dynamic factor model. Journal of the American Statistical Association, 102(478),
603-617.
[36] Hayakawa, K. 2009. Bias Corrected Estimation of Dynamic Panel Data Models with
Interactive Fixed E¤ects. Mimeo.
[38] Hurlin, C., Mignon, V. 2004. Second generation panel unit root tests. Mimeo.
[39] Hsiao, C. Analysis of Panel Data. 2nd ed. Cambridge University Press, Cambridge.
[40] Hsiao, C. 2007. Panel Data Analysis - Advantages and Challenges. TEST. Vol. 16,
pp. 1-22.
40
[41] Hsiao, C., Pesaran, M. H. and Pick, A. 2009. Diagnostic Tests of Cross Section
Independence for Nonlinear Panel Data Models. Mimeo.
[44] Kapetanios, G., and Pesaran, M. H. 2007. Small Sample Properties of Cross Section
Augmented Estimators for Panel Data Models with Residual Multi-factor Struc-
tures; with M. H. Pesaran. In The Re…nement of Econometric Estimation and Test
Pro-cedures: Finite Sample and Asymptotic Analysis, Garry Phillips and Elias
Tzavalis (eds.), Cambridge University Press, Cambridge.
[45] Kapetanios, G., Pesaran, M. H. and Yamagata, T. 2009. Panels with Nonstationary
Multifactor Error Structures. Mimeo.
[46] Kapoor, M., Kelejian, H. and Prucha, I. 2007. Panel Data Models with Spatially
Correlated Error Components. Journal of Econometrics, 140, 97–130.
[47] Kelejian, H. and Prucha, I. 2010. “Speci…cation and Estimation of Spatial Autore-
gressive Models with Autoregressive and Heteroskedastic Disturbances. Journal of
Econometrics, forthcoming.
[48] Kiviet, J. and Sara…dis, V. 2000. Cross-sectional Correlation in Panel Data Rela-
tionships. Mimeo.
[51] Lee, L. F. 2007. GMM and 2SLS estimation of mixed regressive, spatial autoregress-
ive models. Journal of Econometrics, 137, 489–514.
[52] Lawley, D.N. and Maxwell A.E. 1971. Factor Analysis as a Statistical Method.
Butterworth, London.
[53] Moon, R. G. and Perron, B. 2004. E¢ cient Estimation of the SUR Cointegrating
Regression Model and Testing for Purchasing Power Parity. Econometric Reviews,
23, 293-323.
41
[55] Mundlak, Y. 1978. On the pooling of time series and cross section data. Economet-
rica, 46, 69-85.
[56] Nauges, C. and Thomas, A. 2003. Consistent estimation of dynamic panel data
models with time-varying individual e¤ects. Annales d’Economie et de Statistique,
70, 53-74.
[57] Neprash, J.A. 1934. Some Problems in the Correlation of Spatially Distributed
Variables. Journal of the American Statistical Association, 29, 167-168.
[59] Nickell, S. 1981. Biases in Dynamic Models with Fixed E¤ects. Econometrica, 49,
1417-1426.
[60] Onatski, A. 2007. A formal statistical test for the number of factors in the approx-
imate factor models. Mimeo
[61] Pesaran, M. H. 2004. General diagnostic tests for cross section dependence in pan-
els. University of Cambridge, Faculty of Economics, Cambridge Working Papers in
Economics No. 0435.
[62] Pesaran, M. H. and Tosetti, E. 2009. Large panels with common factors and spatial
correlations. Mimeo.
[63] Pesaran, M. H., A. Ullah, and Yamagata, T. 2008. A bias-adjusted test of error
cross section dependence. The Econometrics Journal, 11, 105-127.
[64] Phillips, P. and Sul, D. 2003. Dynamic Panel Estimation and Homogeneity Testing
under cross-sectional Dependence. Econometrics Journal 6, 217-259.
[65] Phillips, P. and Sul, D. 2007. Bias in Dynamic Panel Estimation with Fixed E¤ects,
Incidental Trends and cross-sectional Dependence. Journal of Econometrics 137,
162-188.
[66] Robertson, D. and Symons. J. 2007. Maximum Likelihood Factor Analysis with
Rank De…cient Sample Covariance Matrices. Journal of Multivariate Analysis,
98(4), 813-828.
[67] Robertson, D., V. Sara…dis, and J. Symons (2010). IV Estimation of Panels with
Factor Residuals. mimeo.
[68] Sara…dis, V. 2009. GMM Estimation of Short Dynamic Panel Data Models with
Error Cross-sectional Dependence. Mimeo.
[69] Sara…dis, V. and Robertson, D. 2009. On the Impact of Error Cross-sectional De-
pendence in Short Dynamic Panel Estimation. The Econometrics Journal, 12(1),
62-81.
42
[70] Sara…dis, V., Yamagata, T. and Robertson, D. 2009. A Test of Cross Section De-
pendence for a Linear Dynamic Panel Model with Regressors. Journal of Econo-
metrics, 148(2), 149-161.
[71] Sargan, J.D. 1958. The Estimation of Economic Relationships Using Instrumental
Variables. Econometrica, 26, 393-495.
[72] Stephan, F.F. 1934. Sampling Errors and Interpretations of Social Data Ordered in
Time and Space. Journal of the American Statistical Association, 29, 165-166.
[74] Srivastava. S. and Giles, D. 1987. Seemingly Unrelated Regression Equations Mod-
els. Marcel Dekker, New York.
[75] Tobler, W. 1970. A Computer Movie Simulating Urban Growth in the Detroit Re-
gion. Economic Geography, 46, 234-240.
[76] Yamagata, T. 2008. A Joint Serial Correlation Test for Linear Panel Data Models.
Journal of Econometrics 146, 13-145.
[77] Wansbeek, T., and Knaap, T. 1999. Estimating a Dynamic Panel Data Model with
Heterogenous Trends. Annales d’Economie et de Statistique, 55-56, 331-349.
[78] Wansbeek, T., and E. Meijer. 2000. Measurement Error and Latent Variables in
Econometrics. Amsterdam, Elsevier.
[79] Wansbeek, T., and E. Meijer. 2007. Comments on; Panel data Analysis - Advantages
and Challenges. TEST. Vol. 16, pp. 33-36.
43