Bai and NG 2002
Bai and NG 2002
In this paper we develop some econometric theory for factor models of large dimen-
sions. The focus is the determination of the number of factors r, which is an unresolved
issue in the rapidly growing literature on multifactor models. We first establish the con-
vergence rate for the factor estimates that will allow for consistent estimation of r. We
then propose some panel criteria and show that the number of factors can be consistently
estimated using the criteria. The theory is developed under the framework of large cross-
sections (N ) and large time dimensions (T ). No restriction is imposed on the relation
between N and T . Simulations show that the proposed criteria have good finite sample
properties in many configurations of the panel data encountered in practice.
1 introduction
The idea that variations in a large number of economic variables can be
modeled by a small number of reference variables is appealing and is used in
many economic analyses. For example, asset returns are often modeled as a
function of a small number of factors. Stock and Watson (1989) used one ref-
erence variable to model the comovements of four main macroeconomic aggre-
gates. Cross-country variations are also found to have common components; see
Gregory and Head (1999) and Forni, Hallin, Lippi, and Reichlin (2000b). More
recently, Stock and Watson (1999) showed that the forecast error of a large num-
ber of macroeconomic variables can be reduced by including diffusion indexes,
or factors, in structural as well as nonstructural forecasting models. In demand
analysis, Engel curves can be expressed in terms of a finite number of factors.
Lewbel (1991) showed that if a demand system has one common factor, budget
shares should be independent of the level of income. In such a case, the num-
ber of factors is an object of economic interest since if more than one factor is
found, homothetic preferences can be rejected. Factor analysis also provides a
convenient way to study the aggregate implications of microeconomic behavior,
as shown in Forni and Lippi (1997).
Central to both the theoretical and the empirical validity of factor models is
the correct specification of the number of factors. To date, this crucial parameter
1
We thank three anonymous referees for their very constructive comments, which led to a much
improved presentation. The first author acknowledges financial support from the National Science
Foundation under Grant SBR-9709508. We would like to thank participants in the econometrics
seminars at Harvard-MIT, Cornell University, the University of Rochester, and the University of
Pennsylvania for help suggestions and comments. Remaining errors are our own.
191
192 jushan bai and serena ng
is often assumed rather than determined by the data.2 This paper develops a
formal statistical procedure that can consistently estimate the number of factors
from observed data. We demonstrate that the penalty for overfitting must be
a function of both N and T (the cross-section dimension and the time dimen-
sion, respectively) in order to consistently estimate the number of factors. Con-
sequently the usual AIC and BIC, which are functions of N or T alone, do
not work when both dimensions of the panel are large. Our theory is developed
under the assumption that both N and T converge to infinity. This flexibility is
of empirical relevance because the time dimension of datasets relevant to factor
analysis, although small relative to the cross-section dimension, is too large to
justify the assumption of a fixed T .
A small number of papers in the literature have also considered the problem
of determining the number of factors, but the present analysis differs from these
works in important ways. Lewbel (1991) and Donald (1997) used the rank of a
matrix to test for the number of factors, but these theories assume either N or T
is fixed. Cragg and Donald (1997) considered the use of information criteria when
the factors are functions of a set of observable explanatory variables, but the data
still have a fixed dimension. For large dimensional panels, Connor and Korajczyk
(1993) developed a test for the number of factors in asset returns, but their test
is derived under sequential limit asymptotics, i.e., N converges to infinity with a
fixed T and then T converges to infinity. Furthermore, because their test is based
on a comparison of variances over different time periods, covariance stationarity
and homoskedasticity are not only technical assumptions, but are crucial for the
validity of their test. Under the assumption that N → for fixed T , Forni and
Reichlin (1998) suggested a graphical approach to identify√ the number of factors,
but no theory is available. Assuming N T → with N /T → , Stock and
Watson (1998) showed that a modification to the BIC can be used to select
the number of factors optimal for forecasting a single series. Their criterion is
restrictive not only because it requires N T , but also because there can be
factors that are pervasive for a set of data and yet have no predictive ability
for an individual data series. Thus, their rule may not be appropriate outside of
the forecasting framework. Forni, Hallin, Lippi, and Reichlin (2000a) suggested
a multivariate variant of the AIC but neither the theoretical nor the empirical
properties of the criterion are known.
We set up the determination of factors as a model selection problem. In con-
sequence, the proposed criteria depend on the usual trade-off between good fit
and parsimony. However, the problem is nonstandard not only because account
needs to be taken of the sample size in both the cross-section and the time-
series dimensions, but also because the factors are not observed. The theory we
developed does not rely on sequential limits, nor does it impose any restrictions
between N and T . The results hold under heteroskedasticity in both the time and
2
Lehmann and Modest (1988), for example, tested the APT for 5, 10, and 15 factors. Stock and
Watson (1989) assumed there is one factor underlying the coincident index. Ghysels and Ng (1998)
tested the affine term structure model assuming two factors.
approximate factor models 193
the cross-section dimensions. The results also hold under weak serial and cross-
section dependence. Simulations show that the criteria have good finite sample
properties.
The rest of the paper is organized as follows. Section 2 sets up the preliminaries
and introduces notation and assumptions. Estimation of the factors is considered
in Section 3 and the estimation of the number of factors is studied in Section 4.
Specific criteria are considered in Section 5 and their finite sample properties
are considered in Section 6, along with an empirical application to asset returns.
Concluding remarks are provided in Section 7. All the proofs are given in the
Appendix.
2 factor models
Let Xit be the observed data for the ith cross-section unit at time t, for i =
1 N , and t = 1 T . Consider the following model:
rank of a demand system holding prices fixed is the smallest integer r such that
wj e =
j1 G1 e + · · · +
jr Gr e. Demand systems are of the form (1) where the
r factors, common across goods, are Fh = G1 eh · · · Gr eh . When the number
of households, H , converges to infinity with a fixed J G1 e · · · Gr e can be esti-
mated simultaneously, such as by nonparametric methods developed in Donald
(1997). This approach will not work when the number of goods, J , also converges
to infinity. However, the theory to be developed in this paper will still provide a
consistent estimation of r and without the need for nonparametric estimation of
the G· functions. Once the rank of the demand system is determined, the non-
parametric functions evaluated at eh allow Fh to be consistently estimable (up to
a transformation). Then functions G1 e · · · Gr e may be recovered (also up to
a matrix transformation) from Fh h = 1 H via nonparametric estimation.
3. Forecasting with diffusion indices. Stock and Watson (1998, 1999) considered
forecasting inflation with diffusion indices (“factors”) constructed from a large
number of macroeconomic series. The underlying premise is that these series may
be driven by a small number of unobservable factors. Consider the forecasting
equation for a scalar series
yt+1 = Ft + Wt + t
ŷT +1T = ˆ FT + W
ˆ T
can be formed. Stock and Watson (1998, 1999) showed that this approach of
forecasting outperforms many competing forecasting methods. But as pointed
out earlier, the dimension of F in Stock and Watson (1998, 1999) was determined
using a criterion that minimizes the mean squared forecast errors of y. This may
not be the same as the number of factors underlying Xit , which is the focus of
this paper.
factor analysis (e.g., Anderson (1984)), N is assumed fixed, the factors are inde-
pendent of the errors et , and the covariance of et is diagonal. Normalizing the
covariance matrix of Ft to be an identity matrix, we have = 0 0 + !, where
and ! are the covariance matrices of Xt and et , respectively. Under these
assumptions, a root-T consistent and asymptotically normal estimator of , say,
the sample covariance matrix = 1/T Tt=1 Xt − XX
t − X can be obtained.
The essentials of classical factor analysis carry over to the case of large N but
fixed T since the N × N problem can be turned into a T × T problem, as noted
by Connor and Korajczyk (1993) and others.
Inference on r under classical assumptions can, in theory, be based on the
eigenvalues of since a characteristic of a panel of data that has an r factor
representation is that the first r largest population eigenvalues of the N × N
covariance of Xt diverge as N increases to infinity, but the r + 1th eigenvalue
is bounded; see Chamberlain and Rothschild (1983). But it can be shown that all
nonzero sample eigenvalues (not just the first r) of the matrix increase with
N , and a test based on the sample eigenvalues is thus not feasible. A likelihood
ratio test can also, in theory, be used to select the number of factors if, in addi-
tion, normality of et is assumed. But as found by Dhrymes, Friend, and Glutekin
(1984), the number of statistically significant factors determined by the likelihood
ratio test increases with N even if the true number of factors is fixed. Other
methods have also been developed to estimate the number of factors assuming
the size of one dimension is fixed. But Monte Carlo simulations in Cragg and
Donald (1997) show that these methods tend to perform poorly for moderately
large N and T . The fundamental problem is that the theory developed for clas-
sical factor models does not apply when both N and T → . This is because
consistent estimation of (whether it is an N × N or a T × T matrix) is not a
well defined problem. For example, when N > T , the rank of is no more than
T , whereas the rank of can always be N . New theories are thus required to
analyze large dimensional factor models.
In this paper, we develop asymptotic results for consistent estimation of the
number of factors when N and T → . Our results complement the sparse but
growing literature on large dimensional factor analysis. Forni and Lippi (2000)
and Forni et al. (2000a) obtained general results for dynamic factor models, while
Stock and Watson (1998) provided some asymptotic results in the context of
forecasting. As in these papers, we allow for cross-section and serial dependence.
In addition, we also allow for heteroskedasticity in et and some weak dependence
between the factors and the errors. These latter generalizations are new in our
analysis. Evidently, our assumptions are more general than those used when the
sample size is fixed in one dimension.
Let X i be a T × 1 vector of time-series observations for the ith cross-section
unit. For a given i, we have
Xi = F 0
0i + ei
(3)
T × 1 T × r r × 1 T × 1
196 jushan bai and serena ng
where X i = Xi1 Xi2 XiT F 0 = F10 F20 FT0 , and ei = ei1 ei2
eiT . For the panel of data X = X 1 X N , we have
X = F0 0 + e
(4)
T × N T × r r × N T × N
with e = e1 eN .
Let trA denote the trace of A. The norm of the matrix A is then A =
trA A1/2 . The following assumptions are made:
T 0 0
Assumption A—Factors: EFt0 4 < and T −1 t=1 Ft Ft → F as T →
for some positive definite matrix F .
4
Gregory, Head, and Raynauld (1997) estimated a world factor and seven country specific fac-
tors from output, consumption, and investment for each of the G7 countries. The exercise involved
estimation of 92 parameters and perhaps stretched the state-space model to its limit.
5
The method of asymptotic principal components was studied by Connor and Korajzcyk (1986)
and Connor and Korajzcyk (1988) for fixed T . Forni et al. (2000a) and Stock and Watson (1998)
considered the method for large T .
198 jushan bai and serena ng
subject to the normalization of either k k /N = Ik or F k F k /T = Ik . If we con-
centrate out k and use the normalization that F k F k /T = Ik , the optimization
to maximizing trF k XX F k . The estimated factor matrix,
problem is identical √
k
denoted by F , is T times the eigenvectors corresponding to the k largest
k = F k F k −1 F k X = F k X/T
eigenvalues of the T × T matrix XX . Given F k
is the corresponding matrix of factor loadings.
The solution to the above minimization problem is not unique, even though the
sum of squared residuals V k
k
k
√ is unique. Another solution is given by F ,
k
where is constructed as N times the eigenvectors corresponding to the k
largest eigenvalues of the N ×N matrix X X. The normalization that
k
k /N =
k
k
Ik implies F = X /N . The second set of calculations is computationally less
costly when T > N , while the first is less intensive when T < N .6
Define
Fk = F
k F
k F
k /T 1/2
a rescaled estimator of the factors. The following theorem summarizes the asymp-
totic properties of the estimated factors.
Because the true factors (F 0 ) can only be identified up to scale, what is being
considered is a rotation of F 0 . The theorem establishes that the time average of
the squared deviations between the estimated factors and those that lie in the
true factor space vanish as N T → . The rate of convergence is determined by
the smaller of N or T , and thus depends on the panel structure.
Under the additional assumption that Ts=1 )N s t2 ≤ M for all t and T , the
result7
2
CN2 T Ft − H k Ft0 = Op 1 for each t
(6)
1 N T 2
V k Fk = min Xit −
ki Ftk
(7)
NT
i=1 t=1
denote the sum of squared residuals (divided by N T ) when k factors are esti-
mated. This sum of squared residuals does not depend on which estimate of F is
used because they span the same vector space. That is, V k F k = V k F
k =
V k Fk . We want to find penalty functions, gN T , such that criteria of the
form
can consistently estimate r. Let kmax be a bounded integer such that r ≤ kmax.
Theorem 2: Suppose that Assumptions A–D hold and that the k factors
are estimated by principal components. Let k̂ = arg min0≤k≤kmax PCk. Then
2
limN T → Probk̂ = r = 1 if√(i) gN
√ T → 0 and (ii) CN T · gN T → as N ,
T → , where CN T = min+ N T ,.
Conditions (i) and (ii) are necessary in the sense that if one of the conditions
is violated, then there will exist a factor model satisfying Assumptions A–D, and
200 jushan bai and serena ng
Note that V k Fk is simply the average residual variance when k factors are
assumed for each cross-section unit. The IC criteria thus resemble information
criteria frequently used in time-series analysis, with the important difference that
the penalty here depends on both N and T .
Thus far, it has been assumed that the common factors are estimated by the
method of principle components. Forni and Reichlin (1998) and Forni et al.
(2000a) studied alternative estimation methods. However the proof of Theorem 2
mainly uses the fact that Ft satisfies Theorem 1, and does not rely on principal
components per se. We have the following corollary:
T
1
(8) 2
C G k F 0 2 = Op 1
k − H
NT t t
T t=1
k and CN T replaced by C
Then Theorem 2 still holds with Fk replaced by G N T .
8
We are grateful for a referee whose question led to the results reported here.
approximate factor models 201
N T
practice. Let 3̂ 2 be a consistent estimate of N T −1 i=1 t=1 Eeit
2
. Consider
the following criteria:
N +T NT
PCp1 k = V k Fk + k3̂ 2 ln 4
NT N +T
k 2 N +T
PCp2 k = V k F + k3̂ ln CN2 T 4
NT
2
k 2 ln CN T
PCp3 k = V k F + k3̂
CN2 T
Since V k Fk = N −1 N 2 2
i=1 3̂i , where 3̂i = êi êi /T , the criteria generalize the Cp
criterion of Mallows (1973) developed for selection of models in strict time-
series or cross-section contexts to a panel data setting. For this reason, we refer
to these statistics as Panel Cp PCp criteria. Like the Cp criterion, 3̂ 2 provides
the proper scaling to the penalty term. In applications, it can be replaced by
V kmax Fkmax . The proposed penalty functions are based on the sample size in
the smaller of the two dimensions. All three criteria satisfy conditions (i) and (ii)
of Theorem 2 since CN−2T ≈ N + T /N T → 0 as N T → . However, in finite
samples, CN−2T ≤ N + T /N T . Hence, the three criteria, although asymptotically
equivalent, will have different properties in finite samples.9
Corollary 1 leads to consideration of the following three criteria:
k N +T NT
ICp1 k = lnV k F + k ln 4
NT N +T
N +T
(9) ICp2 k = lnV k Fk + k ln CN2 T 4
NT
ln CN2 T
ICp3 k = lnV k Fk + k
CN2 T
The main advantage of these three panel information criteria (ICp ) is that they
do not depend on the choice of kmax through 3̂ 2 , which could be desirable in
practice. The scaling by 3̂ 2 is implicitly performed by the logarithmic transfor-
mation of V k Fk and thus not required in the penalty term.
The proposed criteria differ from the conventional Cp and information criteria
used in time-series analysis in that gN T is a function of both N and T . To
understand why the penalty must be specified as a function of the sample size in
9
Note that PCp1 and PCp2 , and likewise, ICp1 and ICp2 , apply specifically to the principal com-
ponents estimator because CN2 T = min+N T , is used in deriving them. For alternative estimators
N T .
satisfying Corollary 2, criteria PCp3 and ICp3 are still applicable with CN T replaced by C
202 jushan bai and serena ng
where the factors are T × r matrices of N 0 1 variables, and the factor load-
ings are N 0 1 variates. Hence, the common component of Xit , denoted by cit ,
has variance r. Results with
ij uniformly distributed are similar and will not
approximate factor models 203
be reported. Our base case assumes that the idiosyncratic component has the
same variance as the common component (i.e. 6 = r. We consider thirty con-
figurations of the data. The first five simulate plausible asset pricing applications
with five years of monthly data (T = 60) on 100 to 2000 asset returns. We then
increase T to 100. Configurations with N = 60 T = 100 and 200 are plausible
sizes of datasets for sectors, states, regions, and countries. Other configurations
are considered to assess the general properties of the proposed criteria. All com-
putations were performed using Matlab Version 5.3.
Reported in Tables I to III are the averages of k̂ over 1000 replications, for
r = 1 3, and 5 respectively, assuming that eit is homoskedastic N 0 1. For all
cases, the maximum number of factors, kmax, is set to 8.10 Prior to computation
of the eigenvectors, each series is demeaned and standardized to have unit vari-
ance. Of the three PCp criteria that satisfy Theorem 2, PCp3 is less robust than
PCp1 and PCp2 when N or T is small. The ICp criteria generally have prop-
erties very similar to the PCp criteria. The term N T /N + T provides a small
sample correction to the asymptotic convergence rate of CN2 T and has the effect
of adjusting the penalty upwards. The simulations show this adjustment to be
desirable. When min+N T , is 40 or larger, the proposed tests give precise esti-
mates of the number of factors. Since our theory is based on large N and T , it is
not surprising that for very small N or T , the proposed criteria are inadequate.
Results reported in the last five rows of each table indicate that the ICp crite-
ria tend to underparameterize, while the PCp tend to overparameterize, but the
problem is still less severe than the AIC and the BIC, which we now consider.
The AIC and BIC’s that are functions of only N or T have the tendency
to choose too many factors. The AIC3 performs somewhat better than AIC1
and AIC2 , but still tends to overparameterize. At first glance, the BIC3 appears
to perform well. Although BIC3 resembles PCp2 , the former penalizes an extra
factor more heavily since lnN T > ln CN2 T . As can be seen from Tables II and
III, the BIC3 tends to underestimate r, and the problem becomes more severe
as r increases.
Table IV relaxes the assumption of homoskedasticity. Instead, we let eit = eit1
for t odd, and eit = eit1 + eit2 for t even, where eit1 and eit2 are independent N 0 1.
Thus, the variance in the even periods is twice as large as the odd periods.
Without loss of generality, we only report results for r = 5. PCp1 PCp2 ICp1 , and
ICp2 continue to select the true number of factors very accurately and dominate
the remaining criteria considered.
We then vary the variance of the idiosyncratic errors relative to the common
component. When 6 < r, the variance of the common component is relatively
large. Not surprisingly, the proposed criteria give precise estimates of r. The
results will not be reported without loss of generality. Table V considers the case
6 = 2r. Since the variance of the idiosyncratic component is larger than the
10
In time-series analysis, a rule such as 8 int[T /1001/4 ] considered in Schwert (1989) is sometimes
used to set kmax, but no such guide is available for panel analysis. Until further results are available,
a rule that replaces T in Schwert’s rule by min+N T , could be considered.
204 jushan bai and serena ng
TABLE I
r √
DGP: Xit = j=1
ij Ftj + 6eit ; r = 1; 6 = 1.
N T PCp1 PCp2 PCp3 ICp1 ICp2 ICp3 AIC1 BIC1 AIC2 BIC2 AIC3 BIC3
100 40 102 100 297 100 100 100 800 297 800 800 757 100
100 60 100 100 241 100 100 100 800 241 800 800 711 100
200 60 100 100 100 100 100 100 800 100 800 800 551 100
500 60 100 100 100 100 100 100 521 100 800 800 157 100
1000 60 100 100 100 100 100 100 100 100 800 800 100 100
2000 60 100 100 100 100 100 100 100 100 800 800 100 100
100 100 100 100 324 100 100 100 800 324 800 324 668 100
200 100 100 100 100 100 100 100 800 100 800 800 543 100
500 100 100 100 100 100 100 100 800 100 800 800 155 100
1000 100 100 100 100 100 100 100 108 100 800 800 100 100
2000 100 100 100 100 100 100 100 100 100 800 800 100 100
40 100 101 100 269 100 100 100 800 800 800 269 733 100
60 100 100 100 225 100 100 100 800 800 800 225 699 100
60 200 100 100 100 100 100 100 800 800 800 100 514 100
60 500 100 100 100 100 100 100 800 800 467 100 132 100
60 1000 100 100 100 100 100 100 800 800 100 100 100 100
60 2000 100 100 100 100 100 100 800 800 100 100 100 100
4000 60 100 100 100 100 100 100 100 100 800 800 100 100
4000 100 100 100 100 100 100 100 100 100 800 800 100 100
8000 60 100 100 100 100 100 100 100 100 800 800 100 100
8000 100 100 100 100 100 100 100 100 100 800 800 100 100
60 4000 100 100 100 100 100 100 800 800 100 100 100 100
100 4000 100 100 100 100 100 100 800 800 100 100 100 100
60 8000 100 100 100 100 100 100 800 800 100 100 100 100
100 8000 100 100 100 100 100 100 800 800 100 100 100 100
10 50 800 800 800 800 800 800 800 800 800 800 800 718
10 100 800 800 800 800 800 800 800 800 800 800 800 588
20 100 473 394 629 100 100 100 800 800 800 629 800 100
100 10 800 800 800 800 800 800 800 800 800 800 800 800
100 20 562 481 716 100 100 100 800 716 800 800 800 100
Notes: Table I–Table VIII report the estimated number of factors (k̂) averaged over 1000 simulations. The true number of factors
is r and kmax = 8. When the average of k̂ is an integer, the corresponding standard error is zero. In the few cases when the averaged
k̂ over replications is not an integer, the standard errors are no larger than .6. In view of the precision of the estimates in the majority
of cases, the standard errors in the simulations are not reported. The last five rows of each table are for models of small dimensions
(either N or T is small).
common component, one might expect the common factors to be estimated with
less precision. Indeed, ICp1 and ICp2 underestimate r when min+N T , < 60, but
the criteria still select values of k that are very close to r for other configurations
of the data.
The models considered thus far have idiosyncratic errors that are uncorrelated
across units and across time. For these strict factor models, the preferred criteria
are PCp1 PCp2 IC1 , and IC2 . It should be emphasized that the results reported
are the averages of k̂ over 1000 simulations. We do not report the standard
deviations of these averages because they are identically zero except for a few
approximate factor models 205
TABLE II
r √
DGP: Xit = j=1
ij Ftj + 6eit ; r = 3; 6 = 3.
N T PCp1 PCp2 PCp3 ICp1 ICp2 ICp3 AIC1 BIC1 AIC2 BIC2 AIC3 BIC3
100 40 300 300 390 300 300 300 800 390 800 800 782 290
100 60 300 300 354 300 300 300 800 354 800 800 753 298
200 60 300 300 300 300 300 300 800 300 800 800 614 300
500 60 300 300 300 300 300 300 595 300 800 800 313 300
1000 60 300 300 300 300 300 300 300 300 800 800 300 300
2000 60 300 300 300 300 300 300 300 300 800 800 300 300
100 100 300 300 423 300 300 300 800 423 800 423 720 300
200 100 300 300 300 300 300 300 800 300 800 800 621 300
500 100 300 300 300 300 300 300 800 300 800 800 315 300
1000 100 300 300 300 300 300 300 301 300 800 800 300 300
2000 100 300 300 300 300 300 300 300 300 800 800 300 300
40 100 300 300 370 300 300 300 800 800 800 370 763 292
60 100 300 300 342 300 300 300 800 800 800 342 739 299
60 200 300 300 300 300 300 300 800 800 800 300 583 300
60 500 300 300 300 300 300 300 800 800 544 300 303 300
60 1000 300 300 300 300 300 300 800 800 300 300 300 300
60 2000 300 300 300 300 300 300 800 800 300 300 300 300
4000 60 300 300 300 300 300 300 300 300 800 800 300 298
4000 100 300 300 300 300 300 300 300 300 800 800 300 300
8000 60 300 300 300 300 300 300 300 300 800 800 300 297
8000 100 300 300 300 300 300 300 300 300 800 800 300 300
60 4000 300 300 300 300 300 300 800 800 300 300 300 299
100 4000 300 300 300 300 300 300 800 800 300 300 300 300
60 8000 300 300 300 300 300 300 800 800 300 300 300 298
100 8000 300 300 300 300 300 300 800 800 300 300 300 300
10 50 800 800 800 800 800 800 800 800 800 800 800 721
10 100 800 800 800 800 800 800 800 800 800 800 800 601
20 100 522 457 662 295 292 298 800 800 800 662 800 268
100 10 800 800 800 800 800 800 800 800 800 800 800 800
100 20 600 529 739 295 291 299 800 739 800 800 800 272
cases for which the average itself is not an integer. Even for these latter cases,
the standard deviations do not exceed 0.6.
We next modify the assumption on the idiosyncratic errors to allow for serial
and cross-section correlation. These errors are generated from the process
J
eit = 8eit−1 + vit + vi−jt
j=0 j=−J
The case of pure serial correlation obtains when the cross-section correlation
parameter is zero. Since for each i, the unconditional variance of eit is
1/1 − 82 , the more persistent are the idiosyncratic errors, the larger are their
variances relative to the common factors, and the precision of the estimates can
be expected to fall. However, even with 8 = 5, Table VI shows that the estimates
provided by the proposed criteria are still very good. The case of pure cross-
206 jushan bai and serena ng
TABLE III
r √
DGP: Xit = j=1
ij Ftj + 6eit ; r = 5; 6 = 5.
N T PCp1 PCp2 PCp3 ICp1 ICp2 ICp3 AIC1 BIC1 AIC2 BIC2 AIC3 BIC3
100 40 499 498 517 488 468 499 800 517 800 800 794 305
100 60 500 500 507 499 494 500 800 507 800 800 787 350
200 60 500 500 500 500 500 500 800 500 800 800 691 380
500 60 500 500 500 500 500 500 688 500 800 800 501 388
1000 60 500 500 500 500 500 500 500 500 800 800 500 382
2000 60 500 500 500 500 500 500 500 500 800 800 500 359
100 100 500 500 542 500 500 501 800 542 800 542 775 416
200 100 500 500 500 500 500 500 800 500 800 800 706 480
500 100 500 500 500 500 500 500 800 500 800 800 502 497
1000 100 500 500 500 500 500 500 500 500 800 800 500 498
2000 100 500 500 500 500 500 500 500 500 800 800 500 498
40 100 500 499 509 486 469 500 800 800 800 509 786 296
60 100 500 500 505 499 494 500 800 800 800 505 781 346
60 200 500 500 500 500 500 500 800 800 800 500 671 383
60 500 500 500 500 500 500 500 800 800 644 500 500 391
60 1000 500 500 500 500 500 500 800 800 500 500 500 379
60 2000 500 500 500 500 500 500 800 800 500 500 500 358
4000 60 500 500 500 500 500 500 500 500 800 800 500 337
4000 100 500 500 500 500 500 500 500 500 800 800 500 496
8000 60 500 500 500 500 500 500 500 500 800 800 500 310
8000 100 500 500 500 500 500 500 500 500 800 800 500 493
60 4000 500 500 500 500 500 500 800 800 500 500 500 335
100 4000 500 500 500 500 500 500 800 800 500 500 500 496
60 8000 500 500 500 500 500 500 800 800 500 500 500 312
100 8000 500 500 500 500 500 500 800 800 500 500 500 493
10 50 800 800 800 800 800 800 800 800 800 800 800 728
10 100 800 800 800 800 800 800 800 800 800 800 800 630
20 100 588 541 699 417 379 468 800 800 800 699 800 279
100 10 800 800 800 800 800 800 800 800 800 800 800 800
100 20 649 594 762 424 387 481 800 762 800 800 800 293
TABLE IV
r √ 1 2
DGP: Xit =
F
j=1 ij tj + 6e it ; eit = e it + :t it (:t = 1 for t Even, :t = 0 for t Odd); r = 5; 6 = 5.
e
N T PCp1 PCp2 PCp3 ICp1 ICp2 ICp3 AIC1 BIC1 AIC2 BIC2 AIC3 BIC3
100 40 496 486 609 409 337 493 800 609 800 800 800 181
100 60 499 490 585 469 418 501 800 585 800 800 800 208
200 60 500 499 500 493 487 500 800 500 800 800 800 222
500 60 500 500 500 499 498 500 800 500 800 800 791 223
1000 60 500 500 500 500 500 500 797 500 800 800 647 202
2000 60 500 500 500 500 500 500 551 500 800 800 503 172
100 100 500 498 660 498 479 524 800 660 800 660 800 256
200 100 500 500 500 500 500 500 800 500 800 800 800 333
500 100 500 500 500 500 500 500 800 500 800 800 794 393
1000 100 500 500 500 500 500 500 800 500 800 800 613 398
2000 100 500 500 500 500 500 500 536 500 800 800 500 385
40 100 494 480 539 404 330 490 800 800 800 539 799 168
60 100 498 488 541 466 414 500 800 800 800 541 799 204
60 200 500 499 500 495 487 500 800 800 800 500 756 214
60 500 500 500 500 499 498 500 800 800 729 500 507 213
60 1000 500 500 500 500 500 500 800 800 500 500 500 190
60 2000 500 500 500 500 500 500 800 800 500 500 500 159
4000 60 500 500 500 500 500 500 500 500 800 800 500 146
4000 100 500 500 500 500 500 500 500 500 800 800 500 367
8000 60 500 500 500 500 500 500 500 500 800 800 500 116
8000 100 500 500 500 500 500 500 500 500 800 800 500 337
60 4000 500 500 500 500 500 500 800 800 500 500 500 130
100 4000 500 500 500 500 500 500 800 800 500 500 500 362
60 8000 500 500 500 500 500 500 800 800 500 500 500 108
100 8000 500 500 500 500 500 500 800 800 500 500 500 329
10 50 800 800 800 800 800 800 800 800 800 800 800 727
10 100 800 800 800 800 800 800 800 800 800 800 800 634
20 100 613 562 723 285 223 393 800 800 800 723 800 186
100 10 800 800 800 800 800 800 800 800 800 800 800 800
100 20 752 699 799 331 264 617 800 799 800 800 800 230
the estimates. However, it will generally be true that for the proposed criteria to
be as precise in approximate as in strict factor models, N has to be fairly large
relative to J cannot be too large, and the errors cannot be too persistent as
required by theory. It is also noteworthy that the BIC3 has very good properties
in the presence of cross-section correlations (see Tables VII and VIII) and the
criterion can be useful in practice even though it does not satisfy all the condi-
tions of Theorem 2.
TABLE V
r √
DGP: Xit = j=1 ij Ftj +
6eit ; r = 5; 6 = r × 2.
N T PCp1 PCp2 PCp3 ICp1 ICp2 ICp3 AIC1 BIC1 AIC2 BIC2 AIC3 BIC3
100 40 463 429 514 279 191 447 800 800 800 514 793 082
100 60 478 441 506 373 261 496 800 800 800 506 786 092
200 60 490 480 500 442 403 494 800 800 800 500 692 093
500 60 496 494 499 477 468 492 800 800 688 499 501 077
1000 60 497 497 498 488 486 493 800 800 500 498 500 056
2000 60 498 498 499 491 489 492 800 800 500 499 500 034
100 100 496 467 542 464 361 501 800 542 800 542 774 123
200 100 500 499 500 498 490 500 800 800 800 500 705 180
500 100 500 500 500 500 500 500 800 800 800 500 502 219
1000 100 500 500 500 500 500 500 800 800 500 500 500 217
2000 100 500 500 500 500 500 500 800 800 500 500 500 206
40 100 461 425 507 265 184 448 800 507 800 800 783 074
60 100 476 438 505 366 260 497 800 505 800 800 781 092
60 200 490 478 500 443 407 495 800 500 800 800 670 088
60 500 497 495 499 478 471 493 644 499 800 800 500 074
60 1000 498 497 499 487 484 492 500 499 800 800 500 051
60 2000 499 498 499 489 488 492 500 499 800 800 500 032
4000 60 499 499 499 492 492 493 800 800 500 499 500 018
4000 100 500 500 500 500 500 500 800 800 500 500 500 172
8000 60 499 499 499 492 492 493 800 800 500 499 500 008
8000 100 500 500 500 500 500 500 800 800 500 500 500 140
60 4000 499 499 499 493 492 495 500 499 800 800 500 015
100 4000 500 500 500 500 500 500 500 500 800 800 500 170
60 8000 499 499 499 492 492 493 500 499 800 800 500 008
100 8000 500 500 500 500 500 500 500 500 800 800 500 140
100 10 800 800 800 800 800 800 800 800 800 800 800 724
100 20 800 800 800 800 800 800 800 800 800 800 800 618
10 50 573 522 690 167 133 279 800 690 800 800 800 112
10 100 800 800 800 800 800 800 800 800 800 800 800 800
20 100 639 579 757 185 144 304 800 800 800 757 800 131
TABLE VI
r √
DGP: Xit =
F
j=1 ij tj + 6e it ; e it = 8e it−1 + vit + Jj=−J j=0 vi−jt ; r = 5; 6 = 5, 8 = 5, = 0, J = 0.
N T PCp1 PCp2 PCp3 ICp1 ICp2 ICp3 AIC1 BIC1 AIC2 BIC2 AIC3 BIC3
100 40 731 659 800 552 453 800 800 800 800 800 800 297
100 60 611 527 800 500 476 800 800 800 800 800 800 309
200 60 594 538 788 501 499 739 800 788 800 800 800 331
500 60 568 539 679 500 500 511 800 679 800 800 800 341
1000 60 541 527 602 500 500 500 800 602 800 800 800 327
2000 60 521 514 550 500 500 500 800 550 800 800 800 306
100 100 504 500 800 500 497 800 800 800 800 800 800 345
200 100 500 500 775 500 500 712 800 775 800 800 800 426
500 100 500 500 521 500 500 500 800 521 800 800 800 468
1000 100 500 500 500 500 500 500 800 500 800 800 800 473
2000 100 500 500 500 500 500 500 800 500 800 800 800 469
40 100 537 505 730 458 408 582 800 800 800 730 800 245
60 100 513 499 788 493 467 740 800 800 800 788 800 280
60 200 500 500 502 499 496 500 800 800 800 502 800 284
60 500 500 500 500 500 500 500 800 800 800 500 753 272
60 1000 500 500 500 500 500 500 800 800 572 500 504 254
60 2000 500 500 500 500 500 500 800 800 500 500 500 228
4000 60 511 508 522 500 500 500 800 522 800 800 800 281
4000 100 500 500 500 500 500 500 800 500 800 800 800 462
8000 60 505 505 508 500 500 500 800 508 800 800 800 255
8000 100 500 500 500 500 500 500 800 500 800 800 800 437
60 4000 500 500 500 500 500 500 800 800 500 500 500 192
100 4000 500 500 500 500 500 500 800 800 500 500 500 421
60 8000 500 500 500 500 500 500 800 800 500 500 500 164
100 8000 500 500 500 500 500 500 800 800 500 500 500 397
100 10 800 800 800 800 800 800 800 800 800 800 800 747
100 20 800 800 800 800 800 800 800 800 800 800 800 669
10 50 716 668 789 357 292 570 800 800 800 789 800 242
10 100 800 800 800 800 800 800 800 800 800 800 800 800
20 100 800 799 800 793 758 800 800 800 800 800 800 392
7 concluding remarks
In this paper, we propose criteria for the selection of factors in large dimen-
sional panels. The main appeal of our results is that they are developed under
the assumption that N T → and are thus appropriate for many datasets typi-
cally used in macroeconomic analysis. Some degree of correlation in the errors is
also allowed. The criteria should be useful in applications in which the number
of factors has traditionally been assumed rather than determined by the data.
210 jushan bai and serena ng
TABLE VII
r √ J
DGP: Xit =
F
j=1 ij tj + 6e it ; e it = 8e it−1 + vit + j=−J j=0 vi−jt ; r = 5; 6 = 5, 8 = 00, = 20,
J = max+N /20 10,.
N T PCp1 PCp2 PCp3 ICp1 ICp2 ICp3 AIC1 BIC1 AIC2 BIC2 AIC3 BIC3
100 40 550 527 602 509 501 563 800 602 800 800 798 424
100 60 557 524 603 515 502 596 800 603 800 800 796 472
200 60 597 594 600 588 576 599 800 600 800 800 763 489
500 60 501 501 510 500 500 501 744 510 800 800 600 493
1000 60 500 500 500 500 500 500 598 500 800 800 593 493
2000 60 500 500 500 500 500 500 505 500 800 800 501 488
100 100 579 530 631 543 504 603 800 631 800 631 795 498
200 100 600 600 600 600 598 600 800 600 800 800 784 500
500 100 521 511 564 506 503 541 800 564 800 800 602 500
1000 100 500 500 500 500 500 500 600 500 800 800 600 500
2000 100 500 500 500 500 500 500 572 500 800 800 541 500
40 100 517 506 595 500 498 530 800 800 800 595 796 422
60 100 530 506 601 503 500 587 800 800 800 601 794 469
60 200 535 516 595 504 501 565 800 800 800 595 739 489
60 500 543 529 583 505 502 535 800 800 749 583 604 494
60 1000 555 545 579 508 505 525 800 800 601 579 600 493
60 2000 564 559 576 507 504 517 800 800 600 576 600 491
4000 60 500 500 500 500 500 500 500 500 800 800 500 484
4000 100 500 500 500 500 500 500 500 500 800 800 500 500
8000 60 500 500 500 500 500 500 500 500 800 800 500 472
8000 100 500 500 500 500 500 500 500 500 800 800 500 500
60 4000 565 563 572 505 504 509 800 800 600 572 600 485
100 4000 600 600 600 600 600 600 800 800 614 600 602 500
60 8000 567 566 571 504 504 505 800 800 600 571 600 477
100 8000 600 600 600 600 600 600 800 800 600 600 600 500
100 10 800 800 800 800 800 800 800 800 800 800 800 734
100 20 800 800 800 800 800 800 800 800 800 800 800 649
10 50 623 584 718 482 467 514 800 800 800 718 800 372
10 100 800 800 800 800 800 800 800 800 800 800 800 800
20 100 675 627 775 497 473 571 800 775 800 800 800 381
TABLE VIII
r √ J
DGP: Xit =
F
j=1 ij tj + 6e it ; e it = 8e it−1 + vit + j=−J j=0 vi−jt ; r = 5; 6 = 5, 8 = 050, = 20,
J = max+N /20 10,.
N T PCp1 PCp2 PCp3 ICp1 ICp2 ICp3 AIC1 BIC1 AIC2 BIC2 AIC3 BIC3
100 40 754 692 800 643 552 800 800 800 800 800 800 414
100 60 657 593 800 568 528 800 800 800 800 800 800 439
200 60 652 615 797 600 591 784 800 797 800 800 800 468
500 60 616 597 712 540 530 592 800 712 800 800 800 476
1000 60 571 556 620 503 502 508 800 620 800 800 800 476
2000 60 533 526 561 500 500 500 800 561 800 800 800 469
100 100 598 571 800 572 527 800 800 800 800 800 800 480
200 100 601 600 795 600 599 778 800 795 800 800 800 503
500 100 589 581 606 559 546 594 800 606 800 800 800 500
1000 100 513 509 537 501 501 509 800 537 800 800 800 500
2000 100 500 500 500 500 500 500 800 500 800 800 800 500
40 100 588 546 755 507 493 657 800 800 800 755 800 376
60 100 584 545 796 524 505 779 800 800 800 796 800 425
60 200 567 544 599 520 507 583 800 800 800 599 800 442
60 500 559 547 588 513 508 548 800 800 800 588 791 450
60 1000 561 554 581 513 508 534 800 800 691 581 615 440
60 2000 564 560 574 511 508 522 800 800 600 574 600 427
4000 60 512 510 524 500 500 500 800 524 800 800 800 456
4000 100 500 500 500 500 500 500 800 500 800 800 800 500
8000 60 505 505 508 500 500 500 800 508 800 800 800 437
8000 100 500 500 500 500 500 500 800 500 800 800 800 500
60 4000 563 561 570 507 506 512 800 800 600 570 600 404
100 4000 600 600 600 600 600 600 800 800 644 600 617 500
60 8000 563 562 568 506 505 507 800 800 600 568 600 383
100 8000 600 600 600 600 600 600 800 800 608 600 602 500
100 10 800 800 800 800 800 800 800 800 800 800 800 754
100 20 800 800 800 800 800 800 800 800 800 800 800 685
10 50 734 687 793 484 437 682 800 800 800 793 800 341
10 100 800 800 800 800 800 800 800 800 800 800 800 800
20 100 800 800 800 799 784 800 800 800 800 800 800 454
of the estimated common factors and common components (i.e.,
ˆ i Ft ). But using
Theorem 1, it may be possible to obtain these limiting distributions. For example,
the rate of convergence of Ft derived in this paper could be used to examine the
statistical property of the forecast ŷT +1T in Stock and Watson’s √
framework. It
would be useful to show that ŷT +1T is not only a consistent but a T consistent
estimator of yT +1 , conditional on the information up to time T (provided that N
is of no smaller order of magnitude than T ). Additional asymptotic results are
currently being investigated by the authors.
The foregoing analysis has assumed a static relationship between the observed
data and the factors. Our model allows Ft to be a dependent process, e.g,
ALFt = t , where AL is a polynomial matrix of the lag operator. However,
we do not consider the case in which the dynamics enter into Xt directly. If the
212 jushan bai and serena ng
method developed in this paper is applied to such a dynamic model, the esti-
mated number of factors gives an upper bound of the true number of factors.
Consider the data generating process Xit = ai Ft + bi Ft−1 + eit . From the dynamic
point of view, there is only one factor. The static approach treats the model as
having two factors, unless the factor loading matrix has a rank of one.
The literature on dynamic factor models is growing. Assuming N is fixed,
Sargent and Sims (1977) and Geweke (1977) extended the static strict factor
model to allow for dynamics. Stock and Watson (1998) suggested how dynamics
can be introduced into factor models when both N and T are large, although
their empirical applications assumed a static factor structure. Forni et al. (2000a)
further allowed Xit to also depend on the leads of the factors and proposed a
graphic approach for estimating the number of factors. However, determining
the number of factors in a dynamic setting is a complex issue. We hope that
the ideas and methodology introduced in this paper will shed light on a formal
treatment of this problem.
APPENDIX
Lemma 1: Under Assumptions A–C, we have for some M1 < , and for all N and T ,
T
T
(i) T −1 )N s t2 ≤ M1 ,
s=1 t=1
2
T T
N
(ii) E T −1 N −1/2 et 0 2 = E T −1 N
−1/2
eit
0i
≤ M1
t=1 t=1 i=1
2
−2
T
T
−1
N
(iii) E T N Xit Xis ≤ M1 ,
t=1 s=1 i=1
N T
(iv) E
N T −1/2
e 0
it i ≤ M1 .
i=1 t=1
Proof: Consider (i). Let 8s t = )N s t/)N s s)N t t1/2 . Then 8s t ≤ 1. From
)N s s ≤ M,
T
T
T
T
T −1 )N s t2 = T −1 )N s s)N t t8s t2
s=1 t=1 s=1 t=1
T
T
≤ MT −1 )N s s)N t t1/2 8s t
s=1 t=1
T
T
= MT −1 )N s t ≤ M 2
s=1 t=1
approximate factor models 213
by Assumptions B and C3. For (iii), it is sufficient to prove EXit 4 ≤ M1 for all i t. Now EXit 4 ≤
8E
0i Ft0 4 + 8Eeit 4 ≤ 8
¯ 4 EFt0 4 + 8Eeit 4 ≤ M1 for some M1 by Assumptions A, B, and C1. Finally
for (iv),
2
N T 1 N N T T
E N T −1/2
e 0
it i =
Eeit ejs
0i
0j
i=1 t=1
N T i=1 j=1 t=1 s=1
1 N N T T
≤
¯ 2 * ≤
¯ 2 M
N T i=1 j=1 t=1 s=1 ij ts
by Assumption C4.
k 0 0 0
F F /T /N , we have
T
T
T
T
Ftk − H k Ft0 = T −1 F sk )N s t + T −1 F sk >st + T −1 F sk ?st + T −1 F sk @st where
s=1 s=1 s=1 s=1
es et
>st = − )N s t
N
?st = Fs0 0 et /N
@st = Ft0 0 es /N = ?ts
Note that H k depends on N and T . Throughout, we will suppress this dependence to simplify the
notation. We also note that H k = Op 1 because H k ≤ F k F k /T 1/2 F 0 F 0 /T 1/2 0 /N and
T 2
k
at = T −2 F )N s t
s
s=1
T
k 2
bt = T −2 >st
F
s
s=1
T
k 2
ct = T −2 ?st
F
s
s=1
T
k 2
dt = T −2 @st
F
s
s=1
T
It follows that 1/T t=1 Ftk − H k Ft0 2 ≤ 1/T Tt=1 at + bt + ct + dt .
T k T
Now s=1 Fs )N s t ≤ s=1 F s · s=1 )N2 s t. Thus,
2 k 2 T
T T T
T 2
T −1 at ≤ T −1 T −1 F sk · T −1 )N s t2
t=1 s=1 t=1 s=1
= Op T −1
by Lemma 1(i).
214 jushan bai and serena ng
T
T
T
=T −2
F sk F uk >st >ut
t=1 s=1 u=1
T 2 1/2
T
T
k k 2 1/2 −2
T T
≤ T −2 F s F u T >st >ut
s=1 u=1 s=1 u=1 t=1
T 2 1/2
2 T
T T
≤ T −1 F sk · T −2 >st >ut
s=1 s=1 u=1 t=1
T 2
T T 2 4
From E t=1 >st >ut = E t=1 v=1 >st >ut >sv >uv ≤ T max s t E>st and
1 N 4
E>st 4 = 2 E N −1/2 eit eis − Eeit eis ≤ N −2 M
N i=1
= N −2 et 0 2 Op 1
It follows that
T 0 2
T
T −1 ct = Op 1N −1 T −1 e√t = Op N −1
N
t=1 t=1
by Lemma 1(ii). The term dt = Op N −1 can be proved similarly. Combining these results, we have
T −1 Tt=1 at + bt + ct + dt = Op N −1 + Op T −1 .
To prove Theorem 2, we need additional results.
Proof: For the true factor matrix with r factors and H k defined in Theorem 1, let MF0H =
I − PF0H denote the idempotent matrix spanned by null space of F 0 H k . Correspondingly, let MFk =
IT − Fk Fk Fk −1 Fk = I − PFk . Then
N
V k Fk = N −1 T −1 X i MFk X i
i=1
N
V k F 0 H k = N −1 T −1 X i MF0H X i
t=1
N
V k Fk − V k F 0 H k = N −1 T −1 X i PF0H − PFk X i
i=1
approximate factor models 215
N
Thus, N −1 T −1 i=1 X i PFk − PF0H X i = I + II + III + IV . We consider each term in turn.
N
T
T
I = N −1 T −2 Ftk − H k Ft0 Dk−1 Fsk − H k Fs0 Xit Xis
i=1 t=1 s=1
1/2
T 2 1/2
T
T
T
N
≤ T −2 Ftk − H k Ft0 Dk−1 Fsk − H k Fs0 2 · T −2 N −1 Xit Xis
t=1 s=1 t=1 s=1 i=1
T
≤ T −1 Ftk − H k Ft0 2 · Dk−1 · Op 1 = Op CN−2T
t=1
by Theorem 1 and Lemma 1(iii). We used the fact that Dk−1 = Op 1, which is proved below.
N
T
T
II = N −1 T −2 Ftk −H k Ft0 Dk−1 H k Fs0 Xit Xis
i=1 t=1 s=1
1/2
T 2 1/2
T
T
T
N
≤ T −2 Ftk −H k Ft0 2 ·H k Fs0 2 ·Dk−1 2 · T −2 N −1 Xit Xis
t=1 s=1 t=1 s=1 i=1
1/2 1/2
T
T
≤ T −1 Ftk −H k Ft0 2 ·Dk−1 · T −1 H k Fs0 2 ·Op 1
t=1 s=1
1/2
T
= T −1 Ftk −H k Ft0 2 ·Op 1 = Op CN−1T
t=1
N
T
T
IV = N −1 T −2 Ft0 H k Dk−1 − D0−1 H k Fs0 Xit Xis
i=1 t=1 s=1
2
N
T
≤ Dk−1 − D0−1 N −1 T −1 H k Ft0 · Xit
i=1 t=1
where Op 1 is obtained because the term is bounded by H k 2 1/T Tt=1 Ft0 2 1/N T ×
N T 2 2
i=1 t=1 Xit , which is Op 1 by Assumption A and EXit ≤ M. Next, we prove that Dk − D0 =
216 jushan bai and serena ng
Op CN−1T . From
Fk Fk H k F 0 F 0 H k
Dk − D 0 = −
T T
T
k k
=T −1
Ft Ft − H k Ft0 Ft0 H k
t=1
T
=T −1
Ftk − H k Ft0 Ftk − H k Ft0
t=1
T
T
+ T −1 Ftk − H k Ft0 Ft0 H k + T −1 H k Ft0 Ftk − H k Ft0
t=1 t=1
1/2 1/2
T
T
T
Dk − D0 ≤ T −1 Ftk − H k Ft0 2 + 2 T −1 Ftk − H k Ft0 2 · T −1 H k Ft0 2
t=1 t=1 t=1
Lemma 3: For the matrix H k defined in Theorem 1, and for each k with k < r, there exists a *k > 0
such that
Proof:
N
V k F 0 H k − V r F 0 = N −1 T −1 X i PF0 − PF0H X i
i
N
= N −1 T −1 F 0
0i + ei PF0 − PF0H F 0
0i + ei
i−1
N
= N −1 T −1
0i F 0 PF0 − PF0H F 0
0i
i=1
N
+ 2N −1 T −1 ei PF0 − PF0H F 0
0i
i=1
N
+ N −1 T −1 ei PF0 − PF0H ei
i=1
= I + II + III
0 0
First, note that P − P ≥ 0. Hence, III ≥ 0. For the first two terms,
F FH
N
I = tr T −1 F 0 PF0 − PF0H F 0 N −1
0i
0i
i=1
0 0
k
−1 k 0 0
F F F F H0 0 k
H F 0 F 0H k H F F N
= tr − · N −1
0i
0i
T T T T i=1
−1
→ tr F − F H0k H0k k
F H0 H0k F · D
= trA·D
approximate factor models 217
where A = F − F H0k H0k F H0k −1 H0k F and H0k is the limit of H k with rankH0k = k < r. Now
A = 0 because rank F = r (Assumption A). Also, A is positive semi-definite and D > 0 (Assumption
B). This implies that trA · D > 0.
Remark: Stock and Watson (1998) studied the limit of H k . The convergence of H k to H0k holds
jointly in T and N and does not require any restriction between T and N .
Now
N
N
II = 2N −1 T −1 ei PF0 F 0
0i − 2N −1 T −1 ei PF0H F 0
0i
i=1 i=1
−1 −1
N −1 −1
N
T
0 0
N T e 0 0 0
P F
= N T e F
i F i it t i
i=1 i=1 t=1
1/2 T
2 1/2
T
1 1 N
≤ T −1 Ft0 2 ·√ T −1 √
N e
0
it i
t=1 N t=1 i=1
1
= Op √
N
√
√ equality follows from Lemma 1(ii). The second term is also Op 1/ N , and hence II =
The last
Op 1/ N → 0.
Lemma 4: For any fixed k with k ≥ r, V k Fk − V r Fr = Op CN−2T ).
Proof:
V k Fk − V r Fr ≤ V k Fk − V r F 0 + V r F 0 − V r Fr
Let H k be as defined in Theorem 1, now with rank r because k ≥ r. Let H k+ be the generalized
inverse of H k such that H k H k+ = Ir . From X i = F 0
0i + ei , we have X i = F 0 H k H k+
0i + ei . This
implies
X i = Fk H k+ 0i + ei − Fk − F 0 H k H k+ 0i
= Fk H k+ 0i + ui
where ui = ei − Fk − F 0 H k H k+
0i .
218 jushan bai and serena ng
Note that
N
V k Fk = N −1 T −1 ui MFk ui
i=1
N
V r F 0 = N −1 T −1 ei MF0 ei
i=1
N
V k Fk = N −1 T −1 ei − Fk − F 0 H k H k+
0i MFk ei − Fk − F 0 H k H k+
0i
i=1
N
N
= N −1 T −1 ei MFk ei − 2N −1 T −1
0i H k+ Fk − F 0 H k MFk ei
i=1 i=1
N
0
+ N −1 T −1
H k+ Fk − F 0 H k MFk Fk − F 0 H k H k+
0i
i
i=1
= a + b + c
N
c ≤ N −1 T −1
0i H k+ Fk − F 0 H k Fk − F 0 H k H k+
0i
i=1
T
N
≤T −1
Ftk − H k Ft0 2 · N −1
0i 2 H k+ 2
t=1 i=1
= Op CN−2T · Op 1
by Theorem 1. For term b, we use the fact that trA ≤ rA for any r × r matrix A. Thus
N
b = 2T −1 tr H k+ Fk − F 0 H k MFk N −1 ei
0i
i=1
k
F − F 0 H k 1 N
≤ 2rH k+ ·
√ ·√
e i
0
i
T T N i=1
T
2 1/2
T 1/2
1 1 N
≤ 2rH k+ · T −1 Ftk − H k Ft0 2 ·√ √1 e
0
N T N i=1
it i
t=1 t=1
1
= Op CN−1T · √ = Op CN−2T
N
by Theorem 1 and Lemma 1(ii). Therefore,
N
V k Fk = N −1 T −1 ei MFk ei + Op CN−2T
i=1
N
2
N
= Op 1T −1 N −1 T
−1/2
Ft0 eit −1 −2
= Op T ≤ Op CN T
i=1 t=1
approximate factor models 219
by Assumption D. Thus
N
0 ≥ N −1 T −1 ei PFk ei + Op CN−2T
i=1
N
This implies that 0 ≤ N −1 T −1 i=1 ei PFk ei = Op CN−2T . In summary
Proof of Theorem 2: We shall prove that limN T → P PCk < PCr = 0 for all k = r and
k ≤ kmax. Since
it is sufficient to prove P V k Fk − V r Fr < r − kgN T → 0 as N T → . Consider k < r.
We have the identity:
Lemma 2 implies that the first and the third terms are both Op CN−1T . Next, consider the second
term. Because F 0 H r and F 0 span the same column space, V r F 0 H r = V r F 0 . Thus the second
term can be rewritten as V k F 0 H k − V r F 0 , which has a positive limit by Lemma 3. Hence,
P PCk < PCr → 0 if gN T → 0 as N T → . Next, for k ≥ r,
P PCk − PCr < 0 = P V r Fr − V k Fk > k − rgN T
By Lemma 4, V r Fr − V k Fk = Op CN−2T . For k > r k − rgN T ≥ gN T , which converges
to zero at a slower rate than CN−2T . Thus for k > r P PCk < PCr → 0 as N T → .
For k < r, Lemmas 2 and 3 imply that V k/V r > 1 + 0 for some 0 > 0 with large probability
for all large N and T . Thus lnV k/V r ≥ 0 /2 for large N and T . Because gN T → 0, we
have ICk − ICr ≥ 0 /2 − r − kgN T ≥ 0 /3 for large N and T with large probability. Thus,
P ICk − ICr < 0 → 0. Next, consider k > r. Lemma 4 implies that V k/V r = 1 + Op CN−2T .
Thus lnV k/V r = Op CN−2T . Because k − rgN T ≥ gN T , which converges to zero at a
slower rate than CN−2T , it follows that
P ICk − ICr < 0 ≤ P Op CN−2T + gN T < 0 → 0
Proof of Corollary 2: Theorem 2 is based on Lemmas 2, 3, and 4. Lemmas 2 and 3 are still
valid with F k replaced by G k and CN T replaced by C
N T . This is because their proof only uses the
convergence rate of Ft given in (5), which is replaced by (8). But the proof of Lemma 4 does make
use of the principle component property of Fk such that V k Fk − V r F 0 ≤ 0 for k ≥ r, which is
not necessarily true for G k . We shall prove that Lemma 4 still holds when Fk is replaced by G k and
CN T is replaced by C N T . That is, for k ≥ r,
−2
(12) k − V r G
V k G
r = Op C
NT
(14) k ≤ V r G
V k Fk ≤ V k G r
The first inequality follows from the definition that the principal component estimator gives the
smallest sum of squared residuals, and the second inequality follows from the least squares property
that adding more regressors does not increase the sum of squared residuals. Because C 2 ≤ C 2 , we
NT NT
can rewrite (10) as
−2
(15)
V k Fk − V r F 0 = Op C NT
then (14), (15), and (16) imply (13). To prove (16), we follow the same arguments as in the proof of
Lemma 4 to obtain
r − V r F 0 = 1 N
1 N
−2
V r G ei PGr ei − e P 0 e + Op C
N T i=1 N T i=1 i F i NT
r G
where PGr = G r G
r −1 G
r ; see (11). Because the second term on the right-hand side is shown in
−1
Lemma 4 to be Op T , it suffices to prove the first term is Op C −2 . Now,
NT
N
1 N
r /T −1 1
r G
r /T 2
e G
e P r e ≤ G
N T i=1 i G i N i=1 i
Because H r is of full rank, we have Ĝr Ĝr /T −1 = Op 1 (follows from the same arguments in
proving Dk−1 = Op 1). Next,
N 2 T
1 1 N T r 2 N T
1
e Ĝr /T 2 ≤ √1 F 0
e H + 1 eit2 G r F 0 2
r − H
N i=1 i
N T i=1 T t=1
t it
N T i=1 t=1 T t=1 t t
−2 = Op C
= Op T −1 Op 1+Op 1Op C −2
NT NT
by Assumption D and (8). This completes the proof of (16) and hence Corollary 2.
REFERENCES
Cragg, J., and S. Donald (1997): “Inferring the Rank of a Matrix,” Journal of Econometrics, 76,
223–250.
Dhrymes, P. J., I. Friend, and N. B. Glutekin (1984): “A Critical Reexamination of the Empir-
ical Evidence on the Arbitrage Pricing Theory,” Journal of Finance, 39, 323–346.
Donald, S. (1997): “Inference Concerning the Number of Factors in a Multivariate Nonparameteric
Relationship,” Econometrica, 65, 103–132.
Forni, M., M. Hallin, M. Lippi, and L. Reichlin (2000a): “The Generalized Dynamic Factor
Model: Identification and Estimation,” Review of Economics and Statistics, 82, 540–554.
(2000b): “Reference Cycles: The NBER Methodology Revisited,” CEPR Discussion Paper
2400.
Forni, M., and M. Lippi (1997): Aggregation and the Microfoundations of Dynamic Macroeconomics.
Oxford, U.K.: Oxford University Press.
(2000): “The Generalized Dynamic Factor Model: Representation Theory,” Mimeo, Univer-
sitá di Modena.
Forni, M., and L. Reichlin (1998): “Let’s Get Real: a Factor-Analytic Approach to Disaggregated
Business Cycle Dynamics,” Review of Economic Studies, 65, 453–473.
Geweke, J. (1977): “The Dynamic Factor Analysis of Economic Time Series,” in Latent Variables in
Socio Economic Models, ed. by D. J. Aigner and A. S. Goldberger. Amsterdam: North Holland.
Geweke, J., and R. Meese (1981): “Estimating Regression Models of Finite but Unknown Order,”
International Economic Review, 23, 55–70.
Ghysels, E., and S. Ng (1998): “A Semi-parametric Factor Model for Interest Rates and Spreads,”
Review of Economics and Statistics, 80, 489–502.
Gregory, A., and A. Head (1999): “Common and Country-Specific Fluctuations in Productivity,
Investment, and the Current Account,” Journal of Monetary Economics, 44, 423–452.
Gregory, A., A. Head, and J. Raynauld (1997): “Measuring World Business Cycles,” Interna-
tional Economic Review, 38, 677–701.
Lehmann, B. N., and D. Modest (1988): “The Empirical Foundations of the Arbitrage Pricing
Theory,” Journal of Financial Economics, 21, 213–254.
Lewbel, A. (1991): “The Rank of Demand Systems: Theory and Nonparametric Estimation,” Econo-
metrica, 59, 711–730.
Mallows, C. L. (1973): “Some Comments on Cp ,” Technometrics, 15, 661–675.
Ross, S. (1976): “The Arbitrage Theory of Capital Asset Pricing,” Journal of Finance, 13, 341–360.
Rubin, D. B., and D. T. Thayer (1982): “EM Algorithms for ML Factor Analysis,” Psychometrika,
57, 69–76.
Sargent, T., and C. Sims (1977): “Business Cycle Modelling without Pretending to Have too
much a Priori Economic Theory,” in New Methods in Business Cycle Research, ed. by C. Sims.
Minneapolis: Federal Reserve Bank of Minneapolis.
Schwert, G. W. (1989): “Tests for Unit Roots: A Monte Carlo Investigation,” Journal of Business
and Economic Statistics, 7, 147–160.
Stock, J. H., and M. Watson (1989): “New Indexes of Coincident and Leading Economic Indica-
tions,” in NBER Macroeconomics Annual 1989, ed. by O. J. Blanchard and S. Fischer. Cambridge:
M.I.T. Press.
(1998): “Diffusion Indexes,” NBER Working Paper 6702.
(1999): “Forecasting Inflation,” Journal of Monetary Economics, 44, 293–335.