0% found this document useful (0 votes)

18 views38 pages

Factor Models For Matrix-Valued High-Dimensional T

The document presents a novel factor model for analyzing matrix-valued high-dimensional time series data, which is often encountered in finance and economics. This model retains the matrix structure, allowing for better dimensional reduction and clearer interpretations of factor relationships among variables. The authors explore estimation procedures, theoretical properties, and provide simulations and real data examples to illustrate their approach.

Uploaded by

qiuyihuang1999

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views38 pages

Factor Models For Matrix-Valued High-Dimensional T

Uploaded by

qiuyihuang1999

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/308913127

Factor Models for Matrix-Valued High-Dimensional Time Series

Article in Journal of Econometrics · October 2016

DOI: 10.1016/j.jeconom.2018.09.013

CITATIONS READS
93 521

3 authors, including:

Xialu Liu Rong Chen

San Diego State University Rutgers, The State University of New Jersey
12 PUBLICATIONS 193 CITATIONS 98 PUBLICATIONS 6,528 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Xialu Liu on 30 July 2019.

The user has requested enhancement of the downloaded file.

Factor Models for Matrix-Valued High-Dimensional
∗
Time Series
Dong Wang
Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544
arXiv:1610.01889v2 [stat.ME] 21 Jun 2017

Xialu Liu
Management Information Systems Department, San Diego State University, San Diego, CA 92182
Rong Chen
Department of Statistics, Rutgers University, Piscataway, NJ 08854

June, 2017

Abstract

In finance, economics and many other fields, observations in a matrix form are often ob-
served over time. For example, many economic indicators are obtained in different countries
over time. Various financial characteristics of many companies are reported over time. Al-
though it is natural to turn a matrix observation into a long vector then use standard vector
time series models or factor analysis, it is often the case that the columns and rows of a matrix
represent different sets of information that are closely interrelated in a very structural way.
We propose a novel factor model that maintains and utilizes the matrix structure to achieve
greater dimensional reduction as well as finding clearer and more interpretable factor struc-
tures. Estimation procedure and its theoretical properties are investigated and demonstrated
with simulated and real examples.

1 Introduction
Time series analysis is widely used in many applications. Univariate time series, when one ob-
serves one variable through time, is well studied, with linear models (e.g. Box and Jenkins, 1976;
Brockwell and Davis, 1991; Tsay, 2005), nonlinear models (e.g. Engle, 1982; Bollerslev, 1986;
Tong, 1990), and nonparametric models (e.g. Fan and Yao, 2003). Multivariate time series and
panel time series, when one observes a vector or a panel of variables through time, is also a long
studied but still active field (e.g. Tiao and Box, 1981; Tiao and Tsay, 1989; Engle and Kroner,
1995; Stock and Watson, 2004; Lütkepohl, 2005; Tsay, 2014, and others). Such analysis not only
∗
Chen’s research was supported in part by National Science Foundation grants DMS-1503409 and DMS-1209085.
Corresponding author: Rong Chen, Department of Statistics, Rutgers University, Piscataway, NJ 08854, USA.
Email: [email protected].

1
US Japan ··· China
GDP Xt,11 Xt,12 ··· Xt,1p
Unemployment Xt,21 Xt,22 ··· Xt,2p
Inflation Xt,31 Xt,32 ··· Xt,3p
Payout Ratio Xt,41 Xt,42 ··· Xt,4p

Table 1: Illustration of a matrix-valued time series

reveals the temporal dynamics of the time series, but also explores the relationship among a
group of time series, using the available information more fully. Often, the investigation of the
relationship among the time series is the objective of the study.
Matrix-valued time series, when one observes a group of variables structured in a well defined
matrix form over time, has not been studied. Such a time series is encountered in many appli-
cations. For example, in economics, countries routinely report a set of economic indicators (e.g.
GDP growth, unemployment rate, inflation index and others) every quarter. Table 1 depicts such
a matrix-valued time series. One can concentrate on one cell in Table 1, say US Unemployment
rate series {Xt,21 , t = 1, 2 . . .} and build a univariate time series model. Or one can concentrate
on one column in Table 1, say, all economic indicators of US {(Xt,11 , . . . , Xt,41 )0 } and study it
as a vector time series. Similarly, if one is interested in modeling GDP growth of the group of
countries, a panel time series model can be built for the first row {(Xt,11 , . . . , Xt,1p )} in Table 1.
However, there are certainly relationships among all variables in the table and the matrix struc-
ture is extremely important. For example, the variables in the same column (same country) would
have stronger inter-relationship. Same for the variables in the same row (same indicator). Hence
it is important to analyze the entire group of variables while fully preserve and utilize its matrix
structure.
There are many other examples. Investors may be interested in the time series of a group of
financials (e.g. asset/equity ratio, dividend per share, and revenue) for a group of companies, the
import-export volume among a group of countries, pollution and environmental variables (e.g.
PM2.5, ozone level, temperature, moisture, wind speed, etc) observed at a group of stations. In
this article we study such a matrix-valued time series.
Matrix-valued data has been studied (e.g. Gupta and Nagar, 2000; Kollo and von Rosen, 2006;
Werner et al., 2008; Leng and Tang, 2012; Yin and Li, 2012; Zhao and Leng, 2014; Zhou, 2014;
Zhou and Li, 2014). Their study mainly focuses on independent observations. The concept of
matrix-valued time series was introduced by Walden and Serroukh (2002), applied in signal and
image processing. Still, the temporal dependence of the time series was not fully exploited for
model building.
In this article, we focus on high-dimensional matrix-valued time series data. In cases, we
may allow the dimensions of the matrix to be as large as, or even larger than the length of the
observations. A well-known issue often accompanying with high-dimensional data is the curse
of dimensionality. We adopt a factor model approach. Factor analysis can effectively reduce

2
the number of parameters involved, and is a powerful statistical approach to extracting hidden
driving processes, or latent factor processes, from an observed stochastic process. In the past
decades, factor models for high-dimensional time series data have drawn great attention from
both econometricians and statisticians (e.g. Chamberlain and Rothschild, 1983; Forni et al., 2000;
Bai and Ng, 2002; Hallin and Liška, 2007; Pan and Yao, 2008; Lam et al., 2011; Fan et al., 2011;
Lam and Yao, 2012; Fan et al., 2013; Chang et al., 2015; Liu and Chen, 2016).
With the above observations and motivations, in this article, we aim to develop factor models
for matrix-valued time series, which fully explore the matrix structure. The rest of this article is
organized as follows. In Section 2, detailed model settings are introduced and interpretations are
discussed in detail. Section 3 presents an estimation procedure. The theoretical properties of the
estimators are also studied. Simulation results are shown in Section 4 and two real data examples
are given in Sections 5 and 6. Section 7 provides a brief summary. All proofs are in Appendix.

2 Matrix Factor Models

Let X t (t = 1, . . . , T ) be a matrix-valued time series, where each X t is a matrix of size p1 × p2 ,
 
Xt,11 · · · Xt,1p2
 .. .. .. 
Xt =  . . . .

Xt,p1 1 · · · Xt,p1 p2

We propose the following factor model for matrix-valued time series,

X t = RF t C 0 + E t , t = 1, 2, . . . , T. (1)

Here, F t is a k1 × k2 unobserved matrix-valued time series of common fundamental factors, R

is a p1 × k1 front loading matrix, C is a p2 × k2 back loading matrix, and E t is a p1 × p2 error
matrix. In model (1), the common fundamental factors F t ’s drive all dynamics and co-movement
of X t . R and C reflect the importance of common factors and their interactions.
Similar to multivariate factor models, we assume that the matrix-valued time series is driven
by a few latent factors. Unlike the classical factor model, the factors F t ’s in model (1) are assumed
to be organized in a matrix form. Correspondingly, we adopt two loading matrices R and C to
capture the dependency between each individual time series in the matrix observations and the
matrix factors. In the following we provide two interpretations of the loading matrices. We first
introduce some notation. For a matrix A, we use ai· and aj to represent the i-th row and the
j-th column of A, respectively, and Aij to denote the ij-th element of A.

Interpretation I: To isolate effects, assume k1 = p1 and R = I p1 , then X t = F t C 0 + E t . In this

case, each column of X t is a linear combination of the columns of F t . Take the example shown
in Table 1 and consider the first column of X t (the US economic indicators),

3
US f t,1 ... f t,k2
     
GDP F-GDP F-GDP
     

 Unem  
 = C11  F-Unem 
+ · · · + C1k 
 F-Unem 
 + et,U S .
2 

 Inf 


 F-Inf 
  F-Inf 

PayR F-PayR F-PayR
t t t
It is seen that the US GDP only depends on the first row of F t . Similarly, other countries’ GDP
also only depends on the first row of F t . Hence we can view the first row of F t as the GDP
factors. Similarly, the second row of F t can be considered as the unemployment factors. There
is no interaction between the indicators in this setting (when R = I). The loading matrix C
reflects how each country (column of X t ) depends on the columns of F t , hence reflects column
interactions, or the interactions between the countries. Because of this, we will call C the column
loading matrix.
Similarly, the rows of F t can be viewed as common factors of all rows of X t , and the front
loading matrix R as row loading matrix. Again, assume k2 = p2 and C = I p2 , it follows that
X t = RF t + E t . Then each row of X t is a linear combination of the rows of F t . Consider the
first row of X t ,

US Japan ... China US Japan ... China

(GDP, GDP, . . . , GDP)t = R11 (F-US, F-Japan, . . . , F-China)t f t,1·
+R12 (F-US, F-Japan, . . . , F-China)t f t,2·
..
+··· .
+R1k1 (F-US, F-Japan, . . . , F-China)t f t,k1 ·
+et,GDP · .

It is seen that all economic movements (of each country) are driven by k1 (row) common factors.
For example, every US’s indicator depends on only the first column of F t . Hence the first column
of F t can be viewed as the US factor. And the second column of F t can be viewed as Japan
factor. The loading matrix R reflects how each indicator depends on the rows of F t . It reflects
row interactions, the interactions between the indicators within each country. Because of this, we
will call R the row loading matrix.
Obviously column and row interaction would be of interests and of importance. One way to
introduce interaction is to assume an additive structure, by combining the column and row factor
models
X t = RF 1t + F 2t C 0 + E t , t = 1, 2, . . . , T.
However, the number of factors in this model is large (k1 × p2 + p1 × k2 ). A more parsimonious
model would be a direct interaction as in model (1). In this case the number of factors is only
k1 × k2 .

Interpretation II: We can view the model (1) as a two-step hierarchical model.

4
Step 1: For each fixed row i = 1, 2, . . . , p1 , using data {xt,i· , t = 1, 2, . . . , T }, we can find a
p2 × k2 dimensional loading matrix C (i) and k2 dimensional factors {g t,i· = (Gt,i1 , . . . , Gt,ik2 ), t =
1, 2, . . . , T } under a standard vector factor model setting. That is,
0
(Xt,i1 , . . . , Xt,ip2 ) = (Gt,i1 , . . . , Gt,ik2 )C (i) + (Ht,i1 , . . . , Ht,ip2 ), t = 1, 2, . . . , T.

Let Gt be the p1 × k2 matrix formed with p1 rows of g t,i· . Also denote H t as the p1 × p2 error
matrix formed with the rows of {ht,i· = (Ht,i1 . . . , Ht,ip2 )}.
Step 2: Suppose each column j = 1, 2, . . . , k2 of the assembled factor matrix Gt obtained in
Step 1 also assumes the factor structure, with a p1 × k1 loading matrix R(j) and a k1 dimensional
factor f t,j . That is,
    

Gt,1j Ft,1j ∗
Ht,1j
 .
..
 (j)
 .
 = R  ..
 
+ .. 
,

     .  t = 1, 2, . . . , T.
Gt,p1 j Ft,k1 j ∗
Ht,p 1j

This step reveals the common factors that drive the co-moments in Gt . Let F t be the k1 × k2
matrix formed with the columns f t,j . And let H ∗t be the p1 ×k2 error matrix formed with columns
{h∗t,j = (Ht,1j
∗ , . . . , H∗ 0
t,p1 j ) }.
Step 3: Assembly: With the above two-step factor analysis and notation, assume R(1) = . . . =
R(k2 ) = R and C (1) = . . . = C (p1 ) = C, we have
0
X t = Gt C + H t and Gt = RF t + H ∗t .

Hence
0 0 0
X t = RF t C + H ∗t C + H t = RF t C + E t ,
0
where E t = H ∗t C + H t . It is identical to (1).

Here we provide some additional remarks of model (1).

Remark 1: Let vec(·) be the vectorization operator, i.e., vec(·) converts a matrix to a vector by
stacking columns of the matrix on top of each other. The classical factor analysis treats vec(X t )
as the observations, and a factor model is in the form of

vec(X t ) = Φft + et , t = 1, 2, . . . , T, (2)

where Φ is a p1 p2 × k loading matrix, ft of length k is the latent factor, et is the error term, and
k is the total number of factors. On the other hand, note that model (1) can be re-written as

vec(X t ) = (C ⊗ R)vec(F t ) + vec(E t ). (3)

Assume k = k1 k2 . Then model (3) is a special case of model (2), with a Kronecker product
structured loading matrix. Hence model (1) is a restricted version of model (2), assuming a
special structure for the loading spaces. The number of parameters for the loading matrix Φ in

5
model (2) is (p1 k1 ) × (p2 k2 ) whereas it is p1 k1 + p2 k2 for the loading matrices R and C in model
(1). Therefore, model (1) significantly reduces the dimension of the problem.
Remark 2: Interpretation II also reveals the reduction in the number of factors comparing to
using factor models for each column panel or row panel. Note that, if one ignores the intercon-
nection between the rows and obtain individual factor models for each row, as in Step 1, the total
number of factors is p1 × k2 . These factors may have connections across rows. Step 2 exploits
such correlations and uses another factor model to reduce the number of factors from p1 × k2 to
k1 × k2 .
Remark 3: We also observed that in practice, the total number of factors used in our model
may be larger than the number of factors needed in the vectorized factor model (2). This is
possible since the vectorized model simultaneously exploits common driving features in all series,
while the matrix factor model does it by working on the row vectors separately first (Step 1), then
condensing them by the columns (Step 2). Such a two-step approach may result in redundancy
(highly correlated factors) which may be further simplified. Because we are forcing the factors to
assume a neat matrix structure, it is difficult to have simplifications such as having one or several
elements in the factor matrix F t be constant zero. Since k1 and k2 are usually small, we will
tolerate such redundancy. One extension is to assume that the factor matrix F t , after a certain
rotation, has a block diagonal structure, resulting in a multi-term factor model
s
0
X
Xt = Ri F it C i + E t , t = 1, 2, . . . , T, (4)
i=1

where F it is a ki1 × ki2 factor matrix, and si=1 ki1 = k1 and si=1 ki2 = k2 . This will reduce
P P

the number of factors from k1 × k2 to si=1 ki1 ki2 , with corresponding dimension reduction in the
P

loading matrices as well. We are currently investigating the properties and estimation procedures
of such a multi-term factor model.
Remark 4: As in all factor model setting, the properties or assumptions on the observed
process X t are inferred from the assumptions on the factors and the noise processes, since the
observed series are assumed to be linear combinations of the factor processes plus the noise process.
Indirectly, we assume that all autocovariance matrices of lag h ≥ 1 of all series lie in a structured
k1 k2 ×k1 k2 space, but no assumption on the contemporary covariance matrix, as we do not assume
any contemporary covariance structure on the error E t .
Remark 5: Similar models as model (1) have been proposed and studied when conducting
principal component analysis on matrix-valued data (e.g. Paatero and Tapper, 1994; Yang et al.,
2004; Ye, 2005; Ding and Ye, 2005; Zhang and Zhou, 2005; Crainiceanu et al., 2011; Wang et al.,
2016). In those studies, the matrix-valued observations X t are assumed to be independent, and
they primarily focused on principal component analysis. To the best of our knowledge, our paper
is the first one considering factor models for matrix-valued time series data.
In this article, we extend the methods described in Lam et al. (2011) and Lam and Yao (2012)
for vector-valued factor model (2) to matrix-valued factor model (1). We propose estimators

6
for the loading spaces and the numbers of row and column factors, investigate their theoretical
properties, and establish their convergence rates. Simulated and real examples are presented to
illustrate the performance of the proposed estimators, to compare the asymptotics under different
conditions with different factor strengths, and to explore interactions between row and column
factors.

3 Estimation and Modeling Procedures

Because of the latent nature of the factors, various assumptions are imposed to ‘define’ a factor.
Two common assumptions are used. One assumes that the factors must have impact on most of the
series, and weak serial dependence is allowed for the idiosyncratic noise process, see Chamberlain
and Rothschild (1983); Forni et al. (2000); Bai and Ng (2002); Hallin and Liška (2007), among
others. Another assumes that the factors should capture all dynamics of the observed process,
hence the idiosyncratic noise process has no serial dependence (but may have strong cross-sectional
dependence), see Pan and Yao (2008); Lam et al. (2011); Lam and Yao (2012); Chang et al. (2015);
Liu and Chen (2016). Here we adopt the second assumption and assume that the vectorized error
vec(E t ) is a white noise process with mean 0 and covariance matrix Σe , and is independent of the
factor process vec(F t ). For ease of presentation, we will assume that the process F t has mean 0,
and the observations X t ’s are centered and standardized through out this paper.
For the vector-valued factor model (2), it is well-known that there exists an identifiable issue
among the factors ft and the loading matrix Φ. Similar problem also arises in the proposed
matrix-valued factor model (1). Let U 1 and U 2 be two invertible matrices of sizes k1 × k1 and
k2 × k2 . Then the triplets (R, F t , C) and (RU 1 , U −1 −1 0
1 F t U 2 , CU 2 ) are equivalent under model
(1), and hence model (1) is not identifiable. However, with a similar argument as in Lam et al.
(2011) and Lam and Yao (2012), the column spaces of the loading matrices R and C are uniquely
determined. Hence, in the following, we will focus on the estimation of the column spaces of R
and C, denoted by M(R) and M(C), and referred to as row factor loading space and column
factor loading space, respectively.
We can further decompose R and C as follows,

R = Q1 W 1 , and C = Q2 W 2 ,

where Qi is a pi × ki matrix with orthonormal columns and W i is a ki × ki non-singular matrix,

for i = 1, 2. Let M(Qi ) denote the column space of Qi . Then we have M(Q1 ) = M(R) and
M(Q2 ) = M(C) . Hence, the estimation of column spaces of R and C is equivalent to the
estimation of column spaces of Q1 and Q2 .
Write
Z t = W 1 F t W 02 , t = 1, 2, . . . , T,

as a transformed latent factor process. Then, model (1) can be re-expressed as

X t = Q1 Z t Q02 + E t , t = 1, 2, . . . , T. (5)

7
Equation (5) can be viewed as another formulation of the matrix-valued factor model with or-
thonormal loading matrices. Since M(R) = M(Q1 ) and M(C) = M(Q2 ), we will perform
analysis on model (1) and (5) interchangeably whenever one is more convenient than the other.

3.1 Estimation
To estimate the matrix-valued factor model (1), we follow closely the idea of Lam et al. (2011)
and Lam and Yao (2012) in estimating vector-valued factor models. The key idea is to calculate
auto-cross-covariances of the time series then construct a Box-Ljung type of statistics in matrix.
Under the matrix factor model and white idiosyncratic noise assumption, the space spanned by
such a matrix is directly linked with the loading matrices. In what follows, we will illustrate the
method to obtain an estimate of M(R). The column space of C can be estimated in a similar
way using the transposes of X t ’s.
Let the j-th column of X t , R, C, Qi and E t be xt,j , r j , cj , q i,j and t,j , respectively. Let
r k· , ck· and q i,k· be the row vectors that denote the k-th row of R, C and Qi , respectively. Then
it follows from (1) and (5) that

xt,j = RF t c0j· + t,j = Q1 Z t q 02,j· + t,j , j = 1, 2, . . . , p2 . (6)

From the zero mean assumptions of both F t and E t , we have E(xt,j ) = 0.

Let h be a positive integer. Define
T −h
1 X
Ωzq,ij (h) = Cov(Z t q 02,i· , Z t+h q 02,j· ), (7)
T −h
t=1
T −h
1 X
Ωx,ij (h) = Cov(xt,i , xt+h,j ), (8)
T −h
t=1

for i, j = 1, 2, . . . , p2 . By plugging (6) into (8) and by the assumption that E t is white, it follows
that
Ωx,ij (h) = Q1 Ωzq,ij (h)Q01 , (9)

for h ≥ 1. For a pre-determined integer h0 , define

p2 X
h0 X
X p2
M1 = Ωx,ij (h)Ω0x,ij (h). (10)
h=1 i=1 j=1

By (9) and (10), it follows that

 
X p2 X
h0 X p2
M 1 = Q1  Ωzq,ij (h)Ω0zq,ij (h) Q01 . (11)
h=1 i=1 j=1

Suppose the matrix M 1 has rank k1 (Condition 5 in Section 3.2). From (11), we can see that
each column of M 1 is a linear combination of columns of Q1 , and thus the matrices M 1 and
Q1 have the same column spaces, that is, M(M 1 ) = M(Q1 ). It follows that the eigen-space

8
of M 1 is the same as M(Q1 ). Hence, M(Q1 ) can be estimated by the space spanned by the
eigenvectors of the sample version of M 1 . Assume that M 1 has k1 distinct nonzero eigenvalues,
and let q 1,j be the unit eigenvector corresponding to the j-th largest eigenvalue. As there are two
unit eigenvectors corresponding to each eigenvalue, we use the one with positive 10 q 1,j . We can
now uniquely define Q1 by
Q1 = (q 1,1 , q 1,2 , . . . , q 1,k1 ).

Now we construct the sample versions of these quantities and introduce the estimation proce-
dure as follows. For any positive integer h and a pre-scribed positive integer h0 , let
T −h
1 X
Ω
b x,ij (h) = xt,i x0t+h,j , (12)
T −h
t=1
h0 p 2 p2
b 0 (h).
XXX
M
c1 = Ω
b x,ij (h)Ω
x,ij (13)
h=1 i=1 j=1

Then, M(Q1 ) can be estimated by M(Q b 1 = {b

b 1 ), where Q b1,k1 }, and q
q 1,1 , . . . , q b1,1 , . . . q
b1,k1 are
the eigenvectors of M 1 corresponding to its k1 largest eigenvalues.
c
In practice, the number of row factors k1 is usually unknown. This quantity can be estimated
through a similar eigenvalue ratio estimator as described in Lam and Yao (2012). Let λ b1,1 ≥
b1,2 ≥ . . . ≥ λ
λ b1,p ≥ 0 be the ordered eigenvalues of Mc 1 . Then
1

λ
b1,i+1
k1 = arg min1≤i≤p1 /2
b .
λ
b1,i

For Q2 and k2 , they can be estimated by performing the same procedure on the transposes of
X t ’s to construct M 2 and Mc 2 . Once Qb 1 and Q
b 2 are obtained, the estimate of Z t can be found
via a general linear regression analysis, since

vec(X t ) = (Q2 ⊗ Q1 )vec(Z t ) + vec(E t ).

Together with the orthonormal properties of both Q b 1 and Q

b 2 and the properties of Kronecker
product, it follows that
Z
bt = Qb 0 X tQb 2.
1

Let S t be the dynamic signal part of X t , that is, S t = RF t C 0 = Q1 Z t Q02 . Then a natural
estimator of S t is given by,
S
bt = Q b 0 X tQ
b 1Q b 2Qb0 . (14)
1 2

Remark 6: Theoretically any h0 can be used to estimate the loading spaces, as long as one of
the Ωx,ij (h) is of full rank for i, j = 1, . . . , p2 and h = 1, . . . , h0 . Although they converge at the
same rate, the estimate from the lag where the autocorrelation maximizes is most efficient. We
demonstrate the impact of h0 in the matrix factor model setting in Section 4. As the autocorre-
lation is often at its strongest at small time lags, a relatively small h0 is usually adopted (Lam

9
et al., 2011; Chang et al., 2015; Liu and Chen, 2016). Larger h0 strengthens the signal, but also
adds more noises in the estimation of M i .
Remark 7: K-fold cross-validation procedures can be adopted for model selection between
matrix-valued factor models in (1) and vector-valued factor models in (2), and among the mod-
els with different number of factors. Specifically, we first partition the data D into k subsets
D1 , . . . , Dk , and fit a factor model with each of the D\Dk sets. Then we use the estimated load-
ing spaces, together with the data in Dk to obtain the dynamic signal process S t for Dk , and
obtain out-of-sample residuals. Residual sum of squares (RSS) of the K folds is then adopted for
model comparison. Rolling-validation which uses only the data before the block for estimation
can be used as well.

3.2 Theoretical Properties of the Estimator

In this section, we study the asymptotic properties of the estimators under the setting that all
T , p1 and p2 grow to infinity while k1 and k2 are being fixed. In the following, for any matrix
Y , we use rank(Y ), kY k2 , kY kF , kY kmin , and σj (Y ) to denote the rank, the spectral norm,
the Frobenius norm, the smallest nonzero singular value and the j-th largest singular value of Y .
When Y is a square matrix, we denote by tr(Y ), λmax (Y ) and λmin (Y ) the trace, maximum and
minimum eigenvalues of Y , respectively. We write a b when a = O(b) and b = O(a). Define
T −h
1 X
Σf (h) = Cov (vec(F t ), vec(F t+h )) , and Σe = Var(vec(E t )).
T −h
t=1

The following regularity conditions are imposed before we derive the asymptotics of the esti-
mators.

Condition 1. The vector-valued process vec(F t ) is α-mixing. Specifically, for some γ > 2, the
mixing coefficients satisfy the condition ∞ 1−2/γ < ∞, where
P
h=1 α(h)

α(h) = sup sup |P (A ∩ B) − P (A)P (B)|,

i i
A∈F−∞ ∞
,B∈Fi+h

and Fij is the σ-field generated by {vec(F t ) : i ≤ t ≤ j}.

Condition 2. Let Ft,ij be the ij-th entry of F t . For any i = 1, . . . , k1 , j = 1, . . . , k2 , and

t = 1, . . . , T , we assume that E(|Ft,ij |2γ ) ≤ C, where C is a positive constant, and γ is given in
Condition 1. In addition, there exists an 1 ≤ h ≤ h0 such that rank(Σf (h)) ≥ k, and kΣf (h)k2
O(1) σk (Σf (h)), where k = max{k1 , k2 }, as p1 and p2 go to infinity and k1 and k2 are fixed. For
1 PT −h 1 PT −h
i = 1, . . . , k1 and j = 1, . . . , k2 , T −h t=1 Cov(f t,i , f t+h,i ) 6= 0, T −h t=1 Cov(f t,j· , f t+h,j· ) 6=
0.

The latent process does not have to be stationary, but needs to satisfy the mixing condition
(Condition 1) and boundedness condition (Condition 2). They are weaker than stationarity. For
example, when a process has a deterministic seasonal variance component or with a determin-
istic regime switching mechanism, it is not stationary but mixing. We do not need to assume

10
any specific model for the latent process {F t } since we only use the eigen-analysis based on
autocovariances of the observed process at nonzero lags.
Under Condition 2, Σf (h) may not be of full rank, which indicates that it is allowed to involve
some extent of redundancy in the factors. Condition 2 also guarantees that there is no redundant
row or column in F t , and in each row or column there is at least one factor which has serial
dependence at lag h. The greater dimension reduction can be achieved by a multi-term factor
model in (4). We are currently investigating the properties and estimation procedures of such a
multi-term factor model.

Condition 3. Each element of Σe remains bounded as p1 and p2 increase to infinity.

In model (1), RF t C 0 can be viewed as the signal part of the observation X t , and E t as the
noise. The signal strength, or the strength of the factors, can be measured by the L2 -norm of the
loading matrices which are assumed to grow with the dimensions.

Condition 4. There exist constants δ1 and δ2 ∈ [0, 1] such that kRk22 p1−δ 1
1
kRk2min and
kCk22 p1−δ
2
2
kCk2min , as p1 and p2 go to infinity and k1 and k2 are fixed.

The rates δ1 and δ2 are called the strength for row factors and the strength for column factors,
respectively. They measure the relative growth rate of the amount of information carried by
the observed process X t on the common factors as the dimensions increase, with respect to the
growth rate of the amount of noise. When δi = 0, the factors are strong; when δi > 0, the factors
are weak, which means the information contained in X t on the factors grows more slowly than
the noises introduced as pi increases. For detailed discussion of factor strength, see Lam and Yao
(2012).

Condition 5. M i has ki distinct positive eigenvalues for i = 1, 2.

As stated in Section 3, only M(Q1 ) and M(Q2 ) are uniquely determined, while Q1 and
Q2 are not. However, when the eigenvalues of M i are distinct, we can uniquely define Qi as
Qi = {q i,1 , · · · , q i,ki }, where q i,1 , · · · , q i,ki are the unit eigenvectors of M i corresponding to its ki
largest eigenvalues {λi,1 > λi,2 . . . > λi,ki } which make 10 q i,1 , 10 q i,2 , . . ., and 10 q i,ki all positive,
for i = 1, 2.
The following theorems show the rate of convergence for estimators of loading spaces and the
eigenvalues.

Theorem 1. Under Conditions 1-5 and pδ11 pδ22 T −1/2 = o(1), it holds that

b i − Qi k2 = Op (pδ1 pδ2 T −1/2 ), for i = 1, 2.

kQ 1 2

Concerning the impact of δi ’s, it is not surprising that the stronger the factors are, the more
useful information the observed process carries and the faster the estimators converge. More
interestingly, the strengths of row factors and column factors δ1 and δ2 determine the rates
together. An increase in the strength of row factors is able to improve the estimation of the
column factors loading space and vice versa.

11
√
When p1 and p2 are fixed, the convergence rate for estimating the loading matrices are T .
√
If the loadings are strong (δi = 0), the rate is also T , since the signal is as strong as the noise,
and the increase in dimensions will not affect the estimation of the loading spaces. When δi ’s are
not 0, the noise increases faster than useful information. In this case, increases in dimension will
dilute the information, resulting in less efficient estimators.

Theorem 2. With Conditions 1-5 and pδ11 pδ22 T −1/2 = o(1), the eigenvalues {λ bi,p } of M
bi,1 , . . . , λ
i
ci
which are sorted in descending order satisfy

bi,j − λi,j | = Op (p2−δ1 p2−δ2 T −1/2 ),

|λ for j = 1, 2, . . . , ki ,
1 2

and |λ
bi,j | = Op (p21 p22 T −1 ), for j = ki + 1, . . . , pi ,

where λi,1 > λi,2 . . . > λi,ki are eigenvalues of M i , for i = 1, 2.

Theorem 2 shows that the estimators for nonzero eigenvalues of M i converge more slowly
than those for the zero eigenvalues. It provides the theoretical support for the ratio estimator
proposed in Section 3.1.
The following theorem demonstrates the theoretical properties of the estimator Sb t in (14).

Theorem 3. If Conditions 1-5 hold, pδ11 pδ22 T −1/2 = o(1), and kΣe k2 is bounded, we have
−1/2 −1/2 b b 2 − Q2 k2 ) + Op (p−1/2 p−1/2 )
p1 p 2 kS t b 1 − Q1 k2 ) + Op (a2 kQ
− S t k2 = Op (a1 kQ 1 2
δ /2 δ /2 −1/2 −1/2
= Op (p11 p22 T −1/2 + p1 p2 ),
−δ1 /2 −δ2 /2 −1/2
where a1 a2 O(p1 p2 T ).

The theorem shows that, in order to estimate the signal S t consistently, dimensions p1 and p2
must go to infinity, in order to have sufficient information on S t at each time point t.
Since Qi is not identifiable in model (1), another measure to quantify the accuracy of factor
loading matrices estimation is the distance between M(Qi ) and M(Q b i ). For two orthogonal
matrices O 1 and O 2 of sizes p × q1 and p × q2 , define
1/2
1
D(O 1 , O 2 ) = 1− tr(O 1 O 01 O 2 O 02 ) .
max(q1 , q2 )

Then D(O 1 , O 2 ) is a quantity between 0 and 1. It is equal to 0 if the column spaces of O 1 and
O 2 are the same and 1 if they are orthogonal.

Theorem 4. If Conditions 1-5 hold and pδ11 pδ22 T −1/2 = o(1), we have

b i , Qi ) = Op (pδ1 pδ2 T −1/2 ), for i = 1, 2.

D(Q 1 2

Theorem 4 shows that the error to estimate loading spaces is on the same order as that for
the estimated Qi ’s.

12
4 Simulation
In this section, we study the numerical performance of the proposed matrix-valued approach. In
all simulations, the observed data X t ’s are simulated according to model (1),

X t = RF t C 0 + E t , t = 1, 2, . . . , T.

We choose the dimensions of the latent factor process F t to be k1 = 3 and k2 = 2. The entries of F t
are simulated as k1 k2 independent processes with noise N (0, 1) where the types and coefficients
of the processes will be specified later. The entries of R and C are independently sampled
−δ /2 −δ /2
from the uniform distribution U (−pi i , pi i ) for i = 1, 2, respectively. The error process
E t is a white noise process with mean 0 and a Kronecker product covariance structure, that is,
Cov(vec(E t )) = Γ2 ⊗ Γ1 , where Γ1 and Γ2 are of sizes p1 × p1 and p2 × p2 , respectively. Both
Γ1 and Γ2 have values 1 on the diagonal entries and 0.2 on the off-diagonal entries. For all
simulations, the reported results are based on 200 simulation runs.
We first study the performance of our proposed approach on estimating the loading spaces.
In this part, the k1 k2 = 6 latent factors are independent AR(1) processes with the AR coefficients
[−0.5 0.6; 0.8 − 0.4; 0.7 0.3]. We consider three pairs of (δ1 , δ2 ) combinations: (0.5, 0.5), (0.5, 0)
and (0, 0). For each pair of δ1 and δ2 , the two dimensions (p1 , p2 ) are chosen to be (20, 20), (20, 50)
and (50, 50). The sample size T is selected as 0.5p1 p2 , p1 p2 , and 2p1 p2 . We take h0 = 1 since it
is sufficient for AR(1) model as will be shown later.
Table 2 shows the results for estimating the loading spaces M(Q1 ) and M(Q2 ). The accu-
racies are measured by D(Q b 1 , Q1 ) and D(Q b 2 , Q2 ) using the correct k1 and k2 , respectively. The
results show that with stronger signals and more data sample points, the approach increases the
estimation accuracies. Moreover, increasing the strength of one loading matrix can improve the
estimation accuracies for both loading spaces.
With the same simulated data, we compare the proposed matrix-valued approach and the
vector-valued approach in Lam and Yao (2012) through the estimation accuracy of the total
loading matrix Q = Q2 ⊗ Q1 . In what follows, the subscripts mat and vec denote our approach
and Lam and Yao (2012)’s method, respectively. The loading space Q b mat is computed as Qb mat =
Qb2 ⊗ Q b 1 once we obtain estimates of Q b 1 and Q b 2 through our approach. For the vector-valued
approach, we apply Lam and Yao (2012)’s method to the observations {vec(X t ), t = 1, 2, . . . , T }
to obtain Q b vec . Table 3 presents the results for the estimation accuracies of Q measured by
Dvec (Q,
b Q) and Dmat (Q, b Q). It shows that the matrix approach efficiently improves the estimation
accuracy over the vector-valued approach.
We next demonstrate the performance of the matrix-valued approach on estimating the num-
ber of factors, k1 and k2 . The data are the same as the data in Table 2 with δ1 = δ2 = 0 and hence
the true rank pair is (3, 2). Table 4 shows the relative frequencies of estimated rank pairs over
200 simulation runs. The four pairs (2, 1), (2, 2), (3, 1) and (3, 2) have high appearances in all of
the combinations of p1 , p2 and T . The row for the true rank pair (3, 2) is highlighted. It shows
that the relative frequency of correctly estimating the true rank pair improves with increasing

13
T = .5 ∗ p1 ∗ p2 T = p1 ∗ p2 T = 2 ∗ p1 ∗ p2
δ1 δ2 p1 p2 D(Qb 1 , Q1 ) D(Qb 2 , Q2 ) D(Qb 1 , Q1 ) D(Q b 2 , Q2 ) D(Qb 1 , Q1 ) D(Qb 2 , Q2 )
0.5 0.5 20 20 5.96(0.19) 7.12(0.03) 5.80(0.07) 7.09(0.01) 5.73(0.04) 7.08(0.01)
20 50 5.87(0.15) 7.07(0.02) 5.77(0.04) 7.05(0.01) 5.74(0.02) 7.04(0.01)
50 50 6.26(0.56) 7.05(0.01) 5.73(0.13) 7.04(0.00) 5.61(0.03) 7.03(0.00)
0.5 0 20 20 5.36(0.41) 5.42(2.22) 4.27(1.13) 1.66(1.70) 1.52(0.75) 0.54(0.17)
20 50 5.02(0.67) 5.15(1.61) 1.82(0.77) 1.32(0.60) 0.54(0.18) 0.53(0.17)
50 50 3.68(0.48) 3.44(1.23) 1.31(0.20) 0.65(0.19) 0.51(0.07) 0.28(0.08)
0 0 20 20 0.55(0.16) 0.44(0.10) 0.36(0.08) 0.31(0.06) 0.24(0.04) 0.22(0.04)
20 50 0.25(0.06) 0.36(0.07) 0.16(0.03) 0.26(0.05) 0.10(0.02) 0.18(0.03)
50 50 0.13(0.02) 0.12(0.02) 0.09(0.01) 0.08(0.01) 0.06(0.01) 0.06(0.01)

Table 2: Means and standard deviations (in parentheses) of D(Q b i , Qi ), i = 1, 2, over 200 simula-
tion runs. For ease of presentation, all numbers in this table are the true numbers multiplied by
10.

T = .5 ∗ p1 ∗ p2 T = p1 ∗ p2 T = 2 ∗ p1 ∗ p2
δ1 δ2 p1 p2 Dvec (Q,
b Q) Dmat (Q, b Q) Dvec (Q,
b Q) Dmat (Q,b Q) Dvec (Q,
b Q) Dmat (Q, b Q)
0.5 0.5 20 20 8.75(0.17) 8.26(0.07) 8.24(0.18) 8.19(0.03) 7.62(0.17) 8.16(0.02)
20 50 8.72(0.10) 8.20(0.06) 8.40(0.09) 8.15(0.01) 7.92(0.16) 8.13(0.01)
50 50 8.51(0.14) 8.34(0.22) 7.62(0.14) 8.13(0.05) 6.81(0.06) 8.09(0.01)
0.5 0 20 20 6.40(0.29) 7.19(1.13) 5.50(0.31) 4.66(1.45) 4.37(0.45) 1.64(0.72)
20 50 5.64(0.24) 6.75(1.13) 4.75(0.35) 2.30(0.80) 3.37(0.45) 0.78(0.20)
50 50 5.07(0.10) 4.92(0.94) 4.46(0.29) 1.47(0.23) 2.73(0.46) 0.59(0.08)
0 0 20 20 3.64(0.23) 0.71(0.16) 2.77(0.16) 0.48(0.08) 2.07(0.13) 0.33(0.04)
20 50 2.84(0.18) 0.44(0.07) 2.13(0.10) 0.30(0.05) 1.56(0.07) 0.21(0.03)
50 50 1.85(0.10) 0.18(0.02) 1.34(0.06) 0.12(0.01) 0.97(0.04) 0.09(0.01)

Table 3: Means and standard deviations (in parentheses) of D(Q,

b Q) over 200 replicates. For ease
of presentation, all numbers are the true numbers multiplied by 10.

14
p1 = 20, p2 = 20 p1 = 20, p2 = 50 p1 = 50, p2 = 50
(k̂1 , k̂2 ) T = .5p T = p T = 2p T = .5p T = p T = 2p T = .5p T = p T = 2p
(2,1) 0.2 0.055 0 0.32 0.005 0 0 0 0
(2,2) 0.055 0.04 0 0.025 0.005 0 0 0 0
(3,1) 0.19 0.215 0.01 0.47 0.325 0.005 0.005 0 0
(3,2) 0.365 0.66 0.985 0.17 0.665 0.995 0.995 1 1
Others 0.19 0.03 0.005 0.015 0 0 0 0 0

Table 4: Relative frequency of estimated rank pair (k̂1 , k̂2 ) over 200 runs. The row with the true
rank pair (3, 2) is highlighted. Here p = p1 p2 .

k̂ = 1 k̂ = 2 k̂ = 3 k̂ = 4 k̂ = 6 Others
p1 p2 T vec mat vec mat vec mat vec mat vec mat vec mat
20 20 .5p 0.25 0.125 0.33 0.22 0.035 0.19 0.015 0.07 0.345 0.365 0.025 0.03
p 0.055 0.02 0.105 0.06 0 0.215 0 0.045 0.83 0.66 0.01 0
2p 0 0 0.005 0 0 0.01 0 0.005 0.995 0.985 0 0
20 50 .5p 0.03 0.015 0.62 0.32 0 0.47 0 0.025 0.34 0.17 0.01 0
p 0 0 0.14 0.005 0 0.325 0 0.005 0.86 0.665 0 0
2p 0 0 0 0 0 0.005 0 0 1 0.995 0 0
50 50 .5p 0.07 0 0 0 0 0.005 0 0 0.93 0.995 0 0
p 0 0 0 0 0 0 0 0 1 1 0 0
2p 0 0 0 0 0 0 0 0 1 1 0 0

Table 5: Relative frequency of estimated total rank k̂ over 200 replicates for both the vector and
matrix-valued approaches. The column with the true rank k = 6 is highlighted. Here p = p1 p2 .

sample size T . Table 5 shows a comparison between the matrix and vector-valued approaches on
estimating the total number of latent factors k = k1 k2 . The column with the true rank k = 6 is
highlighted. The results show that the two approaches have similar performance when the sample
size T is large. For smaller T , the probability of the matrix-valued approach to select the rank
pair (3, 1) is high and hence the frequency of estimating the true rank k decreases.
We now study the effects of the lag parameter h0 . The k1 k2 factors are assumed to be
independent and follow the same model which is either an AR(1) or an MA(2) model. For the
AR(1) model, the coefficients of all the factors are 0.9, 0.6 or 0.3. For the MA(2) model, we
consider the case ft = et + 0.9et−2 . We take δ1 = δ2 = 0, T = p1 p2 and compare the estimation
accuracies of the two loading spaces for four lag choices, h0 = 1, 2, 3, 4. Table 6 shows the results
of D(Q b 1 , Q1 ) and D(Q
b 2 , Q2 ). It is seen that, for the AR(1) processes, taking h0 = 1 is sufficient.
Larger h0 in fact decreases the performance, especially for small AR coefficient cases. Note that
larger h0 increases the signal strength in the matrix M , but also increases the noise level in its
sample version M̂ . For an AR(1) model with small AR coefficient, the autocorrelation in higher
lags is relatively small hence the additional signal strength is limited. For the MA(2) process, one
must use h0 ≥ 2 since lag 1 autocovariance matrix is zero and does not provide any information.
h0 = 2 performs the best, since all higher lags carry no additional information, but add significant

15
h0 = 1 h0 = 2 h0 = 3 h0 = 4
AR(1) p1 p2 D(Qb ,Q )
1 1 D(Q b ,Q )
2 2 D(Qb ,Q )
1 1 D(Q b ,Q )
2 2 D(Qb ,Q )
1 1 D(Q b ,Q )
2 2 D(Qb ,Q )
1 1 D(Q b ,Q )
2 2
0.9 20 20 0.13(0.02) 0.10(0.02) 0.13(0.02) 0.10(0.02) 0.14(0.02) 0.10(0.02) 0.14(0.02) 0.11(0.02)
20 50 0.05(0.01) 0.07(0.01) 0.05(0.01) 0.07(0.01) 0.05(0.01) 0.07(0.01) 0.06(0.01) 0.08(0.01)
50 50 0.03(0.00) 0.03(0.00) 0.03(0.00) 0.03(0.00) 0.03(0.00) 0.03(0.00) 0.03(0.00) 0.03(0.00)
0.6 20 20 0.36(0.07) 0.26(0.04) 0.41(0.08) 0.27(0.05) 0.47(0.10) 0.28(0.05) 0.53(0.12) 0.29(0.05)
20 50 0.15(0.03) 0.19(0.02) 0.18(0.04) 0.20(0.03) 0.21(0.05) 0.21(0.03) 0.24(0.06) 0.21(0.03)
50 50 0.09(0.01) 0.07(0.01) 0.10(0.01) 0.08(0.01) 0.11(0.02) 0.08(0.01) 0.12(0.02) 0.08(0.01)
0.3 20 20 1.56(0.72) 0.57(0.13) 2.31(0.99) 0.60(0.16) 2.82(1.04) 0.63(0.18) 3.12(1.05) 0.67(0.18)
20 50 0.64(0.21) 0.45(0.10) 1.14(0.49) 0.49(0.13) 1.66(0.68) 0.55(0.19) 2.13(0.82) 0.63(0.24)
50 50 0.26(0.05) 0.17(0.03) 0.39(0.08) 0.18(0.03) 0.56(0.11) 0.20(0.04) 0.74(0.14) 0.21(0.04)
MA(2) 20 20 2.60(1.11) 0.88(0.28) 0.48(0.12) 0.27(0.05) 0.59(0.15) 0.28(0.05) 0.68(0.17) 0.28(0.06)
20 50 2.76(1.16) 1.13(0.56) 0.21(0.04) 0.21(0.03) 0.27(0.06) 0.22(0.04) 0.32(0.07) 0.22(0.04)
50 50 2.85(1.15) 0.68(0.23) 0.11(0.02) 0.08(0.01) 0.13(0.02) 0.08(0.01) 0.15(0.02) 0.08(0.01)

Table 6: Means and standard deviations (in parentheses) of D(Q

b 1 , Q1 ) and D(Q
b 2 , Q2 ) for different
lag parameter h0 . All numbers are the true numbers multiplied by 10.

p1 = p2 T Dvec (Q,
b Q) Dmat (Q,
b Q) Dvec (S,
b S) Dmat (S,
b S)
10 50 6.26(0.38) 3.66(0.92) 4.05(0.28) 3.41(0.39)
200 4.11(0.33) 1.42(0.42) 3.02(0.19) 2.62(0.15)
1000 2.12(0.13) 0.50(0.09) 2.48(0.05) 2.40(0.04)
5000 0.99(0.05) 0.21(0.03) 2.38(0.02) 2.36(0.01)
20 50 5.65(0.39) 2.36(1.26) 3.07(0.27) 1.86(0.61)
200 3.64(0.23) 0.71(0.16) 1.96(0.16) 1.11(0.05)
1000 1.88(0.11) 0.29(0.04) 1.25(0.05) 1.02(0.01)
5000 0.87(0.03) 0.13(0.01) 1.04(0.01) 1.00(0.00)
50 50 5.81(0.35) 3.17(1.47) 2.64(0.26) 1.95(0.79)
200 3.78(0.24) 0.62(0.19) 1.57(0.16) 0.56(0.09)
1000 2.04(0.10) 0.21(0.03) 0.82(0.06) 0.42(0.01)
5000 0.97(0.04) 0.09(0.01) 0.49(0.02) 0.40(0.00)

Table 7: Means and standard deviations (in parentheses) of estimation accuracies of Q and S.
All numbers are the true numbers multiplied by 10.

amount of noise.
Next we study the performance of recovering the signal S t . The latent factors are simulated
in the same way as the data in Table 2. We take δ1 = δ2 = 0, p1 = p2 = 10, 20, 50, and
T = 50, 200, 1000, 5000. The recovery accuracy of S b t , denoted by D(S,
b S), is estimated by
√
the average of kSb t − S t k2 for t = 1, 2, . . . , T further normalized by p1 p2 , that is, D(S,
b S) =
−1/2 −1/2 P T
p1 p2 t=1 kS t − S t k2 /T . Table 7 presents the results of D(S, S) and D(Q, Q) for the
b b b
two approaches. It shows that, when T is relatively large (hence the Q is estimated relatively
accurately), increasing p improves the estimation of S. For the same p and relatively large T ,
further increasing T has a limited benefit in improving the estimation of S. The estimation
accuracies of S of the proposed matrix-valued approach are better than that of the vector-valued
approach, though the relative improvement decreases as T increases, even the improvement of
estimating Q is significant.

16
k̂2 = 1 k̂2 = 2 k̂2 = 3
k̂1 vec mat vec mat vec mat
1 0.83 0.83 0.72 0.75 0.63 0.75
2 0.72 0.71 0.58 0.56 0.49 0.55
3 0.63 0.67 0.49 0.47 0.46 0.46
4 0.58 0.66 0.46 0.47 0.44 0.43

Table 8: Means of out-of-sample RSS/SST for 10-fold cross-validation over 200 simulation runs.
The cell corresponding to the true order (3, 2) is highlighted.

Next, we conduct a 10-fold cross-validation study. The data are generated in the same way as
the data in Table 2 with δ1 = δ2 = 0, p1 = p2 = 20 and T = 1000. We vary the estimated number
of factors k1 and k2 from all combinations of k1 = 1, 2, 3, 4 and k2 = 1, 2, 3. The means of the
out-of-sample RSS/SST are reported in Table 8. For the matrix-valued approach, the RSS/SST
decreases rapidly when k1 and k2 increase, before they reach the true rank pair (3, 2) (highlighted
in the table). Then the RSS/SST value remain roughly the same with increasing estimated ranks
when k1 > 3 and k2 > 2. For the vector-valued approach, k = k1 k2 , hence the values in the
table are the same for the same k1 k2 value (e.g. (2, 3) and (3, 2) are equivalent). Its performance
improves quickly as k increases until k = 6, the true number of factors. Then the performance
remains relatively the same for k̂ > 6.

5 Real Example: Fama-French 10 by 10 Series

In this section we illustrate the matrix factor model using the Fama-French 10 by 10 return series.
A universe of stocks is grouped into 100 portfolios, according to ten levels of market capital (size)
and ten levels of book to equity ratio (BE). Their monthly returns from January 1964 to December,
2015 for total 624 months and overall 62,400 observations are used in this analysis. For more de-
tailed information, see http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data library.html.
All the 100 series are clearly related to the overall market condition. In this analysis we simply
subtract the corresponding monthly excess market return from each of the series, resulting in 100
market-adjusted return series. We chose not to fit a standard CAPM model to each of the series
to remove the market effect, as it will involve estimating 100 different betas. The market return
data are obtained from the same website above.
Figure 1 shows the time series plot of the 100 series (standardized), and Figures 2 and 3 show
the logarithms and ratios of eigenvalues of M c 1 and M c 2 for the row (size) and column (BE)
loading matrices. Since the series shows very small autocorrelation beyond h = 1, in this example
we use h0 = 1. The results by using h0 = 2 are similar. Although the eigenvalue ratio estimate
presented in Section 3.1 indicates k1 = k2 = 1, we use k1 = k2 = 2 here for illustration. Tables 9
and 10 show the estimated loading matrices after a varimax rotation that maximizes the variance
of the squared factor loadings, scaled by 30 for a cleaner view. For size, it is seen that there are

17
Factor S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
1 -13 -14 -13 -13 -10 -5 -2 1 6 7
2 0 0 -2 3 5 12 12 18 15 5

Table 9: Fama-French series: Size loading matrix after rotation and scaling.

Factor BE1 BE2 BE3 BE4 BE5 BE6 BE7 BE8 BE9 BE10
1 -21 -14 -11 -9 -4 -1 -1 -4 1 3
2 -9 2 3 7 9 10 10 10 13 14

Table 10: Fama-French series: BtoE loading matrix after rotation and scaling.

possibly two or three groups. The 1-st to 5-th smallest size portfolios load heavily (with roughly
equal weights) on the first row of the factor matrix, while the 6-th to 9-th smallest size portfolios
load heavily (with roughly equal weights) on the second row of the factor matrix. The largest
(10-th) size portfolio behaves similar to the other larger size portfolios, but with some differences.
We note that the Fama-French size factor proposed in Fama and French (1993) is constructed
using the return differences of the largest 30% of the companies (combining the 8-th to 10-th size
portfolio) and the smallest 30% of the companies (combining our 1st to 3rd size portfolio).
Turning to the book to equity ratio, Table 10 shows a different pattern in the column loading
matrix. There seem to have three groups. The smallest 2-nd to 4-th BE portfolios load heavily
on the first column of the factor matrix; the 5th to 10th BE portfolios load heavily on the second
columns of the factor matrix. The smaller (1st) BE portfolios load heavily on both columns of
the factor matrix, with different loading coefficients.
Figure 4 shows the estimated factor matrices over time. It can be potentially used to re-
place the Fama-French size factor (SMB) and book to equity factor (HML) in a Fama-French
factor model for asset pricing, factor trading and other usage, though further analysis is needed
to assess their effectiveness. Cross-correlation study shows that there are not many significant
cross-correlation of lag larger than 0 among the factors, though the factors show some strong con-
temporary correlation as the factor matrices are subject to rotation – in our case we performed
rotation to reveal the group structure in the loading matrices. A principle component analysis of
the four factor series reveals that three principle components can explain 98% of the variation in
the four factors, hence there may still be some redundancy in the factors and the model may be
further simplified.
Figure 5 shows the logarithms and ratios of eigenvalues of M c in Lam et al. (2011) for a vec-
torized factor model (2). Models with various number of factors were estimated and a comparison
is shown in Table 11 using a version of rolling-validation. Specifically, for each year between 1996
to 2015, we use all data available before the year to fit a matrix (or vector) factor model and
estimate the corresponding loading matrices. Using these estimated loading matrices and the ob-
served 12 months of the data in the year, we estimate the factors and the corresponding residuals.

18
9
8
7
6
5
4
3
2
1

10
xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk]

Index
Index
Index
Index
Index
Index
Index
Index
Index
1

xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk]

Index
Index
Index
Index
Index
Index
Index
Index
Index
2

xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk]

Vector
Vector
Vector
Vector
Matrix
Matrix
Matrix
Matrix
Matrix
Index
Index
Index
Index
Index
Index
Index
Index
Index
3

model
model
model
model
model
model
model
model
model
xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk]

Index
Index
Index
Index
Index
Index
Index
Index
Index
4

6
5
4
3
(3,3)
(3,2)
(2,3)
(2,2)
(0,0)
factor
xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk]

Index
Index
Index
Index
Index
Index
Index
Index
Index
5

RSS

19
14,149
14,565
15,365
16,262
13,530
14,166
14,514
14,973
29,193
xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk]

Index
Index
Index
Index
Index
Index
Index
Index
Index
6

6
5
4
3
9
6
6
4
0
# factors
xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk]

Index
Index
Index
Index
Index
Index
Index
Index
Index
7

60
50
50
40
xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk]

600
500
400
300
Index
Index
Index
Index
Index
Index
Index
Index
Index
8

Figure 1: Time series plot of Fama-French 10 by 10 series.

# parameters

Table 11: Comparison of different models for Fama-French series.

xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk] xxx[, ik, jk]
Index
Index
Index
Index
Index
Index
Index
Index
Index
9

Index
Index
Index
Index
Index
Index
Index
Index
Index
10
Ratio of eigen values for p dim
log eigen values for p dim

1.0
●
●

●
−2

0.8
●
●

● ●

●
−3

0.6
●
−4

0.4
●
●
●
−5

● ●

2 4 6 8 10 2 4 6 8

Figure 2: Fama-French series: Logarithms and ratios of eigenvalues of M

c 1 for the row (size)
loading matrix.

log eigen values for q dim Ratio of eigen values for q dim
−1.5

0.9

● ●
●
●
−2.0

●
0.8

●
●
−2.5

0.7

●
−3.0

●
0.6
−3.5

●
0.5
−4.0

●
0.4
−4.5

● ●

●
●
−5.0

●
0.3

● ●

2 4 6 8 10 2 4 6 8

Figure 3: Fama-French series: Logarithms and ratios of eigenvalues of M

c 2 for the column (BE)
loading matrix.

20
Ft.inputk.rot[, ik, jk] Ft.inputk.rot[, ik, jk]

−10 −5 0 5 −10 0 10 20

0
0

100
100

200
200

300
300

Index

400
400

500
500

600
600
Ft.inputk.rot[, ik, jk] Ft.inputk.rot[, ik, jk]

21
−15 −5 0 5 10 15 20 −20 −15 −10 −5 0 5 10

0
0

100
100

200
200

300
300

Figure 4: Fama-French series: Estimated factors.

Index

400
400

500
500

600
600
log eigen values: vector model ratio of eigen values: vector model

1.0
3

● ●

0.9
2

0.8
●
●
log(eigval.vec.zero[1:30])

exp(diff(temp))

0.7
●

● ●
0

0.6
●
● ●
● ●
−1

0.5
●
●
● ●
●
● ●

0.4
●
−2

● ●
●
● ●
● ●
● ●
● ● ●
●

0.3
● ●
● ●
−3

0 5 10 15 20 25 30 2 4 6 8

Index Index

Figure 5: Fama-French series: Logarithms and ratios of eigenvalues of M for the vectorized factor
model.

Total sum of squares of the 12 residuals of the 100 series of the 20 years are reported. The RSS
corresponding to model (0, 0) is the total sum of squares of the observed 100 series of the 20 years
being studied. It is seen that the matrix factor model with (2, 2) factor matrices performs better
than the vectorized factor model with equal number of factors and many more parameters in the
loading matrices. The (3, 2) matrix factor model performs similarly as the 6-factor vectorized
factor model, but the number of parameters used is much smaller.

6 Real Example: Series of Company Financials

In this example we analyze the series of financial data reported by a group of 200 companies. We
constructed 16 financial characteristics based on company quarterly financial reports. The list of
variables and their definitions is given in Appendix 2. The period is from the first quarter of 2006
to the fourth quarter of 2015 for 10 years with total 40 observations. The total number of time
series is 3,200.
Figures 6 and 7 show the eigenvalues and their ratios of M c 1 and M c 2 for row factors and
column factors. The estimated dimensions k1 and k2 are both 3, though we use k1 = 5 and
k2 = 20 for this illustration, with interesting results. Estimation is done using h0 = 2.
The estimated row loading matrix is rotated to maximize its variance, with potential grouping
shown by the shaded areas in Table 12. It shows the loading of each financial on the five rows
of the factor matrix, after proper scaling (30 times) and reordering for easy visualization. The
two financials in Group 1 load almost exclusively on Row 1 of the factor matrix, with almost
the same weights. The six financials in Group 2 load heavily on Row 2, again with almost the

22
Eigenvalues of Row Matrix Ratio of Eigenvalues

1.0
● ●
250000

0.9
●

exp(diff(log(eigval1[1:15])))
● ●
●
●
●

0.8
●
150000

● ● ●

0.7
●
●

0.6
● ●

●
0 50000

0.5
● ● ●
● ● ●
●

0.4
● ● ● ●

2 4 6 8 10 12 14 2 4 6 8 10 12 14

Figure 6: Financial series: Eigenvalues and their ratios of M

c 1.

Eigenvalues of Column Matrix Ratio of Eigenvalues

1.0

● ● ●
300000

●●
●● ●
● ●● ●
● ● ●●
exp(diff(log(eigval2[1:30])))

● ●
0.9

● ● ●
●
● ●
200000

0.8

●
●
●
●
●
0.7

●
100000

●
0.6

●
●
●● ●
●●
●●●●
●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●
0

0 10 20 30 40 50 0 5 10 15 20 25 30

Figure 7: Financial series: Eigenvalues and their ratios of M

c 2.

23
Row Factor F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15 F16
1 21 21 -1 -1 -1 -1 -1 4 -2 2 1 0 1 0 0 -1
2 1 1 -13 -9 -11 -11 -13 -12 7 -6 -1 3 -1 1 0 1
3 0 0 0 -10 0 -1 0 -1 -11 9 -24 -4 -1 2 -2 0
4 0 0 -3 2 5 4 -2 -2 19 21 -2 4 1 3 0 -1
5 0 0 -1 1 2 0 -2 -2 -4 2 1 9 -8 -13 -14 -18

Table 12: Financial series: Loading matrix after a varimax rotation and scaling

Group 1 AssetE.R LiabilityE.R

Group 2 Earnings.R EPS Oper.M Profit.Margin ROA ROE
Group 3 Cash.PS Gross.Margin Revenue.PS
Group 4 Payout.R Profit.G.Q Profit.G.Y Revenue.G.Q Revenue.G.Y

Table 13: Financial series: Grouping of company financials

same weights. The three financials in Group 3 load on Rows 3 and 4, with somewhat different
weights. Finally, the five financials in Group 4 mainly load on Row 5 of the factor matrix, with
Payout.Ratio having opposite weights from the others.
The detailed grouping is shown in Table 13. Group 1 consists of asset to equity ratio and
liability to equity ratio. They are two very closely related measures. Group 2 consists of six
measures on earnings and returns. Group 3 consists of cash and revenue per share, and gross
margin. Group 4 consists of profit growth and revenue growth comparing to the previous quarter
and the same quarter last year. The Payout Ratio variable is also included in the group. Such
groupings are relatively expected.
Based on the 200 rows of the estimated columns loading matrix (corresponding to the com-
panies), after rotation to maximize the variance, the companies are grouped into 6 groups. Table
14 shows the grouping corresponding to the industry classification index. The pattern is not as
clear as the clustering of the row loading matrix but we still make some interesting discoveries.
Industrial companies are mainly clustered in Groups 1 to 3; Health Care companies in Groups
2 and 3; Information Technology companies in Groups 1, 3 and 5; and Materials companies in
Group 3. Looking from the other angle, we find that Group 4 mainly contains Energy compa-
nies; Group 5 mainly contains Consumer Discretionary, Financials and Information Technology
companies; Group 6 mainly contains Utility companies.
Figure 8 shows the total 100 factor series in the 5 by 20 factor matrix. Interpretation of the
factors is difficult. There are significant redundancy and correlation among the factors, since we
have 100 factors but the time series length is only 40. Clearly the model tends to overfit. This
example is for illustration purpose only, though we do find interesting features.
Table 15 shows a simple comparison between the matrix factor models and vectorized factor
models of various size and number of factors. Since the time series is short, the table shows

24
Ft.inputk.rot[,
Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] ik, jk]

Index
Index
Index
Index
Index
Index
Index
Index
Index
Ft.inputk.rot[,
Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] ik, jk]

Index
Index
Index
Index
Index
Index
Index
Index
Index

Ft.inputk.rot[,
Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] ik, jk]

Index
Index
Index
Index
Index
Index
Index
Index
Index

group

Energy

Utilities
Materials
Financials

Industrials
Health Care
Ft.inputk.rot[,
Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] ik, jk]

Consumer Staples
Index
Index
Index
Index
Index
Index
Index
Index
Index

Information Technology
Consumer Discretionary

25
Telecommunications Services
Ft.inputk.rot[,
Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] ik, jk]

0
0
2
4
0
0
2
3
2
1

12
Index
Index
Index
Index
Index
Index
Index
Index
Index

6
2
5
0
7
5
5
3
4
3
2
Ft.inputk.rot[,
Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] ik, jk]

5
1
8
2
4
6
4
3

12
17
17
Index
Index
Index
Index
Index
Index
Index
Index
Index

1
0
1
0
0
0
0
9
1
1
4
Ft.inputk.rot[,
Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] ik, jk]

0
0
1
5
1
1
4
0
0
5
5
Index
Index
Index
Index
Index
Index
Index
Index
Index

0
1
0
2
2
0
4
1
4
6

13
Ft.inputk.rot[,
Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] Ft.inputk.rot[,
ik, jk] ik, jk]
Index
Index
Index
Index
Index
Index
Index
Index
Index

Figure 8: Financial series: Plot of the 100 series in the factor matrix

Table 14: Financial series: Matching the companies and the industry
Index
Index
Index
Index
Index
Index
Index
Index
Index
factor RSS RSS/SST # factors # parameters
Matrix model (4,10) 86,739 0.701 40 2,064
Matrix model (4,20) 74,848 0.610 80 4,064
Matrix model (5,10) 84.517 0.688 50 2,080
Matrix model (5,20) 71,535 0.582 100 4,080
Matrix model (5,30) 65,037 0.530 150 6,080
Vector model 3 79,704 0.650 3 9,600
Vector model 4 73,457 0.598 4 12,800
Vector model 5 68,428 0.557 5 16,000
Vector model 6 63,031 0.514 6 19,200

Table 15: Financial series: Comparison of different models for company financials series

in-sample residual sum of squares. Again, it is seen that the matrix factor models use much
fewer parameters in loading matrices to achieve similar estimation performance. The number of
parameters involved is large as we are jointly modeling 3,200 time series.

7 Summary
In this paper we propose a matrix factor model for high-dimensional matrix-valued time series,
along with an estimation procedure. Theoretical analysis shows the asymptotic properties of
the estimators. Simulation and real examples are used to illustrate the model and finite sample
properties of the estimators. The real examples show the usefulness of the model and its abil-
ity to reveal interesting features of high-dimensional time series. Significant amount of effort is
needed to investigate model validation and model comparison procedures for the proposed model.
Extensions to multi-term model and approaches to simply reducing factor redundancy are impor-
tant research topics. Extending the model to dynamic factor model with an imposed dynamic
structure on the factor matrix will be useful in terms of prediction and better understanding the
dynamic nature of the matrix-valued time series.

Acknowledgments
We thank the Editors and two anonymous referees for their helpful insightful comments and
suggestions, which lead to significant improvement of the paper in motivation and justification,
design of simulation study and the analysis of real examples.

References
Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models.
Econometrica, 70(1):191–221.

26
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econo-
metrics, 31(3):307–327.

Box, G. and Jenkins, G. (1976). Time Series Analysis, Forecasting and Control. Holden Day:
San Francisco.

Brockwell, P. and Davis, R. A. (1991). Time Series: Theory and Methods. Springer.

Chamberlain, G. and Rothschild, M. (1983). Arbitrage, factor structure, and meanvariance anal-
ysis on large asset markets. Econometrica, 51(5):1281–1304.

Chang, J., Guo, B., and Yao, Q. (2015). High dimensional stochastic regression with latent
factors, endogeneity and nonlinearity. Journal of Econometrics, 189(2):297–312.

Crainiceanu, C. M., Caffo, B. S., Luo, S., Zipunnikov, V. M., and Punjabi, N. M. (2011). Pop-
ulation Value Decomposition, a Framework for the Analysis of Image Populations. Journal of
the American Statistical Association, 106(495):775–790.

Ding, C. and Ye, J. (2005). 2-Dimensional Singular Value Decomposition for 2D Maps and Images.
In Proc. SIAM Int’l Conf. Data Mining (SDM’05), pages 32–43.

Engle, R. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of

United Kingdom inflactions. Econometrika, 59:987–1007.

Engle, R. and Kroner, K. (1995). Multivariate simultaneous generalized arch. Econometric

Theory, 11(1):122–150.

Fama, E. F. and French, K. R. (1993). The cross-section of expected stock returns. Journal of
Finance, 47:427–465.

Fan, J., Liao, Y., and Mincheva, M. (2011). High dimensional covariance matrix estimation in
approximate factor models. Annals of Statistics, 39(6):3320.

Fan, J., Liao, Y., and Mincheva, M. (2013). Large covariance estimation by thresholding prin-
cipal orthogonal complements. Journal of the Royal Statistical Society: Series B (Statistical
Methodology), 75(4):603–680.

Fan, J. and Yao, Q. (2003). Nonlinear Time Series: Nonparametric and Parametric Methods.
Springer.

Forni, M., Hallin, M., Lippi, M., and Reichlin, L. (2000). The generalized dynamic-factor model:
identification and estimation. Review of Economics and Statistics, 82(4):540–554.

Gupta, A. K. and Nagar, D. K. (2000). Matrix Variate Distributions. Chapman & Hall/CRC,
Boca Raton, FL.

27
Hallin, M. and Liška, R. (2007). Determining the number of factors in the general dynamic factor
model. Journal of the American Statistical Association, 102(478):603–617.

Kollo, T. and von Rosen, D. (2006). Advanced multivariate statistics with matrices, volume 579.
Springer.

Lam, C. and Yao, Q. (2012). Factor modeling for high-dimensional time series: inference for the
number of factors. Annals of Statistics, 40(2):694–726.

Lam, C., Yao, Q., and Bathia, N. (2011). Estimation of latent factors for high-dimensional time
series. Biometrika, 98(4):901–918.

Leng, C. and Tang, C. Y. (2012). Sparse matrix graphical models. Journal of the American
Statistical Association, 107(499):1187–1200.

Liu, X. and Chen, R. (2016). Regime-switching factor models for high-dimensional time series.
Statistica Sinica, 26:1427–1451.

Lütkepohl, H. (2005). New introduction to multiple time series analysis. Springer, Berlin.

Merikoski, J. K. and Kumar, R. (2004). Inequalities for spreads of matrix sums and products.
Applied Mathematics E-Notes, 4:150–159.

Paatero, P. and Tapper, U. (1994). Positive matrix factorization: a non-negative factor model
wiht optimal utilization of errorestimates of data vaelus. Biometrika, 5(1):111–126.

Pan, J. and Yao, Q. (2008). Modelling multiple time series via common factors. Biometrika,
95(2):365–379.

Stock, J. and Watson, M. (2004). An empirical comparison of methods for forecasting using many
predictors. Technical Report, Department of Economics, Havard University.

Tiao, G. and Box, G. (1981). Modelling multiple time series with applications. Journal of the
American Statistical Association, 76(376):802–816.

Tiao, G. and Tsay, R. (1989). Model specification in multivariate time series. Journal of the
Royal Statistical Society: Series B (Statistical Methodology), 51(2):157–213.

Tong, H. (1990). Nonlinear Time Series Analysis: A Dynamical System Approach. London:
Oxford University Press.

Tsay, R. (2005). Analysis of Financial Time Series. New York: Wiley.

Tsay, R. (2014). Multivariate Time Series Analysis. New York: Wiley.

Walden, A. and Serroukh, A. (2002). Wavelet analysis of matrix-valued time series. Proceedings:
Mathematical, Physical and Engineering Sciences, 458(2017):157–179.

28
Wang, D., Shen, H., and Truong, Y. (2016). Efficient dimension reduction for high-dimensional
matrix-valued data. Neurocomputing, 190:25–34.

Werner, K., Jansson, M., and Stoica, P. (2008). On estimation of covariance matrices with
Kronecker product structure. IEEE Transactions on Signal Processing, 56(2):478–491.

Yang, J., Zhang, D., Frangi, A. F., and Yang, J. (2004). Two-Dimensional PCA: A New Approach
to Appearance-Based Face Representation and Recognition. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 26(1):131–137.

Ye, J. (2005). Generalized Low Rank Approximations of Matrices. Machine Learning, 61(1-
3):167–191.

Yin, J. and Li, H. (2012). Model selection and estimation in the matrix normal graphical model.
Journal of Multivariate Analysis, 107(0):119–140.

Zhang, D. and Zhou, Z. (2005). (2D)2 PCA: Two-directional two-dimensional PCA for efficient
face representation and recognition. Neurocomputing, 69(1):224–231.

Zhao, J. and Leng, C. (2014). Structured lasso for regression with matrix covariates. Statistica
Sinica, 24:799–814.

Zhou, H. and Li, L. (2014). Regularized matrix regression. Journal of the Royal Statistical Society:
Series B (Statistical Methodology), 76(2):463–483.

Zhou, S. (2014). Gemini: Graph estimation with matrix variate normal instances. Annals of
Statistics, 42(2):532–562.

29
Appendix 1: Proofs
We start by defining some quantities used in the proofs. Write
T −h
1 X
Ωs,ij (h) = Cov(RF t c0i· , RF t+h c0j· ),
T −h
t=1
T −h
1 X
Ωf c,ij (h) = Cov(F t c0i· , F t+h c0j· )
T −h
t=1
T −h
1 X
Ω
b s,ij (h) = RF t c0i· cj· F 0t+h R0 ,
T −h
t=1
T −h
1 X
Ω
b f c,ij (h) = F t c0i· cj· F 0t+h ,
T −h
t=1
T −h
1 X
Ω
b s,ij (h) = RF t c0i· 0t+h,j ,
T −h
t=1
T −h
1 X
Ω
b s,ij (h) = t,i cj· F 0t+h R0 ,
T −h
t=1
T −h
1 X
Ω
b ,ij (h) = t,i 0t+h,j .
T −h
t=1

The following lemma establishes the entry-wise convergence rate of the covariance matrix
estimation of the vectorized latent factor process vec(F t ).

Lemma 1. Let Ft,ij denote the ij-th entry of F t . Under Conditions 1 and 2, for any i, k =
1, 2, . . . , k1 , and j, l = 1, 2, . . . , k2 , it follows that
T −h
1 X
Ft,ij Ft+h,kl − Cov(Ft,ij , Ft+h,kl ) = Op (T −1/2 ).
T −h
t=1

Proof. Under Conditions 1 and 2, by Davydov’s inequality, it follows that

T −h 2
!
1 X
E Ft,ij Ft+h,kl − Cov(Ft,ij , Ft+h,kl )
T −h
t=1
1 X
= 2
E[Ft1 ,ij Ft1 +h,kl − E(Ft1 ,ij Ft1 +h,kl )][Ft2 ,ij Ft2 +h,kl − E(Ft2 ,ij Ft2 +h,kl )]}
(T − h)
|t1 −t2 |≤h
1 X
+ E[Ft1 ,ij Ft1 +h,kl − E(Ft1 ,ij Ft1 +h,kl )][Ft2 ,ij Ft2 +h,kl − E(Ft2 ,ij Ft2 +h,kl )]}
(T − h)2
|t1 −t2 |>h
T −2h−1
C C X
≤ + α(u)1−2/γ = O(T −1 ).
T −h T −h
u=1

Here C denotes a constant.

30
Under the matrix-valued factor model (1), the RF t C 0 can be view as the signal part and E t
as noise. The following lemma concerns the rates of convergence for estimation of the signal, the
noise, and the interaction between the two.

Lemma 2. Under Conditions 1-4, it holds that

p2 X
X p2
b s,ij (h) − Ωs,ij (h)k2 = Op (p2−2δ1 p2−2δ2 T −1 ),
kΩ 2 1 2
i=1 j=1
p2 X
X p2
b s,ij (h)k2 = Op (p2−δ1 p2−δ2 T −1 ),
kΩ 2 1 2
i=1 j=1
Xp2 Xp2
b s,ij (h)k2 = Op (p2−δ1 p2−δ2 T −1 ),
kΩ 2 1 2
i=1 j=1
Xp2 X
p2
b ,ij (h)k2 = Op (p2 p2 T −1 ).
kΩ 2 1 2
i=1 j=1

Proof. Firstly, we have

b f c,ij (h) − Ωf c,ij (h)k2 ≤ kΩ
kΩ b f c,ij (h) − Ωf c,ij (h)k2
2 F
b f c,ij (h) − Ωf c,ij (h))k2
= kvec(Ω 2
T −h
1 X 2
= vec(F t c0i· cj· F 0t+h − E(F t c0i· cj· F t+h ))
T −h 2
t=1
T −h
1 X 2
F t+h ⊗ F t − E(F t+h ⊗ F t ) vec(c0i· cj· )

=
T −h 2
t=1
T −h
1 X 2
≤ (F t+h ⊗ F t − E(F t+h ⊗ F t ) kvec(c0i· cj· )k22
T −h 2
t=1
T −h
1 X 2
= (F t+h ⊗ F t − E(F t+h ⊗ F t ) kc0i· cj· k2F
T −h F
t=1
T −h
1 X 2
≤ (F t+h ⊗ F t − E(F t+h ⊗ F t ) kci· k22 · kcj· k22 .
T −h F
t=1

Hence, by Condition 4 and Lemma 1, it follows that

p2 X
X p2 p2 X
X p2
b s,ij (h) − Ωs,ij (h)k2 =
kΩ b f c,ij (h) − Ωf c,ij (h))R0 k2
kR(Ω
2 2
i=1 j=1 i=1 j=1
T −h p2
!2
1 X 2 X
≤ kRk42 (F t+h ⊗ F t − E(F t+h ⊗ F t ) kci· k22
T −h F
t=1 i=1
T −h
1 X 2
= kRk42 (F t+h ⊗ F t − E(F t+h ⊗ F t ) kCk4F
T −h F
t=1
T −h
1 X 2
2 −1
≤ k22 kRk42 (F t+h ⊗ F t − E(F t+h ⊗ F t ) kCk42 = Op (p12−2δ1 p2−2δ
2 T ).
T −h F
t=1

31
Similarly, for the interaction component between signal and noise, we have
p2 X
p2 p2 X
p2 T −h
X X 1 X 2
b s,ij (h)k2 ≤
kΩ 2 kRk22 F t c0i· 0t+h,j
T −h 2
i=1 j=1 i=1 j=1 t=1
 
p2 T −h p2
!
2
X 1 X 2 X
2
≤ kRk2 t+h,j ⊗ Ft  kci· k2
T −h 2
j=1 t=1 i=1
1 2−δ2 −1
= Op (p2−δ
1 p2 T ),

and
p2 X
X p2
b s,ij (h)k2 = Op (p2−δ1 p2−δ2 T −1 ).
kΩ 2 1 2
i=1 j=1

Lastly, for the noise term, we have

p2 X
p2 p2 X
p2 T −h
X X 1 X
b ,ij (h)k2 =
kΩ 2 k t,i 0t+h,j k22 = Op (p21 p22 T −1 ).
T −h
i=1 j=1 i=1 j=1 t=1

With the four rates established in Lemma 2, we can now study the rate of convergence for the
observed covariance matrix Ωb x,ij (h).

Lemma 3. Under Conditions 1-4, it holds that

p2 X
X p2
b x,ij (h) − Ωx,ij (h)k2 = Op (p2 p2 T −1 ).
kΩ 2 1 2
i=1 j=1

Proof. From the definition of Ω

b x,ij (h) in (12), we can decompose Ω
b x,ij (h) into four parts as
follows,
T −h
1 X
Ω
b x,ij (h) = xt,i x0t+h,j
T −h
t=1
T −h
1 X
= (RF t c0i· + t,i )(RF t+h c0j· + t+h,j )0
T −h
t=1
= Ω
b s,ij (h) + Ω
b s,ij (h) + Ω
b s,ij (h) + Ω
b ,ij (h).

Then by Lemma 2, it follows that

p2 X
p2
2
X
b x,ij (h) − Ωx,ij (h)
Ω 2
i=1 j=1
p2 X
X p2
≤ 4 b s,ij (h) − Ωs,ij (h)k2 + kΩ
kΩ b s,ij (h)k2 + kΩ
b s,ij (h)k2 + kΩ
b ,ij (h)k2
2 2 2 2
i=1 j=1

= Op (p21 p22 T −1 ).

32
Lemma 4. Under Conditions 1-4, and pδ11 pδ22 T −1/2 = o(1), it holds that

c 1 − M 1 k2 = Op (p2−δ1 p2−δ2 T −1/2 ).

kM 1 2

Proof. From the definitions of M

c 1 and M 1 in (13) and (11), it follows that

p2 X
h0 X p2
b 0 (h) − Ωx,ij (h)Ω0 (h)
X
kM
c 1 − M 1 k2 = Ω
b x,ij (h)Ω
x,ij x,ij
2
h=1 i=1 j=1
p2 X
h0 X
X p2
≤ k(Ω b x,ij (h) − Ωx,ij (h))0 k2 + 2kΩx,ij (h)k2 kΩ
b x,ij (h) − Ωx,ij (h))(Ω b x,ij (h) − Ωx,ij (h)k2
h=1 i=1 j=1
p2 X
h0 X
X p2 p2 X
h0 X
X p2
≤ b x,ij (h) − Ωx,ij (h)k2 + 2
kΩ kΩx,ij (h)k2 kΩ
b x,ij (h) − Ωx,ij (h)k2 .
2
h=1 i=1 j=1 h=1 i=1 j=1

We have
p2 X
X p2 p2 X
X p2 p2 X
X p2
kΩx,ij (h)k22 = kRΩf c,ij (h)R0 k22 ≤ kRk42 kΩf c,ij (h)k22
i=1 j=1 i=1 j=1 i=1 j=1
T −h p2
!2
1 X 2 X
≤ kRk42 · E(F t+h ⊗ F t ) · 2
kci· k2
T −h 2
t=1 i=1
2−2δ1 2−2δ2
= O(p1 p2 ). (15)

By (15) and Lemma 3,

 2
Xp2 X
p2
 kΩx,ij (h)k2 kΩ
b x,ij (h) − Ωx,ij (h)k2 
i=1 j=1
   
Xp2 X
p2 Xp2 X
p2
≤  kΩx,ij (h)k22  ·  b x,ij (h) − Ωx,ij (h)k2 
kΩ 2
i=1 j=1 i=1 j=1
1 2−2δ2 2 2 −1 2 −1
≤ Op (p2−2δ
1 p2 p1 p2 T ) = Op (p14−2δ1 p4−2δ
2 T ), (16)

where the second inequality follows from Cauchy-Schwarz inequality.

From (16), Lemma 3, and the condition pδ11 pδ22 T −1/2 = o(1), it follows that

c 1 − M 1 k2 = Op (p2−δ1 p2−δ2 T −1/2 ).

kM 1 2

Lemma 5. Under Conditions 2 and 3, we have

λi (M 1 ) p2−2δ
1
1 2−2δ2
p2 , i = 1, 2, ...k1 ,

where λi (M 1 ) denotes the i-th largest eigenvalue of M 1 .

33
Proof. By definition, we have
T −h
1 X
E (ci· ⊗ I k1 )vec(F t )vec(F t+h )0 (c0j· ⊗ I k1 )

Ωf c,ij (h) =
T −h
t=1
= (ci· ⊗ I k1 )Σf (h)(c0j· ⊗ I k1 ).

Under Conditions 2-3 and by properties of Kronecker product we have

 
X p2 X
h0 X p2
λk1 (M 1 ) = λk1  RΩf c,ij (h)R0 RΩ0f c,ij (h)R0 
h=1 i=1 j=1
 
p2 X
h0 X
X p2
≥ kRk4min · λk1  Ωf c,ij (h)Ω0f c,ij (h)
h=1 i=1 j=1
 
X p2 X
h0 X p2
= kRk4min · λk1  (ci· ⊗ I k1 )Σf (h)(c0j ⊗ I k1 )(cj· ⊗ I k1 )Σ0f (h)(c0i· ⊗ I k1 )
h=1 i=1 j=1
 
X p2 X
h0 X p2
≥ kRk4min · λk1  (ci· ⊗ I k1 )Σf (h)(c0j· cj· ⊗ I k1 )Σ0f (h)(c0i· ⊗ I k1 )
h=1 i=1 j=1
p2
h0 X
!
X
= kRk4min · λk1 (ci· ⊗ I k1 )Σf (h)(C 0 C ⊗ I k1 )Σ0f (h)(c0i· ⊗ I k1 )
h=1 i=1
h0 Xp2
!
X
0
= kRk4min · λk1 (ci· ⊗ I k1 )Σf (h)(C ⊗ I k1 )(C ⊗ I k1 )Σ0f (h)(c0i· ⊗ I k1 )
h=1 i=1
p2
h0 X
!
X
= kRk4min · λk1 (C ⊗ I k1 )Σ0f (h)(c0i· ⊗ I k1 )(ci· ⊗ I k1 )Σf (h)(C 0 ⊗ I k1 )
h=1 i=1
h0
!
X
= kRk4min · λk1 (C ⊗ I k1 )Σ0f (h)(C 0 C ⊗ I k1 )Σf (h)(C 0 ⊗ I k1 ) .
h=1
0
Since C C is a k2 × k2 symmetric positive definite matrix, we can find a k2 × k2 positive definite
matrix U , such that C 0 C = U U 0 and kU k22 O(p1−δ
2
2
) kU k2min . By the properties of Kronecker
1/2−δ2 /2
product, we can show that σ1 (U ⊗I k1 ) O(p2 ) σk1 k2 (U ⊗I k1 ). Under Condition 2 using
1/2−δ2 /2
Theorem 9 in Merikoski and Kumar (2004), it follows that σk1 Σ0f (h)(U ⊗ I k1 ) O(p2

).
Using Theorem 9 in Merikoski and Kumar (2004) again, we have
h0
!
X
λk1 (M 1 ) ≥ kRk4min · λk1 (C ⊗ I k1 )Σ0f (h)(U ⊗ I k1 )(U 0 ⊗ I k1 )Σf (h)(C 0 ⊗ I k1 )
h=1
h0
!
X
0 0
= kRk4min · λk1 (U ⊗ I k1 )Σf (h)(C C ⊗ I k1 )Σ0f (h)(U ⊗ I k1 )
h=1
h0
!
X
= kRk4min · λk1 (U 0 ⊗ I k1 )Σf (h)(U ⊗ I k1 )(U 0 ⊗ I k1 )Σ0f (h)(U ⊗ I k1 )
h=1
2
kRk4min · σk1 (U 0 ⊗ I k1 )Σ0f (h)(U ⊗ I k1 ) = O(p12−2δ1 p2−2δ

≥ 2
2
).

34
Proof of Theorem 1

Proof. By Lemmas 1-5, and Lemma 3 in Lam et al. (2011), Theorem 1 follows.

Proof of Theorem 2

Proof. The proof is quite similar to that of Theorem 1 of Lam and Yao (2012). We denote λ b1,j
and q
b1,j for the j-th largest eigenvalue of Mc 1 and its corresponding eigenvector, respectively.
The corresponding population eigenvalues are denoted by λ1,j and q 1,j for the matrix M 1 . Let
Q
b 1 = (b
q 1,1 , . . . , q
b1,k1 ) and Q1 = (q 1,1 , . . . , q 1,k1 ). We have

λ1,j = q 01,j M 1 q 1,j , and λ b01,j M

b1,j = q c 1q
b1,j , j = 1, . . . , p1 .

b1,j − λ1,j by
We can decompose λ

b01,j M
b1,j − λ1,j = q
λ b1,j − q 01,j M 1 q 1,j = I1 + I2 + I3 + I4 + I5 ,
c1 q

where
q 1,j − q 1,j )0 (M
I1 = (b c 1 − M 1 )b
q 1,j , q 1,j − q 1,j )0 M 1 (b
I2 = (b q 1,j − q 1,j ),

q 1,j − q 1,j )0 M 1 q 1,j ,

I3 = (b I4 = q 01,j (M
c 1 − M 1 )b
q 1,j , I5 = q 01,j M 1 (b
q 1,j − q 1,j ).

q 1,j − q 1,j k2 ≤ kQ
For j = 1, . . . , k1 , kb b 1 − Q1 k2 = Op (hT ), where hT = pδ1 pδ2 T −1/2 by The-
1 2
orem 1, and kM 1 k2 = Op (p2−δ 1
1 2−δ2
p 2 ). By Lemma 4, we have kI k
1 2 and kI2 k2 are of order
2−2δ1 2−2δ2 2 2−2δ1 2−2δ2
Op (p1 p2 hT ) and kI3 k2 , kI4 k2 and kI5 k2 are of order Op (p1 p2 hT ). So |λ
b1,j − λ1,j | =
2−2δ1 2−2δ2 2−δ1 2−δ2 −1/2
Op (p1 p2 hT ) = Op (p1 p2 T ).
For j = k1 + 1, . . . , p1 , define,
p2 X
h0 X
X p2
M
f1 = b x,ij (h)Ω0 (h),
Ω B q 1,k1 +1 , . . . , q
b 1 = (b b1,p1 ), and B 1 = (q 1,k1 +1 , . . . , q 1,p1 ).
x,ij
h=1 i=1 j=1

It can be shown that kB b 1 − B 1 k2 = Op (hT ), similar to proof of Theorem 1 with Lemma 3 in Lam
et al. (2011). Hence, kb
q 1,j − q 1,j k2 ≤ kB b 1 − B 1 k2 = Op (hT ).
Since λ1,j = 0, for j = k1 + 1, . . . , p1 , consider the decomposition

λ b01,j M
b1,j = q c 1q
b1,j = K1 + K2 + K3 ,

where

b01,j (M
K1 = q c1 − M
f1 − M
f 1 + M 1 )b
q 1,j , q 01,j (M
K2 = 2b f 1 − M 1 )(b
q 1,j − q 1,j ),

q 1,j − q 1,j )0 M 1 (b
K3 = (b q 1,j − q 1,j ).

35
By Lemma 2 and Lemma 4,
h0
X p2 X
X p2 p2 X
h0 X
X p2
K1 = k (Ωx,ij (h) − Ωx,ij (h))b
b 2
q 1,j k2 ≤ b x,ij (h) − Ωx,ij (h)k2 = Op (p2 p2 T −1 ),
kΩ 2 1 2
h=1 i=1 j=1 h=1 i=1 j=1

|K2 | = Op (kM
f 1 − M 1 k2 · kb
q 1,j − q 1,j k2 ) = Op (kM b 1 − B 1 k2 ) = Op (p2 p2 T −1 ),
f − M 1 k2 · k B
1 2
b 1 − B 1 k2 · kM 1 k2 ) = Op (p2−2δ1 p2−2δ2 h2 ) = Op (p2 p2 T −1 ).
|K3 | = Op (kB 2 1 2 T 1 2

Hence λb1,j = Op (p2 p2 T −1 ).

1 2
If we use the transpose of X t to construct M 2 , we can obtain the asymptotic properties of
the eigenvalues of estimated M 2 in a similar way.

Proof of Theorem 3

Proof.
b t − S t =Q
S b 0 X tQ
b 1Q b 0 − Q1 Z t Q0 = Q
b 2Q b 0 (Q1 Z t Q0 + E t )Q
b 1Q b 0 − Q1 Q0 Q1 Z t Q0 Q2 Q0
b 2Q
1 2 2 1 2 2 1 2 2

=Q b 0 Q1 Z t Q0 (Q
b 1Q b 0 − Q2 Q0 ) + ( Q
b 2Q b 0 − Q1 Q0 )Q1 Z t Q0 + Q
b 1Q b 0 EtQ
b 1Q b0
b 2Q
1 2 2 2 1 1 2 1 2

=I1 + I2 + I3 .

By Theorem 1, we have
b 2 − Q2 k2 = Op (p1/2−δ1 /2 p1/2−δ2 /2 kQ
kI1 k2 ≤ 2kZ t k2 kQ b 2 − Q2 k2 ) = Op (p1/2+δ1 /2 p1/2+δ2 /2 T −1/2 ),
1 2 1 2
b 1 − Q1 k2 kZ t k2 = Op (p1/2−δ1 /2 p1/2−δ2 /2 kQ
kI2 k2 ≤ 2kQ b 1 − Q1 k2 ) = Op (p1/2+δ1 /2 p1/2+δ2 /2 T −1/2 ),
1 2 1 2
b 0 EtQ
kI3 k2 ≤ kQ b0 ⊗ Q
b 2 k2 = k(Q b 0 )vec(E t )k2 ≤ k1 k2 kΣe k2 = Op (1).
1 2 1

The conclusion follows.

Proof of Theorem 4

Proof. We assume that Q1 is uniquely defined as Q1 = (q 1,1 , q 1,2 , . . . , q 1,k1 ), where q 1,1 , . . . , q 1,k1
are eigenvectors of M 1 corresponding to the largest k1 eigenvalues λ1,1 , . . . , λ1,k1 , and λ1,1 >
λ1,2 > . . . > λ1,k1 . Then similar to proof of Theorem 3 in Liu and Chen (2016), we can obtain
the results.

Appendix 2: Definitions of Financials Used

The following table shows the definition of the company financials used in the analysis. Some
are directly reported by the company in their quarterly reports, and some are derived using the
reported figures.

In calculating profit growth ratio, an NA is recorded when profit changes from negative to
positive or from positive to negative.

36
Short Name Variable Name Calculation
Profit.M Profit Margin Net Income/Revenue
Oper.M Operating Margin Operating Income / Revenue
EPS Diluted Earing per share from report
Gross.Margin Gross Margin Gross Profit / Revenue
ROE Return on equity Net Income / Shareholders Equity
ROA Return on assets Net Income / Total Assets
Revenue.PS Revenue Per Share Revenue / Shares Outstanding
LiabilityE.R Liability/Equity Ratio Total Liabilities / Shareholders Equity
AssetE.R Asset/Equity Ratio Total Assets / Shareholders Equity
Earnings.R Basic Earnings Power Ratio EBIT / Total Assets
Payout.R Payout Ratio Dividend Per Share / EPS Basic
Cash.PS Cash Per Share Cash and other / Shares Outstanding
Revenue.G.Q Revenue Growth over last Quarter Revenue/ Revenue Last Quarter −1
Revenue.G.Y Revenue Growth over same Quarter Last Year Revenue/ Revenue Last Year −1
Profit.G.Q Profit Growth over last Quarter Profit / Profit Last Quarter −1
Profit.G.Y Profit Growth over same Quarter last Year profit / Profit Last Quarter −1

View publication stats

Modelling Non-Stationary Times Series
100% (1)
Modelling Non-Stationary Times Series
263 pages
Stochastic Calculus and Brownian Motion
From Everand
Stochastic Calculus and Brownian Motion
Tejas Thakur
No ratings yet
Essentials of Time Series Econometrics
From Everand
Essentials of Time Series Econometrics
Rajat Chopra
No ratings yet
Perth 2014 - Abstract Book - Final PDF
100% (1)
Perth 2014 - Abstract Book - Final PDF
277 pages
Introduction to Time Series Analysis
From Everand
Introduction to Time Series Analysis
Vikas Rathi
No ratings yet
OL Physics Book 2 (MCQ Theory) 2008 Till 2021
No ratings yet
OL Physics Book 2 (MCQ Theory) 2008 Till 2021
386 pages
TVP Multivaiate Causality
No ratings yet
TVP Multivaiate Causality
65 pages
Lecture Notes TS Econometrics
No ratings yet
Lecture Notes TS Econometrics
114 pages
7 Applied Time Series Econometrics PETER C.B. PHILLIPS
No ratings yet
7 Applied Time Series Econometrics PETER C.B. PHILLIPS
350 pages
Course-Material TS Final
No ratings yet
Course-Material TS Final
175 pages
Christ Came Forth From India Georgian Astrological Texts 2020
100% (1)
Christ Came Forth From India Georgian Astrological Texts 2020
485 pages
Catálogo de Anfibios de Venezuela
No ratings yet
Catálogo de Anfibios de Venezuela
200 pages
Year 5 Science Term 1
No ratings yet
Year 5 Science Term 1
42 pages
Advanced Econometrics - 1985 - 1era Edición - Amemiya
100% (1)
Advanced Econometrics - 1985 - 1era Edición - Amemiya
531 pages
Technology and American Society A History 3rd Edition Cross Gary Szostak Rick PDF Download
No ratings yet
Technology and American Society A History 3rd Edition Cross Gary Szostak Rick PDF Download
54 pages
Hannan E.J., Krishnaiah P.R., Rao M.M.-Handbook of Statistics, Vol. 5. Time Series in The Time Domain (1985) PDF
No ratings yet
Hannan E.J., Krishnaiah P.R., Rao M.M.-Handbook of Statistics, Vol. 5. Time Series in The Time Domain (1985) PDF
482 pages
2016 Book TimeSeriesEconometrics
100% (3)
2016 Book TimeSeriesEconometrics
421 pages
TIME SERIES ANALYSIS Chapter 1 and 2
No ratings yet
TIME SERIES ANALYSIS Chapter 1 and 2
24 pages
Lecture Notes
No ratings yet
Lecture Notes
97 pages
Time Series Analysis
No ratings yet
Time Series Analysis
20 pages
Topics in Time Series Econometrics
67% (3)
Topics in Time Series Econometrics
157 pages
3 - Thermal Energy Storage in District Heating and Cooling Systems A Review
No ratings yet
3 - Thermal Energy Storage in District Heating and Cooling Systems A Review
22 pages
CUHK ICSA Report V1.0
No ratings yet
CUHK ICSA Report V1.0
34 pages
Zeluiz, V4n4a03
No ratings yet
Zeluiz, V4n4a03
21 pages
Autoregressivemodelsformatrix Valuedtimeseries
No ratings yet
Autoregressivemodelsformatrix Valuedtimeseries
22 pages
MTH101 Final Term Solved Subjective Lecture 23 To 45
No ratings yet
MTH101 Final Term Solved Subjective Lecture 23 To 45
43 pages
Considering Customer Lifetime Network Value in Oligopoly Markets With The Use of Game Theory Approach
No ratings yet
Considering Customer Lifetime Network Value in Oligopoly Markets With The Use of Game Theory Approach
27 pages
ACX 6.4-MV Product Brochure
100% (1)
ACX 6.4-MV Product Brochure
4 pages
Dissertation Penser Par Soi Meme
100% (2)
Dissertation Penser Par Soi Meme
6 pages
CL6 Winter 2024-25
No ratings yet
CL6 Winter 2024-25
4 pages
State Space
No ratings yet
State Space
50 pages
Substantive Theory and Constructive Measures: A Collection of Chapters and Measurement Commentary on Causal Science
From Everand
Substantive Theory and Constructive Measures: A Collection of Chapters and Measurement Commentary on Causal Science
Mark Everett Stone
No ratings yet
A Brief Tour of Deep Learning From A Statistical Perspective
No ratings yet
A Brief Tour of Deep Learning From A Statistical Perspective
31 pages
The Dynamic, The Static, and The Weak Factor Models and The Analysis of High-Dimensional Time Series
No ratings yet
The Dynamic, The Static, and The Weak Factor Models and The Analysis of High-Dimensional Time Series
25 pages
WP0 REPLA0140 Same 0 Box 00 PUBLIC0
No ratings yet
WP0 REPLA0140 Same 0 Box 00 PUBLIC0
114 pages
Dissertation Theatre Vu Ou Lu
100% (1)
Dissertation Theatre Vu Ou Lu
7 pages
PRML Errata 1st 20110921
No ratings yet
PRML Errata 1st 20110921
29 pages
1 s2.0 S0304407618301787 Main
No ratings yet
1 s2.0 S0304407618301787 Main
18 pages
2 WhyWhatDM
No ratings yet
2 WhyWhatDM
9 pages
Journal Time Series Analysis - 2023 - Armillotta - Count Network Autoregression
No ratings yet
Journal Time Series Analysis - 2023 - Armillotta - Count Network Autoregression
29 pages
An Introduction to Probability and Stochastic Processes
From Everand
An Introduction to Probability and Stochastic Processes
James L. Melsa
4.5/5 (2)
Time Series Analysis in Economics
100% (1)
Time Series Analysis in Economics
397 pages
Dynamic Factor Models M Watson
No ratings yet
Dynamic Factor Models M Watson
43 pages
Asr 013
No ratings yet
Asr 013
16 pages
Curriculum Development A Summary
No ratings yet
Curriculum Development A Summary
22 pages
DLL Mapeh G4 Q2 W1
No ratings yet
DLL Mapeh G4 Q2 W1
11 pages
Time Series Econometrics
100% (5)
Time Series Econometrics
421 pages
Structural Inference in Cointegrated Vector Autoregressive Models
No ratings yet
Structural Inference in Cointegrated Vector Autoregressive Models
197 pages
Topics in Time Series Econometrics PDF
No ratings yet
Topics in Time Series Econometrics PDF
157 pages
Time Series with Python: How to Implement Time Series Analysis and Forecasting Using Python
From Everand
Time Series with Python: How to Implement Time Series Analysis and Forecasting Using Python
Bob Mather
3/5 (1)
A Cell-Based Smoothed Finite Element Method For TH
No ratings yet
A Cell-Based Smoothed Finite Element Method For TH
14 pages
Distributed Lag
No ratings yet
Distributed Lag
4 pages
Lütkepohl & Krätzig 2004 Applied Time Series Econometrics
No ratings yet
Lütkepohl & Krätzig 2004 Applied Time Series Econometrics
350 pages
Time Series Data
No ratings yet
Time Series Data
19 pages
MACF Dan MPACF (Tiao - Box1981)
No ratings yet
MACF Dan MPACF (Tiao - Box1981)
16 pages
Cmos Asynchronous Fifo 2048 X 9, 4096 X 9, 8192 X 9 and 16384 X 9
No ratings yet
Cmos Asynchronous Fifo 2048 X 9, 4096 X 9, 8192 X 9 and 16384 X 9
15 pages
Arma Models: This Project Is About The Time Analysis Based Model ARMA. Which Is A Forecasting
No ratings yet
Arma Models: This Project Is About The Time Analysis Based Model ARMA. Which Is A Forecasting
23 pages
Statistical Method from the Viewpoint of Quality Control
From Everand
Statistical Method from the Viewpoint of Quality Control
Walter A. Shewhart
4.5/5 (5)
Princeton Research Work
No ratings yet
Princeton Research Work
68 pages
MPRA Paper 98322
No ratings yet
MPRA Paper 98322
14 pages
TU108 Project 3
No ratings yet
TU108 Project 3
8 pages
Applied Survival Analysis: Regression Modeling of Time-to-Event Data
From Everand
Applied Survival Analysis: Regression Modeling of Time-to-Event Data
David W. Hosmer, Jr.
4/5 (2)
Introduction To Time Series Econometrics
No ratings yet
Introduction To Time Series Econometrics
16 pages
John Research
No ratings yet
John Research
9 pages
Handbook 12-12-06 Elsevier Style
No ratings yet
Handbook 12-12-06 Elsevier Style
57 pages
Trends, Cycles and Autoregressions
0% (1)
Trends, Cycles and Autoregressions
11 pages
Bishop-Pattern-Recognition-and-Machine-Learning-2006 第455 - 459页
No ratings yet
Bishop-Pattern-Recognition-and-Machine-Learning-2006 第455 - 459页
5 pages
FIAT Q SF 2014 - Raw Items and Tables
No ratings yet
FIAT Q SF 2014 - Raw Items and Tables
2 pages
Final Term Table Exam Fall 2024-2025 Final
No ratings yet
Final Term Table Exam Fall 2024-2025 Final
3 pages
Jurnal Varxx
No ratings yet
Jurnal Varxx
9 pages
Summary Completion Ielts Reading
No ratings yet
Summary Completion Ielts Reading
8 pages
Arima Garch 11 Modelling and Forecasting For A Ge Stock Price Using R
No ratings yet
Arima Garch 11 Modelling and Forecasting For A Ge Stock Price Using R
20 pages
Hawas Bajawi CV
No ratings yet
Hawas Bajawi CV
4 pages
Valeo Technical Paper
No ratings yet
Valeo Technical Paper
6 pages
Unit III Time Series Analysis Lesson 6
No ratings yet
Unit III Time Series Analysis Lesson 6
22 pages
2003 - Engle
No ratings yet
2003 - Engle
31 pages
Special Issue On Nonlinear Modelling and
No ratings yet
Special Issue On Nonlinear Modelling and
3 pages
CV 231218002 101551 MR Mok WingYeungRussell
No ratings yet
CV 231218002 101551 MR Mok WingYeungRussell
3 pages
CV 231218002 84224 MR Chen Shouzheng
No ratings yet
CV 231218002 84224 MR Chen Shouzheng
3 pages
SSC Geography
No ratings yet
SSC Geography
3 pages
Time Series
No ratings yet
Time Series
19 pages
Evidence
No ratings yet
Evidence
4 pages
Analysis of Stock Market Data by Using Dynamic Fourier and Wavelets Teknik
No ratings yet
Analysis of Stock Market Data by Using Dynamic Fourier and Wavelets Teknik
13 pages
Chapter 5
No ratings yet
Chapter 5
17 pages
Computational Fluid Dynamics with Moving Boundaries
From Everand
Computational Fluid Dynamics with Moving Boundaries
Wei Shyy
No ratings yet
Models: Autoregressive Moving Average
No ratings yet
Models: Autoregressive Moving Average
13 pages
Time Series Analysis
No ratings yet
Time Series Analysis
2 pages
CV 231218002 83407 MR Su Yujia
No ratings yet
CV 231218002 83407 MR Su Yujia
1 page
CV 231218002 81212 Miss LI RUI
No ratings yet
CV 231218002 81212 Miss LI RUI
1 page
Beck
No ratings yet
Beck
18 pages
Nomination Acceptance Letter - Mitacs GRI 2025
No ratings yet
Nomination Acceptance Letter - Mitacs GRI 2025
1 page
Red Team Blue Team Exercise Data Sheet
No ratings yet
Red Team Blue Team Exercise Data Sheet
2 pages
Econometrics 3: Massimiliano Marcellino
No ratings yet
Econometrics 3: Massimiliano Marcellino
4 pages
Temporal Disaggregation of Time Series Data
No ratings yet
Temporal Disaggregation of Time Series Data
8 pages
2-Time Series Analysis 22-02-07 Revised
No ratings yet
2-Time Series Analysis 22-02-07 Revised
14 pages
TSA: James D. Hamilton, Time Series Analysis, Princeton University Press, 1994
No ratings yet
TSA: James D. Hamilton, Time Series Analysis, Princeton University Press, 1994
6 pages
BBS en 2010 1 Piscopo
No ratings yet
BBS en 2010 1 Piscopo
8 pages
Baker
No ratings yet
Baker
10 pages
Characteristics of Time Series
No ratings yet
Characteristics of Time Series
2 pages
Assignment No. 7 Chemical Engineering Fluid Dynamics Session 2016 Due Date: 16 May-2018 Solve All The Questions. (As A Part of Assessment of CLO3)
No ratings yet
Assignment No. 7 Chemical Engineering Fluid Dynamics Session 2016 Due Date: 16 May-2018 Solve All The Questions. (As A Part of Assessment of CLO3)
1 page
İdi̇l Ören CV
No ratings yet
İdi̇l Ören CV
3 pages

Factor Models For Matrix-Valued High-Dimensional T

Uploaded by

Factor Models For Matrix-Valued High-Dimensional T

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Factor Models for Matrix-Valued High-Dimensional Time Series

Article in Journal of Econometrics · October 2016

Xialu Liu Rong Chen

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Table 1: Illustration of a matrix-valued time series

2 Matrix Factor Models

We propose the following factor model for matrix-valued time series,

Here, F t is a k1 × k2 unobserved matrix-valued time series of common fundamental factors, R

Interpretation I: To isolate effects, assume k1 = p1 and R = I p1 , then X t = F t C 0 + E t . In this

US Japan ... China US Japan ... China

Here we provide some additional remarks of model (1).

vec(X t ) = Φft + et , t = 1, 2, . . . , T, (2)

vec(X t ) = (C ⊗ R)vec(F t ) + vec(E t ). (3)

3 Estimation and Modeling Procedures

where Qi is a pi × ki matrix with orthonormal columns and W i is a ki × ki non-singular matrix,

as a transformed latent factor process. Then, model (1) can be re-expressed as

xt,j = RF t c0j· + t,j = Q1 Z t q 02,j· + t,j , j = 1, 2, . . . , p2 . (6)

From the zero mean assumptions of both F t and E t , we have E(xt,j ) = 0.

for h ≥ 1. For a pre-determined integer h0 , define

By (9) and (10), it follows that

Then, M(Q1 ) can be estimated by M(Q b 1 = {b

vec(X t ) = (Q2 ⊗ Q1 )vec(Z t ) + vec(E t ).

Together with the orthonormal properties of both Q b 1 and Q

3.2 Theoretical Properties of the Estimator

α(h) = sup sup |P (A ∩ B) − P (A)P (B)|,

and Fij is the σ-field generated by {vec(F t ) : i ≤ t ≤ j}.

Condition 2. Let Ft,ij be the ij-th entry of F t . For any i = 1, . . . , k1 , j = 1, . . . , k2 , and

Condition 3. Each element of Σe remains bounded as p1 and p2 increase to infinity.

Condition 5. M i has ki distinct positive eigenvalues for i = 1, 2.

b i − Qi k2 = Op (pδ1 pδ2 T −1/2 ), for i = 1, 2.

bi,j − λi,j | = Op (p2−δ1 p2−δ2 T −1/2 ),

where λi,1 > λi,2 . . . > λi,ki are eigenvalues of M i , for i = 1, 2.

b i , Qi ) = Op (pδ1 pδ2 T −1/2 ), for i = 1, 2.

Table 3: Means and standard deviations (in parentheses) of D(Q,

Table 6: Means and standard deviations (in parentheses) of D(Q

5 Real Example: Fama-French 10 by 10 Series

Figure 1: Time series plot of Fama-French 10 by 10 series.

Table 11: Comparison of different models for Fama-French series.

Figure 2: Fama-French series: Logarithms and ratios of eigenvalues of M

Figure 3: Fama-French series: Logarithms and ratios of eigenvalues of M

Figure 4: Fama-French series: Estimated factors.

6 Real Example: Series of Company Financials

Figure 6: Financial series: Eigenvalues and their ratios of M

Eigenvalues of Column Matrix Ratio of Eigenvalues

Figure 7: Financial series: Eigenvalues and their ratios of M

Group 1 AssetE.R LiabilityE.R

Table 13: Financial series: Grouping of company financials

Engle, R. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of

Engle, R. and Kroner, K. (1995). Multivariate simultaneous generalized arch. Econometric

Tsay, R. (2005). Analysis of Financial Time Series. New York: Wiley.

Tsay, R. (2014). Multivariate Time Series Analysis. New York: Wiley.

Proof. Under Conditions 1 and 2, by Davydov’s inequality, it follows that

Here C denotes a constant.

Lemma 2. Under Conditions 1-4, it holds that

Proof. Firstly, we have

Hence, by Condition 4 and Lemma 1, it follows that

Lastly, for the noise term, we have

Lemma 3. Under Conditions 1-4, it holds that

Proof. From the definition of Ω

Then by Lemma 2, it follows that

c 1 − M 1 k2 = Op (p2−δ1 p2−δ2 T −1/2 ).

Proof. From the definitions of M

By (15) and Lemma 3,

where the second inequality follows from Cauchy-Schwarz inequality.

c 1 − M 1 k2 = Op (p2−δ1 p2−δ2 T −1/2 ).

Lemma 5. Under Conditions 2 and 3, we have

where λi (M 1 ) denotes the i-th largest eigenvalue of M 1 .

Under Conditions 2-3 and by properties of Kronecker product we have

λ1,j = q 01,j M 1 q 1,j , and λ b01,j M

q 1,j − q 1,j )0 M 1 q 1,j ,

Hence λb1,j = Op (p2 p2 T −1 ).

The conclusion follows.

Appendix 2: Definitions of Financials Used

View publication stats

xt,j = RF t c0j· + t,j = Q1 Z t q 02,j· + t,j , j = 1, 2, . . . , p2 . (6)