Multivariate Gaussian and Student T Process Regression For Multi-Output Prediction
Multivariate Gaussian and Student T Process Regression For Multi-Output Prediction
Abstract Gaussian process for vector-valued function model has been shown to be
a useful method for multi-output prediction. The existing method for this model is
to re-formulate the matrix-variate Gaussian distribution as a multivariate normal
distribution. Although it is effective in many cases, re-formulation is not always
workable and difficult to extend because not all matrix-variate distributions can be
transformed to related multivariate distributions, such as the case for matrix-variate
Student−t distribution. In this paper, we propose a new derivation of multivariate
Gaussian process regression (MV-GPR), where the model settings, derivations and
computations are all directly performed in matrix form, rather than vectorizing the
matrices as done in the existing methods. Furthermore, we introduce the multivari-
ate Student−t process and then derive a new method, multivariate Student−t pro-
cess regression (MV-TPR) for multi-output prediction. Both MV-GPR and MV-TPR
have closed form expressions for the marginal likelihoods and predictive distribu-
tions. The usefulness of the proposed methods are illustrated through several sim-
ulated examples. In particular, we verify empirically that MV-TPR has superiority
for the datasets considered, including air quality prediction and bike rent predic-
tion. At last, the proposed methods are shown to produce profitable investment
strategies in the stock markets.
1 Introduction
Over the last few decades, Gaussian processes regression (GPR) has been proven
to be a powerful and effective method for non-linear regression problems due to
many favorable properties, such as simple structure of obtaining and expressing
uncertainty in predictions, the capability of capturing a wide variety of behaviour
by parameters, and a natural Bayesian interpretation [21, 4]. In 1996, Neal [20] re-
vealed that many Bayesian regression models based on neural network converge to
Gaussian processes (GP) in the limit of an infinite number of hidden units [26]. GP
has been suggested as a replacement for supervised neural networks in non-linear
regression [17,27] and classification [17]. Furthermore, GP has excellent capability
for forecasting time series [6,5].
Despite the popularity of GPR in various modelling tasks, there still exists a
conspicuous imperfection, that is, the majority of GPR models are implemented
for single response variables or considered independently for multiple responses
variables without consideration of their correlation [4, 25]. In order to resolve multi-
output prediction problem, Gaussian process regression for vector-valued function
is proposed and regarded as a pragmatic and straightforward method. The core of
this method is to vectorize the multi-response variables and construct a ”big” co-
variance, which describes the correlations between the inputs as well as between the
outputs [4,25,7,2]. This modelling strategy is feasible due to the fact that the matrix-
variate Gaussian distributions can be re-formulated as multivariate Gaussian distri-
butions [7,14]. Intrinsically, Gaussian process regression for vector-valued function
is still a conventional Gaussian process regression model since it merely vectorizes
multi-response variables, which are assumed to follow a developed case of GP with
a reproduced kernel. As an extension, it is natural to consider more general elliptical
processes models for multi-output prediction. However, the vectorization method
cannot be used to extend multi-output GPR because the equivalence between vec-
torized matrix-variate and multivariate distributions only exists in Gaussian cases
[14].
To overcome this drawback, we propose another derivation of dependent Gaus-
sian process regression, named as multivariate Gaussian process regression (MV-
GPR), where the model settings, derivations and computations are all directly per-
formed in matrix form, rather than vectorizing the matrices as done in the existing
methods. MV-GPR is a more straightforward method, and can be implemented in
the same way as the conventional GPR. In fact, our proposed MV-GPR not only a
new derivation of Gaussian process for multi-output, but also a simple framework
to derive more general elliptical processes models for multi-output. Hence, we fur-
ther introduce the multivariate Student−t process and then derive a new consider-
able method, multivariate Student−t process regression (MV-TPR) for multi-output
prediction. The usefulness of proposed methods are illustrated through several sim-
ulated examples. Furthermore, we also verify empirically that MV-TPR has supe-
riority in the prediction based on some widely-used datasets, including air qual-
ity prediction and bike rent prediction. The proposed methods are then applied to
stock market modelling and shown to make the profitable stock investment strate-
gies.
In summary, the main contributions of this paper are concluded in the follow-
ing. (1), We propose a concise and straightforward derivation of dependent Gaus-
sian process regression, MV-GPR. (2), Based on the framework of MV-GPR deriva-
Multivariate Gaussian and Student−t process regression for multi-output prediction 3
tion, we achieve a new method for multi-output prediction, MV-TPR. (3), The ef-
fectiveness of the proposed MV-GPR and MV-TPR are illustrated through several
simulated examples. (4), MV-TPR shows its superiority in the air quality predic-
tion and the bike rent prediction. (5), We apply MV-GPR and MV-TPR to produce
profitable investment strategies in the stock markets.
The paper is organized as follows. Section 2 introduces some preliminaries of
matrix-variate Gaussian and Student−t distributions with their useful properties.
In Section 3 multivariate Gaussian process and multivariate Student−t process are
defined and their regression models are presented. Some numerical experiments by
simulated data, stock market investment, and real datasets are presented in Section
4. Section 5 concludes the paper.
2 Preliminaries
dn d n 1
p( X | M, Σ, Ω) = (2π )− det(Σ)− 2 det(Ω)− 2 etr(− Ω−1 ( X − M)T Σ−1 ( X − M )),
2
2
(1)
where etr(·) is exponential of matrix trace, Ω and Σ are positive semi-definite. It is denoted
X ∼ MN n,d ( M, Σ, Ω). Without loss of clarity, it is denoted X ∼ MN ( M, Σ, Ω).
where vec(·) is vector operator and ⊗ is Kronecker product (or called tensor product).
where n1 , n2 , d1 , d2 is the column or row length of the corresponding vector or matrix. Then,
1. X1r ∼ MN n1 ,d ( M1r , Σ11 , Ω),
−1
X2r | X1r ∼ MN n2 ,d M2r + Σ21 Σ11 ( X1r − M1r ), Σ22·1 , Ω ;
where Σ22·1 and Ω22·1 are the Schur complement [29] of Σ11 and Ω11 , respectively,
−1 −1
Σ22·1 = Σ22 − Σ21 Σ11 Σ12 , Ω22·1 = Ω22 − Ω21 Ω11 Ω12 .
Γn [ 21 (ν + d + n − 1)] d n
p( X |ν, M, Σ, Ω) = 1 dn
det(Σ)− 2 det(Ω)− 2 ×
π 2 Γn [ 21 (ν + n − 1)]
1
det(In + Σ−1 ( X − M)Ω−1 ( X − M )T )− 2 (ν+d+n−1) , (2)
d
Theorem 6 (Asymptotics) Let X ∼ MT n,d (ν, M, Σ, Ω),then X → MN n,d ( M, Σ, Ω)
d
as ν → ∞, where ”→” denotes convergence in distribution.
Theorem 7 (Marginalization and conditional distribution) Let X ∼ MT n,d (ν, M, Σ, Ω),
and partition X, M, Σ and Ω as
X1r n1 M1r n1
X= = [ X1c X2c ] , M= = [ M1c M2c ]
X2r n2 M2r n2
d1 d2 d1 d2
where n1 , n2 , d1 , d2 is the column or row length of the corresponding vector or matrix. Then,
1. X1r ∼ MT n1 ,d (ν, M1r , Σ11 , Ω),
−1
X2r | X1r ∼MT n2 ,d ν + n1 , M2r + Σ21 Σ11 ( X1r − M1r ), Σ22·1 ,
−1
Ω + ( X1r − M1r )T Σ11 ( X1r − M1r ) ;
where Σ22·1 and Ω22·1 are the Schur complement of Σ11 and Ω11 , respectively,
−1 −1
Σ22·1 = Σ22 − Σ21 Σ11 Σ12 , Ω22·1 = Ω22 − Ω21 Ω11 Ω12 .
Like the derivation of GPR, we have to define multivariate Gaussian process before
the derivation of MV-GPR. Following the definition of GP, a multivariate Gaussian
process should be a collection of random vector-valued variables, any finite number
of which have matrix-variate Gaussian distribution. Therefore, we define a multi-
variate Gaussian process as follows.
Definition 3 (MV-GP) f is a multivariate Gaussian process on X with vector-valued
mean function u : X 7→ Rd , covariance function (also called kernel) k : X × X 7→ R and
positive semi-definite parameter matrix Ω ∈ Rd×d if any finite collection of vector-valued
variables have a joint matrix-variate Gaussian distribution,
[f ( x1 )T , . . . , f ( xn )T ]T ∼ MN ( M, Σ, Ω), n ∈ N,
where f , u ∈ Rd are row vectors whose components are the functions { f i }id=1 and {µi }id=1
respectively. Furthermore, M ∈ Rn×d with Mij = µ j ( xi ), and Σ ∈ Rn×n with Σij =
k( xi , x j ). Sometimes Σ is called column covariance matrix while Ω is row covariance ma-
trix. We denote f ∼ MGP (u, k, Ω).
6 Zexun Chen et al.
[f ( x1 )T , . . . , f ( xn )T ]T ∼ MN (0, K 0 , Ω),
where K 0 is the n × n covariance matrix of which the (i, j)-th element [K 0 ]ij =
k 0 ( x i , x j ).
To predict a new variable f∗ = [ f ∗1 , . . . , f ∗m ]T at the test locations X∗ = [ xn+1 , . . . , xn+m ]T ,
the joint distribution of the training observations Y = [y1T , · · · , ynT ]T and the predic-
tive targets f∗ are given by
K 0 ( X, X ) K 0 ( X∗ , X )T
Y
∼ MN 0, ,Ω , (3)
f∗ K 0 ( X∗ , X ) K 0 ( X∗ , X∗ )
where
M̂ = K 0 ( X∗ , X )T K 0 ( X, X )−1 Y, (5)
0 0 T 0 −1 0
Σ̂ = K ( X∗ , X∗ ) − K ( X∗ , X ) K ( X, X ) K ( X∗ , X ), (6)
Ω̂ = Ω. (7)
3.1.1 Kernel
Despite there are two covariance matrices, column covariance and row covariance
in MV-GPR, only the column covariance depended on inputs is considered as kernel
since it contains our presumptions about the function we wish to learn and define
the closeness and similarity between data points [22]. Of course, the choice of kernel
also has a profound impact on the performance of a multivariate Gaussian process
as well as multivariate Student−t process. The most widely-used kernel is Squared
Exponential (SE), which is defined as
k x − x 0 k2
k SE ( x, x 0 ) = s2f exp(− ),
2`2
where s2f is the signal variance and can also be considered as an output-scale am-
plitude and the parameter ` is the input (length or time) scale [23]. The kernel can
also be defined by Automatic Relevance Determination (ARD)
(x − x0 )T Θ −1 (x − x0 )
k SEard (x, x0 ) = s2f exp(− ),
2
p
where Θ is a diagonal matrix with the element components {`2i }i=1 , which are the
length scales for each corresponding input dimension.
nd d n 1
L= ln(2π ) + ln det(K 0 ) + ln det(Ω) + tr((K 0 )−1 YΩ−1 Y T ). (10)
2 2 2 2
The derivatives of the negative log marginal likelihood with respect to parame-
ter σn2 , θi , φij and ϕii are as follows
∂L d 1
= tr((K 0 )−1 ) − tr(αK0 Ω−1 αTK0 ),
∂σn2 2 2
∂L
d ∂K 1 ∂K
= tr (K 0 )−1 θ − tr αK0 Ω−1 αTK0 θ ,
∂θi 2 ∂θi 2 ∂θi
∂L n 1
= tr[Ω−1 (Eij ΦT + ΦEij )] − tr[αΩ (K 0 )−1 αTΩ (Eij ΦT + ΦEij )],
∂φij 2 2
∂L n 1
= tr[Ω−1 (Jii ΦT + ΦJii )] − tr[αΩ (K 0 )−1 αTΩ (Jii ΦT + ΦJii )],
∂ϕii 2 2
where αK0 = (K 0 )−1 Y, αΩ = Ω−1 Y T , Eij is the d × d elementary matrix having unity
in the (i,j)-th element and zeros elsewhere, and Jii is the same as Eij but with the
unity being replaced by e ϕii . The details can be found in A.2.
3.2 Comparison
In fact, by comparing the results in the existing methods [25, 2] and our proposed
methods (Equation (8) and Equation (9)), the predictive distributions are the same
if we consider the noise-free regression model. However, there is no doubt that
our proposed MV-GPR has more straightforward formulation, where the model
settings, derivations and computations are all directly performed in matrix form,
rather than vectorizing the matrices.
Furthermore, it is not difficult to find that Theorem 2 (vectorizable property)
plays an essential role in the derivation of the existing methods of multi-output
Gaussian process. In other words, the classical derivation does rely on the equiv-
alence between vectorized matrix-variate Gaussian distribution and multivariate
Gaussian distribution. Unfortunately, book [14] presents a simple example to show
that the similar result between vectorized matrix-variate Student−t distribution
and multivariate Student−t distribution does not exist. That is to say, the structure
of the derivation of the existing methods cannot be applied to the Student−t ver-
sion. However, the derivation of our proposed MV-GPR is based on the calculations
in the matrix form. That is to say, the derivation does not rely on the equivalence
in Theorem 2. As a consequence, the structure of our proposed MV-GPR is able
to be applied to derive MV-TPR naturally and thus it is easy to consider a depen-
dent Student−t process regression model for multi-output prediction by defining a
multivariate Student−t process.
To conclude, our proposed MV-GPR provides not only a new derivation of
Gaussian process for multi-output, but also a simple framework to derive more
general elliptical processes models for multi-output, that is, MV-TPR.
Multivariate Gaussian and Student−t process regression for multi-output prediction 9
[f ( x1 )T , . . . , f ( xn )T ]T ∼ MT (ν, M, Σ, Ω), n ∈ N,
where f , u ∈ Rd are row vectors whose components are the functions { f i }id=1 and {µi }id=1
respectively. Furthermore, M ∈ Rn×d with Mij = µ j ( xi ), and Σ ∈ Rn×n with Σij =
k( xi , x j ). We denote f ∼ MT P (ν, u, k, Ω).
Without much modifications, MV-TPR model can be formulated along the same
line as MV-GPR based on the definition of multivariate Student−t process and it
briefly presented below.
Given n pairs of observations {( xi , yi )}in=1 , xi ∈ R p , yi ∈ Rd , we assume
where ν is the degree of freedom of Student−t process and the remaining parame-
ters have the same meaning of MV-GPR model. Consequently, the predictive distri-
bution is obtained as
where
ν̂ = ν + n, (12)
0 T 0 −1
M̂ = K ( X∗ , X ) K ( X, X ) Y, (13)
Σ̂ = K 0 ( X∗ , X∗ ) − K 0 ( X∗ , X )T K 0 ( X, X )−1 K 0 ( X∗ , X ), (14)
T 0 −1
Ω̂ = Ω + Y K ( X, X ) Y. (15)
∂L
1 1 1 1 1 1
= ln det(U ) − ln det(K 0 ) + ψn ( τ ) − ψn (τ + d) ,
∂ν 2 2 2 2 2 2
∂L (τ + d) τ
= tr(U −1 ) − tr((K 0 )−1 ),
∂σn2 2 2
∂L
(τ + d) −1 ∂K τ −1 ∂Kθ
= tr U θ
− tr Σ ,
∂θi 2 ∂θi 2 ∂θi
∂L (τ + d) n
=− tr[U −1 αTΩ (Eij ΦT + ΦEij )αΩ ] + tr[Ω−1 (Eij ΦT + ΦEij )],
∂φij 2 2
∂L (τ + d) n
=− tr[U −1 αTΩ (Jii ΦT + ΦJii )αΩ ] + tr[Ω−1 (Jii ΦT + ΦJii )],
∂ϕii 2 2
4 Experiments
We first consider a simulated data from two specific functions. The true model used
to generate data is given by,
y = [ f 1 ( x ), f 2 ( x )] + [ε(1) , ε(2) ],
f 1 ( x ) = 2x · cos( x ), f 2 ( x ) = 1.5x · cos( x + π/5),
where the vector noise is produced from a sample of multivariate Gaussian process
[ε(1) , ε(2) ] ∼ MGP (0, k SE , Ω) . We select k SE with parameter [`, s2f ] = [ln(1.001), ln(5)]
and Ω = 0.25 1 0.25 . The covariate x has 100 equally spaced values in [-10, 10] so that
1
a sample of 100 observations for y1 and y2 are obtained.
For model training, we try to use fewer points with one part missing so that
the zth data points where z = {3r + 1}12 32
r =1 ∪ {3r + 2}r =22 are selected for both y1
and y2 . The prediction is performed at all 100 covariate values equally spaced in
[-10, 10]. The RMSEs between the predicted values and the true ones from f 1 ( x )
and f 2 ( x ) are calculated. At the same time, the conventional GPR and TPR models
are conducted for the two outputs independently, which are used to compare with
the proposed models. The process above is repeated 1000 times and the results are
reported in Table 1 and an example of prediction is given in Figure 1. The ARMSE
(Average Root Mean Square Error) for 100 points predictions repeated 1000 times is
defined by
!1
1 1000 1 100
2
ARMSE = ∑
1000 i=1 100 j=1 ∑ (ŷij − yij ) 2
,
where yij is the jth observation in the ith experiment while ŷij is the jth prediction
in the ith experiment.
Table 1 The ARMSE by the different models (multivariate Gaussian noisy data)
The same experiment is also conducted for the case where the vector noise is
a sample from multivariate Student−t process [ε(1) , ε(2) ] ∼ MT P (3, 0, k SE , Ω).We
select k SE with parameter [`, s2f ] = [ln(1.001), ln(5)] and Ω = [1, 0.25; 0.25, 1]. The
result of ARMSEs are presented in Table 2 and an example of prediction is demon-
strated in Figure 2.
From the tables and figures above, it can be seen that the multivariate process
regression models are able to discover a more desired pattern in the gap compared
with using the conventional GPR and TPR model independently. It also reveals that
taking correlations between the two outputs into consideration improves the accu-
racy of prediction compared with the methods of modeling each output indepen-
dently. In particular, MV-TPR performs better than MV-GPR in the predictions of
12 Zexun Chen et al.
MV-GP, y 1 MV-TP, y 1
20 25
15 20
15
10
10
5
5
0
0
-5
-5
-10
-10
-15 -15
-20 -20
-10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10
(a) MV-GPR (y1 ) (b) MV-TPR (y1 )
GP, y 1 TP, y 1
20 20
15 15
10 10
5 5
0 0
-5 -5
-10 -10
-15 -15
-20 -20
-10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10
(c) GPR (y1 ) (d) TPR (y1 )
MV-GP, y 2 MV-TP, y 2
15 20
15
10
10
5
5
0 0
-5
-5
-10
-10
-15
-15 -20
-10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10
(e) MV-GPR (y2 ) (f) MV-TPR (y2 )
GP, y 2 TP, y 2
20 20
15 15
10 10
5 5
0 0
-5 -5
-10 -10
-15 -15
-10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10
(g) GPR (y2 ) (h) TPR (y2 )
Fig. 1 Predictions for MV-GP noise data using different models. From panels (a) to (d): predictions
for y1 by MV-GPR, MV-TPR, GPR and TPR. From panels (e) to (h): predictions for y2 by MV-GPR,
MV-TPR, GPR and TPR. The solid blue lines are predictions, the solid red lines are the true functions
and the circles are the observations. The dash lines represent the 95% confidence intervals
Multivariate Gaussian and Student−t process regression for multi-output prediction 13
MV-GP, y 1 MV-TP, y 1
25 25
20 20
15 15
10 10
5 5
0 0
-5 -5
-10 -10
-15 -15
-20 -20
-25 -25
-10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10
(a) MV-GPR (y1 ) (b) MV-TPR (y1 )
GP, y 1 TP, y 1
25 25
20 20
15 15
10 10
5 5
0 0
-5 -5
-10 -10
-15 -15
-20 -20
-25 -25
-10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10
(c) GPR (y1 ) (d) TPR (y1 )
MV-GP, y 2 MV-TP, y 2
20 20
15 15
10 10
5 5
0 0
-5 -5
-10 -10
-15 -15
-20 -20
-10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10
(e) MV-GPR (y2 ) (f) MV-TPR (y2 )
GP, y 2 TP, y 2
20 20
15 15
10 10
5 5
0 0
-5 -5
-10 -10
-15 -15
-20 -20
-25 -25
-10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10
(g) GPR (y2 ) (h) TPR (y2 )
Fig. 2 Predictions for MV-TP noise data using different models. From panels (a) to (d): predictions
for y1 by MV-GPR, MV-TPR, GPR and TPR. From panels (e) to (h): predictions for y2 by MV-GPR,
MV-TPR, GPR and TPR. The solid blue lines are predictions, the solid red lines are the true functions
and the circles are the observations. The dash lines represent the 95% confidence intervals
14 Zexun Chen et al.
Table 2 The ARMSE by the different models (multivariate Student−t noisy data)
both types of noisy data. This may be explained by the fact that MV-TPR has a bet-
ter modeling flexibility with one more parameter which can capture the degree of
freedom of the data and take the correlations between two responses into account.
The reason remains to be studied in future.
It is notable that the predictive variance of MV-GPR is much smaller than the
independent GPR model. This is likely caused by the loss of information in the in-
dependent model. As discussed in [15], the prediction uncertainty of GPR is useful
in building the predicting model by ensemble learning.
We further test our proposed methods on two real datasets 1 . The selected mean
function is zero-offset and the selected kernel is SEard. Before the experiments, all
the data have been normalized by
yi − µ
ỹi = ,
σ
where µ and σ are the sample mean and standard deviation of the data {yi }in=1
respectively.
The dataset contains 9358 instances of hourly averaged responses from an array of
5 metal oxide chemical sensors embedded in an Air Quality Chemical Multisen-
sor Device with 15 attributes [10]. We delete all the points with missing attributes
(887 points remaining). The first 864 points are considered in our experiment be-
cause the data is hourly observed (1 day = 24 hours) and the whole data set is
divided into 9 subsets (each subset has 4-days’ data points, totally 864 data points).
In the experiment, the input is comprised of 9 attributes, including time, true hourly
averaged concentration CO in mg/m3 (COGT), true hourly averaged overall Non
Metanic HydroCarbons concentration in microg/m3 (NMHCGT), true hourly aver-
aged Benzene concentration in microg/m3 (C6H6GT), true hourly averaged NOx
concentration in ppb (NOx), true hourly averaged NO2 concentration in microg/m3
(NO2), absolute humidity (AH), temperature (T) and relative humidity (RH). The
output consists of 5 attributes, including PT08.S1 (tin oxide) hourly averaged sen-
sor response, PT08.S2 (titania) hourly averaged sensor response, PT08.S3 (tungsten
oxide) hourly averaged sensor response, PT08.S4 (tungsten oxide) hourly averaged
sensor response and PT08.S5 (indium oxide) hourly averaged sensor response.
1 These data sets are from the UC Irvine Machine Learning Repository: https://archive.ics.
uci.edu/ml/index.php.
Multivariate Gaussian and Student−t process regression for multi-output prediction 15
(b) MAE
MV-GPR MV-TPR GPR TPR
PT08S1CO 0.240 0.204 0.212 0.223
Outputs PT08S2NMHC 6.39 × 10−3 1.15 × 10−2 1.80 × 10−4 9.26 × 10−5
(Median of 9 PT08S3NOx 0.141 0.122 0.115 0.120
subsets’ MAEs) PT08S4NO2 0.095 0.089 0.079 0.073
PT08S5O3 0.231 0.210 0.199 0.205
MMO 0.240 0.210 0.212 0.223
This dataset contains the hourly and daily count of rental bikes between years 2011
and 2012 in Capital bikeshare system with the corresponding weather and seasonal
information [12]. There are 16 attributes. We test our proposed methods for multi-
output prediction based on daily count dataset. After deleting all the points with
missing attributes, we use the first 168 data points in the season Autumn because
the data is daily observed (1 week = 7 days) and the whole dataset is divided into
8 subsets (each subset has 3 weeks’ data points). In the experiment, the input is
comprised of 8 attributes, including normalized temperature, normalized feeling
temperature, normalized humidity, normalized wind speed, whether day is holiday
or not, day of the week, working day or not and weathersit. The output consists of
2 attributes, including the count of casual users (Casual) and the count of registered
users (Registered).
16 Zexun Chen et al.
(b) MAE
MV-GPR MV-TPR GPR TPR
Outputs (median Casual 0.558 0.488 0.540 0.546
of 8 subsets’ MAEs) Registered 0.897 0.855 0.916 0.907
MMO 0.897 0.855 0.916 0.907
The results are shown in Table 4 and we can also verify empirically that MV-
TPR performs the best in terms of MSE and MAE.
In the previous subsections, the examples show the usefulness of our proposed
methods in terms of more accurate prediction. Furthermore, our proposed meth-
ods can be applied to produce trading strategies in the stock market investment.
It is known that the accurate prediction of future for an equity market is almost
impossible. Admittedly, the more realistic idea is to make a strategy based on the
Buy&Sell signal in the different prediction models [1]. In this paper, we consider a
developed Dollar 100 (dD100) as a criterion of the prediction models. The dD100
criterion is able to reflect the theoretical future value of $100 invested at the begin-
ning, and traded according to the signals constructed by predicted value and the
reality. The details of dD100 criterion are described in Section 4.3.2.
Furthermore, the equity index is an important measurement of the value of a
stock market and is used by many investors making trades and scholars studying
stock markets. The index is computed from the weighted average of the selected
stocks prices, so it is able to describe how the whole stock market in the considera-
tion performs in a period and thus many trading strategies of a stock or a portfolio
have to take the information of the index into account. As a result, our experimental
predictions for specific stocks are based on the indices as well.
We obtain daily price data, containing opening, closing, and adjusted closing for
the stocks (the details are shown in Section 4.3.3 and Section 4.3.4) and three main
Multivariate Gaussian and Student−t process regression for multi-output prediction 17
indices in the US, Dow Jones Industrial Average (INDU), S&P500 (SPX), and NAS-
DAQ (NDX) from Yahoo Finance in the period of 2013 – 2014. The log returns of the
adjusted closing price and inter-day log returns are obtained by defining
ACPi
Log return: LRi = ln ,
ACPi−1
CPi
Inter-day log return: ILRi = ln ,
OPi
where ACPi is the adjusted closing price of the ith day (i > 1), CPi is the closing
price of the ith day, and OPi is the opening price of the ith day. Therefore, there are
totally 503 daily log returns and log inter-day returns for all the stocks and indices
from 2013 to 2014.
The sliding windows method is used for our prediction models, including GPR,
TPR, MV-GPR, and MV-TPR, based on the indices, INDU, SPX, and NDX. The train-
ing sample is set as 303, which is used to forecast for the next 10 days, and the train-
ing set is updated by dropping off the earliest 10 days and adding on the latest 10
days when the window is moved. The sliding-forward process was run 20 times,
resulting in a total of 200 prediction days, in groups of 10. The updated training set
allows all the models and parameters to adapt the dynamic structure of the equity
market [1]. Specifically, the inputs consist of the log returns of 3 indices, the targets
are multiple stocks’ log returns and Standard Exponential with automatic relevance
determination (SEard) is used as the kernel for all of these prediction models.
It is noteworthy that the predicted log returns of stocks are used to produce a
buy or sell signal for trading rather than to discover an exact pattern of the future.
The signal BS produced by the predicted log returns of the stocks is defined by
where { LRˆ i }200 are the predicted log returns of a specific stock, { LRi }200 are the
i =1 i =1
true log returns while { ILRi }200
i =1 are the inter-day log returns. The Buy&Sell strat-
egy relying on the signal BS is described in Table 5.
Decision Condition
Buy ˆ i > 0, & BSi > 0 & we have the position of cash
LR
Sell ˆ i < 0, & BSi < 0 & we have the position of share
LR
Keep No action is taken for the rest of the option
It is noted that the stocks in our experiment are counted in Dollar rather than
the number of shares, which means in theory we can precisely buy or sell a specific
Dollar valued stock. For example, if the stock price is $37 when we only have $20,
we can still buy $20 valued stock rather than borrowing $17 and then buy 1 share.
18 Zexun Chen et al.
Furthermore, it is also necessary to explain why we choose the signal BS. By the
definition, we rewrite it as
ˆ i
ACP ACPi CP
BSi = ln( ) − ln( ) + ln( i )
ACPi−1 ACPi−1 OPi
ˆ i
ACP ACPi ACPi
= ln( ) − ln( ) + ln( )
ACPi−1 ACPi−1 AOPi
ˆ i
ACP
= ln( ),
AOPi
In recent years, the ”Chinese concepts stock” has received an extensive attention
among international investors owing to the fast development of Chinese economy
and an increasing number of Chinese firms have been traded in the international
stock markets [16]. The ”Chinese concepts stock” refers to the stock issued by firms
whose asset or earning have essential activities in Mainland China. Undoubtedly,
all these ”Chinese concept stocks” are heavily influenced by the political and eco-
nomic environment of China together. For this reason, all these stocks have the po-
tential and unneglectable correlation theoretically, which is probably reflected in the
movement of stock prices. The performance of multiple targets prediction, which
takes the potential relationship into consideration, should be better. Therefore, the
first real data example is based on 3 biggest Chinese companies described in Table
6.
We apply MV-GPR, MV-TPR, GPR and TPR strategies and the results are demon-
strated in Figure 3. Furthermore, Table B1, Table B2 and Table B3 in the appendix
summarize the results by period for each stock respectively. In particular, the Buy&Sell
2 Actually, the value has to be considered as adjusted opening price since all the shares are
counted in Dollar. The adjusted opening price is also easy to compute based on the real opening
price and the dividend information
3 The figure 0.025% is comprehensive consideration referred to NASDAQ website:http://
nasdaq.cchwallstreet.com/
Multivariate Gaussian and Student−t process regression for multi-output prediction 19
signal examples for each stock are shown in Table B4, Table B5 and Table B6 respec-
tively, along with other relevant details.
140
150 130
120
100 110
100
50 90
Mar 2014 May 2014 Jul 2014 Sep 2014 Nov 2014 Jan 2015 Mar 2014 May 2014 Jul 2014 Sep 2014 Nov 2014 Jan 2015
MV-GPR Strategy
220 MV-TPR Strategy
GPR Strategy
TPR Strategy
200
Buy-and-Hold Stragety
Dow Jones Index
180 NASDAQ
S&P500
160
140
120
100
80
Mar 2014 May 2014 Jul 2014 Sep 2014 Nov 2014 Jan 2015
Fig. 3 The movement of invested $100 in 200 days for 3 Chinese stocks in the US market. The top 4
lines in legend are Buy&Sell strategies based on 4 prediction models, MV-GPR, MV-TPR, GPR, and
TPR, respectively. The last 4 lines are Buy&Hold strategies for the stock and for the three indices,
INDU, NASDAQ, and NDX, respectively
From the view of Figure 3, there is no doubt that a $100 investment for each
stock has sharply increased over 200 days period using Buy&Sell strategies no
matter whether the stock went up or down during this period. In particular, the
stock prices of BIDU and NTES rose up gradually while CTRP hit the peak and
then decreased in a large scale. Anyway, the Buy&Sell strategies based on dif-
ferent prediction models have still achieved considerable profits compared with
Buy&Hold strategies for the corresponding investments. However, the different
prediction models have diverse performances for each stock. For BIDU, GPR-based
models, including MV-GPR and GPR, outperform TPR-based models, including
MV-TPR and TPR. For NTES, all the models for Buy&Sell strategy have the sim-
20 Zexun Chen et al.
Owing to the globalization of capital, there has been a significant shift in the relative
importance of national and economic influences in the world’s largest equity mar-
kets and the impact of industrial sector effects is now gradually replacing that of
country effects in these markets [3]. Therefore, a further example is carried out un-
der the diverse industrial sectors in Dow 30 from New York Stock Exchange (NYSE)
and NASDAQ.
Initially, the classification of stocks based on diverse industrial sectors in Dow
30 has to be done. There are two main industry classification taxonomies, including
Industry Classification Benchmark (ICB) and Global Industry Classification Stan-
dard (GICS). In our research, ICB is used to segregate markets into sectors within
the macro economy. The stocks in Dow 30 are classified in Table 7. Due to the multi-
variate process models considering at least two related stocks in one group, the first
(Basic Materials) and the last industrial sector (Telecommunications), each consist-
ing of only one stock, are excluded. As a result, our experiments are performed
4 Note that the terms ”industry” and ”sector” are reversed from the Global Industry Classifica-
seven times for the seven grouped industrial sector stocks, including Oil&Gas, In-
dustrial, Consumer Goods, Health Care, Consumer Services, Financials and Tech-
nology, respectively.
Secondly, the 4 models, MV-GPR, MV-TPR, GPR and TPR, are applied in the
same way as in Section 4.3.3 and the ranking of stock investment performance is
listed in Table 8 (the details are summarized in Table C1). On the whole, for each
stock, there is no doubt that using Buy&Sell strategy is much better than using
Buy&Hold strategy regardless of the industrial sector. Specifically, MV-GPR makes
a satisfactory performance overall in the sectors, Industrials, Consumer Services
and Financials while MV-TPR has a higher ranking in Health Care in general.
and Industrials. The optimal investment strategy in Health Care is MV-TPR while
in Technology industry, using GPR seems to be the most profitable.
It is noted that in this paper we assumed that the different outputs were ob-
served at the same covariate values. In practice, the different responses sometimes
are observed at several different locations. This is, however, difficult for the pro-
posed method since all the outputs had to be considered as a matrix rather than a
vector with adjustable length and thus each response had to depend on the same
input. Additionally, the kernel used in our model was squared exponential and was
the same for each output, whilst it may be better to use different kernels for different
outputs [25]. All these problems remain to make a further exploration.
References
1. Akbilgic, O., Bozdogan, H., Balaban, M.E.: A novel hybrid RBF neural networks model as a
forecaster. Statistics and Computing 24(3), 365–375 (2014)
2. Alvarez, M.A., Rosasco, L., Lawrence, N.D., et al.: Kernels for vector-valued functions: A re-
view. Foundations and Trends
R in Machine Learning 4(3), 195–266 (2012)
3. Baca, S.P., Garbe, B.L., Weiss, R.A.: The rise of sector effects in major equity markets. Financial
Analysts Journal 56(5), 34–40 (2000)
4. Boyle, P., Frean, M.: Dependent Gaussian processes. In: Advances in neural information pro-
cessing systems, pp. 217–224 (2005)
5. Brahim-Belhouari, S., Bermak, A.: Gaussian process for nonstationary time series prediction.
Computational Statistics and Data Analysis 47(4), 705–712 (2004)
6. Brahim-Belhouari, S., Vesin, J.M.: Bayesian learning using Gaussian process for time series pre-
diction. In: Statistical Signal Processing, 2001. Proceedings of the 11th IEEE Signal Processing
Workshop on, pp. 433–436. IEEE (2001)
7. Chakrabarty, D., Biswas, M., Bhattacharya, S., et al.: Bayesian nonparametric estimation of
Milky Way parameters using matrix-variate data, in a new gaussian process based method.
Electronic Journal of Statistics 9(1), 1378–1403 (2015)
8. Chen, Z., Wang, B.: How priors of initial hyperparameters affect Gaussian process regression
models. arXiv preprint arXiv:1605.07906 (2016)
9. Dawid, A.P.: Some matrix-variate distribution theory: notational considerations and a bayesian
application. Biometrika 68(1), 265–274 (1981)
10. De Vito, S., Massera, E., Piga, M., Martinotto, L., Di Francia, G.: On field calibration of an elec-
tronic nose for benzene estimation in an urban pollution monitoring scenario. Sensors and
Actuators B: Chemical 129(2), 750–757 (2008)
11. Duvenaud, D., Lloyd, J.R., Grosse, R., Tenenbaum, J.B., Ghahramani, Z.: Structure discovery in
nonparametric regression through compositional kernel search. arXiv preprint arXiv:1302.4922
(2013)
12. Fanaee-T, H., Gama, J.: Event labeling combining ensemble detectors and background knowl-
edge. Progress in AI 2(2-3), 113–127 (2014)
13. Gentle, J.E.: Matrix algebra: theory, computations, and applications in statistics. Springer Sci-
ence & Business Media (2007)
14. Gupta, A.K., Nagar, D.K.: Matrix variate distributions, vol. 104. CRC Press (1999)
15. Liu, Y., Gao, Z.: Real-time property prediction for an industrial rubber-mixing process with
probabilistic ensemble gaussian process regression models. Journal of Applied Polymer Sci-
ence 132(6) (2015)
16. Luo, Y., Fang, F., Esqueda, O.A.: The overseas listing puzzle: Post-IPO performance of Chinese
stocks and adrs in the US market. Journal of multinational financial management 22(5), 193–211
(2012)
17. MacKay, D.J.: Gaussian processes-a replacement for supervised neural networks? (1997)
18. MacKay, D.J.: Introduction to Gaussian processes. NATO ASI Series F Computer and Systems
Sciences 168, 133–166 (1998)
19. Neal, R.M.: Monte carlo implementation of Gaussian process models for bayesian regression
and classification. arXiv preprint physics/9701026 (1997)
20. Neal, R.M.: Bayesian learning for neural networks, vol. 118. Springer Science & Business Media
(2012)
24 Zexun Chen et al.
21. Rasmussen, C.E.: Evaluation of Gaussian processes and other methods for non-linear regres-
sion. University of Toronto (1999)
22. Rasmussen, C.E., Williams, C.K.: Gaussian processes for machine learning, vol. 1. MIT press
Cambridge (2006)
23. Roberts, S., Osborne, M., Ebden, M., Reece, S., Gibson, N., Aigrain, S.: Gaussian processes for
time-series modelling. Phil. Trans. R. Soc. A 371(1984), 20110,550 (2013)
24. Shah, A., Wilson, A.G., Ghahramani, Z.: Student-t processes as alternatives to Gaussian pro-
cesses. In: AISTATS, pp. 877–885 (2014)
25. Wang, B., Chen, T.: Gaussian process regression with multiple response variables. Chemomet-
rics and Intelligent Laboratory Systems 142, 159–165 (2015)
26. Williams, C.K.: Computing with infinite networks. Advances in neural information processing
systems pp. 295–301 (1997)
27. Williams, C.K., Rasmussen, C.E.: Gaussian processes for regression. In: Advances in neural
information processing systems, pp. 514–520 (1996)
28. Wilson, A.G.: Covariance kernels for fast automatic pattern discovery and extrapolation with
Gaussian processes. Ph.D. thesis, PhD thesis, University of Cambridge (2014)
29. Zhang, F.: The Schur complement and its applications, vol. 4. Springer Science & Business
Media (2006)
30. Zhu, S., Yu, K., Gong, Y.: Predictive matrix-variate t models. In: Advances in Neural Informa-
tion Processing Systems, pp. 1721–1728 (2008)
According to the chain rule of derivatives of matrix, there exists [13]: Letting U = f ( X ), the deriva-
tives of the function g(U ) with respect to X are
" T #
∂g(U ) ∂g(U ) ∂U
= tr ,
∂Xij ∂U ∂Xij
where X is an n × m matrix. Additionally, there are another two useful formulas of derivative with
respect to X.
∂ ln det( X ) ∂
= X −T , tr( AX −1 B) = −( X −1 BAX −1 )T ,
∂X ∂X
where X is an n × n matrix, A is a constant m × n matrix and B is a constant n × m matrix.
nd d n 1
L= ln(2π ) + ln det(Σ) + ln det(Ω) + tr(Σ−1 (Y − M )Ω−1 (Y − M )T ), (18)
2 2 2 2
where actually Σ = K + σn2 I As we know there are several parameters in the kernel k so that we
can denote K = Kθ . The parameter set denotes Θ = {θ1 , θ2 , . . .}. Besides, we denote the parameter
matrix Ω = ΦΦT since Ω is positive semi-definite, where
φ11 0 · · · 0
φ21 φ22 · · · 0
Φ= . .. . . . .
.. . . ..
φd1 φd2 · · · φdd
Multivariate Gaussian and Student−t process regression for multi-output prediction 25
To guarantee the uniqueness of Φ, the diagonal elements are restricted to be positive and denote
ϕii = ln(φii ) for i = 1, 2, · · · , d. Therefore,
∂Σ ∂Σ ∂Kθ0 ∂Ω ∂Ω
= In , = , = Eij ΦT + ΦEij , = Jii ΦT + ΦJii ,
∂σn2 ∂θi ∂θi ∂φij ∂ϕii
where Eij is the d × d elementary matrix having unity in the (i,j)-th element and zeros elsewhere,
and Jii is the same as Eij but with the unity being replaced by e ϕii .
The derivatives of the negative log likelihood with respect to σn2 , θi , φij and ϕii are as follows.
The derivative with respect to θi is
∂L d ∂ ln det(Σ) 1 ∂
= + tr(Σ−1 (Y − M)Ω−1 (Y − M )T )
∂θi 2 ∂θi 2 ∂θi
" # " T #
∂ ln det(Σ) T ∂Σ ∂tr(Σ−1 G )
d 1 ∂Σ
= tr + tr
2 ∂Σ ∂θi 2 ∂Σ ∂θi
0 0
d ∂K 1 ∂K
= tr Σ−1 θ − tr Σ−1 GΣ−1 θ
2 ∂θi 2 ∂θi
0
∂K 0
d ∂K 1
= tr Σ−1 θ − tr αΣ Ω−1 αTΣ θ , (19)
2 ∂θi 2 ∂θi
where G = (Y − M)Ω−1 (Y − M)T and αΣ = Σ−1 (Y − M ).The fourth equality is due to the sym-
metry of Σ.
Due to ∂Σ/∂σn2 = In , the derivative with respect to σn2 is:
∂L d 1
= tr(Σ−1 ) − tr(αΣ Ω−1 αTΣ ). (20)
∂σn2 2 2
∂L n ∂ ln det(Ω) 1 ∂
= + tr(Σ−1 (Y − M )Ω−1 (Y − M)T )
∂φij 2 ∂φij 2 ∂φij
! " #
n ∂Ω 1 ∂Ω
= tr Ω−1 − tr ((Ω−1 (Y − M)T Σ−1 (Y − M)Ω−1 )T )T
2 ∂φij 2 ∂φij
! !
n ∂Ω 1 ∂Ω
= tr Ω−1 − tr αΩ Σ−1 αTΩ
2 ∂φij 2 ∂φij
n 1
= tr[Ω−1 (Eij ΦT + ΦEij )] − tr[αΩ Σ−1 αTΩ (Eij ΦT + ΦEij )], (21)
2 2
where the third equation is due to the symmetry of Ω. Similarly, the derivative with respect to ϕii
is
∂L n ∂ ln det(Ω) 1 ∂
= + tr(Σ−1 (Y − M )Ω−1 (Y − M )T )
∂ϕii 2 ∂ϕii 2 ∂ϕii
n 1
= tr[Ω−1 (Jii ΦT + ΦJii )] − tr[αΩ Σ−1 αTΩ (Jii ΦT + ΦJii )]. (22)
2 2
26 Zexun Chen et al.
The negative log likelihood of observations Y ∼ MT n,d (ν, M, Σ, Ω) where M ∈ Rn×d , Σ ∈ Rn×n , Ω ∈
Rd×d , is
1
L= (ν + d + n − 1) ln det(In + Σ−1 (Y − M)Ω−1 (Y − M)T )
2
d n 1 1
+ ln det(Σ) + ln det(Ω) + ln Γn (ν + n − 1) + dn ln π
2 2 2 2
1
− ln Γn ( ν + d + n − 1)
2
1 ν+n−1
= (ν + d + n − 1) ln det(Σ + (Y − M)Ω−1 (Y − M)T ) − ln det(Σ)
2 2
1 1
+ ln Γn (ν + n − 1) − ln Γn ( ν + d + n − 1)
2 2
n 1
+ ln det(Ω) + dn ln π.
2 2
Letting U = Σ + (Y − M )Ω−1 (Y − M )T and αΩ = Ω−1 (Y − M )T , the derivative of U with
respect to σn2 , θi ,ν, φij and ϕii are
∂U ∂U ∂Kθ0 ∂U
= In , = , = 0, (23)
∂σn2 ∂θi ∂θi ∂ν
∂U ∂Ω −1 ∂Ω
= −(Y − M)Ω−1 Ω (Y − M)T = −αTΩ αΩ , (24)
∂φij ∂φij ∂φij
∂U ∂Ω −1 ∂Ω
= −(Y − M)Ω−1 Ω (Y − M )T = −αTΩ αΩ . (25)
∂ϕii ∂ϕii ∂ϕii
∂L (τ + d) ∂ ln det(U ) n ∂ ln det(Ω)
= +
∂φij 2 ∂φij 2 ∂φij
(τ + d) n
=− tr[U −1 αTΩ (Eij ΦT + ΦEij )αΩ ] + tr[Ω−1 (Eij ΦT + ΦEij )]. (29)
2 2
Similarly, the derivative with respect to ϕii is
∂L (τ + d) n
=− tr[U −1 αTΩ (Jii ΦT + ΦJii )αΩ ] + tr[Ω−1 (Jii ΦT + ΦJii )]. (30)
∂ϕii 2 2
Multivariate Gaussian and Student−t process regression for multi-output prediction 27
Table B1 The movement of invested $100 for 200 days split in to 20 periods (Stock: BIDU)
Table B2 The movement of invested $100 for 200 days split in to 20 periods (Stock: CTRP)
Table B3 The movement of invested $100 for 200 days split in to 20 periods (Stock: NTES)
Table B4 The detailed movement of invested $100 for last 10 days period (Period 20, Stock: BIDU)
191 Buy 263.29 Buy 226.00 Buy 267.69 Buy 232.93 136.60 106.25 112.37 107.51
192 Keep 272.74 Keep 234.11 Keep 277.29 Keep 241.29 141.50 108.83 115.14 110.09
193 Keep 275.50 Keep 236.48 Keep 280.10 Keep 243.73 142.94 108.99 115.52 110.60
194 Keep 275.94 Keep 236.85 Keep 280.55 Keep 244.12 143.16 109.94 115.84 111.02
195 Sell 275.70 Sell 236.65 Sell 280.31 Sell 243.91 142.28 110.33 115.45 111.21
196 Buy 273.26 Buy 234.55 Buy 277.83 Buy 241.75 140.95 110.37 115.55 111.20
197 Keep 277.87 Keep 238.52 Keep 282.52 Keep 245.83 143.33 110.51 116.39 111.56
198 Keep 272.35 Keep 233.77 Keep 276.90 Keep 240.94 140.48 110.42 116.35 111.66
199 Sell 270.29 Keep 233.57 Sell 274.81 Sell 239.13 140.36 110.08 115.53 111.11
200 Keep 270.29 Sell 233.29 Keep 274.81 Keep 239.13 139.12 109.10 114.29 109.97
Table B5 The detailed movement of invested $100 for last 10 days period (Period 20, Stock: CTRP)
191 Buy 175.02 Buy 208.09 Buy 168.75 Buy 182.07 83.65 106.25 112.37 107.51
192 Keep 180.77 Keep 214.92 Keep 174.28 Keep 188.05 86.39 108.83 115.14 110.09
193 Keep 185.40 Keep 220.43 Keep 178.75 Keep 192.87 88.61 108.99 115.52 110.60
194 Sell 184.91 Sell 220.28 Sell 178.28 Sell 192.36 88.55 109.94 115.84 111.02
195 Keep 184.91 Buy 222.82 Keep 178.28 Keep 192.36 88.45 110.33 115.45 111.21
196 Keep 184.91 Keep 222.82 Keep 178.28 Keep 192.36 88.22 110.37 115.55 111.20
197 Buy 185.16 Keep 222.82 Buy 178.52 Buy 192.62 88.92 110.51 116.39 111.56
198 Keep 184.06 Keep 222.82 Keep 177.46 Keep 191.47 88.39 110.42 116.35 111.66
199 Sell 183.25 Keep 222.82 Sell 176.67 Sell 190.63 88.45 110.08 115.53 111.11
200 Keep 183.25 Keep 222.82 Keep 176.67 Keep 190.63 89.20 109.10 114.29 109.97
Multivariate Gaussian and Student−t process regression for multi-output prediction 29
Table B6 The detailed movement of invested $100 for last 10 days period (Period 20, Stock: NTES)
191 Buy 176.57 Buy 174.49 Buy 174.06 Buy 171.33 151.35 106.25 112.37 107.51
192 Keep 184.84 Keep 182.66 Keep 182.21 Keep 179.36 158.44 108.83 115.14 110.09
193 Keep 184.16 Keep 181.98 Keep 181.54 Keep 178.69 157.86 108.99 115.52 110.60
194 Keep 185.46 Keep 183.27 Keep 182.82 Keep 179.95 158.97 109.94 115.84 111.02
195 Sell 185.33 Keep 179.25 Sell 182.69 Sell 179.83 155.49 110.33 115.45 111.21
196 Buy 187.22 keep 180.86 Buy 184.55 Buy 181.66 156.88 110.37 115.55 111.20
197 Keep 187.75 Keep 181.38 Keep 185.08 Keep 182.18 157.33 110.51 116.39 111.56
198 Sell 188.32 Keep 177.52 Keep 181.15 Keep 178.30 153.98 110.42 116.35 111.66
199 Keep 188.32 Sell 176.70 Sell 180.31 Sell 177.48 153.29 110.08 115.53 111.11
200 Keep 188.32 Keep 176.70 Keep 180.31 Keep 177.48 153.52 109.10 114.29 109.97
Table C2 The detailed industry portfolio investment results under different strategies