Econometrics 06 00007 v2
Econometrics 06 00007 v2
Article
A Multivariate Kernel Approach to Forecasting the
Variance Covariance of Stock Market Returns
Ralf Becker 1 , Adam Clements 2 and Robert O’Neill 3, *
1 Economics, School of Social Sciences, University of Manchester, Oxford Road, Manchester M13 9PL, UK;
[email protected]
2 School of Economics and Finance, Queensland University of Technology, Brisbane City, QLD 4000, Australia;
[email protected]
3 The Business School, University of Huddersfield, Huddersfield HD1 3DH, UK
* Correspondence: r.o’[email protected]; Tel: +44-01484-471-853
Abstract: This paper introduces a multivariate kernel based forecasting tool for the prediction of
variance-covariance matrices of stock returns. The method introduced allows for the incorporation
of macroeconomic variables into the forecasting process of the matrix without resorting to a
decomposition of the matrix. The model makes use of similarity forecasting techniques and it
is demonstrated that several popular techniques can be thought as a subset of this approach.
A forecasting experiment demonstrates the potential for the technique to improve the statistical
accuracy of forecasts of variance-covariance matrices.
1. Introduction
Forecasting variance-covariance matrices (VCMs) is an important issue in finance, having
applications in portfolio selection and risk management as well as being directly used in the pricing of
several financial assets. In recent years an increasing body of literature has developed multivariate
models to forecast this matrix, these include the DCC of Engle and Sheppard (2001), the VARFIMA
model of Chiriac and Voev (2011) and Riskmetrics of J.P. Morgan (1996). All of these models can
be used to forecast the VCM of a portfolio and do so only using returns data from the assets under
consideration.
Previous studies, focusing on modelling the volatility of single assets, have identified economic
predictor variables that may be related to the variance of returns and attempted to utilise such
variables in forecasting. For example Aït-Sahalia and Brandt (2001) investigate which of a range of
factors influence stock volatility such as dividend yields and default spreads. However, advances in
terms of multivariate volatility models are complicated by the requirement that forecasts of VCMs must
be positive semi-definite (psd) and symmetric, restrictions which make the incorporation of predictor
variables difficult. As the dimension of the problem increases two issues arise. First, without complex
restrictions this will result in a proliferation of parameters, making identification and estimation
difficult. Second, the implicit assumption of model stability becomes less defendable as the model
dimension grows.
In this paper a semi-parametric kernel based forecasting method is proposed where forecasts
are based on a weighted average of past observations of VCMs. This approach builds on the work
by Clements et al. (2011), who show that in a univariate setting, employing kernels to determine
weighting structures dependent on the similarity of volatility observations through time can improve
forecast accuracy when compared to more established methods. As this essentially generates VCM
forecasts as weighted averages of past VCMs it guarantees symmetry and positive-semi-definiteness
by construction and hence avoid the issues discussed earlier.
The proposed method is similar in spirit to Riskmetrics forecasts and the Heterogeneous
Autoregressive (HAR) model of Corsi (2009)1 . However the proposed approach does not make
the potentially restrictive assumption that more recent observations attract a larger weight, with the
weights being a decreasing function of the time difference between when the forecast is formed and
the time at which a VCM was observed2 . Additionally, the impact of predictor variables are easily
included within the kernel weighting function while avoiding the problems discussed above that
are commonly encountered with multivariate models. The methodology proposed here is merely a
forecasting tool. It is not meant to represent an underlying data generating process. It is therefore
understood that, as a representation of the unknown data generating process it is surely misspecified.
Its value lies in (potentially) improved forecast quality.
An empirical analysis is undertaken to examine the efficacy of the proposed forecasting framework.
Given the nature of the approach, and the potentially wide range of exogenous variables, it is not
straightforward to design a representative simulation experiment. As a result, a thorough and careful
forecasting exercise is undertaken, focusing on forecasting the variance-covariance matrix of the
returns on 20 large U.S. stocks. A range of predictor variables including matrix similarity measures,
interest rate information, commodity returns, and a range of macroeconomic data and option implied
volatility are used.
This empirical analysis is designed to address a number of issues. Does the proposed
forecasting approach compare favourably to more established forecasting techniques for relatively
high dimensional VCMs? Do the predictor variables help improve the accuracy of VCM forecasts?
And finally, do the use of matrix comparison measures lead to improved forecasting performance?
Overall, the results of the forecasting experiment are promising in that they establish that the proposed
non-parametric approach produces forecasts of the VCM that are statistically superior to those from
a range of competing models. The results also demonstrate that the variables which measure the
similarity of VCM realisations can significantly improve on forecasts based only on kernels that are a
function of time. However, there is little evidence to show that using any of the other variables adds
significantly to forecast performance.
The paper proceeds as follows. Section 2 introduces important terminology and notation and
offers an overview of the current forecasting approaches including the role played by exogenous
predictor variables. Section 3 shows how a number of common forecasting methods can be expressed
as a kernel based approach. Section 4 outlines the methodology underlying the proposed forecasting
approach. Section 5 describes the data used in the empirical analysis. Section 6 outlines the structure
of the empirical analysis including the forecasting exercise and the competing models. Sections 7 and 8
report the results of the empirical analysis focusing on the behaviour of the kernel weighting functions
and forecast performance respectively. Section 9 provides concluding comments.
2. Background
This section will discuss the framework on which this paper builds. Important notation and
terminology will be presented followed by an outline of the existing approaches to forecasting the VCM.
1 The HAR approach has not been applied to forecasting the full variance-covariance matrix. To do so would require a range
of possible transformations to ensure positive definiteness, which leads to a deterioration in forecast performance.
2 In practice the HAR model will deliver a decreasing step-function, although it could also produce non-decreasing
step functions.
Econometrics 2018, 6, 7 3 of 27
3 Please refer to the discussion of the literature in Barndorff-Nielsen et al. (2011) for more information on recent developments
in this area.
4 The scale matrix is the conditional expectation of the rVCM.
Econometrics 2018, 6, 7 4 of 27
The simplest way to obtain psd forecasts of the rVCM is to produce forecasts by merely averaging
past observations of the rVCM, which by construction, are psd themselves. The Riskmetrics approach
to forecasting the variance-covariance matrix is based on this principle with an exponentially weighted
moving average (EWMA) applied to the history of rt r0t . Fleming et al. (2003) use a similar weighting
scheme applied directly to Vt in order to demonstrate the economic benefit of forecasting using
RVCMs as opposed to daily returns. The use of the EWMA scheme imposes decaying weights. A
recent approach gaining popularity is the Heterogeneous Autoregressive (HAR) model of Corsi (2009)
which can be applied to forecasting the rVCM. Similar to the Riskmetrics approach, the weights in a
HAR model decline with time but as a step function rather than smoothly. As shown by Chiriac and
Voev (2011) and Bauer and Vorkink (2011), HAR models can be applied to the transformed elements
(Cholesky or Matrix Logarithm transformation) of the rVCM. This is appealing as it facilitates the
straightforward estimation of rather complex dynamics of these elements which in turn can be used to
produce psd forecasts (see details in Section 6.2).
The approach proposed here generates forecasts that are weighted averages of previously observed
rVCMs. However the weight applied to past observations of the rVCM is not solely determined by
the lags at which it is observed. This approach builds upon Clements et al. (2011) who developed a
univariate volatility forecasting scheme, where forecasts are a weighted average of historical values of
realized volatility and the weights are related to the similarity between historical volatility and volatility
at the time at which the forecast is formed. Clements et al. (2011) show that at a 1 day forecast horizon
such an approach performs well against competing volatility forecasting techniques.
This principle is extended to the multivariate setting in this paper with a kernel based approach
proposed for forecasting the VCM and is an application of the general technique of empirical similarity,
as described more generally in Gilboa et al. (2006). The kernel density acts as a similarity function and
the forecasts of the rVCM are similarity weighted averages. This technique allows for the weights to
be determined by a vector of variables rather than only one variable (e.g., time difference as in the
Riskmetrics or HAR models). It is notable that the only previous explicit use of similarity forecasting in
volatility forecasting is in Golosnoy et al. (2014) who used the general approach to combine univariate
forecasts of stock return volatility using similarity based weights which compare the forecast period to
previous periods, in this case similarity is computed based on the closeness of forecasts of models of
the value of volatility at the current forecast point. It is shown that the proposed method encompasses
Riskmetrics as a special case5 .
5 Gijbels et al. (1999) show that the Riskmetrics approach can be interpreted as a kernel approach in which weights on
historical observations are determined by the lag at which a realization was observed. See Section 3.
Econometrics 2018, 6, 7 5 of 27
such as gold (Sjaastad and Scacciavillani 1996) and oil (Sadorsky 1999; Hamilton 1996) prices, have
also been linked to stock market volatility and are therefore considered here as potential variables to
contribute to the kernel weighting functions.
The final variable that falls into this category is implied volatility, namely the VIX index of
the Chicago Board of Exchange. This is often interpreted as a market’s view on future stock
market volatility. This measure has been used in the context of univariate volatility forecasting
(Poon and Granger 2003; Blair et al. 2001) and is here considered as another variable in the multivariate
kernel weighting scheme.
Another important class of variables considered for the kernel weighting algorithm are scalar
transformations of matrices as they can be used to establish the closeness or similarity of matrices.
The idea is to give higher weight to past observations from periods when the rVCM was similar to the
current rVCM (regardless of how distant in time that observation is). To the best of our knowledge
such variables have not previously been used in the context of VCM forecasting. There is, however,
a literature that discusses matrix distance measures. Moskowitz (2003) proposes three statistics to
evaluate the closeness of rVCMs. The first metric compares the matrix eigenvalues, the second
looks at the relative differences between the individual matrix elements and the third considers how
many of the correlations have the same sign in the matrices. These three metrics will be utilised to
determine the level of similarity between two rVCMs. Other functions used to compare matrices,
often called loss functions, have been discussed in the forecast evaluation literature (for example
in Laurent et al. 2013). One such loss function is the Stein distance, also known as the MVQLIKE
function6 . This loss function is shown to perform well in discriminating between VCM forecasts in
Becker et al. (2014) and Laurent et al. (2012) and represents another useful tool for comparing VCMs.
when observations are equally spaced in time and λ is a smoothing parameter, 0 < λ < 1, commonly
set at a value recommended in J.P. Morgan (1996). From recursive substitution and with H1 = r10 r1 ,
the forecast of the VCM can be expressed as
T −1
H T +1 = (1 − λ ) ∑ λ j rT− j r0T− j (2)
j =0
The sum of the weights is equal 1 − λ T and as noted in Gijbels et al. (1999) which approaches
1 as T approaches infinity. However in order to normalise the sum of the weights to be exactly 1,
the Riskmetrics model can be restated as
∑ Tj=−01 λ j r T − j r0T − j
H T +1 = (3)
∑ Tj=−01 λ j
which can reformulated with kernel weights7 , defining h = −1/ log(λ) and K (u) = exp(u)1u≤0 ,
allowing (3) to be restated as
t− T
∑tT=1 K h rt r0t T
H T +1 =
t− T
= ∑ Wrm,t Vrm,t . (4)
∑tT=1 K h t =1
This replicates the conclusion of Gijbels et al. (1999) that Riskmetrics is a zero degree local
h. From
polynomial kernel estimate with bandwidth a practical
point of view the Riskmetrics kernel
t− T T t− T
determines weights, Wrm,t = K h / ∑ j=1 K h , are based on how close observations of
Vrm,t = rt r0t are to time T, the period at which a forecast is being made. The largest weight is attached
to the observation at time T with the weights exponentially decaying.
In the univariate volatility context, the HAR model is based on a step kernel, rather than the
smoothly decaying kernel built under the Riskmetrics approach. In the context of multivariate
forecasting the application of either Riskmetrics or HAR is hampered by the fact that the estimation of
kernel bandwidths is not straightforward. This is the reason why Riskmetrics approaches tend to be
applied with fixed, pre-determined bandwidths. However, it is argued here that a bandwidth can be
estimated with a cross-validation approach. It is useful to demonstrate that the Riskmetrics model
can be represented as a kernel based model as this highlights that the basic methodology proposed
here encompasses existing popular methods, while at the same time including measures of closeness
of dimensions other than time. While this may be the case, empirical results show that time based
weighting remains important.
4. Methodology
This section presents the method by which the kernel weighting scheme and subsequent forecasts
of the VCM are obtained. The inputs are a set of p variables, which may contain information relevant
to forecasting the VCM and a time series of rVCMs. Calculation of the n × n rVCM, Vt , is a non-trivial
issue. Here it is computed using two standard methods from the realized (co)variance literature and it
is assumed that Vt is psd. The method used to calculate the matrices used in the rest of this paper is
now described.
The calculation of Vt is accomplished along the lines proposed by Barndorff-Nielsen et al. (2011),
producing a multivariate realised kernel (MVRK) estimate9 .
H
Vt = ∑ k (h/H ) Γt,h
h=− H
M
Γt,h = ∑ qt,i q0t,i−h , for h ≥ 0
i =1+ h
Γt,h = Γ0t,−h
Both the kernel function k ( ) and the bandwidth, H, are chosen in the manner recommended
in Barndorff-Nielsen et al. (2011). The kernel is the Parzen kernel and the bandwidth is estimated
from the data. Importantly, this estimate will produce a positive semi-definite matrix that allows for
non-synchronous trading and the existence of some microstructure noise10 .
T −d
∑ Wt Vt+d .
(d) (d)
H T +d = (5)
t =1
As this is a weighted combination of symmetric, psd matrices, H T +d also inherits these properties
and so is a valid covariance matrix without resorting parameter restrictions or transformations.
As discussed in Section 3 the Riskmetrics and HAR forecasting models can be seen as special cases of
this approach.
The focus of much of the remainder of this section is a description of how the optimal weights in (5),
ωt are found. In order to ensure that the weights sum to one the following normalisation is imposed,
ωt
Wt = T −d
(6)
∑ i =1 ωi
(d)
which allows Equation (5) to be interpreted as a weighted average, ensuring an appropriate scaling for HT+d.
The central idea is to determine which of the past time periods experienced conditions most
similar to those at the time of forming the forecast, T, a logic based on the similarity forecasting
approach framework of Gilboa et al. (2011). More weight is placed on the VCMs that occurred over
the d periods following the dates that were most similar to time T, the forecast point. The similarity
of historical periods to time T is determined using p variables and employs a multivariate kernel to
calculate the raw weight applicable to day t, hence
p
ωt = ∏ K j (Φt,j , ΦT,j , h j ) (7a)
j =1
9 The computations of the MVRK were done using the “realized_multivariate_kernel” function of Kevin Sheppard’s MFE
Toolbox for MATLAB https://www.kevinsheppard.com/MFE_Toolbox.
10 The following empirical analysis was repeated with an alternative estimator, using intra-daily 5 min return data. None of
the results reported in this paper changes qualitatively when using this alternative estimator. Some results that use this
alternative estimator for the rVCM are reported in Section 8.
11 In this paper we restrict our applied analysis to the case where d = 1 however in general there is no reason why the approach
should not be extended to multi-day forecasts, although this requires consideration of the impact and inclusion of overnight
returns in the construction of realized VCMs.
Econometrics 2018, 6, 7 8 of 27
where Φ T,j is the element from the T th row and jth column of the ( T × p) dimensional data matrix Φ
which collects all T observations for the p potential weighting variables h j is the bandwidth for the jth
variable.
For continuous variables K j (Φt,j , Φ T,j , h j ) is the standard normal density kernel12 (Silverman 1986;
Bowman 1997) defined as
!2
1 Φ T,j − Φ t,j
K j (Φt,j , Φ T,j , h j ) = (2π )−0.5 exp − . (7b)
2 hj
In the case of a discrete variable, such as a bull/bear market dummy used below, the discrete
univariate kernel proposed by Aitchison and Aitken (1976) is used. The form of the kernel is
(
1 − hj if Φt,j = Φ T,,j
K j (Φt,j , Φ T,,j , h j ) = (7c)
h j /(s j − 1) if Φt,j 6= Φ T,,j
where s j is the number of possible values the discrete variable can take (s j = 2 in the case of the
bull/bear market variable). In the two state discrete case h j ∈ [0, 0.5]. If h j = 0.5 the value of the
discrete variable has no impact on the forecast, while if h j = 0 we disregard data points which do not
share the same discrete variable value as Φ T,j .
As discussed earlier, it is possible to think of several time based approaches to forecasting Σt ,
thus a kernel based on Riskmetrics weighting is used here to explicitly account for time When time is
included as one of the p variables the kernel with the form,
h Tj −t
K j (Φt,j , Φ T,j , h j ) = T −q
(7d)
∑qT=−1h h j
is employed, which has the same structure as the Riskmetrics approach in Equation (3). However, here a
flexible bandwidth, h j ∈ [0, 1], is allowed as opposed to a pre-specified value as in J.P. Morgan (1996).
This time kernel will generally tend to produce weighting patterns for time which are similar to those
produced by other exponential smoothing approaches. The largest weights will be placed on the most
recent observations of the realized VCM and will generally fall away to zero quickly. This is important
in the multivariate kernel as the multiplicative nature of Equation (7a) means that this property will
be inherited by the weights used in the multivariate kernel. While the general approach presented
through Equations (5), (6) and (7a) captures the Riskmetrics approach as a special case, it introduces a
significant amount of additional flexibility, by allowing the weights Wt to be determined from a set of
p variables other than just time13 .
where σj is the standard deviation of the jth variable. Although this rule of thumb provides a simple
method for choosing bandwidths, as noted in Wand and Jones (1995) these bandwidths may be
sub-optimal.
Importantly, if one was to optimise (using cross-validation) the bandwidth parameters, the optimised
bandwidths h j , will reflect the importance of the jth element in Φ for determining the optimal weights Wt .
As noted in Li and Racine (2007, pp. 140–41), irrelevant (continuous) variables are associated with h j = ∞.
For binary variables (and kernel as in Equation (7c)) and a time variable (and a kernel as in Equation (7d))
the bandwidths h j = 0.5 and h j = 1 respectively represent irrelevant variables.
Cross-validation is a bandwidth optimisation strategy introduced by Rudemo (1982) and
Bowman (1984). It selects bandwidths to minimise the mean integrated squared error (MISE) of density
estimates and is generally recommended as the method of choice in the context of non-parametric
density and regression analysis (see Wand and Jones 1995; Li and Racine 2007). As forecast performance
rather than density estimation is of interest here, the bandwidths are obtained by minimising
the MVQLIKE of the forecasts. Alternative loss functions, such as MSE are available, however
they are not considered as most are not robust to estimation error in the volatility estimates, see
Patton and Sheppard (2009). This choice is discussed further in Section 8.1.
T
1
CV MVQ (h) =
K ∑ MVQLIKE(Hτ (h)) (9)
τ = T − K +1
The bandwidths that minimise (9) are then used in Equations (5), (6) and (7a) in order to forecast HT+1.
14 This measure has been successfully used in the forecast evaluation literature, e.g., in Laurent et al. (2013), and is sometimes
called the Stein distance measure.
15 The following argument is, for notational ease, made for 1 period ahead forecasts but the extension to d period forecasts is
straight forward.
16 We set T − K = 300, which means that every forecast used in cross-validation is based on a minimum of 300 observations.
Econometrics 2018, 6, 7 10 of 27
suggest that a cross-validation approach in the context of a multivariate kernel regression should,
asymptotically, deliver bandwidth estimates that approach their irrelevant values discussed above
(h j = ∞, h j = 0.5 and h j = 1 respectively for continuous, binary and time variables), meaning there
should be no need to manually eliminate irrelevant variables.
To begin, when all variables were considered jointly, difficulties were encountered in the optimisation
process and the non-linear bandwidth optimisation of (9) was unable to identify an optimum.
An alternative strategy is proposed in which first attempts to eliminate variables that contribute
little to improving forecasts, before identifying optimal bandwidths only for the remaining subset of
variables. This is achieved as follows. Each variable is used as the individually in Φ to determine
kernel weights.
The optimal bandwidth, e h j , for each variable is found by minimising the criterion in (9).
The optimal CV MVQ h j is then compared to a benchmark CV MVQ R from taking simple moving
e
averages of past VCMs to form a forecast. The rationale is that a relevant variable should deliver
improvements compared to a naïve approach. Weighting variables that do not improve on the
CV MVQ R by at least 1% are then eliminated17 .
In short the process of variable elimination and bandwidth optimisation can be summarised in
the following three step procedure:
1. For each of the p variables considered for inclusion in the multivariate kernel, apply cross
validation to obtain the optimal bandwidth when only that variable is included in the kernel
estimator. These are referred to as univariate optimised bandwidths h̃ j j = 1, ..., p.
2. Compare theforecasting performance of the univariate optimised bandwidths from Step 1,
CV MVQ e h j , against CV MVQ R from a simple moving average forecast model. Any of the
p variables that fail to improve on the rolling average forecast performance by at least 1% are
eliminated at this stage as it is considered to have little value for forecasting. We are left with
p∗ ≤ p variables used as weighting variables.
3. Estimate the multivariate optimised bandwidths h∗j for the p∗ variables that are not eliminated
in Step 2 by minimising the cross validation criterion in Equation (9). As opposed to Step 1 this
optimisation is done simultaneously over all p∗ bandwidths.
Having obtained the optimised bandwidths from Step 3, we then forecast the VCM for the d
day-ahead time period ending at T + d using (7a) in combination with the relevant kernel definitions
in Equations (7b), (7c) or (7d).
5. Data
The stock return data and additional predictor variables used are now outlined. The predictor
variables can be grouped into two classes, namely those which are based on observations of the
variance-covariance matrix and those which represent exogenous macroeconomic variables. All models
considered below also make use, either explicitly or implicitly, of a time variable which is defined as
the number of trading days between two points in time.
NYSE Trade and Quote database via the Wharton Research Data Service for the period covering
02/01/1997-31/12/2012. This delivers 4,026 trading days with information. Appendix A lists the 20
stocks included in the analysis18 .
This data is used to create realisations of the variance-covariance matrix, Vt , as described in
Section 4.119 . These realisations of the VCM are then used in creating the variables which are based on
comparisons of the elements of the matrix which are then included in the kernel model.
trace (V0t Vt )
p
q (10)
trace V0T V T
Values closer to 1 indicate that a greater degree of similarity. The second statistic, adopted from
Moskowitz (2003), evaluates the absolute element-wise differences between the matrices Vt and
V T . The sum of all absolute differences is standardised by the sum of all elements in V T (ElemDiff ).
The statistic is defined as
ι0 |V T − Vt |ι
(11)
ι0 V T ι
where ι is an n × 1 vector of ones. For identical matrices this statistic will take a value of 0.
A third metric suggested in Moskowitz (2003) is:
m
1
∑I
sign(vech(Ct − C̄)i )=sign vech(C T − C̄)i . (12)
m i =1
This makes use of the realized correlation matrices Ct and C T 20 . I {} is an indicator taking the
value of 1 when the statement inside the brackets is true and 0 otherwise and m = n2 (n − 1) is the
number of unique correlations in the n × n correlation matrix. Equation (12) compares how similar Ct
and C T are in relation to the average realized correlation matrix C̄. This measure compares correlations
to their long run-average values. sign(vech(Ct − C̄)i ) delivers a positive (negative) sign if the realized
correlation (of the ith unique element) at time t is larger (smaller) than the relevant average correlation.
The statistic considered here essentially calculates the proportion of the m unique elements in Ct
that have identical patterns of deviations from the long-run correlations as those in C T (SignDiff ).
If matrices are identical with respect to this measure this statistic will take a value of 1.
The weighting scheme also employs a comparison of matrices using the MVQLIKE loss function
(Laurent et al. 2012) due to it being a robust multivariate loss function (as well as playing a key role in
the cross validation procedure), defined as
tr V−
t
1
V T − log V− 1
t VT − n (13)
18 Wharton Research Data Services (WRDS) was used in preparing this paper. This service and the data available thereon
constitute valuable intellectual property and trade secrets of WRDS and/or its third-party suppliers.
19 The data cleaning advice provided in Barndorff-Nielsen et al. (2009) is followed.
√
20 The realized correlation matrices are calculated from Ct = D− 1 −1
t Vt Dt where Dt is a ( n × n ) diagonal matrix with Viit on
the ith diagonal element and Viit is the (i, i ) element of Vt .
Econometrics 2018, 6, 7 12 of 27
such that matrices which are identical will deliver a statistic of value 0. These four statistics are used
to measure the degree of similarity between the VCMs at time t and time T. The variable selection
and bandwidth estimation strategy described previously will determine which of these variables are
relevant for VCM forecasting.
21 Difference between 1 and 10 year maturity treasury yield curve rates for US treasury issued bonds, see http://www.treasury.
gov/resource-center/data-chart-center/interest-rates/Pages/TextView.aspx?data=yield.
22 Difference between yields on Moody’s Aaa and Baa rated corporate bonds. Data obtained from https://research.stlouisfed.
org/fred2/categories/119.
23 Gold price is Gold Fixing price in London Bullion Market, 3:00 pm London time, from https://research.stlouisfed.org/
fred2/series/GOLDPMGBD228NLBM#. Oil price is crude oil brent, price per barrel. Obtained from datastream, with the
identifier OILBREN.
24 While all these variables are available on a daily frequency, the methodology can easily handle lower frequency data such as
Industrial Production and inflation measures which were used to model slow moving stock market volatility by Engle et al.
(2013).
25 While, of course, bull and bear markets are not synonymous with booms and recessions, we feel that the use of the more
narrow definition of a stock market state is justified for the problem at hand. The algorithm identifies bull and bear periods
based on monthly data, daily data is often too noisy to support identification of broad trends. As a result once the algorithm
identifies a month as belonging to a bull/bear period all of the constituent days are also assumed to belong to this period.
26 Data used here is closing price of the S&P500 index for the last day of the month.
27 Daily observations of the CBOE volatility index, data obtained from: http://www.cboe.com/micro/vix/historical.aspx.
28 Temperature data was obtained from the University of Dayton’s daily temperature archive. See http://academic.udayton.
edu/kissock/http/Weather/.
Econometrics 2018, 6, 7 13 of 27
6. Empirical Framework
The proposed model is to be viewed as a forecasting tool only and it not designed to represent
underlying data generating process. Thus, as its potential lies in improved forecast accuracy
this analysis in the tradition of the work by Engle et al. (2013), Bauer and Vorkink (2011) and
Chiriac and Voev (2011). The empirical application of the kernel technique presented in this paper is
designed to answer the following questions. First, does the forecasting approach introduced in Section 4
compare favourably to more established forecasting techniques for relatively high dimensional VCMs?
Second, do the predictor (economic) indicators discussed in Section 5.2.2 provide valuable information
for the purposes of VCM forecasting? Third, do the matrix comparison variables help to improve
forecasting performance? These questions will eventually be answered in Section 8. To that end the
following forecasting structure is devised. The full sample represents daily from 2 January 1997 to
29 November 2012. 2,901 one day ahead forecasts will be produced for the purposes of the forecast
analysis, beginning with a forecast for 19 June 2001 finishing with a forecast for 31 December 2012. The
next two Subsections (Section 6.1) describe the variations of Kernel forecasting models used followed
by a description of their competitor models (Section 6.2). Section 7 analyses the weight vectors used
in these forecasts in order to highlight the different characteristics produced by the different models.
In Section 8 a formal forecast evaluation is presented.
T −1
H T +1 = (1 − λ ) ∑ λ j VT−j . (14)
j =0
A forecast is also generated from Equation (14) where the smoothing parameter is chosen by cross
validation in the same manner in which cross-validation is used to optimise bandwidths for the kernel
forecasting models. In fact one can think of this model as a special case of the kernel forecasting model,
a model that uses time as its only weighting variable29 . In subsequent results these two models are
denoted RM and RM_Opt respectively.
(w)
X t +1 = β 0 + β 1 X t + β 2 X t + β 3 Xtbw + β 4 Xtm + et+1 (15)
The constant β 0 is a (m × 1) vector of element specific constants and β i for i = 1, ..., 4 are scalar
(d) (w) (bw)
coefficients which determine the weight for the daily, Xt , weekly, Xt , bi-weekly, Xt , and monthly,
(m)
Xt , trailing averages (1, 5, 10 and 22 day) of the elements in the Cholesky decomposition. Importantly,
these parameters can be estimated by OLS. Using the estimated coefficients, forecasts for XT +1 , X̂T +1
can be produced, which in turn can be used to produce forecasts for the (n × n) rVCM, H T +1 31 ,
by reversing the vech( ) operation and using the Cholesky decomposition32 . Forecasts from this model
will be denoted as HAR_CD. The parameters of the model are re-estimated at each of the forecast
points considered using either a fixed window length of about 4 years worth of data or a recursive,
increasing window.
This transform/model/re-transform (TMR) approach is extremely convenient as the particular
transformation chosen, here the Cholesky transformation, ensures that the re-transformed
variance-covariance matrix forecast is psd without having to impose any restrictions on the chosen forecasting
model of the transformed unique elements Xt . The Cholesky decomposition is not the only decomposition
that can be used, Bauer and Vorkink (2011) propose the use of a matrix logarithm transformation. Therefore
a HAR_LOG forecast is also generated based on the matrix logarithm transformation33.
HAR type forecasting models can be seen as a step kernel forecast for XT +1 that, by design, puts
0 weight on all realisations of Xt for which T + 1 − t > 21. The reason that this approach does not
perfectly fit into the framework of the kernel forecasting model (as described by Equations (5), (6)
and (7a)) as a special case is that it applies a kernel-type approach to XT +1 , the unique elements of
the Cholesky (or matrix logarithm) decomposition rather than the rVCM directly; the latter being a
non-linear combination of the former. However, the step kernel interpretation will still be useful in
terms of understanding what lags of information are being used. It would be conceptually possible to
apply a step kernel approach directly to the rVCM. This would, for instance, replace the smooth kernel
in the Riskmetrics forecasting model (14). But as argued above, there would be no easy way to estimate
these parameters and one would have to apply a cross-validation type approach as for RM_Opt.
In comparison to the smooth kernel applied in RM_Opt, the step kernel of a HAR-type model appears
more restrictive and will not used a time based weighting function in the kernel forecasting approach.
30 Chiriac and Voev (2011) propose a VARFIMA model rather than the simpler to estimate HAR model although there seems
little forecasting improvement from using this different model.
31 Recall that forecasts of the VCM were labelled as H.
32 It should be noted that the elements of H T +1 are non-linear combinations of the elements in X̂T +1 . Therefore, while
this procedure can produce unbiased forecasts for XT +1 , it will not deliver unbiased forecasts for VT+1 . While
Chiriac and Voev (2011) devise a bias correction strategy they also conclude that it is likely to be practically negligible and
hence we refrain from applying this bias correction. The same issue and conclusion are reached in Bauer and Vorkink (2011).
33 This was also implemented in Chiriac and Voev (2011).
Econometrics 2018, 6, 7 15 of 27
The process begins with a set of forecasting models Γ0 . The first stage of the process tests the
null hypothesis that all of the models considered have equal predictive accuracy (EPA). Let Hit be
the forecast of the VCM at time t as produced by the ith forecasting model. Σt is the observed VCM
(essentially a consistent estimate34 ) at time t. Then a loss function is based on a comparison of these,
L(Hit , Σt ). The evaluation of the EPA hypothesis is based on loss differentials between the values of
the loss functions for different models where the loss differential between forecasting models i and j
for time t, dij,t , is defined as
dij,t = L(Hit , Σt ) − L(H jt , Σt ) (16)
Stationarity of the dij,t is one of the assumptions for the application of the block bootstrap
procedure used to establish the MCS. This is difficult to establish in the context of the loss functions used
here, which are a scalar mapping of a matrix. It is well known that the presence of estimated parameters
makes these considerations even more intractable. Therefore the MCS methodology is applied here
in the knowledge that the validity of its assumptions cannot be established. Nevertheless it is the
best available technology to tackle the current research question (also see Caporin and McAleer 2012;
Laurent et al. 2012; and Becker et al. 2014, for applications of the MCS in a similar context).
If all of the forecast models are equally accurate then the loss differentials between all pairs of
forecast models should not be significantly different from zero. The null hypothesis of EPA is then
H0 : E dij,t = 0 ∀i > j ∈ Γ
(17)
and failure to reject H0 implies all forecasting models in the set Γ0 have equal predictive ability. The
test (17) is conducted using the semi-quadratic test statistic described in Hansen and Lunde (2007).
If the null hypothesis is rejected at an α confidence level, the worst performing model is removed and
the process is repeated with the reduced set of forecasting models, Γ1 . This process is iterated until the
test of equal predictive accuracy cannot be rejected, or a single model remains. The model(s) which
survive form the MCS with α confidence35 .
The loss function used is the MVQLIKE (Stein distance) function described above in (13). This is
a robust loss function, as described in Laurent et al. (2013). Becker et al. (2014) and Laurent et al.
(2012) established that this loss function, compared to other loss functions, identifies a correctly
specified forecasting model in a smaller MCS, hence it is more discriminatory. Analysis is also
conducted using mean average deviation (MAD) and mean square error (MSE) loss functions36 .
However, consistent with findings in Becker et al. (2014) and Laurent et al. (2012) show they tend
to be non-discriminatory (MSE) or inconsistent (MAD) in the sense of Patton and Sheppard (2009).
Therefore the main conclusions drawn here are based on the MVQLIKE results but those utilising
MAD and MSE are also shown to illustrate how the results change in the way predicted by the earlier
literature.
34 In the below forecast experiments we use the realized VCM, Vt , using a regular 5 min grid of intra-daily returns, in place of
Σt as it is a consistent estimator of the unobserved VCM. To establish the robustness of the results we also use the realised
multivariate kernel. Some such robustness results will be included in subsequent tables.
35 We use the mcs function implemented in Kevin Sheppard’s MFE toolbox for MATLAB (https://www.kevinsheppard.com/
MFE_Toolbox).
36 See the definitions in Section 8.1.
Econometrics 2018, 6, 7 16 of 27
Figure 1. Graph of weights (vertical axis) for six different forecasting methods on T = 19 June 2001.
Time lag relative to the period T at which the forecast is formed on the horizontal axis.
The estimated weights for four different kernel models are shown in the top and middle rows.
They produce more flexible weighting structures which need not be decreasing as the time difference
to the time of the forecast, T, increases. The most obvious example of this is in the kernel which
includes only matrix distance measures (K_D, top left), this model includes no explicit time variable
nor macroeconomic indicators, and hence the weights show no consistent pattern with reference to
37 Recall that the HAR model is not a special case of the General Kernel forecasting model ((5), (6) and (7a)), but it is still valid
and instructive to look at the distribution of lags used in the HAR.
Econometrics 2018, 6, 7 17 of 27
time. It should be noted though that the largest weights are still given to the Vt s that are closest to
the forecasting point. It is noteworthy to mention that this method produces positive weights for
lags larger than the maximum lag of 100 days in Figure 1, but also that the largest weights, for this
particular example, relate to observations very close to T (values close to 0 on the horizontal axis).
When the economic predictor variables are included (K_DM, top right), or a time variable itself
(K_DT, middle left) the kernel weighting scheme becomes increasingly influenced by time, however in
neither case do the weights monotonically decrease as the time lag increases. When all of the proposed
variables are included in the kernel (K_DTM, middle right), at least for this particular day, the largest
weight is placed on the most recent observations. But also note that there is a very distinct hump
which allocates larger weights to observations two weeks prior to T than to those one week prior to T.
Lastly it is interesting to compare the weighting functions for the RM and the K_DT models.
While the inclusion of the distance measures described in Section 5.2.1 (in this particular example) do
not change the general shape of the weighting function, now significantly positive weights out to a lag
of 50 rather than 25 are observed.
The differences in the weighting patterns in Figure 1 are an important illustration of how the
kernel method allows for flexible weights. The results in Section 8 will consider, on the basis of a small
experiment, whether these weighting patterns can be translated into improved statistical accuracy of
forecasts for the variance-covariance matrix.
Histograms, showing the distribution of weighted average lag values, L̄T , are shown in Figure 2. Where
n
L̄ T = ∑ Wi · i where i = T − t. (18)
i =1
These histograms provide evidence showing that the weighted lags from the kernel model are
noticeably different from those models which focus exclusively on time. While an increased variance
of L̄ T is an outcome that appears sensible, as it indicates that the kernel forecasting methods do utilise
information that previously has been ignored by the time based forecasting methods, it is also likely
that a sole reliance on matrix distance measures seems implausible. The variation in L̄ T for K_D
evident in Figure 2 is too great to great to make it a plausible forecasting model. When comparing the
histograms for L̄ T from RM_Opt and K_DT, qualitatively similar shapes are observed (right skewed
distributions) but the kernel method does allocate significantly more weight to older observations.
Values for L̄ T > 12 are extremely rare for the RM_Opt model but occur frequently for the RM_DT.
The right tail for the distribution gets even longer when we either also include macroeconomic variables
(K_DTM) or exclude the time variable (K_DM). Lastly, the HAR forecasting model seems again overly
restrictive in its use of past information38 .
38 Of course one could allow for longer lag use in a HAR-type model by allowing longer averages than the standard maximum
of 22 days.
Econometrics 2018, 6, 7 18 of 27
an historical average forecast. Figure 3 illustrates the percentage improvement in fit (as measured by
the QLIKE/Stein measure) of a selection of variables39 .
Figure 2. Histograms of weighted lags, L̄ T as per Equation (18), for six different forecasting methods
during the forecast period. The statistics are calculated from the forecasting models using the RVCM
and the recursive sampling scheme.
Weighting variables are included in the multivariate kernel if when they are used as the sole
weighting variable, the accuracy of the resulting forecasts improve by at least 1% compared an historical
average forecast. The results indicate that almost all variables pass this test and so are included in
the multivariate kernel in almost all time periods. The only variables excluded from the multivariate
kernel using the RMVK VCMs and a recursive estimation scheme are the temperature in Dubai40 and
the sign difference variables which are excluded in all periods, the elementwise difference measure,
which is excluded in the first three time periods and the bull and bear dummy which is excluded
in only the first period. All of the other variables discussed in Section 5.2 are included in all of the
estimation periods.
39 The results for the variables not shown here, to keep the image readable, are similar to the ones shown.
40 The temperature was introduced as a sensibility check. In fact, when using the rolling (rather than recursuve) sample
scheme, this variable does survive the first elimination step. This could be the seasonal nature of this variable which may
pick up some element of local trending in the variance covariance matrix.
Econometrics 2018, 6, 7 19 of 27
Figure 3. Percentage improvement of the QLIKE/Stein statistics, applied to a hold-out sample (see
Section 4.3) when using a variable as the only kernel weighting variable, compared to a simple
historical average (equal weights). The evaluation is undertaken for the kernel models using RVCM
and a recursive sample scheme. This exercise is repeated every 264 trading days. The results for the
variables not represented in this Figure are qualitatively comparable to one of the included variables.
Eigenvalue Ratios (Equation (10)) is similar to the included MVQlike; Sign Differences (Equation (12))
is similar to the Temperature in Dubai; Risk Premium and Oil Price are similar to the Yield Spread and
VIX and Gold Price are similar to MVQlike.
The low threshold in improvement in the preliminary univariate kernel analysis is sufficient to
render the joint multivariate optimisation problem feasible by eliminating uninformative variables.
Otherwise, the numerical optimisation of the multivariate bandwidths can run for long periods without
converging on an optimal solution as any uninformative variables have bandwidths large enough to
make all densities for that variable equal and hence have discriminatory power.
Figure 3 is based on forecasts of the RVCM with a recursive sampling scheme. The results remain
qualitatively very similar when using the RMVK rather than the RVCM as a proxy for the variance
covariance matrix. When using the rolling sampling scheme the results are again qualitatively fairly
similar. What the analysis of univariate improvements in Figure 3 illustrates is how important the
respective weighting variables are when considered in isolation.
The one stark difference between the use of RMVK and RVCM in the univariate kernels occurs in
the variable selection exercise for the 2007–2010 sample period. The rolling sample exhibits significantly
reduced improvements of the univariate kernels relative to the historical rolling average, during the
2007–2010 period, thus all of the lines in Figure 3 dip towards the x-axis in the 2007–2010 period when
using RVCM rather than RMVK. Otherwise the results illustrated in Figure 3 can be thought of as
being a good representation of univariate kernel behaviour across sampling schemes and methods of
obtaining observations of the VCM.
Eventually these variables are used in combination and establishing which of these variables are
most influential in terms of determining weights Wt is not straightforward. At each point in time,
the influence of variable j is a function of its own bandwidth h j , its value at the time of forecasting
Θ T,j and its difference to all previous values Θt,j for all t < T, and also the respective bandwidth and
variable values for all other variables (see Equations (7a)–(7d)).
In order to gain an insight into the importance of individual variables, a detailed analysis of
Wt is undertaken, potentially including all weighting variables (K_DTM) using RVCM proxies and a
recursive estimation scheme. At each forecast period T, the weighted average lag as per Equation (18),
Econometrics 2018, 6, 7 20 of 27
−j
L̄ T is calculated. New weights41 , Wt are then calculated which result excluding the jth weighting
−j
variable and a corresponding L̄ T
is then determined. If a particular weighting variable j was not
influential in the weight calculation at a particular forecast period T we will see values for DT =
−j
L̄ T − L̄ T close to 0; and conversely values significantly different from 0 if the jth variable was important
at a particular T.
Figure 4 plots the resulting values for DT across all forecasting periods and for all weighting
variables. The most dominant feature in these results is that time variable is by far the most influential
variable (note the different scale for the time variable). Only in periods when the time variable is
least influential (2002–2003 and 2009–2012) is a significant influence exerted by the other weighting
variables. In particular the Yield Spread, MVQLIKE and the VIX are influential during the 2002–2003
period and the MVQLIKE, the element wise difference (ElemDiff ), the VIX and the Gold Price are
important between 2009 and 2012. These findings are consistent with those obtained from evaluating
the importance of individual weighting variables in Figure 3.
Figure 4. Graph of DT for each of the 12 variables across the forecasting period.
8. Analysis-Forecast Evaluation
This section presents the formal forecast evaluation. The forecasting results for the full sample are
presented in Section 8.1. An analysis of sub-sample results concludes this Section.
Both a rolling (fixed estimation window length) and a recursive (uses all available data at the point
of forecasts) estimation scheme are used. The analysis is also undertaken for two different estimators
of realised covariance matrices to be used in the kernel model (5). The realised multivariate kernel
estimator (RMVK) as described in Section 4.1 and a standard estimator of realised covariation using
5 min intra-day returns (RVCM) are used.
In Table 2 results of MCS analyses of the forecasts from the models are presented. MCS p-values
are reported with values smaller than 0.05 indicating that the respective model is excluded from the
95% MCS.
Table 2. Model Confidence Set (MCS) p-values for different forecast models. On the basis of 2901 daily
1-day ahead forecasts (19 June 2001 to 31 December 2012). The MCS algorithm is applied to the indicated
loss functions. Values larger than 0.05 indicate models that would be included in 95% confidence MCS.
Σt is proxied by the realized variance covariance matrix using a regular grid of 5 min intra-day returns.
Recursive/Rolling indicates the type of estimation window used. RMVK represents models that used the
multivariate kernel estimates to estimate the variance covariance matrix while RVCM indicates that the
model used an estimate based on a regular grid of 5 min intra-day returns. Loss functions: (MV)QLIKE as
defined in (16); MSE is the mean squared difference between the forecast and observed variance covariance
matrix as measured across all elements, vec(Hit − Σt )0 vec(Hit − Σt )/n2 , scaled by 108 .
To interpret these results, begin by concentrating on the results for forecasts using a recursive
scheme. When evaluating forecasts with MVQLIKE, K_DT is the only remaining model in the MCS,
with the K_TDM having MCS p-values just below 5%. These results are interesting in a number of
respects. First, it is important to note that the addition of matrix distance measures deliver significant
improvements in the VCM forecasts. This is indicated by the fact that the RM_Opt forecasts (which
are equivalent to kernel forecasts with only time as a weighting variable) are not included in the MCS.
Econometrics 2018, 6, 7 22 of 27
Second, the addition of exogenous variables (M, in addition to matrix distance measures, D) appear
not to improve the forecasts. In fact, they seem to have a slightly detrimental effect on the resulting
forecasts, noting that K_DTM is marginally rejected from the MCS (at α = 0.05 but not at α = 0.01).
Third, the previously discussed differences in terms of summary statistics between the kernel forecasts
using the time variable (K_DT and K_DTM) and those not (K_D and K_DM) turns out to be a statistically
significant one. Consequently, the inclusion of a time variable as one of the weighting variables is
important in order to make the best use of the matrix distance and exogenous variables.
When turning to the results (focusing on those for recursive sampling) using MSE the MCS
methodology is unable to identify any of the models as being inferior to any other. Confirming the
results of Becker et al. (2014) we find the MSE criterion to be unable to discriminate between forecasting
models. The results that are based on the rolling sampling scheme are somewhat different and less
favourable for the kernel forecasting methodology. Of course, it was argued earlier that, as long as
the time variable is included as a weighting variable, the recursive scheme is a sensible choice for the
kernel methodology as it allows the forecasting model to access information from “distant” history
if the variable similarities demand this. It also seems that restricting the available information via
a rolling scheme is to the detriment of the method. A similar result can be seen in the MCS results.
Where K_TD was judged to be superior to other forecasting models using the recursive sampling
scheme, it is now either not superior to the Riskmetrics forecasting model (RM_Opt) or it is indeed
judged to be inferior (for the case of a RVCM proxy).
Overall these results indicate that the qualitative differences we identified in the weighting
functions (7) can result in statistically significant forecast differences. It is, however, important to note
that such results would need to be corroborated by many more forecasting scenarios before we could
make general statements about the use of the method as a forecasting tool.
42 As our first forecasting period is the 19 June 2001, the first of these sub-samples has somewhat fewer observations, 384, than
the others which all have around 500 observations.
Econometrics 2018, 6, 7 23 of 27
Table 3. Results of MCS analysis of 1 day ahead forecasts over two year periods the period 2001–2012.
Results in this Table are based on forecasting models that use the RVCM as estimates for the variance
covariance model in the forecasting models, and as proxies for Σt , the variance covariance matrix. Sampling
method is recursive and the loss function used in the MCS algorithm is the QLIKE loss function.
These sub-sample results are robust to using RMVK in the forecasting model and as a proxy for
the variance covariance matrix in the forecast evaluation. When a rolling sampling scheme is used,
the RM models are always in the MCS and other kernel models are occasionally included. This finding
is consistent with the earlier results based on the rolling sampling scheme43 .
Author Contributions: This paper is based on a Chapter of a PhD thesis submitted by Robert O’Neill to the
University of Manchester in 2011. All authors contributed significantly to the development of the paper and the
improvement to the idea presented in that thesis.
Conflicts of Interest: There are no conflicts of interest.
Econometrics 2018, 6, 7 25 of 27
References
Aït-Sahalia, Yacine, and Michael W. Brandt. 2001. Variable selection for portfolio choice. The Journal of Finance 56:
1297–351.
Aitchison, J., and C. G. G. Aitken. 1976. Multivariate binary discrimination by the kernel method. Biometrika 63:
413–20.
Barndorff-Nielsen, Ole E., Peter Reinhard Hansen, Asger Lunde, and Neil Shephard. 2009. Realized kernals in
parctice: Trades and quotes. Econometrics Journal 12: C1–C32.
Barndorff-Nielsen, Ole E., Peter Reinhard Hansen, Asger Lunde, and Neil Shephard. 2012. Multivariate realised
kernals: Consistent positive semi-definite estimators of the covariation of equity prices with noise and
non-synchronous trading. Journal of Econometrics 162: 149–69.
Bauer, Gregory H., and Keith Vorkink. 2011. Forecasting multivariate realized stock market volatility. Journal of
Econometrics 160: 93–101. doi:10.1016/j.jeconom.2010.03.021.
Becker, Ralf, Clements, Adam, Doolan, Mark, and Hurn, Stan, 2014. Selecting volatility forecasting
models for portfolio allocation purposes. International Journal of Forecasting 31: 849–61.
doi:10.1016/j.ijforecast.2013.11.007.
Blair, Bevan J., Ser-Huang Poon, and Stephen J. Taylor. 2001. Forecasting S&P 100 volatility: The incremental
information content of implied volatilities and high-frequency index returns. Journal of Econometrics 105: 5–26.
Bowman, Adrian W. 1984. An alternative method of cross-validation for the smoothing of density estimates.
Biometrika 71: 353–60.
Bowman, Adrian W. 1997. Applied Smoothing Techniques for Data Analysis: The Kernel Approach with S-Plus
Illustrations. Oxford: Clarendon Press.
Caporin, Michael, and Massimiliano McAleer. 2012. Robust Ranking Multivariate GARCH Models by
Problem Dimension: An Empirical Evaluation. Working Paper No. 815, Institute of Economic Research,
Kyoto University, Kyoto, Japan.
Campbell, John Y. 1987. Stock returns and the term structure. Journal of Financial Econometrics 18: 373–99.
Campbell, John Y., Martin Lettau, Burton G. Malkiel, and Yexiao Xu. 2001. Have individual stocks become more
volatile? An empirical exploration of idiosyncratic risk. The Journal of Finance 56: 1–43.
Econometrics 2018, 6, 7 26 of 27
Christensen, Kim, Silja Kinnebrock, and Mark Podolskij. 2010. Pre-averaging estimators of the ex-post covariance
matrix in noisy diffusion models with non-synchronous data. Journal of Econometrics 159: 116–33.
Clements, Adam E., Stan Hurn, and Ralf Becker. 2011. Semi-Parametric Forecasting of Realized Volatility.
Studies in Nonlinear Dynamics & Econometrics 15: 1–21.
Chiriac, Roxana, and Valeri Voev. 2011. Modelling and forecasting multivariate realized volatility. Journal of
Applied Econometrics 26: 922–47.
Corsi, Fulvio. 2009. A simple approximate long-memory model of realized volatility. Journal of Financial
Econometrics 7: 1–23.
Engle, Robert F., Eric Ghysels, and Bumjean Sohn. 2013. Stock market volatility and macroeconomic fundamentals.
The Review of Economics and Statistics 95: 776–97.
Engle, Robert F., and Kevin Sheppard. 2001. Theoretical and Empirical Properties of Dynamic Conditional
Correlation Multivariate GARCH. NBER Working Paper No. 8554, NBER, Cambridge, MA, USA.
Fama, Eugene F., and Kenneth R. French. 1989. Business conditions and expected returns on stocks and bonds.
Journal of Financial Economics 25: 23–49.
Fleming, Jeff, Chris Kirby, and Barbara Ostdiek. 2003. The economic value of volatility timing using ’realized’
volatility. Journal of Financial Economics 67: 473–509.
Gijbels, Itzhak., Alun Lloyd Pope, and M. P. Wand. 1999. Understanding exponential smoothing via kernel
regression. Journal of the Royal Statistical Society 61: 39–50.
Gilboa, Itzhak, Offer Lieberman, and David Schmeidler. 2006. Empirical similarity. The Review of Economics and
Statistics 88: 433–44. doi:10.1162/rest.88.3.433.
Gilboa, Itzhak, Offer Lieberman, and David Schmeidle. 2011. A similarity-based approach to prediction. Journal of
Econometrics 162: 124–31.
Golosnoy, Vasyl, Alain Hamid, and Yarema Okhrin. 2014. The empirical similarity approach for volatility
prediction. Journal of Banking & Finance 40: 321–29. doi:10.1016/j.jbankfin.2013.12.009.
Golosnoy, Vasyl, Alain Hamid, and Yarema Okhrin. 2012. The conditional autoregressive Wishart model for
multivariate stock market volatility. Journal of Econometrics 167: 211–23.
Hamilton, James D., and Gang Lin. 1996. Stock market volatility and the business cycle. Journal of Applied
Econometrics 11: 573–93.
Hamilton, James D. 1996. This is what happened to the oil price-macroeconomy relationship. Journal of Monetary
Economics 38: 215–20.
Hansen, R. P., and A. Lunde. 2007. MULCOM 1.00, Econometric toolkit for multiple comparisons. (Packaged with
Mulcom package). Unpublished.
Hansen, Peter Reinhard, Asger Lunde, and James M. Nason. 2003. Choosing the best volatility models: the model
confidence set approach. Oxford Bulletin of Economics and Statistics 65: 839–61.
Harvey, Campbell R. 1989. Time-varying conditional covariance in tests of asset pricing models. Journal of Financial
Economics 24: 289–317.
Harvey, Campbell R. 1991. The Specification Of Conditional Expectations. Working Paper, Duke University,
Durham, NC, USA.
Heiden, Moritz D. 2015. Pitfalls of the Cholesky Decomposition for Forecasting Multivariate Volatility. Available
online: http://ssrn.com/abstract=2686482 (accessed on 29 September 2017).
J.P. Morgan. 1996. Riskmetrics Technical Document, 4th ed. New York: J.P. Morgan.
Laurent, Sébastien, Jeroen V. K. Rombouts, and Francesco Violante. 2012. On the forecasting accuracy of
multivariate GARCH models. Journal of Applied Econometrics 27: 934–55.
Laurent, Sébastien, Jeroen V. K. Rombouts, and Francesco Violante. 2013. On loss functions and ranking forecasting
performances of multivariate volatility models. Journal of Econometrics 173: 1–10.
Li, Qi, and Jeffrey S. Racine. 2007. Nonparametric Econometrics Theory and Practice. Oxford: Princeton University Press.
Moskowitz, Tobias J. 2003. An analysis of covariance risk and pricing anomalies. The Review of Financial Studies 16:
417–57.
Pagan, Adrian R., and Kirill A. Sossounov. 2003. A simple framework for analysing bull and bear markets. Journal
of Applied Econometrics 18: 23–46.
Patton, Andrew J., and Kevin Sheppard. 2009. Evaluating volatility and correlation forecasts. In Handbook
of Financial Time Series. Edited by Torben Gustav Andersen, Richard A. Davis, Jens-Peter Kreib and
Thomas V. Mikosch. Berlin: Springer Verlag.
Econometrics 2018, 6, 7 27 of 27
Poon, Ser-Huang, and Clive W. J. Granger. 2003. Forecasting volatility in financial markets: A review. Journal of
Economic Literature 41: 478–539.
Rudemo, Mats. 1982. Empirical choice of histograms and kernel density estimators. Scandinavian Journal of
Statistics 9: 65–78.
Sadorsky, Perry. 1999. Oil price shocks and stock market activity. Energy Economics 21: 449–69.
Schwert, G. William. 1989. Why does stock market volatility change over time. The Journal of Finance 44: 1115–53.
Silvennoinen, Annastiina, and Timo Teräsvirta. 2009. Multivariate GARCH Models. In Handbook of Financial
Time Series. Edited by Torben Gustav Andersen, Richard A. Davis, Jens-Peter Kreib and Thomas V. Mikosch.
Berlin: Springer.
Silverman, Bernard W. 1986. Density Estimation for Statistics and Data Analysis. London: Chapman & Hall.
Sjaastad, Larry A., and Fabio Scacciavillani. 1996. The price of gold and the exchange rate. Journal of International
Money and Finance 15: 79–97.
Wand, M. P., and M. C. Jones. 1995. Kernel Smoothing. London: Chapman & Hall.
Whitelaw, Robert F. 1994. Time variations and covariations in the expectation and volatility of stock market
returns. The Journal of Finance 49: 515–41.
c 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).