Garch+deep Learning
Garch+deep Learning
* - corresponding author
Abstract
In this paper, we develop a hybrid approach to forecasting the volatility and risk of financial
instruments by combining common econometric GARCH time series models with deep learning
neural networks. For the latter, we employ Gated Recurrent Unit (GRU) networks, whereas four
different specifications are used as the GARCH component: standard GARCH, EGARCH,
GJR-GARCH and APARCH. Models are tested using daily logarithmic returns on the S&P 500 index
as well as gold price Bitcoin prices, with the three assets representing quite distinct volatility
dynamics. As the main volatility estimator, also underlying the target function of our hybrid models,
we use the price-range-based Garman-Klass estimator, modified to incorporate the opening and
closing prices. Volatility forecasts resulting from the hybrid models are employed to evaluate the
assets’ risk using the Value-at-Risk (VaR) and Expected Shortfall (ES) at two different tolerance
levels of 5% and 1%. Gains from combining the GARCH and GRU approaches are discussed in the
contexts of both the volatility and risk forecasts. In general, it can be concluded that the hybrid
solutions produce more accurate point volatility forecasts, although it does not necessarily translate
into superior VaR and ES forecasts.
Keywords: Value-at-Risk, Expected Shortfall, risk management, financial time series, neural
networks.
1. Introduction and literature review
Measuring and predicting volatility and investment risk of financial assets are perennial
problems of great importance both for scientists and practitioners, with the relevant literature
abounding in model specifications and quantitative methods designed to address the tasks. Currently
the most common approach has been developed within the area of financial econometrics, where the
prices of financial instruments are typically assumed to form some conditionally heteroscedastic
stochastic processes, the exact specification of which, along with their estimation and statistical
inference, constitute a key part of the researchers’ endeavours. A basic group of this type of tools for
modelling and forecasting volatility (and, consequently, risk) are the GARCH models developed by
Bollerslev (1986) & Taylor (1986), generalising the ARCH specification proposed by Engle (1982).
The input information in the GARCH models, driving current volatility, comprises primarily past
return rates and their conditional variances. Voluminous subsequent research aimed at modifications
and extensions of the original GARCH structure, also by admitting various types of the conditional
distribution. This resulted in a considerable diversity of the GARCH class, with some of the most
popular specifications including EGARCH (Nelson, 1991), APARCH (Ding, Engle & Granger, 1993),
GJR-GARCH (Glosten, Jagannathan & Runkle, 1993), and TGARCH (Zakoian, 1994).
A parallel trend in financial time series modelling and forecasting follows the development of
machine learning tools, particularly artificial neural networks (ANNs). These models, often treated as
“black boxes”, are regarded as nonlinear and nonparametric techniques in which no a priori
assumption concerning the mathematical form (equation) of the model is formulated. The function
mapping input data into output signals (forecasts) is formed at the stage of training the model,
implemented on the basis of a learning set including historical quotations. Over recent years, in data
analyses, both researchers and practitioners have increasingly used dynamic ANNs equipped with the
ability to remember and process information from some recent period of time. These tools include
mainly deep-learning-based recurrent neural networks (RNNs; introduced by Hopfield, 1982, and
further developed by Williams, Hinton & Rumelhart, 1986), in particular Long Short-Term Memory
networks (LSTM; Hochreiter & Schmidhuber, 1997), and also (utilised in the presented research)
Gated Recurrent Unit (GRU) neural networks (Chung et al., 2014), which constitute simplified
modifications of LSTM.
1
Quite recently, a new promising research trend has emerged (including also our present
paper), in which attempts are made to integrate formal tools based on the GARCH methodology with
currently developed neural models based on deep learning with memorising the dynamics of the
analysed phenomenon. Research on this type of hybrid models has been undertaken in many works. In
particular, to cite only the most pertaining to the current paper, Kristjanpoller & Minutolo have
applied hybrid models (based on feed-forward back-propagation neural network and GARCH) to
predict the volatility of gold (Kristjanpoller & Minutolo, 2015) and oil prices (Kristjanpoller &
Minutolo, 2016). Hu, Ni & Wen (2020) developed a hybrid deep learning method combining GARCH
with neural networks and applied it to forecasting the volatility of copper price. Finally, in (Liu & So,
2020), a GARCH model was incorporated into an LSTM network for improving the prediction of
stock volatility.
2. Methodology
Below we briefly present the methodological framework of this study, the basis of which is
formed by the class of GARCH models (Subsection 2.1), later combined with GRU neural networks
to yield the final, GARCH-GRU hybrid specifications (Subsection 2.2). We close this section with a
concise presentation of the ex post volatility forecast accuracy measures employed in our work
(Subsection 2.3).
with 𝑃𝑡 and 𝑃𝑡−1 standing for the instrument’s prices at time 𝑡 and 𝑡 − 1, respectively. Typically, in
financial econometrics, a series of the log returns is modelled as the sum of the conditional mean of
the returns (given the past information, ψ𝑡−1), and an error term, ε𝑡:
2
where 𝐸(𝑟𝑡|ψ𝑡−1) usually takes some ARMA form. To capture some well-documented empirical
characteristics of most financial time series (with volatility clustering and fat tails among other,
widely recognized features; Tsay, 2010), the errors are usually defined as some conditionally
heteroscedastic process, typically of a very broad GARCH family, extending the basic ARCH
structure introduced by Engle (1982), later generalised into GARCH by Bollerslev (1986). The error
term therein is defined as the product:
1/2
ε𝑡 = 𝑧𝑡ℎ𝑡 , (2)
where random variables 𝑧𝑡~𝑖𝑖𝐷(0, 1) form a sequence of independent and identically distributed
1/2
standardised errors (with zero mean and unit variance), while ℎ is the return’s conditional standard
𝑡
deviation, usually referred to as the volatility.
In this research, four most commonly entertained in the literature GARCH specifications are
considered: ‘standard’ GARCH (Bollerslev, 1986), the Glosten-Jagannathan-Runkle GARCH
(GJR-GARCH; Glosten et al. 1993), the Exponential GARCH (EGARCH; Nelson, 1991), and the
Asymmetric Power ARCH (APARCH; Ding et al., 1993). Below, we briefly present their underlying
equations defining the dynamics of conditional variance (a detailed and comprehensive review of
univariate GARCH model specifications can be found, e.g., in Terasvirta, 2009).
GARCH
The volatility equation takes the form:
𝑞 𝑝
2
ℎ𝑡 = α0 + ∑ α𝑖ε 𝑡−𝑖
+ ∑ β𝑗ℎ𝑡−𝑗, (3)
𝑖=1 𝑗=1
where ℎ𝑡 is the conditional variance at time t, and the parameters are subject to restrictions ensuring
positive ℎ𝑡 for each t: α0 > 0, αi ≥ 0 for i = 1, ..., q, and β𝑗≥ 0 for j=1, ..., p.
GJR-GARCH
The volatility equation takes the form:
𝑞 𝑞 𝑝
2 2
ℎ𝑡 = α0 + ∑ α𝑖ε 𝑡−𝑖
+ ∑ ω𝑖𝐼𝑡−𝑖ε 𝑡−𝑖
+ ∑ β𝑗ℎ𝑡−𝑗, (4)
𝑖=1 𝑖=1 𝑗=1
where It−i =1 when εt−i ≤ 0, and It−i = 0 otherwise. Additionally, α0 > 0 , αi ≥ 0 , αi +ωi ≥ 0, for i = 1, ...,
q ≥ 0, and βj for j = 1, ..., p.
EGARCH
The volatility equation takes the form:
3
𝑞 𝑝
𝑙𝑛 ℎ𝑡 = α0 + ∑ α𝑖{θ𝑧𝑡−𝑖 + γ[|𝑧𝑡−𝑖| − 𝐸(|𝑧𝑡−𝑖|)]} + ∑ β𝑗𝑙𝑛 ℎ𝑡−𝑗, (5)
𝑖=1 𝑗=1
APARCH
The volatility equation takes the form:
𝑞 𝑝
δ δ δ
ℎ 𝑡
= α0 + ∑ α𝑖[|ε𝑡−𝑖| − γ𝑖ε𝑡−𝑖] + ∑ β𝑗ℎ 𝑡−𝑗
, (6)
𝑖=1 𝑗=1
Three types of conditional distributions, most commonly entertained in the literature, are used
in this study for the standardised error term, 𝑧 : the normal distribution, as well as Student’s t- and
𝑡
Estimation (through the maximum likelihood approach) and forecasting in the GARCH
models have been implemented in numerous libraries available in the R programming environment,
among which the rugarch package (see Ghalanos, 2022a, b) appears one of the most popular and
comprehensive, and is also employed in this work.
The neural network component of the hybrid models developed in this paper relies on GRU
neural networks, introduced by Chung et al. (2014), as a simplified version of more popular LSTM
networks (see Figure 1). The GRU networks use a single unit to forget the information or update the
network state, which allows them to achieve similar results to LSTMs, while significantly reducing
the training time.
4
The functions within a GRU network cell can be described by the following equations:
𝑧𝑡 = σ𝑔(𝑊𝑧𝑥𝑡 + 𝑈𝑧𝑜𝑡−1 + 𝑏𝑧), (7)
where xt is the input vector, ot is the output vector, while zt and rt are the update gate and the reset gate
vectors, respectively. The matrices W and U (subscripted according to pertinent equations) as well as
the vectors b comprise the net’s parameters, σ𝑔 and ϕ0 are sigmoid and hyperbolic tangent activation
functions, and finally, ⊙ denotes the Hadamard product. See Goodfellow et al. (2016) for a detailed
description of the GRU networks and a comparison with other types of recurrent networks.
For the inputs, we use the following three in this research: the absolute log returns, volatility
estimates (obtained by means of a given method; see below), and the volatility forecasts derived from
a given GARCH model. The incorporation of the latter into the model yields a hybrid structure,
further referred to as GARCH-GRU (see Figure 2).
5
This model is trained to the Garman-Klass volatility estimator modified by including the gap between
the previous day’s closing and the current day’s opening prices, following Yang & Zhang (2000).
Specifically, the volatility estimate (denoted as GKYZ) is given by the formula:
2 2 2
σ
2,𝐺𝐾𝑌𝑍
=
1
𝑛
⎡
⎢
⎣
(
∑⎢ 𝑙𝑛
𝑂𝑖
𝐶𝑖−1 ) +
1
2 ( )
𝑙𝑛
𝐻𝑖
𝐿𝑖
− (2 · 𝑙𝑛2 − 1) 𝑙𝑛 ( )
𝐶𝑖
𝑂𝑖
⎤
⎥,
⎥
⎦
(11)
where 𝑂𝑖, 𝐻𝑖, 𝐿𝑖 and 𝐶𝑖, respectively, denote the opening, the highest, the lowest and the closing price
at time i. Finally, n denotes the number of daily log returns used to calculate the estimate (we set n =
10). Additionally, we scale the estimator to match the magnitude of the volatility estimates with the
ones retrieved from GARCH models. To that end, the following formula proposed by Fiszeder (2005,
2009) is employed:
𝐺𝐾𝑌𝑍 𝑎 𝐺𝐾𝑌𝑍
𝑆𝑐𝑎𝑙𝑒𝑑 σ = 𝑏
·σ , (12)
𝑇 𝑇
1 2 1 𝐺𝐾𝑌𝑍,2
𝑎= ∑ 𝑟𝑡 , 𝑏 = ∑ σ𝑡 (13)
𝑇 𝑇
𝑡=1 𝑡=1
where T denotes the initial sample size used for the estimation of a GARCH model.
Specific architecture of the GRU component used in this research consists of three GRU-type
layers with 512/256/128 neurons and one single neuron dense layer on the output. Each of these GRU
layers uses ReLU (Rectified Linear Unit) activation function, a dropout regulariser set to 0.3, and l2
kernel regulariser set to 0.00001, which allows to select the best MSE value based on the validation
set loss from all the epochs.
For the network optimisation, we use Adam optimiser (Kingma & Ba, 2017), with the
learning rate set to 0.0009. The loss function is defined as the mean square error between the GKYZ
volatility estimates at time t+1 and the network output (predictions; see Eq. 14). Datasets feeded into
the network are divided into mini-batches, with the size of 500 data points, while each batch is
divided into sequences of 6 days based on which a single day prediction is produced. The tuning
process is performed by means of the KersTuner with Hyperband algorithm (O’Malley et al., 2019).
Finally, the model is trained for 150 epochs, with a model checkpoint callback function using the
lowest value of the loss that occurred during the training.
6
2.3 Ex post volatility forecasts evaluation
Ex post assessment of the volatility forecasts produced by the various GARCH and
GARCH-GRU models is carried out in this study with respect to two aspects. First, the overall point
prediction accuracy is measured by three standard forecast error metrics:
𝑛
1 𝐺𝐾𝑌𝑍 𝑓 2
𝑀𝑆𝐸 = 𝑛
∑ (σ𝑡+1 − σ𝑡+1) , (14)
𝑡=1
𝑛
∑ |||σ𝑡+1 − σ𝑡+1|||,
1 𝐺𝐾𝑌𝑍 𝑓
𝑀𝐴𝐸 = 𝑛 (15)
𝑡=1
2
𝑛
( )
𝐺𝐾𝑌𝑍 𝑓
1 σ𝑡+1 −σ𝑡+1
𝐻𝑀𝑆𝐸 = ∑ 𝐺𝐾𝑌𝑍 , (16)
𝑛 σ𝑡+1
𝑡=1
𝐺𝐾𝑌𝑍 𝑓
where σ𝑡+1 is the volatility estimated using the GKYZ estimator and σ𝑡+1 is the volatility forecast
The second aspect of the forecasts evaluation in this paper is risk measures accuracy. To that
end, volatility predictions resulting from GARCH and GARCH-GRU models are used to produce long
position Value at Risk (VaR) and Expected Shortfall (ES) forecasts :
𝑓 𝑓 𝑧
𝑉𝑎𝑅𝑡+1(α) =− 𝑟𝑡+1 − σ𝑡+1𝑞α, (18)
( )
𝐸𝑆𝑡+1(α) = 𝐸 𝑟𝑡+1|𝑟𝑡+1 < 𝑉𝑎𝑅𝑡+1(α) = 𝑟𝑡+1 + σ𝑡+1𝐸 𝑧𝑡|𝑧𝑡 < 𝑞α ,
𝑓 𝑓
( 𝑧
) (19)
7
𝑓 𝑓 𝑧
where 𝑟𝑡+1 and σ𝑡+1 are, respectively, the forecasted return and volatility at time t+1, while 𝑞α denotes
𝑡ℎ
the α quantile of the distribution assumed for 𝑧𝑡 (see Doman & Doman, 2009; McNeil & Fray,
𝑓
2000). Notice that the return predictions, 𝑟𝑡+1, are generated in our paper only through the underlying
GARCH model, and thereby are not further processed through the hybrid GARCH-GRU structure.
This limitation is intended here to ensure that any differences between the VaR and ES predictions
stemming from GARCH and GARCH-GRU models remain attributable solely to the accuracy of the
volatility forecasts produced by the two approaches. Analysis of further potential gains from using the
hybrid model return predictions instead, remain beyond the scope of the current research.
Backtesting of VaR and ES forecasts is performed by means of standard tools. For the former,
we use two procedures. First, the Kupiec (1995) test is employed to examine the unconditional
coverage property (stated by the null hypothesis), that is the consistency between the empirical and
expected numbers of VaR exceedances, under a chosen tolerance level. Both significantly higher and
lower number of exceedances cause the null to be rejected. Second, through the conditional coverage
test by Christoffersen (1998, 2001) test we check whether the VaR hits are independent (thus do not
occur in clusters) and the empirical VaR hit ratio coincides with the assumed tolerance probability.
The two statements jointly form the null hypothesis.
To backtest the Expected Shortfall predictions, we resort to the McNeil and Fray (2000) test,
with the null assuming that the mean of the ES exceedances equals zero. The test results are reported
in two variants, one under the exact distribution of the test statistics, while the other using a
bootstrapped distribution, with the latter approach accounting for a possible misspecification of the
underlying distribution of the standardised residuals.
3. Empirical analysis
3.1 Data
The following three data sets of daily logarithmic rates of return are analysed in our research,
each representing quite a distinct type of financial assets: the S&P 500 index (5-day week, quotations
over 6 April 2009 to 31 December 2020), Bitcoin (BTC/USD; 7-day week, quotations over 5 August
2013 to 31 December 2020), and gold (XAU/USD; 5-day week, quotations over 8 July 2010 to 31
December 2020). The time ranges of the data sets ensure an equal number of 2707 observations in
each case. Figure 3 displays the asset prices, while Figure 4 shows the returns (in percentage points)
8
and squared returns. Empirical distributions of the returns (along with a normal distribution fit) are
presented in Figure 5.
Figure 4. Returns and squared returns of S&P 500, Bitcoin and gold
Figure 4 reveals a much higher volatility of Bitcoin as compared with S&P 500 and gold,
which is fairly typical, bearing in mind a generally elevated volatility of all cryptocurrencies in recent
years (see the standard deviations reported in Table 1, although the coefficients of variation, CV,
presented therein may suggest otherwise, which is only attributable to relatively lower means of the
S&P 500 and gold returns). In addition, Bitcoin scores much higher returns (in absolute terms) than
the other two assets. According to Table 1, all the data distributions exhibit a noticeable negative
skewness and leptokurtosis, both quite commonly featured by financial data.
9
Figure 5. Histograms of S&P 500, Bitcoin and gold returns, along with a normal distribution fit (line)
For the forecasting evaluation of the models (presented in the following section), all the data
sets are divided in such a manner as to ensure the same amounts of observations for each asset at
corresponding stages of analysis. Specifically, a rolling window scheme is employed for both
GARCH and hybrid GARCH-GRU models. The size of the rolling window for the GARCH models is
set to 504 days (the models are re-estimated upon arrival of each new observation). The GARCH
component order is fixed to p=q=1 (see Eqs. 3-6) for all the assets (as typically done in the empirical
literature), while the ARMA component is reduced to AR(1) for S&P 500, and only a constant for
Bitcoin and gold, with the choices supported by a preliminary analysis (the results left unreported for
the sake of brevity) employing the Bayesian information criterion (BIC).
For the neural network stage, the data is divided into a series of rolling training sets, each of
1008 observations, and a series of rolling test sets, each comprising 504 observations. Each time, 33%
of the training set (336 observations) is used for a validation set.
The total number of the ex post evaluated predictions obtained from the GARCH and hybrid
models is 1194, and is the same for each asset (although the corresponding time ranges vary: 7 April
2016 to 31 December 2020 for S&P 500, 18 May 2016 to 31 December 2020 for gold, and 29
September 2017 to 31 December 2020 for Bitcoin).
10
Empirical results obtained for each of the three assets are discussed below in the following
fashion. First, we compare the models in terms of the MSE loss function along with testing its values
through the Diebold-Mariano test, with low p-values favouring the hybrid model; see Tables 2, 5, and
8. Then, results for VaR exceedances are presented (Tables 3, 6, and 9). Finally, based on the previous,
a selection of the best performing models is analysed in more detail, both with respect to the overall
volatility forecast accuracy and risk prediction (Tables 4, 7, and 10), the latter including 1% and 5%
Value at Risks as well as 5% Expected Shortfall.
Table 2 indicates that the best performing (in terms of MSE) is the EGARCH-GRU model
with a skewed Student’s t-distribution, although only by a rather narrow margin as compared with
some other specifications, like GJR-GARCH-GRU with either a symmetric or skewed t-distribution,
and even ‘standard’ GARCH-GRU with a skewed t-distribution. Forecasts obtained from the winning
specification are displayed in Figure 6, along with the GKYZ volatility estimates and the ‘pure’
EGARCH model (with a skewed Student’s t-distribution).
Overall, the results presented in Table 2 imply unanimously that combining GARCH models
with GRU networks significantly enhances the forecast accuracy, with all of the Diebold-Mariano test
p-values remaining below 0.05.
11
Table 2. Comparison of volatility forecasts based on the MSE loss function across all models and
distributions for S&P 500
Metrics / Model G(N) G(N)-GRU G(STD) G(STD)-GRU G(SSTD) G(SSTD)-GRU
Next, we compare the models in terms of VaR unconditional coverage, with Table 3
presenting the actual number of VaR exceedances and hit ratios (in percentage terms) for both VaR
tolerance levels, i.e. 5% and 1%. The results indicate that the most accurate VaR hit coverage is
attained by the GJR-GARCH-GRU models with a normal and a skewed Student’s t-distribution for
the 5% tolerance level, and the APARCH model with a skewed Student’s t-distribution for the 1%
tolerance (paths of the 5% and 1% VaR forecasts along with their violations are displayed in Figure
7). Overall, and contrary to Table 2, the results here provide only mixed conclusions as to gains from
resorting to the hybrid models, since combining GARCH with the GRU networks does not necessarily
bring the VaR hit ratios closer to the expected 5% and 1% tolerance levels.
12
Table 3. Number of VaR exceedances across all models for S&P 500
VaR / Model G(N) G(N)-GRU G(STD) G(STD)-GRU G(SSTD) G(SSTD)-GRU
A selection of best performing models, according to MSE and VaR exceedances, is further
analysed in more detail. Results shown in Table 4 indicate that the two GJR-GARCH-GRU models
pass the unconditional coverage test for the 5% tolerance level, but rather fail the conditional coverage
test, implying some clustering of the VaR violations (as might have already been expected from
Figure 7, to some extent). In addition, and to one’s dismay, these models also fail the ES backtest.
On the other hand, the APARCH model with a skewed Student’s t-distribution, preferred in
terms of the 1% VaR prediction, performs well in all three tests. Nevertheless, the model’s
performance for the 5% tolerance level is clearly surpassed by the other specifications.
13
Table 4. Detailed comparison of the best performing models for S&P 500
Metrics / Model E(SSTD)-GRU GJR(N)-GRU GJR(SSTD)-GRU AP(SSTD)
As indicated by Table 5, the best performing model for Bitcoin (in terms of point volatility
forecasts) is the conditionally normal APARCH-GRU structure, with four other hybrid specifications
being close seconds: GARCH-GRU and GJR-GARCH-GRU, with both symmetric and skewed
t-distributions. As a side note, the result may imply that simpler GARCH specifications, constituting
some special cases of APARCH, may require more sophisticated, heavy-tailed conditional
distributions to offset their simpler volatility structure.
Overall, and similar to the case of S&P 500, combining GARCH and GRU models largely
improves volatility forecasts. Enhancing a GARCH model with a GRU network results in a one- or
two-order-of-magnitude drop in MSE, even though in some cases the difference appears either
statistically insignificant (conditionally normal EGARCH vs. EGARCH-GRU) or at least not as
statistically significant as one could expect (conditionally t-distributed EGARCH vs. EGARCH-GRU,
and APARCH vs. APARCH-GRU with a skewed t-distribution). However surprising or aberrant these
results may appear, they remain largely attributable to a single erratically high (compared to the target
GKYZ estimates) volatility forecast obtained from the above-mentioned ‘sheer’ GARCH models,
linked to the COVID-19 pandemic outbreak (see Figure 8). Conceivably, this volatility
over-prediction is due to additional reverse transformations required to calculate the forecast of
conditional standard deviation from the volatility equation defined inherently either for the logarithm
14
of the variance (as in EGARCH; see Eq. 5) or some power transformation thereof (as in APARCH;
see Eq. 6). Ultimately, these discrepancies between the ‘sheer’ and hybrid EGARCH (and APARCH)
volatility forecasts lead to an overly high long–run variance estimate underlying the DM test, which
dwindles the test statistics value, thus increasing the p-value.
Table 5. Comparison of volatility forecasts based on the MSE loss function across all models and
distributions for Bitcoin
Metrics / Model G(N) G(N)-GRU G(STD) G(STD)-GRU G(SSTD) G(SSTD)-GRU
Table 6 presents the number and hit ratios of VaR exceedances. Despite the earlier results
indicating a considerable gain from combining GARCH with GRU models for the sake of volatility
forecasting, the effect does not necessarily translate into superior VaR prediction performance of the
hybrid structures, to one’s dismay. For both of the tolerance levels under consideration, it is a ‘sheer’
15
APARCH model that produces VaR estimates with a hit ratio nearest to the expected one: the
conditionally normal APARCH for the 5% tolerance level, and APARCH with a t-distribution for the
1% tolerance (see Figure 9). Overall, as inferred from Table 6, combining GARCH with GRU models
may lead to either more conservative or more liberal VaR predictions, as compared with the ones from
the corresponding GARCH structures.
Table 7 presents detailed results for a selection of the best models (based on the MSE and
VaR coverage performance). In general, the discrepancy between the models’ performance in terms of
either volatility forecasting or VaR and ES prediction is striking. The conditionally normal APARCH
16
model combined with a GRU network produces by far superior forecasts of the returns’ volatility, yet
fails to yield unanimously satisfactory outcomes for risk prediction. Conversely, it is ‘sheer’
APARCH models with a normal and Student’s t-distribution that yield the best VaR and ES forecasts
for 5% and 1% tolerances, respectively (however poorly forecasting the volatility itself). Such a
divergence of the models’ performance may lead one to a conclusion that the GKYZ volatility
estimates, employed in training the GRU component of the hybrid model structures, are somewhat
deficient for the follow-up task of risk assessment. Conceivably, this may be the case due to not only a
generally high volatility of the Bitcoin returns, but also a strikingly high ‘volatility of volatility’,
numerous and pronounced spikes in the modelled series, interspersing otherwise relatively ‘regular’
returns. (see Figure 4).
According to Table 8, the most accurate volatility forecasts for the returns on gold are
produced by a hybrid EGARCH-GRU model with a Student’s t-distribution (see Figure 10), although
most of the remaining hybrid specifications could be regarded as ‘close seconds’.
Overall, combining the GARCH with GRU models largely enhances the prediction
throughout all the resulting specifications. However, similarly to the two previous two assets, the
‘hybridization’ does not improve the VaR prediction unanimously (see Table 9). Nevertheless, it is
17
still the hybrid models: t-distributed GARCH-GRU, normally distributed EGARCH-GRU, and
GJR-GRACH-GRU with a skewed t-distribution, that perform the best at the 5% tolerance level. On
the contrary, ‘sheer’ GARCH, EGARCH and GJR-GARCH models (with a skewed t-distribution)
prove the best for the 1% VaR prediction.
Table 8. Comparison of volatility forecasts based on the MSE loss function across all models and
distributions for gold
Metrics / Model G(N) G(N)-GRU G(STD) G(STD)-GRU G(SSTD) G(SSTD)-GRU
18
Table 9. Number of VaR exceedances across all models for gold
VaR / Model G(N) G(N)-GRU G(STD) G(STD)-GRU G(SSTD) G(SSTD)-GRU
Table 10 presents a detailed analysis for a selection of the best performing models for gold.
The results are fairly mixed, with indications of a superior specification hinged on the choice of a
particular metrics, thereby precluding an ultimate winner. Nevertheless, they still partially support the
hybrid approach of combining the GARCH with GRU models.
19
Table 10. Detailed comparison of the best performing models for gold
Metrics / Model E(STD)-GRU GJR(SSTD)-GRU GJR(SSTD) E(N)-GRU
4. Conclusions
The main aim of the paper was to develop hybrid GARCH-GRU models for the task of
financial volatility and risk prediction, thereby bridging the most common, ‘classic’ econometric tools
for volatility dynamics (GARCH models) with deep machine learning methods. The approach was
tested on three financial assets displaying distinct volatility dynamics: S&P 500, Bitcoin and gold.
In summary, it can be concluded that no single model specification would do best in all the
situations. Overall, however, for each of the three assets under consideration, it was a hybrid model
that emerged superior for the point volatility forecasting (in terms of MSE): EGARCH-GRU models
for S&500 and gold (under a skewed and a symmetric t-distribution, respectively), while
APARCH-GRU (with a normal distribution) – for Bitcoin. Nonetheless, this general outcome is hardly
a surprise, given that the main task of the GRU network in the hybrid models was to minimise the
MSE loss.
Somewhat to one’s dismay, the gains from the volatility prediction by means of the hybrid
GARCH-GRU structures do not appear to translate unanimously into superior Value at Risk and
Expected Shortfall forecasts. From Tables 4, 7 and 10 it can be inferred that the choice of a winning
specification largely hinges on both the asset at hand as well as tolerance level. Using, for brevity, the
20
models’ acronyms used in the tables, the following models proved the most valid with respect to the
risk assessment at the tolerance levels of 5% and 1%:
The above list indicates clearly that hybridising GARCH with GRU models does not
necessarily yield superior risk forecasts. Moreover, all of the listed specifications differ from the ones
that proved most accurate for volatility forecasting in terms of MSE, mentioned in the previous
paragraph. This, in turn, may actually put into question the very choice of the target function
underlying the GRU components in the hybrid models advanced in this paper (here hinged on the
point volatility forecast accuracy), therefore necessitating the function to be redefined specifically for
the task of VaR and/or ES prediction. This line of research is left for future work.
On the whole, the research findings corroborate the potential and purposefulness of
combining ‘classic’ econometric models for volatility dynamics with deep machine learning
approaches for the purpose of improving, in general, results produced by the former. Nevertheless, the
results presented in the current paper preclude unanimous conclusions as to the empirical advantages
of such ‘hybridisation’, leaving it largely to a particular financial asset and task at hand.
Acknowledgements
The authors acknowledge financial support from subsidies granted to the Krakow University of
Economics.
References
21
4. Bollerslev T. (1986), Generalized Autoregressive Conditional Heteroskedasticity,
Journal of Econometrics, Vol 31 (3): 307–3
5. Bams, D., Blanchard, G., Lehnert, T. (2017). Volatility measures and Value-at-Risk.
International Journal of Forecasting, Vol. 33: 848–863.
6. Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H.,
Bengio, Y. (2014). Learning Phrase Representations using RNN Encoder-Decoder for
Statistical Machine Translation. arXiv:1406.1078 [Cs, Stat].
http://arxiv.org/abs/1406.1078, Accessed 22 August 2023.
7. Chung, J., Gulcehre, C., Cho, K., Bengio, Y. (2014). Empirical Evaluation of Gated
Recurrent Neural Networks on Sequence Modeling. arXiv:1412.3555 [cs.NE],
https://arxiv.org/abs/1412.3555, Accessed 22 August 2023.
8. Christoffersen, P. F. (1998). Evaluating Interval Forecasts. International Economic
Review, Vol: 39(4): 841–862.
9. Christoffersen, P., Hahn, J., Inoue, A. (2001). Testing and Comparing Value-at-Risk
Measures. Journal of Empirical Finance Vol. 8(3): 325-342.
10. Diebold, F. X., Mariano, R. S. (1995). Comparing Predictive Accuracy. Journal of
Business & Economic Statistics, Vol. 13(3): 253–263.
11. Ding Z., Engle R.F., Granger C.W.J. (1993), A long memory property of stock market
return and a new model, Journal of Empirical Finance, Vol. 1(1): 83-106,
12. Doman M., Doman R. (2009). Modelowanie zmienności i ryzyka. Metody ekonometrii
finansowej, Wolters Kluwer, Kraków.
13. Du, Z., Wang, M., & Xu, Z. (2019). On Estimation of Value-at-Risk with Recurrent
Neural Network, 2019 Second International Conference on Artificial Intelligence for
Industries (AI4I), 103–106.
14. Engle R.F. (1982), Autoregressive Conditional Heteroscedasticity with Estimates of
Variance of United Kingdom Inflation, Econometrica, Vol. 50 (4): 987–1008.
15. Fischer, T., & Krauss, C. (2018). Deep learning with long short-term memory networks
for financial market predictions. European Journal of Operational Research, Vol. 270(2),
654–669.
16. Fiszeder P. (2005), Forecasting the volatility of the Polish stock index – WIG20. W:
Forecasting Financial Markets. Theory and Applications. Łódź.
17. Fiszeder P. (2007), Prognozowanie VaR – zastosowanie wielorównaniowych modeli
GARCH, Modelowanie i prognozowanie gospodarki narodowej, Prace i Materiały
Wydziału Zarządzania Uniwersytetu Gdańskiego, 365-376.
18. Fiszeder P. (2009), Modele klasy GARCH w empirycznych badaniach finansowych,
Wydawnictwo Naukowe Uniwersytetu Mikołaja Kopernika, Toruń.
19. Garman, M. B., & Klass, M. J. (1980). On the Estimation of Security Price Volatilities
from Historical Data. The Journal of Business, Vol. 53(1): 67–78.
20. Ghalanos, A. (2022a). Introduction to the rugarch package (Version 1.4-3),
https://cran.r-project.org/web/packages/rugarch/vignettes/Introduction_to_the_rugarch_p
ackage.pdf, Accessed 22 August 2023.
22
21. Ghalanos, A. (2022b). rugarch: Univariate GARCH models,
https://cran.r-project.org/web/packages/rugarch/rugarch.pdf, Accessed 22 August 2023.
22. Glosten L.R., Jagannathan R., Runkle D.E. (1993), Relationship between the expected
value and the volatility of the nominal excess return on stocks, The Journal of Finance,
Vol. 48(5): 1779-1801.
23. Goodfellow I., Bengio Y., Courville A. (2016). Deep Learning, The MIT Press.
24. Harvey, D., Leybourne, S., Newbold, P. (1997). Testing the equality of prediction mean
squared errors. International Journal of forecasting, Vol. 13(2): 281-291
25. Higgins M.L., Bera A.K. (1992), A class of nonlinear ARCH models, International
Economic Review, Vol: 33 (1): 137-158.
26. Hochreiter S., Schmidhuber J. (1997), Long Short-Term Memory. Neural Computation.
Vol. 9(8): 1735–1780.
27. Hopfield J.J. (1982), Neural networks and physical systems with emergent collective
computational abilities, Proceedings of the National Academy of Sciences, Vol. 79(8):
2554–2558.
28. Hu Y., Ni J., Wen L. (2020), A hybrid deep learning approach by integrating
LSTM-ANN networks with GARCH model for copper price volatility prediction,
Physica A: Statistical Mechanics and its Applications, Vol. 557, Article 124907.
29. Jacobs, K. (2017), Python Deep Learning Tutorial: Create A GRU (RNN) In TensorFlow,
https://www.data-blogger.com/2017/08/27/gru-implementation-tensorflow/, Accessed 10
December 2021.
30. Kim, H. Y., Won, C. H. (2018). Forecasting the volatility of stock price index: A hybrid
model integrating LSTM with multiple GARCH-type models. Expert Systems with
Applications, Vol. 103: 25–37.
31. Kingma, D. P., Ba, J. (2017). Adam: A Method for Stochastic Optimization.
ArXiv:1412.6980 [Cs], http://arxiv.org/abs/1412.6980, Accessed 22 August 2023.
32. Kristjanpoller, W., & Hernández, E. (2017). Volatility of main metals forecasted by a
hybrid ANN-GARCH model with regressors. Expert Systems with Applications, Vol. 84:
290–300.
33. Kristjanpoller, W., & Minutolo, M. C. (2018). A hybrid volatility forecasting framework
integrating GARCH, artificial neural network, technical analysis and principal
components analysis. Expert Systems with Applications, Vol. 109: 1–11.
34. Kristjanpoller W., Minutolo M.C. (2016), Forecasting volatility of oil price using an
artificial neural network-GARCH model, Expert Systems with Applications, Vol. 65:
233-241,
35. Kristjanpoller W., Minutolo M.C. (2015), Gold price volatility: A forecasting approach
using the Artificial Neural Network–GARCH model, Expert Systems with Applications,
Vol. 42: 7245-7251.
36. Kupiec, P. (1995). Techniques for Verifying the Accuracy of Risk Measurement Models
(SSRN Scholarly Paper ID 6697). Social Science Research Network,
https://papers.ssrn.com/abstract=6697, Accessed 22 August 2023.
23
37. Liu W.K., & So M.K.P. (2020), A GARCH model with artificial neural networks,
Information, Vol. 11(10): 489.
38. McNeil, A. J., & Frey, R. (2000), Estimation of tail-related risk measures for
heteroscedastic financial time series: an extreme value approach. Journal of Empirical
Finance, Vol. 7(3): 271–300.
39. Małecka, M. (2016), Weryfikacja hipotez w ocenie ryzyka rynkowego, Wydawnictwo
Uniwersytetu Łódzkiego, Łódź.
40. Mincer, J., & Zarnowitz, V., (1969), The Evaluation of Economic Forecasts. In:
Economic Forecasts and Expectations: Analysis of Forecasting Behavior and
Performance, National Bureau of Economic Research, Inc,
https://EconPapers.repec.org/RePEc:nbr:nberch:1214.
41. Nelson D.B. (1991), Conditional heteroscedasticity in asset returns: a new approach,
Econometrica, Vol. 59(2): 347-370.
42. O’Malley, T., Bursztein, E., Long, J., Chollet, F., Jin, H., Invernizzi, L. and others,
(2019), KerasTuner. https://github.com/keras-team/keras-tuner, Accessed 22 August
2023.
43. Piontek K., Papla D. (2005), Wykorzystanie wielorównaniowych modeli AR-GARCH w
pomiarze ryzyka metodą VaR, Prace Naukowe Akademii Ekonomicznej we Wrocławiu, s.
126-138
44. Taylor S. (1986), Modelling Financial Time Series. Wiley.
45. Teräsvirta, T. (2009). An Introduction to Univariate GARCH Models. In: Mikosch, T.,
Kreiß, JP., Davis, R., Andersen, T. (eds) Handbook of Financial Time Series. Springer,
Berlin, Heidelberg.
46. Tsay R.S. (2010), Analysis of Financial Time Series, John Wiley & Sons, Chicago.
47. Williams R.J., Hinton G.E., Rumelhart D.E. (1986), Learning representations by
back-propagating errors, Nature, Vol. 323 (6088): 533–536.
48. Yang, D., & Zhang, Q. (2000), Drift‐Independent Volatility Estimation Based on High,
Low, Open, and Close Prices. The Journal of Business, Vol. 73(3): 477–492.
49. Zakoian J.M. (1994), Threshold heteroscedasticity models. Journal of Economic
Dynamics and Control, Vol. 18(5): 931-955.
24