A Hybrid Deep Learning Approach by Integrating LSTM-ANN Networks
A Hybrid Deep Learning Approach by Integrating LSTM-ANN Networks
PII: S0378-4371(20)30469-6
DOI: https://doi.org/10.1016/j.physa.2020.124907
Reference: PHYSA 124907
Please cite this article as: Y. Hu, J. Ni and L. Wen, A hybrid deep learning approach by integrating
LSTM-ANN networks with GARCH model for copper price volatility prediction, Physica A
(2020), doi: https://doi.org/10.1016/j.physa.2020.124907.
This is a PDF file of an article that has undergone enhancements after acceptance, such as the
addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive
version of record. This version will undergo additional copyediting, typesetting and review before it
is published in its final form, but we are providing this version to give early visibility of the article.
Please note that, during the production process, errors may be discovered which could affect the
content, and all legal disclaimers that apply to the journal pertain.
of
Yan Hu
E-mail: [email protected]
pro
School of Finance, Southwestern University of Finance and Economics
re-
** Jian Ni
(Corresponding author)
E-mail: [email protected]
lP
Liu Wen
Tel.: +86 28 87352835
E-mail: [email protected]
Abstract: Forecasting the copper price volatility is an important yet challenging task. Given
of
the nonlinear and time-varying characteristics of numerous factors affecting the copper price,
we propose a novel hybrid method to forecast copper price volatility. Two important techniques
are synthesized in this method. One is the classic GARCH model which encodes useful
pro
statistical information about the time-varying copper price volatility in a compact form via the
GARCH forecasts. The other is the powerful deep neural network which combines the GARCH
forecasts with both domestic and international market factors to search for better nonlinear
features; it also combines the long short-term memory (LSTM) network with traditional
re-
artificial neural network (ANN) to generate better volatility forecasts. Our method synthesizes
the merits of these two techniques and is especially suitable for the task of copper price
volatility prediction. The empirical results show that the GARCH forecasts can serve as
informative features to significantly increase the predictive power of the neural network model,
lP
and the integration of the LSTM and ANN networks is an effective approach to construct useful
deep neural network structures to boost the prediction performance. Further, we conducted a
series of sensitivity analyses of the neural network architecture to optimize the prediction
rna
results. The results suggest that the choice between LSTM and BLSTM networks for the hybrid
model should consider the forecast horizon, while the ANN configurations should be fine-tuned
1. Introduction
As a key material in various industrial applications, copper is the most actively traded base
metal. However, copper prices are highly volatile and depend on many external factors. Such
inherent high volatility makes the prediction modelling more challenging. In fact, the
application of traditional prediction models (e.g., ARMA and GARCH) has shown their
improving the volatility forecasts of copper prices is a worthy endeavor since volatility is an
1
Journal Pre-proof
important proxy for market risk. In particular, Kristjanpoller and Minutolo (2015) and García
and Kristjanpoller (2019) have adopted volatility for measuring the risk of commodity market.
Ederington and Lee (1993) and Fuertes et al. (2009) have also applied volatility to measure the
risk in the stock market. Other applications of volatility as a measure of market risk can be
of
found in Kristjanpoller et al. (2014), Kristjanpoller and Hernández (2017), Kim and Won (2018),
and references therein. Therefore, the ability to predict the volatility of copper prices with
greater precision is critical for market participants. How to accurately predict volatility is still
pro
an unsolved issue. To address this issue, a hybrid volatility prediction model is developed in
this study by synthesizing the state-of-art deep learning technique with the classic GARCH
GARCH model proposed by Bollerslev (1986) provides a way to model a change in variance
in a time series. Later on, different extensions on the basic ARCH and GARCH models have
been developed, including EGARCH (Nelson, 1991), APGARCH (Ding et al., 1993),
lP
SWARCH (Hamilton and Susmel, 1994), FIAPGARCH (Tse, 1998), HYGARCH (Davidson,
2004), and more. However, due to the existence of complex nonlinear correlation structure
among variables and larger mass of data sets, the prediction results of these GARCH-type
rna
The recent development of the deep learning methodology originates from the artificial
neural network (ANN) models, which are designed to mimic the knowledge-acquisition and
organizational skills of the human brain (Bergerson and Wunsch, 1991; Sharda and Patil, 1992).
Jou
Many comparative studies between ANNs and traditional prediction models (e.g., ARMA and
GARCH) have been conducted with regard to their performance on predictions. High accuracy
of ANNs on volatility prediction for various commodities have been demonstrated, see Hamid
and Iqbal (2004), HParisi et al. (2008), Azadeh et al. (2012) and Yazdani-Chamzini et al. (2012).
Furthermore, hybrid ANN and GARCH-type models are usually found to have advantages
in comparison with ANNs or time series models. Bildirici and Ersin (2013) used a neural
network augmented GARCH to predict oil prices. They concluded that neural networks models
were promising. Kristjanpoller et al. (2014) applied a hybrid ANN-GARCH model to forecast
2
Journal Pre-proof
volatility in three Latin American indexes from Brazil, Chile, and Mexico. They demonstrated
that neural network models can improve the predictions from GARCH models. Similar studies
about hybrid models with ANNs can be seen as Kristjanpoller and Minutolo (2015), Cui et al.
(2015), Lu et al. (2016), Lahmiri (2017) and Kristjanpoller and Hernández (2017).
of
Extensions of ANNs also improve the accuracy of copper price prediction. A recurrent
neural network (RNN) is one of the extensions of ANNs. In the RNN, connections between
nodes form a directed graph along a temporal sequence. This allows the neural network to
pro
exhibit temporal dynamic behavior. Unlike feedforward neural networks, RNNs can use their
internal state (memory) to process sequences of inputs. This makes them applicable to tasks
recognition (Li and Wu, 2015). Various forms of RNNs have performed well for prediction as
re-
related to time series data (Herman and Schrauwen, 2013), such as the long short-term memory
(LSTM) and bidirectional LSTM (BLSTM). The LSTM network, which was firstly introduced
dealing with the time series data, see Maknickienė and Maknickas (2012), Nelson et al. (2017)
lP
and Kim et al. (2018), Wu et al (2019). BLSTM is proposed to access both the past and future
information by combining a forward hidden layer and a backward hidden layer (Liu and Guo,
2019). Sardelich and Manandhar (2018) used a model combining the BLSTM and stacked
rna
LSTM neural networks to predict daily stock volatility. The proposed model outperformed the
well-known GARCH (1,1) model in many sectors (e.g., financial, health care, etc.). Eapen et
al. (2019) found that CNN layers combining with BLSTM showed a better prediction
deep learning methodology, opens up possibilities for us to develop new prediction models that
can better forecast the volatility of copper prices. To enhance the prediction power, the best
forecasts of the GARCH model are used as inputs for the hybrid models that combine GARCH
and neural networks. Six neural network models (i.e., ANN, LSTM-ANN, BLSTM-ANN,
this study. A comparative study on the proposed models will be conducted to find the best
prediction models. In section 3, we conduct an empirical analysis to assess the prediction results.
of
2. Methodology
Return estimation
pro
The closing price of copper has a very crucial role in the metal market. Moreover, investors are
more concerned with return rather than the price. Therefore, the volatility of returns is
forecasted in this paper. The returns of copper on day t are computed by using the following
equation: re-
Rt (log pt log pt 1 ) 100% (1)
where Rt is the return of the time series on day t , Pt and Pt 1 are the closing prices of
t t 1 , respectively.
lP
the financial time series on day and
Volatility estimation
Volatility plays a very important role in the metal market. As a classic metric of volatility,
variance reflects how much copper has varied during a certain period. In this paper, we compare
rna
the predicted and realized volatilities (i.e., variances) to assess the performance of various
models. Realized volatility ( RVt ) on day t during T trading days is calculated by the
following equation:
j t T 1
1
Jou
RVt ( R j R )2
T j t
(2)
where T is the number of trading days after day t , R j is the return of copper on day j ,
and R is the average return of copper during T trading days. RVt is the actual volatility.
In this study, we will consider three cases, i.e., T 10,15, 20 . For T 10 , RVt is the realized
volatility over the next 10 trading days (two weeks) from day t . Similarly, RVt values for
4
Journal Pre-proof
T 15 and T 20 then correspond to the realized volatilities over the next 15 trading days
(three weeks) and 20 trading days (four weeks) from day t , respectively.
Data standardization
Standardization is very important before putting data into the ANN and LSTM networks. It can
of
enhance the training efficiency of the prediction models. In this paper, we choose the MinâMax
method to normalize the raw input data. Following this method, the input variables are
pro
calculated as follows:
xi ,t xi ,min
xi ,t (3)
xi ,max xi ,min
Where xi ,t [0,1] is the standardized value of the i th feature on day t , xi ,t is the actual
re-
value of the i th feature on day t , xi ,min and xi ,max is the minimum and maximum value of
of prediction errors are used, including the mean squared error (MSE), the mean absolute error
(MAE), the mean absolute percentage error (MAPE), and the root mean squared error (RMSE).
rna
(4)
1 N
MAE PVt RVt
N i 1 (5)
Jou
N
1
MAPE
N
1 PV / RV
i 1
t t
(6)
N
1
RMSE
N
( PV RV )
i 1
t t
2
(7)
where PVt and RVt are the predicted and realized volatilities of copper return series,
respectively; N is the number of predictions. A lower value of the measure indicates a better
5
Journal Pre-proof
prediction.
We note that the four measures of prediction errors (i.e., MSE, MAE, RMSE, and MAPE)
introduced above are widely-applied in the literature. For example, Fuertes et al. (2009) and
Kim and Won (2018) have applied MSE, MAE, and MAPE in analyzing stock volatility
of
forecasts. Kristjanpoller et al. (2014) has utilized the four measures (MSE, MAE, RMSE, and
MAPE) to examine the volatility prediction results of hybrid neural network models for three
Latin-American stock exchange indexes. Other applications of these four measures in analyzing
pro
prediction models of commodity market include Bentes (2015), Zhang et al. (2015), and
Kristjanpoller and Hernández (2017). Besides, these four measures have also been adopted in
studying prediction models in the foreign exchange market (Sermpinis et al., 2012; Petropoulos
et al., 2017; Henriquez and Kristjanpoller 2019). To be in line with the literature, all the four
re-
measures of prediction errors (i.e., MSE, MAE, RMSE, and MAPE) are utilized in this study
GARCH model
lP
GARCH model describes the variance of the current error term as a function of the error terms
of the previous periods. The specification for the GARCH(p, q) is defined as:
Rt at (8)
rna
at t t (9)
p q
t2 0 i at2i j t2 j (10)
i 1 j 1
p q
variance of GARCH(p, q) is always positive. i j 1 which ensures that the
i 1 j 1
variance is finite.
ANN model
ANN is a network of artificial neurons, which can receive inputs, change their internal states
according to the inputs, and then compute outputs based on the inputs and internal states. These
artificial neurons have weights that can be modified by a process called learning. Figure 1
6
Journal Pre-proof
shows a neural network model that sequentially calculates the value of the output layer from
the input layer by using the output of the previous layer as the input for the current layer.
of
pro
Input Layer Hidden Layer Output Layer
y f ( f ( xi wij b j ) w jk bk ) (11)
re-
Equation (11) shows that the input variable xi is multiplied by weight wij and summed
with bias b j , f (.) is the activation function, the result of this layer is the inputs of the next
LSTM is a classic type of RNNs which can deal with the exploding and vanishing gradient
problems. It is normally augmented by recurrent gates called forget gates. Different from
rna
previous neural networks such as ANN, LSTM can learn very deep learning tasks that require
a long-time memory of events. LSTM can also handle inputs or signals that have both low and
high-frequency components. For more technical details, please see Hochreiter and
Schmidhuber (1997) and Gers et al. (2000). The structure of an LSTM unit is demonstrated in
Figure 2.
Jou
ht-1 xt ht-1 xt
Ft Forget Gate
ht-1 xt
7
Journal Pre-proof
ft (W fx xt W fh ht 1 b f )
(12)
of
ot (Wox xt Woh ht 1 bo ) (14)
pro
ht ot tanh(ct ) (16)
As Figure 2 illustrates, the LSTM contains a memory cell ( ct ) and three gates: an input gate
( it ), a forget gate ( f t ), and an output gate ( ot ). In equations (12)-(16), the initial values are
re-
c0 0 and h0 0 , the operator denotes the element-wise product (i.e., Hadamard
product). At time t, xt represents the input vector, ht is the hidden state vector which is
lP
also known as the output vector of the LSTM unit. W and b are weight matrices and bias
parameters which need to be learned during training. (.) is the sigmoid function and
The design of BLSTM is to access both the past and future information by combining a forward
hidden layer and a backward hidden layer (Liu and Guo, 2019). As shown in figure 3, BLSTM
can access long-range information in two opposite directions: one processing the sequential
Jou
data from top to bottom, the other one from bottom to top. The equations for a BLSTM unit are
given in (17)-(19):
ht f (Wxh xt Whh ht 1 bh )
(17)
ht f (Wxh xt Whh ht 1 bh )
(18)
where xt represents the input vector, ht is the hidden state vector. W and b are
8
Journal Pre-proof
weight matrices and bias parameters which need to be learned during training. Figure 3 shows
xt 1 ht 1 ht 1 yt 1
of
xt ht ht yt
xt 1 ht 1 yt 1
pro
ht 1
with ANN model. A simple illustration of the LSTM-ANN (BLSTM-ANN) structure is shown
in Figure 4.
Input LSTM block ANN block Output Input BLSTM block ANN block Output
lP
We now proceed to formally introduce the six prediction models of copper price volatility that
will be tested and analyzed in detail. By integrating the aforementioned LSTM-ANN and
BLSTM-ANN networks with the GARCH model, we can construct two hybrid prediction
neural network model. The effectiveness of incorporating GARCH forecasts is then assessed
by benchmarking these two hybrid models with models of the same network structure but
without using the GARCH forecasts (i.e., LSTM-ANN and BLSTM-ANN). On the other hand,
to test the effectiveness of introducing the memory units (LSTM or BLSTM) into the hybrid
model, the memory-free ANN structure is also introduced for comparison purpose, resulting in
two additional benchmark models (ANN and GARCH-ANN). Thus, we have six different types
and test their prediction performances from different angles to develop insights on the
of
3. Empirical analysis
We collect data from January 1, 2008 to December 31, 2018, totaling eleven years. The data
pro
source is the WIND database. A group of explanatory variables are utilized, which could
potentially improve the volatility forecasts. The detailed description of the explanatory
variables is given in Table 1. The variables are related to the main metal prices, the main stock
market indices, currency, the main metal futures, and interest rate.
re-
Table 1: Variables and descriptions.
Metal markets CP Copper spot price of Yangtze River nonferrous metals, China
10
Journal Pre-proof
of
Interest rate SHI Shanghai Interbank Offered Rate
We use the daily closing prices of these variables to compute their daily returns. Missing
values are filled with the last valid values of previous trading days. Then, each dataset is
pro
sequentially divided into two sets, i.e., the training set which contains 70% of the data, and the
testing set which contains the remaining 30% data. Table 2 presents the common statistics for
Note: The critical value at 5% for Jarque–Bera test is 5.99. ADF is the stationarity test, and the critical
value at 5% is −2.86. The ARCH(12) statistic corresponds to the ARCH-LM test with 12 lags, where the
probability distribution is 2 (12) and the critical value at the 5% level is 21.03.
11
Journal Pre-proof
From Table 2, it is shown that the mean copper returns (CP) is close to 0 (-0.0075%), and
its standard deviation is close to 1%. The ADF test is significant at 5%, which indicates that the
time series are stationary. However, the Jarque–Bera (normality) test shows that the copper
returns are not normally distributed. Besides, ARCH (12) is the heteroskedasticity test used to
of
identify the presence of ARCH effects for copper returns, and the null hypothesis is rejected,
indicating the existence of ARCH effects. Moreover, Table 2 shows that other explanatory
variables also exhibit some complex statistical features, such as non-normality and ARCH
pro
effects. These complex statistical features imply that it is difficult to deal with these time series
combination of both conventional statistical methods (e.g., GARCH) and the state-of-art deep
learning techniques (e.g., LSTM) to improve the copper price volatility prediction.
re-
Next, we depict the copper price, as well as its variance of three weeks from 2008 to 2018
in Figure 5.
lP
rna
Figure 5: Time series plots for daily closing price and the variance of three weeks.
From Figure 5, it can be observed that, in the second half of 2008, the price of copper
suddenly dived, coinciding with the subprime crisis. This unexpected crash greatly increased
Jou
the subsequent market volatility in the second half of 2008 and the beginning of 2009.
The lower the correlation between the explanatory variables indicating more information
can be taken by the models to make a better fit. For this reason, the study on explanatory
variables is conducted with the correlation analysis and principal component analysis. Both
analyses are investigated in terms of returns. The heat map of return correlation coefficient
12
Journal Pre-proof
of
pro
Figure 6: Heat map of return correlation coefficient matrix.
From Figure 6, we can observe that the correlation coefficients disperse. It means that we
need some methods to extract useful information. Also, from the principal component analysis
re-
(shown in Appendix) we can find that with 19 eigenvectors, 98.85% of explained variance ratio
is achieved. Furthermore, 99.45% is achieved with 20 eigenvectors. These results also show
In this section, six different types of hybrid prediction models are empirically tested and
ANN, and GARCH-BLSTM-ANN. The empirical study aims to unravel four main questions:
1) Is the inclusion of GARCH forecasts as network inputs an effective way to enhance the
prediction performance? 2) Can the integration of RNNs with GARCH forecasts significantly
reduce the prediction errors? 3) Which neural network architecture, particularly the choice
Jou
between LSTM and BLSTM, works better for the task of copper price volatility prediction? 4)
How to fine-tune the ANN configuration (e.g., the number of layers and neurons) in the hybrid
To this end, different choices of the number of hidden layers and neurons in each layer are
tested and compared. Each ANN configuration is denoted by ANN (l,n), where l corresponds
to the number of hidden layers, and n corresponds to the number of neurons in each layer. We
set the hidden layers and neurons of ANN according to the results of Kristjanpoller and
13
Journal Pre-proof
Hernández (2017); thus, l 4,5, 6 and n 10, 20 . As for the GARCH model, we have
tested the model performances of various parameters values ( p and q , see equation (10))
using the copper price data, and found that in general the best choice is p 1 and q 1 . Thus,
of
In the first set of experiments, we use one year trading data (i.e., one-year input window,
including 252 trading days) to forecast the volatility in the next two weeks (two-week ahead
pro
forecasts, or T 10 in equation (2)). Four measures of prediction errors (i.e., MSE, MAPE,
MAE, and RMSE) are utilized to evaluate the prediction performances. To facilitate the
prediction error analysis, all returns are scaled up by a factor of 100. The results are presented
in Table 3. re-
Table 3: Prediction performances of two-week ahead forecasts for different hybrid neural
network configurations. (Var% is the percentage variation of the error measures. S1, S2 and S3
Panel A
14
Journal Pre-proof
of
3 (6,20) 3.56E-03 -22.34% (4,20) 4.2919 -19.48%
pro
3 (5,20) 3.63E-03 -24.74% (5,10) 4.6163 -28.51%
Panel B
15
Journal Pre-proof
From Table 3, we have the following key observations. First, although the network models
can outperform the GARCH model, incorporating GARCH forecasts as inputs can enhance the
prediction power of the network models. For example, the MSE of GARCH-LSTM-ANN(6,20)
of
is 3.56E-03, which is smaller than LSTM-ANN(6,20) with a MSE of 4.35E-03. The MAPE of
the MAPE is 4.4011. Similar results can be found in the cases of MAE and RMSE in spite of a
pro
few exceptions. Second, both LSTM and BLSTM can improve the prediction performance of
models outperform the GARCH-ANN model, and both LSTM-ANN and BLSTM-ANN
models demonstrate better performances than the ANN model, with only few exceptions.
re-
Next, GARCH-LSTM-ANN architecture is better than the other five network architectures.
The MSE, MAPE, MAE, and RMSE of GARCH-LSTM-ANN models are the lowest in general.
There is only one exception. The MAE of LSTM-ANN(6,10) and GARCH-LSTM-ANN are
almost the same. Then, let us focus on GARCH-LSTM-ANN models. In terms of MSE, the
lP
GARCH-LSTM-ANN architecture with the 6-layer and 20-neuron in ANN configuration is the
rna
best prediction model in terms of MAE. These results suggest that the ANN configuration
We next investigate the case of three-week ahead forecasts ( T 15 in equation (2)). The
Table 4: Prediction performances of three-week ahead forecasts for different hybrid neural
network configurations. (Var% is the percentage variation of the error measures. S1, S2 and S3
Panel A
16
Journal Pre-proof
of
S2 1 (6,10) 3.81E-03 -16.87% (5,20) 1.7711 -77.73%
pro
S3 1 (6,10) 3.87E-03 -18.71% (6,10) 1.8797 -88.63%
Panel B
17
Journal Pre-proof
of
2 (6,20) 4.97E-02 -76.24% (6,20) 6.82E-02 -19.44%
pro
2 (5,10) 3.90E-02 -38.30% (5,10) 6.23E-02 -9.11%
2 (6,10) 2.92E-02
re- -3.55% (5,10) 5.88E-02 -2.98%
From Table 4, we have the following key observations. First, we can find that the GARCH
model underperforms the network models and incorporating GARCH forecasts as inputs can
still enhance the prediction power of the network models in general. This can be illustrated by
lP
ANN models. As for the comparison between BLSTM-ANN and GARCH-BLSTM-ANN, the
rna
BLSTM-ANN models. Similar results can be found for other error measures (MAPE, MAE
and RMSE) and the comparison between ANN and GARCH-ANN models. Besides, GARCH-
BLSTM-ANN architecture appears to be better than the other five network architectures, since
Jou
the MSE, MAPE, MAE, and RMSE of GARCH-BLSTM-ANN models are always the lowest.
We next examine the best network structure. It is clear that both LSTM-ANN and BLSTM-
ANN models demonstrate better performances than ANN models. Thus, the incorporation of
LSTM or BLSTM is effective. As for the ANN configurations (hidden layers and neurons), it
still appears that the best configuration should be dependent on the choice of error measures.
After the investigation of two-week and three-week ahead forecasts, we then turn to four-
week ahead forecasts ( T 20 in equation (2)) to analyze the robustness of the results. The
Table 5: Prediction performances of four-week ahead forecasts for different hybrid neural
network configurations. (Note: Var% is the percentage variation of the error measures. S1, S2
of
Panel A
pro
GARCH 1.35E-02 -258.09% 4.8537 -639.89%
3 (4,10) 4.92E-03
re- -30.50% (4,10) 2.2993 -250.50%
Panel B
19
Journal Pre-proof
of
S2 1 (4,10) 3.77E-02 -30.45% (4,10) 6.36E-02 -3.58%
pro
S3 1 (4,10) 3.93E-02 -35.99% (5,20) 6.39E-02 -4.07%
From Table 5, we have the following key observations. First, we can find that
incorporating GARCH forecasts as inputs can enhance the prediction power of the network
smaller than that of BLSTM-ANN(5,20), of which the MSE is 4.08E-03. Besides, it is clear
that both LSTM and BLSTM can improve the prediction performance of ANN, which is
consistent with former observations for the cases of the two-week and three-week forecasts.
Next, GARCH-LSTM-ANN architecture still presents the best predictive power than the
other five network architectures in general. For example, the MAPE of GARCH-LSTM-
ANN(6,10) is 0.9237; while this is the worst among all GARCH-LSTM-ANN model
configurations, it is still better than other prediction models. Then, let us examine the influence
of network configuration to optimize the prediction results. We find that when prediction errors
20
Journal Pre-proof
are measured in terms of MSE, MAPE, MAE, and RMSE, the best prediction models are
neurons) of the prediction model are a bit different for different error measures.
of
To sum up, we can find the following conclusions from Tables 3, 4 and 5. First,
incorporating GARCH forecasts as inputs can enhance the prediction power of the network
models in general. Second, the integration of memory networks (i.e., LSTM and BLSTM) with
pro
the classical ANN can improve the prediction power of the entire neural network. Third, the
best model architecture for volatility prediction is GARCH-LSTM-ANN for two-week and
four-week ahead forecasts. However, for three-week ahead forecasts, the best architecture is
Further, the predicted and realized copper price volatilities in both the training and testing
sets are shown in the following Figure 7. The cases of two-week, three-week, and four-week
ahead forecasts are all considered. Figure 7 shows that the deviation between the predicted and
lP
realized volatilities will generally become larger as the forecast horizon grows from two weeks
to four weeks. This is consistent with the intuition that it should be easier to make predictions
about the nearer future. Moreover, from Figure 7, it seems that the forecasting accuracy is not
rna
dependent on volatility level. For instance, for the two-week and three-week ahead forecasts in
the testing set, the realized volatilities around November 2016 are extremely high, whereas the
realized volatilities around August 2017 are relatively low. However, there are significantly
large prediction errors around both November 2016 and August 2017. Thus, the size of the
Jou
prediction error is not necessarily dependent on the level of the realized volatility.
21
Journal Pre-proof
of
pro
re-
lP
rna
4. Conclusions
We have investigated the integration of deep learning methodology and GARCH model to
improve the prediction of copper price volatility. Facing the long memory phenomenon in time
series data, recurrent neural networks (RNNs) are inherently suitable to distill information from
sequences of inputs. Thus, the conventional ANN is combined with two RNNs (LSTM and
BLSTM) to generate hybrid neural networks. Besides, GARCH models are trained and their
best forecasts are used as extra inputs to augment the training data for the hybrid neural
networks. As a result, six different types of hybrid neural network models (i.e., ANN, LSTM-
22
Journal Pre-proof
are developed and tested. We further conduct a series of sensitivity analysis of the ANN
configuration (i.e., hidden layers and neurons) for each neural network architecture to optimize
of
Through the empirical analyses, we have several major findings that can contribute to the
literature. First, we find the GARCH forecasts can serve as informative features to substantially
boost the volatility prediction, which complements the results of Kristjanpoller and Hernández
pro
(2017). We also find that incorporating RNNs (LSTM and BLSTM) into the hybrid GARCH–
ANN network can further improve the volatility prediction performance; such a finding
highlights the efficacy of deep learning methodology and advances the results of Kristjanpoller
and Hernández (2017). Besides, we have conducted a variety of experiments to find the best
re-
architecture of the hybrid neural network. In particular, we find that the choice between LSTM
and BLSTM networks should be dependent on the forecast horizon. It appears that BLSTM
works better for three-week ahead volatility forecasts while LSTM works better for other cases.
Further, the empirical results suggest that the ANN configuration in the hybrid model should
lP
be optimized depending on the choice of the measure of prediction errors. From these findings,
through neatly integrating GARCH forecasts with ANN, LSTM, and BLSTM networks. This
rna
proposed hybrid approach, we hope, will motivate further investigation into the exploration and
exploitation of using deep learning methodology in solving time series forecasting problems.
Appendix
Jou
Table A.1: Principal component analysis results. (Note: CuP represents cumulative proportion
of explained variance)
Panel A
Variable PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
AP 0.21 -0.24 -0.14 -0.10 0.57 0.05 0.09 0.14 -0.13 0.08
CP 0.27 -0.27 -0.05 -0.23 -0.06 0.08 0.10 -0.20 -0.05 0.16
CSI 0.15 -0.01 -0.12 -0.05 -0.40 -0.04 -0.28 0.73 0.14 0.18
23
Journal Pre-proof
DJIA 0.14 0.32 -0.01 -0.47 0.00 0.17 0.17 0.08 0.09 -0.39
EUR 0.13 -0.13 0.48 -0.20 0.01 -0.04 -0.11 0.01 0.19 0.34
FTSE 0.20 0.32 -0.04 -0.40 -0.03 0.07 0.07 0.04 0.15 -0.16
GP 0.09 -0.13 0.55 -0.08 -0.07 0.07 0.31 0.00 0.17 0.24
of
HAP 0.25 -0.16 -0.14 0.07 0.47 0.03 -0.02 0.37 0.02 -0.05
HCP 0.29 -0.23 -0.06 0.01 -0.17 0.07 0.01 0.04 0.05 0.01
HZP 0.29 -0.21 -0.07 0.11 -0.24 0.06 0.04 0.05 -0.20 -0.21
pro
LAP 0.26 0.02 0.09 0.23 0.24 -0.18 -0.12 -0.11 0.48 -0.23
LCP 0.32 0.00 0.04 0.09 -0.08 -0.04 -0.05 -0.17 0.27 -0.10
LMAP 0.22 0.34 0.09 0.23 0.20 -0.10 -0.06 -0.03 -0.05 0.22
LMCP 0.27 0.33 0.00 0.10 re- -0.02 0.01 0.01 -0.04 -0.17 0.24
LMZP 0.25 0.31 0.02 0.26 -0.10 -0.03 -0.01 -0.02 -0.35 0.16
LZP 0.30 -0.01 0.03 0.26 -0.20 -0.07 -0.05 -0.16 0.06 -0.31
OIL 0.17 0.30 0.01 -0.17 0.09 0.13 0.10 0.00 -0.15 0.19
SHI -0.01 0.00 0.11 0.07 0.07 0.81 -0.54 -0.13 0.01 -0.02
lP
USD -0.04 0.04 -0.36 0.31 -0.09 0.42 0.55 0.01 0.40 0.26
YEN -0.07 0.02 0.50 0.28 0.04 0.20 0.33 0.37 -0.22 -0.37
ZP 0.25 -0.30 -0.06 -0.11 -0.14 0.08 0.15 -0.18 -0.35 -0.03
rna
CuP (%) 30.39 42.42 49.41 54.77 59.76 64.62 69.16 73.39 77.12 80.67
Panel B
Variable PC11 PC12 PC13 PC14 PC15 PC16 PC17 PC18 PC19 PC20 PC21
AP 0.03 -0.14 0.05 0.19 0.28 -0.20 -0.08 0.29 0.46 0.05 -0.04
Jou
CP -0.12 -0.05 -0.32 0.31 -0.02 0.01 0.36 -0.19 -0.25 0.42 -0.30
CSI -0.09 -0.07 0.10 0.31 0.06 -0.07 -0.06 0.01 -0.02 0.01 -0.01
DJIA 0.17 -0.10 -0.01 0.05 -0.39 -0.47 -0.02 0.00 0.00 0.01 0.05
EUR 0.59 0.40 -0.08 -0.06 0.01 -0.05 -0.03 0.01 0.06 -0.02 0.00
FTSE 0.13 -0.16 0.01 -0.11 0.43 0.61 0.11 0.02 0.05 -0.04 -0.01
GP -0.33 -0.44 0.37 -0.19 -0.01 -0.07 -0.01 0.00 -0.02 -0.01 -0.01
HAP 0.04 0.03 0.05 -0.43 -0.05 0.01 0.07 -0.41 -0.41 -0.07 0.04
24
Journal Pre-proof
HCP -0.21 0.05 -0.39 -0.26 -0.28 0.08 0.34 0.24 0.29 -0.44 0.14
HZP 0.14 0.08 0.23 -0.31 -0.28 0.24 -0.28 0.16 0.19 0.47 -0.20
LAP -0.06 0.08 0.16 0.27 -0.16 0.13 0.07 0.46 -0.31 0.02 -0.04
LCP -0.21 0.05 -0.32 -0.06 0.26 -0.15 -0.41 -0.21 0.08 0.25 0.48
of
LMAP 0.07 -0.22 -0.02 0.28 -0.44 0.28 -0.03 -0.39 0.33 -0.01 0.04
LMCP -0.03 -0.11 -0.31 -0.14 0.08 -0.12 -0.42 0.22 -0.25 -0.23 -0.48
LMZP 0.23 -0.14 0.09 -0.11 0.15 -0.20 0.43 0.25 -0.15 0.22 0.39
pro
LZP 0.08 0.08 0.26 0.04 0.30 -0.28 0.22 -0.34 0.23 -0.25 -0.37
OIL -0.47 0.67 0.29 0.03 -0.01 0.01 0.04 -0.01 0.05 0.02 0.03
SHI -0.02 -0.09 0.05 0.02 0.01 0.01 0.00 0.01 -0.01 0.00 0.00
YEN -0.02 0.14 -0.31 0.22 0.11 0.12 0.03 0.00 -0.03 0.03 -0.01
ZP 0.15 -0.02 0.22 0.36 -0.02 0.15 -0.27 -0.04 -0.28 -0.41 0.31
CuP (%) 83.88 86.87 89.58 91.99 93.95 95.71 96.94 97.92 98.85 99.45 100
lP
References
Azadeh, A., Moghaddam, M., Khakzad, M., & Ebrahimipour, V. (2012). A flexible neural
Bentes, S.R. (2015). Forecasting volatility in gold returns under the GARCH, IGARCH and
Bergerson, K., & Wunsch, D. C. (1991). A commodity trading model based on a neural
Bildirici, M., & Ersin, Ö. Ö. (2013). Forecasting oil prices: Smooth transition and neural
Cui, L., Huang, K., & Cai, H. J. (2015). Application of a TGARCH-wavelet neural network to
arbitrage trading in the metal futures market in China. Quantitative Finance, 15(2), 371–
384.
of
models, and a new model. Journal of Business & Economic Statistics, 1, 16–29.
Ding, Z., Engle, R., & Granger, C. (1993). A long memory property of stock market returns and
pro
Eapen, J., Bein, D., & Verma, A. (2019). Novel deep learning model with CNN and bi-
directional LSTM for improved stock market index prediction. IEEE 9th Annual
Ederington, L. H., & Lee, J. H. (1993). How Markets Process Information: News Releases and
re-
Volatility. Journal of Finance, 48(4), 1161-1191.
Fuertes, A. M., Izzeldin, M., & Kalotychou, E. (2009). On forecasting daily stock volatility: the
25(2), 259-281.
lP
Garcí
a, D., & Kristjanpoller, W. (2019). An adaptive forecasting approach for copper price
volatility through hybrid and non-hybrid models. Applied Soft Computing, 74, 466-478.
Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., & Schmidhuber, J. (2009). A
rna
Hamid, S. A., & Iqbal, Z. (2004). Using neural networks for forecasting volatility of S&P 500
Neural Network model for forecasting exchange rate variation. Applied Soft Computing.
Herman, M., & Schrauwen, B. (2013). Training and analyzing deep recurrent neural networks.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8),
1735–1780.
26
Journal Pre-proof
Kim, H. Y. & Won C. H. (2018). Forecasting the volatility of stock price index: A hybrid model
integrating LSTM with multiple GARCH-type models. Expert Systems with Applications,
103, 25-37.
Kristjanpoller, W., Fadic, A., & Minutolo, M. C. (2014). Volatility forecast using hybrid neural
of
network models. Expert Systems with Applications, 41(5), 2437-2442.
Kristjanpoller, W., & Hernández P, E. (2017). Volatility of main metals forecasted by a hybrid
ANN-GARCH model with regressors. Expert Systems with Applications 84, 290-300.
pro
Kristjanpoller, W., & Minutolo, M. (2015). Gold price volatility: A forecasting approach using
the artificial neural network–GARCH model. Expert System with Applications, 42(5),
7245-7251.
Lahmiri, S. (2017). Modeling and predicting historical volatility in exchange rate markets.
re-
Physica A: Statistical Mechanics and its Applications, 471, 387-395.
Li, X. G. & Wu, X. H. (2015). Constructing long short-term memory based deep recurrent
neural networks for large vocabulary speech recognition. 2015 IEEE International
Liu, G., & Guo, J. (2019). Bidirectional LSTM with attention mechanism and convolutional
Lu, X., Que, D., & Cao, G. (2016). Volatility forecast based on the hybrid artificial neural
rna
Maknickienė, N., & Maknickas, A. (2012). Application of neural network for forecasting of
exchange rates and forex trading. Proceedings of the 7th international scientific
Nelson, D. M. Q., Pereira, A. C. M., & Oliveira R. A. d. (2017). Stock market's price movement
networks, 1419-1426.
Parisi, A., Parisi, F., & Díaz, D. (2008). Forecasting gold price changes: Rolling and recursive
Petropoulos, A., Chatzis, S. P., Siakoulis, V., & Vlachogiannakis, N. (2017). A stacked
27
Journal Pre-proof
generalization system for automated FOREX portfolio trading. Expert Systems with
Sardelich, M. & Manandhar, S. (2018). Multimodal deep learning for short-term stock volatility
prediction. arXiv:1812.10479.
of
Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE
Sermpinis, G., Dunis, C. L., Laws, J., & Stasinakis, C. (2012). Forecasting and trading the
pro
EUR/USD exchange rate with stochastic Neural Network combination and time-varying
Sharda, R., & Patil, R. B. (1992). A connectionist approach to time series prediction: an
Wu, Y., X., Wu, Q., -B., & Zhu, J., Q. (2019). Improved EEMD-based crude oil price
forecasting using LSTM networks. Physica A: Statistical Mechanics and its Applications,
lP
516, 114-124.
ˇ ˙
Yazdani-Chamzini, A., Yakhchali, S. H., Volungevic iene , D., & Zavadskas, E. K. (2012).
Forecasting gold price changes by using adaptive network fuzzy inference system. Journal
rna
Zhang, J., Zhang, Y., & Zhang, L. (2015). A novel hybrid method for crude oil price forecasting.
28
Journal Pre-proof
Highlights:
of
1. We develop a novel hybrid deep learning method to improve forecasts of copper price volatility
2. The hybrid method combines GARCH with neural network models (ANN and LSTM)
pro
3. The empirical results confirm the effectiveness of the proposed method for volatility forecasts
4. The choices between various network configurations are examined to optimize the forecasts
re-
lP
rna
Jou
Journal Pre-proof
Yan Hu: Data curation, Software, Formal analysis, Visualization, Writing - Original draft
of
preparation. Jian Ni: Conceptualization, Methodology, Supervision, Resources, Writing -
Reviewing and Editing. Liu Wen: Software, Validation.
pro
re-
lP
rna
Jou
Journal Pre-proof
Declaration of interests
☒ The authors declare that they have no known competing financial interests or personal relationships
that could have appeared to influence the work reported in this paper.
of
☐The authors declare the following financial interests/personal relationships which may be considered
as potential competing interests:
pro
re-
lP
rna
Jou