Financial Time Series Forecasting Using Deep Learning Network

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/327311623

Financial Time Series Forecasting Using Deep Learning Network

Chapter  in  Communications in Computer and Information Science · March 2018


DOI: 10.1007/978-981-13-2035-4_3

CITATIONS READS

16 745

4 authors, including:

Preeti Patarwal Rajni Bala


University of Delhi University of Delhi
5 PUBLICATIONS   30 CITATIONS    20 PUBLICATIONS   163 CITATIONS   

SEE PROFILE SEE PROFILE

Ram PAL Singh


University of Delhi
14 PUBLICATIONS   146 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Semi-blind Image Watermarking View project

OS-ELM View project

All content following this page was uploaded by Rajni Bala on 19 March 2021.

The user has requested enhancement of the downloaded file.


Financial Time Series Forecasting
Using Deep Learning Network

Preeti(B) , Ankita Dagar, Rajni Bala, and Ram Pal Singh

Department of Computer Science, Deen Dayal Upadhyaya College,


University of Delhi, New Delhi, India
[email protected]

Abstract. The analysis of financial time series for predicting the future
developments is a challenging problem since past decades. A forecast-
ing technique based upon the machine learning paradigm and deep
learning network namely Extreme Learning Machine with Auto-encoder
(ELM-AE) has been proposed. The efficacy and effectiveness of ELM-AE
has been compared with few existing forecasting methods like Gener-
alized Autoregressive Conditional Heteroskedastcity (GARCH), General
Regression Neural Network (GRNN), Multiple Layer Perceptron (MLP),
Random Forest (RF) and Group Method of Data Handling (GRDH).
Experimental results have been computed on two different time series
data that is Gold Price and Crude Oil Price. The results indicate that
the implemented model outperforms existing models in terms of quali-
tative parameters such as mean square error (MSE).

Keywords: Auto-encoder · Deep learning · ELM · Forecasting


Time series

1 Introduction
Financial time series [3] are inherently noisy and nonstationary. The nonsta-
tionary time series are those where statistical parameters like mean, median,
standard deviation changes over a period of time. These characteristics of time
series will continuously alter the relationship between the input and output vari-
ables. It has been observed in literature that during forecasting recent observa-
tions have more impact rather distant observations. The same approach can be
considered in forecasting nonstationary time series [14].
Literature survey shows that a number of models have been built upon time
series data for forecasting and prediction analysis. The two widely used tech-
niques for time series forecasting are statistical and computational methods. The
statistical methods are used by economists for price forecasting studies. The
popular time series forecasting techniques include exponential smoothing and
autoregressive models like Autoregressive Integrated Moving Average (ARIMA)
[10] and GARCH family models [18]. However, recent trend shows that machine
learning algorithms have outperformed classical statistical techniques [7]. Among
c Springer Nature Singapore Pte Ltd. 2018
G. C. Deka et al. (Eds.): ICACCT 2018, CCIS 899, pp. 23–33, 2018.
https://doi.org/10.1007/978-981-13-2035-4_3
24 Preeti et al.

various machine learning techniques, Support Vector Regression(SVR) and Arti-


ficial Neural Network(ANN) [11] have widely been used to forecast the time series
data. Oancea [10] has used ANN in forecasting crude oil price data. Another
variant of ANN, i.e. Multi-layer Perceptron(MLP) have also been applied on
Gold-price and Crude-Oil price data [12]. However, above mentioned methods
can easily fall into local minima and it becomes difficult to achieve a global opti-
mal solution. Other than MLP, several other techniques have also been used such
as GARCH, GRNN, GMDH and RF for future prediction on time series data
[12]. Apart from neural network family models, Kim et al. [8] have studied SVR
technique for stock market price prediction. Furthermore, SVR technique with
adaptive parameters have been used for financial forecasting and achieved good
generalization performance [4]. Since financial time series data is more prone to
noise, so modelling this data using SVR could lead to overfitting and uderfitting
problems [4].
Recently, Deep learning(DL) [5,15] has gained more attention in pattern
recognition, computer vision, natural language processing, bio-informatics and
several machine learning fields since it is outperforming on a number of difficult
tasks. It is evident from the study of related work that several machine learning
techniques have been used for forecasting different time series but very few efforts
have been made to study this problem using machine learning technique with
deep learning. Kuremoto et al. [9] and Shen et al. [16] used deep belief networks
in financial market prediction. However, one of the approach of deep learning
namely autoencoders method have rarely been explored for time series predic-
tion. Therefore, this paper focuses on using the combination of autoencoder with
one of the machine learning techniques to build a forecasting model.
The remaining paper is structured as follows. In Sect. 2, there is a brief
overview of ELM and ELM-Autoencoder with an overview of deep learning.
Section 3 describes the proposed algorithm along with the description of finan-
cial time series data. In Sect. 4, experimental results have been presented and
discussed. Finally, conclusions and future perspective of paper is discussed in
Sect. 5.

2 Brief Theories About ELM and Auto-Encoder

This section presents an overview about the basics of Extreme Learning Machine
and ELM Auto-encoder.

2.1 Extreme Learning Machine

Extreme learning machine (ELM) is an efficient learning method proposed to


train single layer feed-forward neural networks (SLFNs) [6]. SLFN is a feed-
forward neural network with single hidden layer connecting input layer to output
layer. For past decades, Back-Propagation (BP) algorithm based upon gradient
descent has been used for training SLFN. However, gradient descent based learn-
ing methods are usually very slow due to iterative learning. These algorithms
Financial Time Series Forecasting Using Deep Learning Network 25

might not always converge to global minima. Because of these issues in learning
algorithms of feed-forward neural network, they might not lead to better gener-
alized solution. Unlike these traditional learning algorithms, ELM trains SLFN
by chossing input weights and hidden layer biases arbitrarily. So, there will not
be any iterative tuning of parameters. After choosing input weights and hidden
layer biases randomly, SLFN can be considered as a linear system. Then the
output weights of SLFN can be determined through simple generalized inverse
operation of the output matrix obtained from hidden layer.
Given a training set with N number of random samples S = {(xi ,ti ) | xi =
[xi1 , · · · , xin ]T ∈ IRn , ti = [ti1 , · · · , tim ]T ∈ IRm , i = 1, 2, · · · , N }, ELM is
mathematically modeled as [6]


 Ñ

βi g(xj ) = βi g(wi · xj + bi ) = oj , j = 1, · · · , N (1)
i=1 i=1

where wi = [wi1 , · · · , win ]T is the weight vector which connects input layer and
hidden layer, n is number of features, m is the number of classes, Ñ represents
number of hidden nodes and g(x) is the activation function (infinitely differen-
tiable). In this study, radial basis function (RBF) has been used as the activation
function. The steps used by ELM for training SLFN are as follows:

1. For i = 1, 2, · · · , Ñ , input weights wi and biases bi are chosen randomly and


fixed.
2. In the nextstep, the hidden layer output matrix of neural network denoted
by H is computed as follows

H(w1 , · · · , wÑ , b1 , · · · , bN , x1 , · · · , xN ) =
⎡ ⎤
g(w1 · x1 + b1 ) · · · g(wÑ · x1 + bÑ )
⎢ .. .. ⎥
⎣ . ··· . ⎦ (2)
g(w1 · xN + b1 ) · · · g(wÑ · xN + bÑ ) N ×Ñ

where, the ith column of H is the output of ith hidden node with respect to
inputs x1 , x2 , · · · , xN .
3. The output weight vector β connecting hidden nodes and output nodes is
calculated using β = H † T where T = [ti1 , · · · , tim ]T and H † is the Moore
Penrose generalized inverse of matrix H [13].

2.2 Auto-Encoder
Artificial neural network with deep architecture has become a powerful tool to
represent the high level abstract features of high dimensional data. Based on the
concept of ELM, the extreme learning machine with auto-encoder (ELM-AE)
[17] is proposed as an unsupervised learning algorithm. The aim of an auto-
encoder is to learn new encoding for set of data using deep learning architecture.
The basic idea of ELM-AE is consist of two stage process. Given a dataset
26 Preeti et al.

X = [xT1 , xT2 , · · · , xTN ] where, N represents the number of samples. In the first
stage, ni number of input features of original data are mapped onto nh number
of hidden neurons. Now, depending upon the size of ni and nh , three different
architectures of ELM-AE are possible: (1) compressed architecture, where ni >
nh (2) equal dimension architecture, ni = nh (3) sparse architecture, ni < nh .
The mapping of input xi with ni number of features to a nh dimensional space
is calculated as follows:
h(xi ) = g(aT xi + b) (3)
where h(xi ) ∈ IRh is the hidden layer output vector with respect to xi , a and
b are input weight matrix and baises of hidden units respectively, and function
g(·) represents an activation function which can be any non differentiable or
piecewise continuous function. Finally, the output vector of auto-encoder can be
calculated using
f (xi ) = h(xi )T β, i = 1, 2, ..., N (4)
In the second stage of ELM-AE, the weight vector of output layer that is β
is computed by minimizing the error loss function. The closed-form solution to
calculate β is as follows:

β = (H T H + Inh /C)−1 H T X (5)

where C is a regularization factor that is a penalty coefficient on the training


error. Now given the original data X, new enriched data Xnew which is the
representation of X in nh dimensional space can be obtained as Xnew = Xβ T .
Further, this Xnew dataset is used to build a model using ELM for forecasting
time series.

3 Proposed Methodology

The framework followed to forecast time series data is depicted in Fig. 1. It shows
the several stages followed as a flow chart to predict the forecast value of a time
series.
The step-by-step procedure for forecasting financial time series using the pro-
posed ELM-AE is described as follows. Let Y = (y1 , y2 , · · · , yk , yk+1 , · · · , yN ) be
the set of N observations of a financial series recorded at time t = (1, 2, · · · , k, k+
1, · · · , N ) respectively. Then the output variable that is label can be forecasted
as follows:

1. Consider a memory order M = 3. Memory order M denotes the number


of previous values on which current time tcur value that is ycur is depen-
dent. Formulate the data having M number of input features and an output
Financial Time Series Forecasting Using Deep Learning Network 27

Fig. 1. System flow for forecasting financial time series.

variable/label. Where, yM +1 is dependent upon previous M number of series


records. The exact formulation of data using given Y series is as follows:
⎡ ⎤
Inputf eatures Label
⎢ y1 y2 ··· yM yM +1 ⎥
⎢ ⎥
⎢ y2 y · · · y +1 yM +2 ⎥

⎢ 3 M
⎢ y3 y · · · y y ⎥ (6)
⎢ 4 M +2 M +3 ⎥
⎢ .. .. .. .. ⎥
⎣ . . ··· . . ⎦
yN −M yN −M −1 ··· yN −1 yN (N −M )×(M +1)

2. Normalize the obtained data in Eq. 6 in the range of [0,1]. This transformation
of data is required to avoid differences in the smallest and largest value of a
time series dataset.
3. The normalized dataset is divided into 80/20 partitions of training and test
set respectively. Extreme learning machine algorithm is applied to obtain a
model using training set. While, developing a forecasting model using ELM, a
grid search has been performed to obtain the best set of values for parameters,
i.e. number of hidden neurons and radial function parameter. The grid-search
is performed for number of hidden neurons = {1,3,· · · ,99,101} and activation
function parameter = {2−6 , 2−5 , · · · , 20 , 21 }. The obtained model is tested
on test set.
28 Preeti et al.

4. For each value of M = 4 to 7, repeat steps 1 to 3. The value of memory order


is varied from 3 to 7 to denote that minimum number of previous records on
which ycur is dependent.
5. The MSE obtained for every possible value of memory order M is compared
to find the best memory order M . The best M is the one with least mean
square error.
6. The best M is used to formulate the data again as in step 1 and obtained
data is normalized in the range of [0, 1].
7. In the next step, Auto-Encoder is applied on the normalized data with mem-
ory order best M by varying number of hidden neurons nh ranging from the
value of best M to 30. With nh number of hidden neurons, auto-encoder
results into new representation for the set of features which are used for fur-
ther processing.
8. The set of enriched features is used to train ELM with the optimal set of
parameters obtained from grid search to yeild forecasting of training set as
in step 3 and MSE is obtained.
9. Step 7 and 8 are repeated for the range of values of nh and corresponding
mean square errors are stored.
10. The resultant model with least MSE is found at some value of nh . Then obtain
the predictions of test set using that resultant model.
Computational complexity is the number of total functions evaluated. In the
proposed ELM-AE algorithm, two stage process is followed. In the first stage,
autoencoder is applied whose running time is O(mN ), where m is the random
subspace of features and N is number of training instances. Second stage includes
ELM algorithm with grid searching for optimal set of parameters which takes
O(mN ). Therefore, the total computational complexity of ELM-AE is O(mN ).

4 Experimental Design
In this section, there is a brief description about the various datasets used
for experiments and the performance measures used for evaluation of proposed
model.

4.1 Datasets Description


The datasets used for experiments are based on daily US Dollar (USD) exchange
rates with respect to two different prices namely Gold and Crude Oil Price. These
two datasets have been used for testing the effectiveness of proposed forecasting
model. The Gold price data is obtained from [1] and Crude-Oil price data is
obtained from [2]. The total number of observations in Gold-Price data are
12, 630 and in Crude-Oil Price data are 7760. Each of the dataset is partitioned
into both 80% Training set and 20% Test set respectively. So for Gold-Price
dataset the number of records in training set are 10, 104 and in test set are
2, 526. And correspondingly for Crude-Oil price dataset, the partition is 6, 208
and 1, 552 respectively.
Financial Time Series Forecasting Using Deep Learning Network 29

4.2 Data Pre-Processing

The Gold-Price data and Crude-Oil price data used for the study are pre-
processed before using them with proposed model. The two major pre-processing
steps performed are as follows:

1. Phase Space Reconstruction: The original time series data y is just a


single column representing its closing value for time t. Where, Y =
(y1 , y2 , · · · , yk , yk+1 , · · · , yN ) be the set of N number of observations at time
t = (1, 2, · · · , k, k + 1, · · · , N ) respectively. In this phase, the original data is
converted into a set of input features given a memory order M and a label.
This obtained dataset is used for further processing.
2. Normalization: Since different time series dataset may have different range of
values. So after phase space reconstruction, obtained data is normalized in
the range of [0, 1]. This normalized data is used further for training a model
and to test it.

4.3 Performance Measures Used

Among several performance measures, Mean Squared Error (MSE) is the one
that can be used to evaluate the performance of proposed model on time series
data. It is defined as in Eq. 7
N
i=1 (yt − y˜t )2
M SE = (7)
N
where, N represents the total number of forecasts obtained, yt and y˜t are actual
and forecasted values at time t respectively. The MSE (see Eq. 7) measures the
average of the squares of the errors or deviations from true value. It is basi-
cally the difference between estimator and estimated. MSE is useful to measure
accuracy for a continuous variable. So lesser the mean square error better is the
model.

5 Results and Discussion

In the previous section, we discussed about the experimental design of proposed


method. This section presents the dataset-wise results obtained and discusses
them.

5.1 Gold Price(USD)

At first, ELM-AE is analyzed on gold price dataset. The proposed method


ELM-AE based on ELM technique is sensitive towards user defined parameter -
number of hidden neurons. Figure 2 depicts the MSE obtained while predict-
ing Gold-Price on test dataset with different number of hidden neurons. For a
particular number of hidden neurons, the depicted MSE is the MSE obtained
30 Preeti et al.

by taking average of MSE of 20 executions. As shown in Fig. 2, the minimum


MSE 0.000015 for Gold Price data is obtained with 15 hidden neurons and mem-
ory order 5. Table 1 presents MSE values obtained by using different forecasting
models including GARCH, MLP, GRNN, GMDH, RF, ELM and ELM-AE for
both training set and test set of Gold Price (USD) dataset. It can be observed
from Table 1 that our proposed models are better than other models presented
in terms of MSE. Also, among the two different forecasting methods used in this
study, obtained MSE with ELM-AE for test set is better than ELM method.
The actual gold price index and predicted values obtained from ELM-AE model
is illustrated in Fig. 3. It can be observed that the predicted value of gold price
time series is very much close to its actual value. It shows the deep learning
based method for time series forecasting has good predicting capability.

Fig. 2. MSE obtained for Gold Price data

Table 1. Comparison of results for Gold-Price data using proposed and other models

Method Training set MSE Test set MSE


GARCH [12] 0.000565 0.001100
MLP [12] 0.001800 0.003200
GRNN [12] 0.000451 0.000918
GMDH [12] 0.000452 0.000903
RF [12] 0.000498 0.001100
ELM 0.0000186 0.0000158
ELM-AE 0.0000192 0.0000150

5.2 Crude-Oil Price(USD)


The performance of proposed method ELM-AE is also evaluated on Crude-Oil
price dataset. The proposed model was run for different number of hidden neu-
rons. The MSE obtained while forecasting Crude-Oil Price test dataset with
different number of hidden neurons is depicted in Fig. 4. The depicted MSE
Financial Time Series Forecasting Using Deep Learning Network 31

Fig. 3. Predictions of test set of Gold Price series data.

Fig. 4. MSE obtained for Crude-Oil Price data

in Fig. 4 is the average of MSE of 20 runs and minimum MSE 0.0000465 is


obtained at 29 number of hidden neurons and memory order 3. Table 2 presents
MSE values obtained by using different forecasting models including GARCH,
MLP, GRNN, GMDH, RF, ELM and ELM-AE for both training set and test
set of Crude-Oil Price (USD) dataset. It can be observed from Table 2, the two

Table 2. Comparision of results for Crude-Oil Price data using proposed and other
models

Method Training set MSE Test set MSE


GARCH [12] 0.003527 0.002351
MLP [12] 0.005400 0.002700
GRNN [12] 0.002100 0.001800
GMDH [12] 0.002700 0.001800
RF [12] 0.003500 0.002000
ELM 0.00005263 0.00005561
ELM-AE 0.00005631 0.00004651
32 Preeti et al.

models presented in this study are better than other models proposed in related
work in terms of MSE. Since, autoencoder has been applied on crude-oil price
data with different number of hidden neurons to retrieve the enriched set of
features. Also, it shows that the parameter number of hidden neurons used by
autoencoder plays an important role in forecasting.

Fig. 5. Predictions of test set of crude oil price series data

Figure 5 depicts the actual price value and predicted value for crude-oil series
data. It can be seen in the Fig. 5 that the actual and predicted values of crude-oil
data are very close to each other. This shows that the prediction capability of
proposed ELM-AE method is good.

6 Conclusion
The main aim of this paper was to study the deep learning based technique for
time series prediction. This paper proposed an autoencoder based ELM algo-
rithm, ELM-AE to forecast different time series data. The experiments have
been performed on two different financial time series i.e. Gold-Price (USD) and
Crude-Oil Price (USD). It is observed that the proposed ELM-AE yielded statis-
tically significant results compared to other forecasting models such as GARCH,
MLP, GRNN, GMDH and RF on two time series datasets in terms of MSE.
The spectacular performance of ELM-AE is due to the presence of a deep learn-
ing approach that is autoencoder before ELM algorithm, which yeilded enriched
set of features used for further modeling. The proposed model ELM-AE based
on auto-encoder provides quiet promising results in forecasting. The obtained
results outperform the other methods which indicate its further use in forecast-
ing other financial or non financial time series.

References
1. http://www.quandl.com/LBMA/GOLD-Gold-Price-London-Fixing
2. http://www.quandl.com/data/FRED/DCOILBRENTEU-Crude-Oil-Prices-
Brent-Europe
Financial Time Series Forecasting Using Deep Learning Network 33

3. Abu-Mostafa, Y.S., Atiya, A.F.: Introduction to financial forecasting. Appl. Intell.


6(3), 205–213 (1996)
4. Cao, L.J., Tay, F.E.H.: Support vector machine with adaptive parameters in finan-
cial time series forecasting. IEEE Trans. Neural Netw. 14(6), 1506–1518 (2003)
5. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neu-
ral networks. Science 313(5786), 504–507 (2006)
6. Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and appli-
cations. Neurocomputing 70(1), 489–501 (2006)
7. Karia, A.A., Bujang, I., Ismail, A.: Forecasting on crude palm oil prices using
artificial intelligence approaches. Am. J. Oper. Res. 3(2), 259 (2013)
8. Kim, K.J.: Financial time series forecasting using support vector machines. Neu-
rocomputing 55(1), 307–319 (2003)
9. Kuremoto, T., Kimura, S., Kobayashi, K., Obayashi, M.: Time series forecasting
using a deep belief network with restricted boltzmann machines. Neurocomputing
137, 47–56 (2014)
10. Oancea, B., Ciucu, Ş.C.: Time series forecasting using neural networks. arXiv
preprint arXiv:1401.1333 (2014)
11. Parisi, A., Parisi, F., Dı́az, D.: Forecasting gold price changes: rolling and recursive
neural network models. J. Multinational Financ. Manag. 18(5), 477–487 (2008)
12. Pradeepkumar, D., Ravi, V.: Forecasting financial time series volatility using par-
ticle swarm optimization trained quantile regression neural network. Appl. Soft
Comput. 58, 35–52 (2017)
13. Rao, C.R., Mitra, S.K.: Generalized inverse of matrices and its applications (1971)
14. Refenes, A., Bentz, Y., Bunn, D.W., Burgess, A.N., Zapranis, A.D.: Financial time
series modelling with discounted least squares backpropagation. Neurocomputing
14(2), 123–138 (1997)
15. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61,
85–117 (2015)
16. Shen, F., Chao, J., Zhao, J.: Forecasting exchange rate using deep belief networks
and conjugate gradient method. Neurocomputing 167, 243–253 (2015)
17. Sun, K., Zhang, J., Zhang, C., Hu, J.: Generalized extreme learning machine
autoencoder and a new deep neural network. Neurocomputing 230, 374–381 (2017)
18. Zhang, G.P.: Time series forecasting using a hybrid arima and neural network
model. Neurocomputing 50, 159–175 (2003)

View publication stats

You might also like