0% found this document useful (0 votes)
32 views8 pages

MRC-LSTM A Hybrid Approach of Multi-Scale

The document presents MRC-LSTM, a novel hybrid model combining Multi-scale Residual Convolutional Neural Network and Long Short-Term Memory for predicting Bitcoin prices. This approach effectively captures features across different time scales and learns long-term dependencies, outperforming various existing models in forecasting accuracy. The study also validates the model's effectiveness through experiments on Bitcoin, Ethereum, and Litecoin, highlighting its potential in cryptocurrency price prediction.

Uploaded by

lucas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views8 pages

MRC-LSTM A Hybrid Approach of Multi-Scale

The document presents MRC-LSTM, a novel hybrid model combining Multi-scale Residual Convolutional Neural Network and Long Short-Term Memory for predicting Bitcoin prices. This approach effectively captures features across different time scales and learns long-term dependencies, outperforming various existing models in forecasting accuracy. The study also validates the model's effectiveness through experiments on Bitcoin, Ethereum, and Litecoin, highlighting its potential in cryptocurrency price prediction.

Uploaded by

lucas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

MRC-LSTM: A Hybrid Approach of Multi-scale

Residual CNN and LSTM to Predict Bitcoin Price


Qiutong Guo Shun Lei Qing Ye∗ Zhiyang Fang
School of Computer Science School of Computer Science School of Computer Science School of Computer Science
Sichuan University Sichuan University Sichuan University Sichuan University
Chengdu, China Chengdu, China Chengdu, China Chengdu, China
arXiv:2105.00707v1 [q-fin.TR] 3 May 2021

Abstract—Bitcoin, one of the major cryptocurrencies, presents rich portfolio and the potential for high returns have attracted
great opportunities and challenges with its tremendous potential the attention of an increasing number of financial investors.
returns accompanying high risks. The high volatility of Bitcoin However, the Bitcoin market is highly volatile and subject
and the complex factors affecting them make the study of
effective price forecasting methods of great practical importance to frequent speculative bubbles [5], [6], thus its trading risks
to financial investors and researchers worldwide. In this paper, we are enormous. Consequently, as discussed above, the study
propose a novel approach called MRC-LSTM, which combines of effective price forecasting methods is of great practical
a Multi-scale Residual Convolutional neural network (MRC) importance to investors, researchers, and policymakers around
and a Long Short-Term Memory (LSTM) to implement Bitcoin the world due to the tremendous opportunities and challenges
closing price prediction. Specifically, the Multi-scale residual
module is based on one-dimensional convolution, which is not posed by Bitcoin.
only capable of adaptive detecting features of different time scales In recent years increasingly machine learning methods,
in multivariate time series, but also enables the fusion of these especially deep neural networks(DNNs), have been applied to
features. LSTM has the ability to learn long-term dependencies market forecasting in cryptocurrencies. In terms of the Bitcoin
in series, which is widely used in financial time series forecasting. market, it is highly volatile and generates a large amount
By mixing these two methods, the model is able to obtain highly
expressive features and efficiently learn trends and interactions of highly non-linear transaction data. In order to effectively
of multivariate time series. In the study, the impact of external explore these dynamic data, we need models which are able
factors such as macroeconomic variables and investor attention to analyze the internal interactions and hidden patterns in the
on the Bitcoin price is considered in addition to the trading data. Several researchers [7], [8] have shown that DNN models
information of the Bitcoin market. We performed experiments are well suited for learning lagged correlations between step-
to predict the daily closing price of Bitcoin (USD), and the exper-
imental results show that MRC-LSTM significantly outperforms wise trends in large financial time series. In a literature review
a variety of other network structures. Furthermore, we conduct [9] of comparative studies between artificial neural networks
additional experiments on two other cryptocurrencies, Ethereum and traditional statistical models, it was found that in 72% of
and Litecoin, to further confirm the effectiveness of the MRC- 96 cases, artificial neural network models were shown to have
LSTM in short-term forecasting for multivariate time series of better predictive performance. Consequently, there is no doubt
cryptocurrencies.
that due to the highly nonlinear and volatile nature of financial
I. I NTRODUCTION markets, DNN models are increasingly applied in the field of
Bitcoin is the world’s first distributed super sovereign digital finance, especially for financial time series forecasting.
currency, proposed and established by Satoshi Nakamoto [1] In this paper, a novel DNN model consisting of a Multi-
in 2009. It relies on an electronic payment system based on scale Residual Convolutional Neural Network with Long
cryptography and P2P (Point to Point) technology. By using Short-Term Memory (LSTM) is proposed for Bitcoin price
encryption algorithms and automatic authentication mecha- prediction. In the study, not only the influence of transaction
nisms, it is difficult to crack or forge, making its transactions information such as Bitcoin historical price is considered
more secure and transparent. Since the birth of Bitcoin, it has on the closing price of Bitcoin, but also external influences
quickly gained widespread attention. such as macroeconomic factors and investors’ attentions are
As a new type of cryptocurrency, Bitcoin conducts 24/7 introduced. The proposed model consists of two main parts.
transactions and is able to exchange for many major currencies First, is the proposed multi-scale residual module based on
at a low cost of foreign exchange. Compared to other tradi- one-dimensional convolution. It contains an identity mapping
tional financial assets, Bitcoin provides investors a new type and three branching networks, where the size of the con-
of portfolio management. Existing research [2], [3] indicated volutional kernel is different for each branching network.
that Bitcoin has an apparent role in the portfolio manage- Then, the second part is the LSTM network, which is able
ment market. Empirical analysis by Anne H et al. [4] also to further learn the trends of multivariate time series and the
corroborates the investability of Bitcoin. In recent years, its interactions between the series, and output the final predicted
values. Different from previous financial multivariate time
∗ indicates the corresponding author. series forecasting, we construct a multi-scale residual module,
which can also be called a three-bypass residual module, B. Time Series Forecasting
in which information from these bypasses can be shared Time series research has always been an important field
with each other. Moreover, this structure can extract potential of machine learning. By building up neural networks based
features that have a high impact on bitcoin price in multiple on deep learning, we are able to extract and exploit the
time scales and integrate them into highly expressive feature hidden information represented by digital currency raw data of
vectors after the concatenate operation. In addition to the digital currencies in order to make accurate and efficient price
Bitcoin price prediction experiments, we also conducted exten- predictions [10]. In recent years, researchers have continuously
sive experiments on datasets of two different cryptocurrencies made progress in the field of financial time series forecasting
(Litecoin and Ethereum) to confirm the strong ability of the [23]–[25]. Many traditional research approaches focus on
proposed MRC-LSTM model for short-term price prediction learning internal patterns, such as autocorrelation, in a partic-
of cryptocurrencies. ular time series. However, for reality scenarios, especially in
The remainder of the paper is organized as follows. Sec- cryptocurrencies, stock markets and other financial domains,
tion II reviews the relevant literature. Section III introduces in most cases we are dealing with multivariate time series.
the methodology, including the proposed model. Section IV Changwei Hu et al. [26] developed a deep learning structural
presents our experiments, while our empirical results and time series model to handle correlated multivariate time series
discussion are shown in Section V. Finally, Section VI is the inputs. Their model is able to leverage dependencies among
conclusion of this paper. multiple correlated time series and extract weighted differenc-
II. R ELATED LITERATURE ing features for better trend learning. The existence of close
correlations among many multivariate time series motivates
In this section, the related work about price prediction of the us to consider not only intra-series pattern learning but also
cryptocurrency and the basis knowledge to the MRC-LSTM inter-series pattern learning when dealing with such tasks.
are provided.
C. Residual Network
A. Neural Network Approaches
As one of the milestones in the evolution of CNN, Resid-
Salim Lahmiri et al. [10] first applied the deep neural ual Network (ResNet) [27] has achieved impressive, record-
networks (DNNs) to cryptocurrency price prediction, eluci- breaking performance on many challenging tasks. ResNet
dated the short-term predictability of cryptocurrencies, and share some similarities with Highway networks, such as
found that the predictive accuracy of Long Short-term Memory residual blocks and shortcut connections. ResNet simplifies
(LSTM) neural networks is higher than that of Generalized the training of very deep networks by bypassing signals from
Regression Neural Networks (GRNN) [11]. Subsequently, a one layer to the next through identity connections. The basic
growing body of research has emerged in this area [12], idea underlying residual learning is the branching of gradient
[13]. LSTM neural networks [14] have been shown to be propagation paths. For CNNs, this idea was first introduced in
significantly effective in forecasting bitcoin prices due to the form of parallel paths in the inception models of [28].
their ability to identify long-term dependencies and store both Many research works [29], [30] have exploited the multi-
long-term and short-term temporal information. Researchers level features in CNNs by skip-connections and found them
have obtained better results with the LSTM network, in their to be effective for a variety of visual tasks. GoogLeNet [28],
works to predict bitcoin prices and compare them with the [31] proposed an “Inception module” that connects feature
performance of different models [15], [16]. mappings generated by filters of different sizes to increase
Further, Convolutional Neural Networks (CNNs) have also the diversity of feature extraction. Meanwhile, GoogLeNet
been applied to cryptocurrency market forecasting. In a study increased the network width by skipping connections to im-
[17], researchers combined CNN with LSTM neural network prove the robustness and expressiveness of the network. Gao
for high-frequency market trend prediction for a variety of Huang et al. [32] proposed the Dense Convolutional Network
cryptocurrencies. Their empirical analysis shows that the ad- (DenseNet) to exploit the potential of the network through
dition of convolutional layers improves the prediction perfor- feature reuse. In contrast to ResNet, it introduces a direct
mance and that the hybrid network structure provides the best connection from any layer to all subsequent layers. In addition,
prediction results in the experiment. Yan Li et al. [18] propose DenseNet combines features by concatenating them, rather
a hybrid neural network model based on CNN and LSTM than combining features through summation. These works
neural networks, and experimental results show that CNN- show us the utility of skip connections, and DenseNet also
LSTM hybrid neural network can effectively improve the shows us how to connect feature maps via concatenation. Our
accuracy of value prediction and direction prediction compared work is partly inspired by these two ideas and explores their
to single-structure neural network. The hybrid neural network application to model building for short-term price forecasting
model combining CNN with LSTM has also been used by of cryptocurrencies.
many researchers for time series prediction of financial data
such as Gold volatility prediction [19], stock prices, etc. Its III. M ETHODOLOGIES
excellent performance has also been demonstrated in many In this section, the details of our proposed method MRC-
other fields [20]–[22]. LSTM and basic blocks are described.
A. ResNet
In order to solve the problem of gradient disappearance
and gradient explosion due to network deepening, He et al.
proposed a new DNN——ResNet [27]. ResNet consists of
many residual units, shown in Fig. 1, and each residual unit
can be represented in the following Equations 1 and 2.

yt = h(xt ) + F (xt , wt ) (1)


xt+1 = f (yi ) (2)
Where F is a residual function, f is a ReLU function, wt is
the weight matrix, and xt and yt are the inputs and outputs
Fig. 2. Basic unit of LSTM network
of the t-th layer. The function h is an identity mapping given
by Equation 3:
h(xt ) = xt (3) In the input gate, the sigmoid activation function is used to
In a residual block, skip connections can effectively aggre- calculate which information needs to be updated. Then, use
gate historical information, reduce the loss of features and the tanh activation function to get a vector of candidate values
information to some extent, and enable the network to learn C̃t , after which updates the previous state Ct−1 to Ct . The
richer content. Motivated by the idea of Residual units, our formulas are Equations 5, 6 and 7.
proposed model applies the method of skip connections.
it = (Wi xt + Ui ht−1 + bi ) (5)
C̃t = tanh(Wc xt + Uc ht−1 + bc ) (6)
Ct = ft · Ct−1 + it · C̃t (7)
The output gate is used to calculate the extent of the
information output at the current moment. The information
is filtered by the sigmoid activation function to obtain ot .
Then the tanh activation function is used to obtain the desired
information ht :
ot = σ(Wo xt + Uo ht−1 + bo ) (8)
Fig. 1. the residual block
ht = ot · tanh(Ct ) (9)
B. LSTM C. The Proposed Network Architecture.
LSTM is based on the Recurrent Neural Network (RNN) Inspired by ResNet, and based on the need to deal with
model, which can effectively solve the gradient disappearance time series problems, we developed a novel building block,
and gradient explosion problems in the RNN model. The Multi-scale Residual Convolutional block (MRC) based on
addition of a special gate control mechanism makes it possible one-dimensional time convolution. It is combined with an
to solve the problem of long-term dependence, and it is LSTM neural network to form a new hybrid network archi-
suitable for time series processing and prediction, natural tecture (MRC-LSTM) to perform cryptocurrency time series
language generation [33]–[35]. Fig. 2 gives the basic unit of prediction. In the following, we will first introduce how to
the LSTM neural network. Its basic unit is the memory block, design the multi-scale residual block, followed by the proposed
which contains the memory cell and three gates that control MRC-LSTM model, i.e., how the new building block can be
the memory cell, namely, the input gate, the output gate, and combined with LSTM to predict the bitcoin price.
the forget gate. The Multi-scale Residual Block. This is the first part of
Forget Gate is mainly used to calculate the degree of MRC-LSTM which is utilized to extract potential features
information retention and discarding. The output ft represents with high expressiveness at different time scales from the
the probability of forgetting the state of the underlying cell dataset. We constructed a three-bypass convolutional layer
layer, and is calculated as follows: with convolutional kernels of different sizes, and joined a
ft = σ(Wf xt + Uf ht−1 + bf ) (4) jump connection to further aggregate historical information
efficiently. Due to the high volatility of Bitcoin and the
In the Equation (4), the xt indicates the input of the current short-term predictability of cryptocurrencies [10], the Bitcoin
cell, ht−1 presents the output of the previous cell, σ is the market is suitable for short-term forecasting, and the longer the
sigmoid function. forecast period, the worse the forecast performance. Therefore,
a set of 1D convolutional kernels with a window size of
1 to slide in the time direction for initial feature extraction
and feature augmentation of the sequence. Then, the multi-
scale residual module is utilized to extract and integrate local
features across multiple time scales. Finally, an LSTM network
is used to learn relationships in time steps for price forecasting.
(
x, (x > 0)
f (x) = (10)
0, (x ≤ 0)
We introduce activation functions to incorporate nonlinear-
ities in the network. The common activation functions include
Sigmoid, Tanh, ReLU, and SELU. In this paper, the rectified
linear unit (ReLU) is selected as the nonlinear activation
Fig. 3. The Structure of Multi-scale Residual Block
function, which is expressed as the Equation (10).
IV. E XPERIMENTS
we select 1D convolutional kernels of size 1, 2, and 3 to A. Data Collection and pre-processing
slide the sequences in the temporal domain, which means that
The dataset used in this experiment includes the daily
such multi-scale residual module can simultaneously extract
closing price of Bitcoin and 10 types of internal (Bitcoin
the trends and the hidden interactions of the data within 1,
trading datas) and external (macroeconomic variables and
2, and 3 adjacent days in the sequences. In addition, it is
investor attention) information that have an impact on the price
known that the window size of the kernel in a one-dimensional
of Bitcoin from October 25, 2015 to October 17, 2020, for
convolution will affect the network learning effect. It is likely
a total of 1820 records. The dataset is divided into 80% as
to miss local feature information when the window size is too
training set and 20% as test set. The training set is further
large; When the window size is small, local features are easy
divided into a training set (80%) and a validation set (20%) for
to extract, but may reduce the correlation of local features.
evaluating performance and avoiding overfitting. The Table I
Hence, the ”Multi-scale” design of the proposed model can
shows all the attributes that compose the dataset.
also take into account the advantages of both large and small
The internal information in the dataset refers to the daily
windows. Fig. 3 shows the design of the residual module.
transaction datas of Bitcoin, including the open price, the
When it comes to the way to combine the extracted features
closing price, the highest price, the lowest price, the weighted
of three bypasses, here we draw on the idea of DenseNet,
price, the trade volume in Bitcoins and the market’s currency.
which differs from ResNet in that we never combine features
They are collected from the offical website 1 .
by summing them before they are passed to the next layer;
As for external factors, we take the macroeconomic factors
instead, we combine features by concatenating them. We know
and investor attention into consideration. As a new type of
that the concatenation operation is not viable when the feature-
investment instrument, the price of bitcoin is considered to be
map size changes. Therefore, in order to keep the feature-
related to several macroeconomic variables. The study by Leon
maps size uniform, we need to perform a zero-padding when
Li considered four representative volatility indices published
performing 1D convolution. Then, the identity mapping is
by the Chicago Board Options Exchange(CBOE), including
concatenated in the depth direction with the feature maps of
the Volatility Index(VIX) and the Gold Price Volatility In-
the three paths and fed into the subsequent layer.
dex(GVZ). He believed that volatility indices can be used
At the end of the module, the use of 1 × 1 2D convo-
to speculate and trade on market sentiment regarding future
lutional kernels enables cross-channel information interaction
volatility [36]. Some researchers have found that there is
and expansion. In this way, the feature-maps extracted by the
a relationship between the price of bitcoin and gold, crude
three prediction cycles will be fused and the network will
oil, and stock market indices [37]. Therefore, this paper
adaptive extract useful information from these hierarchical
introduces macroeconomic variables that have been proven to
features. Meanwhile, non-linearity can be added while keeping
have predictive power for the bitcoin market to predict bitcoin
the feature map scale constant.
prices. Ultimately, the following macroeconomic factors were
Hybrid MRC-LSTM model. The network consists of two chosen for the experiment: S&P 500 Index, GVZ, VIX. They
main parts, the first is the multi-scale residual module for are all daily data from the Wind Database.
extracting features in the multivariate time series, and the Moreover, in recent years, a number of researchers have
second is the LSTM layer for learning pattern changes and found that Google trends, which reflect investors’ attention,
predicting prices. Fig. 4 shows the structure of the MRC- play a significant role in predicting the Bitcoin market [38],
LSTM neural network. The network contains an input layer, [39]. Da et al. [40] pointed out that search activity can
a 1D convolutional layer, the multi-scale residual module, an reflect investors’ attention. Therefore, we employ the Google
LSTM layer, a fully connected layer, and an output layer.
During the construction of the MRC-LSTM, we first use 1 https://bitcoincharts.com
Fig. 4. The overall architecture of MRC-LSTM model. Conv 1D, One-dimensional convolutional layer; MRC, multi-scale residual block; LSTM, long
short-term memory; FC, fully connected layer.

Trends as one of the predictors, which is derived from Google where yt is the output of the prediction value, ymin , ymax
Trends 2 . are the minimum and maximum value of the target data, and
ynorm is the predicted value derived directly from the building
TABLE I model.
L IST OF INPUT ATTRIBUTES
B. Process and Parameters Settings
Transaction Information Macroeconomic Variables Investor Attention
the Open price S&P 500 Index Google Trends The proposed neural network structure in this paper consists
the Close price GVZ - of two main parts. Through repeated experiments, the param-
the Highest price VIX -
the Lowest price - - eters of the proposed network are determined as follows.
the Weighted price - -
Volume(BTC) - - • In the first part, a 1D convolutional layer is used for
Volume(Currency) - - initial feature extraction and expansion. There are 16
convolutional kernels of size 1 in this layer. After that,
The prediction period used in this experiment is five days, the features obtained from the upper layer are input into
which means the closing price of bitcoin on the sixth day the proposed residual module for feature extraction and
is predicted using the characteristic parameter data from the integration at multiple time scales. The parameters of the
previous five days. In the meantime, our proposed residual convolutional layer in the residuals module are shown in
module is capable of using three convolutional kernels of Fig. 3.
different scales to extract the temporal characteristics of a • In the second part of the network structure, the features

multivariate sequence between one, two, and three adjacent extracted by the residual module are input into the LSTM
days in a 5-day period. The richer potential information layer, which has a layer number of 1 and a number of
contained in the sequences is extracted by getting different neurons of 50. Since the features extracted by the 1D
sized receptive fields. convolution-based residual module are along the temporal
The original data is usually normalized to eliminate scale dimension, they can be entered into the LSTM to learn
effects between indicators before modeling. The common nor- the long-range time pattern changes of the trends in the
malization methods are min-max normalization and Z-Score. sequence. Finally, the full connection layer is used and
In the experiment, the min-max normalization is used. This the predicted value is output.
approach, as shown in Equation (11), is capable of varying In the experiment, the loss function is the MSE, the op-
the original data linearly and mapping the values of the data timization function is the Adam optimization algorithm, the
between 0 and 1. batch normalization number Batch size is 50. In addition, we
xt − xmin use the Piece wise decay method for learning rate decay, with
xnorm = (11) an initial learning rate of 0.001, and the learning rate decreases
xmax − xmin
to 0.3 times the original rate every 500 epochs, for a total of
where xt , xnorm are the input sample data and the data after
2000 epochs. Each time the neural network outputs a value,
normalization, xmin ,xmax are the minimum and maximum
a sliding window is used for prediction. The experiments are
value of the samples. After modeling, the target outputs need
compared and evaluated with actual values to obtain the final
to be anti-normalized.
predictive performance. All algorithms are implemented using
ŷt = ynorm (ymax − ymin ) + ymin (12) Pytorch 3 .
2 https://trends.google.com 3 https://pytorch.org/
(a) MLP, MRC-LSTM and Actual curves (b) LSTM, MRC-LSTM and Actual curves (c) CNN-LSTM, MRC-LSTM and Actual curves

Fig. 5. Comparison diagrams of prediction and actual curves

C. Performance Evaluation Criteria TABLE III


ETH AND LTC: T HE ERRORS OF DIFFERENT DEEP NEURAL NETWORKS
In this work, four error functions, Mean Absolute Er-
ror (MAE)(i.e., Equation (13)), Root Mean Square Er- Architecture
Ethereum(ETH) Litecoin(LTC)
ror (RMSE)(i.e., Equation (14)), Mean Absolute Percentage MAE RMSE MAPE MAE RMSE MAPE
Error (MAPE)(i.e., Equation (15)) and the coefficient of MLP 1.04 1.53 7.83 0.42 0.70 9.67
determination(R2 ), are introduced as performance metrics to LSTM
CNN
0.76
0.81
1.17
1.23
5.74
5.92
0.32
0.39
0.38
0.44
7.65
9.37
quantify the ability of the DNN to forecast prices. CNN-LSTM 0.77 1.24 6.20 0.27 0.37 6.43
MRC-LSTM 0.70 1.13 5.43 0.14 0.26 3.17
N
1 X
M AE = |(yi − fi )| (13)
N i=1
V. R ESULTS AND DISCUSSION
v The algorithms used for comparison in the experiments are
u N
u1 X Multilayer Perceptron (MLP), Two-dimensional Convolutional
RM SE = t (yi − fi )2 (14) Neural Network (CNN), Long short-term memory (LSTM),
N i=1
and CNN-LSTM. Table II shows the accuracy metrics of the
proposed model compared to the benchmarks using different
N
100 X (yi − fi ) architectures.
M AP E = (15) From Table II, it is apparent that the MLP has the largest
N i=1 yi
prediction error, while the LSTM outperforms the MLP. The
where yi and fi are the ith actual value and predicted value, LSTM can solve the long-term dependence problem in time
and N is the number of test samples. They reflect the deviation series samples, making it more suitable for solving time series
between the predicted and actual values. The larger their value, prediction problems than the MLP.
the greater the error in the forecast. The CNN-LSTM model consists of a two-dimensional
The R2 value reflects the accuracy of the model which convolutional layer for feature extraction first, and then the
ranges from 0 to 1 with 1 denotes perfect match: extracted features are fed into the LSTM for price prediction.
The prediction accuracy of the LSTM model is just lower than
Pn
(ŷi − ȳ)2 that of the CNN-LSTM, which illustrates that when there are
R = Pn=1
2
n 2
(16) many features and few samples, the LSTM cannot fully extract
n=1 (yi − ȳ)
the important potential information from the data, which will
where ŷi represents the predicted value, ȳ is the average value, affect the training effect of the neural network to some extent.
and yi is the observed value. The error of MRC-LSTM prediction results is smaller
than that of CNN-LSTM, which is the best prediction per-
TABLE II
formance algorithm in this experiment. It indicates that the
B ITCOIN : T HE ERRORS OF DIFFERENT DEEP NEURAL NETWORKS feature extraction strategy of Multi-scale residual block is
more efficient. The main reason is that, on the one hand,
Architecture MAE RMSE MAPE R2 (%) in the one-dimensional convolutional layer, the convolutional
MLP 246.55 345.50 2.31 87.95 kernel slides only along the time dimension and performs
LSTM 212.71 317.24 1.99 89.83
CNN 191.92 268.26 1.77 92.73 the convolution operation on all feature values at the time
CNN-LSTM 176.79 270.66 1.66 92.60 step, which reduces the problem of multiple covariance in the
MRC-LSTM 166.52 261.44 1.56 93.10
features. On the other hand, extracting features in multivariate
sequences from three different scales improves the expression
of local features and facilitates LSTM to make more accurate features. Besides, the utilization of local residual learning leads
and fast predictions. to a reduction in computational complexity as well as improves
Besides, Fig. 5 shows the effect of our proposed network the performance of the DNN. Different from some traditional
by extending the prediction map to several sample points, studies, we introduce external influences such as macroeco-
and three sets of comparisons are made. It can be directly nomic variables and investor attention when performing price
reflected from the figure that the predicted values derived from forecasting to combine various factors that may affect bitcoin
the MRC-LSTM model are overall closer to the actual values prices. Furthermore, sufficient experimental results confirm
compared to several other models. The main reason is that that our proposed model has better prediction results than other
the proposed residual module enables the model to combine single-structured models. The efficient feature extraction and
the features extracted from the underlying convolution into the integration capability of the proposed residual block is also
upper convolution and learn the trends embedded in the multi- demonstrated in the experimental comparison. In addition, we
variate time series from different time scales. Consequently, it perform price prediction with the MRC-LSTM for two other
can make efficient use of the features in the series and improve major cryptocurrencies, ETH and LTC, further demonstrating
the prediction performance of the model. the effectiveness of our model for multivariate time series
It is evident from the Fig. 5 that the predictions of both the forecasting of cryptocurrencies. In summary, compared with
MRC-LSTM and several other models have some degree of several state-of-art works, the proposed DNN model achieves
lag with the actual values. This is probably because despite better prediction performance, but it also has some limitations,
the outstanding performance of the Multi-scale residual con- since the Bitcoin market is particularly sensitive to some
volution module in feature extraction as described above, we national policies, regulatory and market events, etc. More
still cannot ignore the working principle of the LSTM, which future work could be focused on comprehensive metrics which
uses information from previous lags to predict future instances. measure the investor’s attention to more timely detection
Moreover, as the Bitcoin market is a highly dynamic system, of bitcoin market volatility and thus more accurate price
the patterns and dynamics present in that system are not always prediction.
the same. Therefore, when the LSTM module misses a large
leap, it makes a quick correction in time with new information, R EFERENCES
which in turn creates the curves in the Fig. 5. [1] S. Nakamoto and A. Bitcoin, “A peer-to-peer electronic cash system,”
Bitcoin.–URL: https://bitcoin. org/bitcoin. pdf, vol. 4, 2008.
To further evaluate the performance of our proposed model [2] A. Dyhrberg, “Bitcoin, gold and the dollar – a garch volatility analysis,”
for short-term forecasting of cryptocurrencies, we also use Finance Research Letters, vol. 16, 11 2015.
the same sets of models for price forecasting of two other [3] A. H. Dyhrberg, “Hedging capabilities of bitcoin. is it the virtual gold?”
Finance Research Letters, vol. 16, pp. 139 – 144, 2016.
major cryptocurrencies, ETH and LTC. In this experiment, [4] A. H. Dyhrberg, S. Foley, and J. Svec, “How investible is bitcoin? ana-
both the ETH and LTC data sets contain daily transaction data lyzing the liquidity and transaction costs of bitcoin markets,” Economics
from April 15, 2016 to November 20, 2020. This includes Letters, vol. 171, pp. 140 – 143, 2018.
[5] J. E.-T. Cheah and J. Fry, “Speculative bubbles in bitcoin markets?
the highest price, the lowest price, the bid price, the ask an empirical investigation into the fundamental value of bitcoin,” Eco-
price and the volume, and all these data are from Quandl4 . nomics Letters, vol. 130, 02 2015.
Table III shows the prediction errors under different DNN [6] S. Corbet, B. Lucey, and L. Yarovaya, “Datestamping the bitcoin and
ethereum bubbles,” Finance Research Letters, vol. 26, pp. 81 – 88, 2018.
architectures. As this table shows, the results indicate that the [7] R. Adcock and N. Gradojevic, “Non-fundamental, non-parametric bit-
LSTM outperforms the MLP in the time series task. The CNN- coin forecasting,” Physica A: Statistical Mechanics and its Applications,
LSTM model uses the CNN for feature extraction before using vol. 531, p. 121727, 2019.
[8] M. Nakano, A. Takahashi, and S. Takahashi, “Bitcoin technical trading
the LSTM for prediction, which results in a smaller error than with artificial neural network,” Physica A: Statistical Mechanics and its
the LSTM. Finally, it is apparent from the table that the MRC- Applications, vol. 510, pp. 587 – 609, 2018.
LSTM has the best performance. Our proposed MRC-LSTM [9] M. Khashei and M. Bijari, “An artificial neural network (p,d,q) model
for timeseries forecasting,” Expert Systems with Applications, vol. 37,
model improves the feature extraction method by using a no. 1, pp. 479 – 489, 2010.
multi-scale residual convolutional layer for feature extraction, [10] S. Lahmiri and S. Bekiros, “Cryptocurrency forecasting with deep
which gives the best prediction results in the experiments. learning chaotic neural networks,” Chaos, Solitons & Fractals, vol. 118,
pp. 35–40, 2019.
The empirical analysis shows that the proposed model also [11] D. F. Specht et al., “A general regression neural network,” IEEE
performs well in predicting the financial time series of the transactions on neural networks, vol. 2, no. 6, pp. 568–576, 1991.
other two cryptocurrencies. [12] H. Liang, W. Lei, P. Y. Chan, Z. Yang, M. Sun, and T.-
S. Chua, “Pirhdy: Learning pitch-, rhythm-, and dynamics-aware
embeddings for symbolic music,” Proceedings of the 28th ACM
VI. C ONCLUSION International Conference on Multimedia, Oct 2020. [Online]. Available:
In this study, a hybrid method consisting of a multi-scale http://dx.doi.org/10.1145/3394171.3414032
[13] F. Zhu, W. Lei, C. Wang, J. Zheng, S. Poria, and T.-S. Chua, “Re-
residual block and an LSTM network is proposed to predict trieving and reading: A comprehensive survey on open-domain question
the bitcoin price. Specifically, multi-scale residual block in this answering,” 2021.
hybrid model is able to extract rich features at different time [14] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
computation, vol. 9, no. 8, pp. 1735–1780, 1997.
scales and also strengthen the representational ability of these [15] N. Uras, L. Marchesi, M. Marchesi, and R. Tonelli, “Forecasting bitcoin
closing price series using linear regression and neural networks models,”
4 https://www.quandl.com/ arXiv preprint arXiv:2001.01127, 2020.
[16] P. Linardatos and S. Kotsiantis, “Bitcoin price prediction combining data [36] L. Li, “Risk of investing in volatility products: A regime-switching
and text mining,” in Advances in Integrations of Intelligent Methods. approach,” Investment Analysts Journal, pp. 1–16, 2020.
Springer, 2020, pp. 49–63. [37] S. Vassiliadis, P. Papadopoulos, M. Rangoussi, T. Konieczny, and
[17] S. Alonso-Monsalve, A. L. Suárez-Cetrulo, A. Cervantes, and D. Quin- J. Gralewski, “Bitcoin value analysis based on cross-correlations,”
tana, “Convolution on neural networks for high-frequency trend predic- Journal of Internet Banking and Commerce, vol. 22, no. S7, p. 1, 2017.
tion of cryptocurrency exchange rates using technical indicators,” Expert [38] A. Urquhart, “What causes the attention of bitcoin?” Economics Letters,
Systems with Applications, vol. 149, p. 113250, 2020. vol. 166, pp. 40–44, 2018.
[18] Y. Li and W. Dai, “Bitcoin price forecasting method based on cnn-lstm [39] A. Yelowitz and M. Wilson, “Characteristics of bitcoin users: an analysis
hybrid neural network model,” The Journal of Engineering, vol. 2020, of google search data,” Applied Economics Letters, vol. 22, no. 13, pp.
01 2020. 1030–1036, 2015.
[19] A. Vidal and W. Kristjanpoller, “Gold volatility prediction using a cnn- [40] Z. Da, J. Engelberg, and P. Gao, “In search of attention,” The Journal
lstm approach,” Expert Systems with Applications, vol. 157, p. 113481, of Finance, vol. 66, no. 5, pp. 1461–1499, 2011.
2020.
[20] C. Tian, J. Ma, C. Zhang, and P. Zhan, “A deep neural network model
for short-term load forecast based on long short-term memory network
and convolutional neural network,” Energies, vol. 11, no. 12, p. 3493,
2018.
[21] W. Lei, X. Wang, M. Liu, I. Ilievski, X. He, and M.-Y. Kan,
“Swim: A simple word interaction model for implicit discourse relation
recognition,” in Proceedings of the Twenty-Sixth International Joint
Conference on Artificial Intelligence, IJCAI-17, 2017, pp. 4026–4032.
[Online]. Available: https://doi.org/10.24963/ijcai.2017/562
[22] W. Lei, X. Jin, M.-Y. Kan, Z. Ren, X. He, and D. Yin,
“Sequicity: Simplifying task-oriented dialogue systems with single
sequence-to-sequence architectures,” in Proceedings of the 56th
Annual Meeting of the Association for Computational Linguistics
(Volume 1: Long Papers). Melbourne, Australia: Association for
Computational Linguistics, Jul. 2018, pp. 1437–1447. [Online].
Available: https://www.aclweb.org/anthology/P18-1133
[23] A. A. Ariyo, A. O. Adewumi, and C. K. Ayo, “Stock price prediction
using the arima model,” in 2014 UKSim-AMSS 16th International
Conference on Computer Modelling and Simulation. IEEE, 2014, pp.
106–112.
[24] R. Adhikari and R. Agrawal, “A combination of artificial neural network
and random walk models for financial time series forecasting,” Neural
Computing and Applications, vol. 24, no. 6, pp. 1441–1449, 2014.
[25] J. Cao, Z. Li, and J. Li, “Financial time series forecasting model
based on ceemdan and lstm,” Physica A: Statistical Mechanics and its
Applications, vol. 519, pp. 127–139, 2019.
[26] C. Hu, Y. Hu, and S. Seo, “A deep structural model for analyzing
correlated multivariate time series,” in 2019 18th IEEE International
Conference On Machine Learning And Applications (ICMLA). IEEE,
2019, pp. 69–74.
[27] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proceedings of the IEEE conference on computer vision
and pattern recognition, 2016, pp. 770–778.
[28] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,”
in Proceedings of the IEEE conference on computer vision and pattern
recognition, 2015, pp. 1–9.
[29] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks
for semantic segmentation,” in Proceedings of the IEEE conference on
computer vision and pattern recognition, 2015, pp. 3431–3440.
[30] S. Yang and D. Ramanan, “Multi-scale recognition with dag-cnns,” in
Proceedings of the IEEE international conference on computer vision,
2015, pp. 1215–1223.
[31] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking
the inception architecture for computer vision,” in Proceedings of the
IEEE conference on computer vision and pattern recognition, 2016, pp.
2818–2826.
[32] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely
connected convolutional networks,” in Proceedings of the IEEE confer-
ence on computer vision and pattern recognition, 2017, pp. 4700–4708.
[33] W. Lei, X. Jin, M.-Y. Kan, Z. Ren, X. He, and D. Yin, “Sequicity:
Simplifying task-oriented dialogue systems with single sequence-to-
sequence architectures,” in Proceedings of the 56th Annual Meeting of
the Association for Computational Linguistics (Volume 1: Long Papers),
2018, pp. 1437–1447.
[34] X. Jin, W. Lei, Z. Ren, H. Chen, S. Liang, Y. Zhao, and D. Yin, “Explicit
state tracking with semi-supervisionfor neural dialogue generation,” in
Proceedings of the 27th ACM International Conference on Information
and Knowledge Management, 2018, pp. 1403–1412.
[35] L. Pan, W. Lei, T.-S. Chua, and M.-Y. Kan, “Recent advances in neural
question generation,” arXiv preprint arXiv:1905.08949, 2019.

You might also like