MRC-LSTM A Hybrid Approach of Multi-Scale
MRC-LSTM A Hybrid Approach of Multi-Scale
Abstract—Bitcoin, one of the major cryptocurrencies, presents rich portfolio and the potential for high returns have attracted
great opportunities and challenges with its tremendous potential the attention of an increasing number of financial investors.
returns accompanying high risks. The high volatility of Bitcoin However, the Bitcoin market is highly volatile and subject
and the complex factors affecting them make the study of
effective price forecasting methods of great practical importance to frequent speculative bubbles [5], [6], thus its trading risks
to financial investors and researchers worldwide. In this paper, we are enormous. Consequently, as discussed above, the study
propose a novel approach called MRC-LSTM, which combines of effective price forecasting methods is of great practical
a Multi-scale Residual Convolutional neural network (MRC) importance to investors, researchers, and policymakers around
and a Long Short-Term Memory (LSTM) to implement Bitcoin the world due to the tremendous opportunities and challenges
closing price prediction. Specifically, the Multi-scale residual
module is based on one-dimensional convolution, which is not posed by Bitcoin.
only capable of adaptive detecting features of different time scales In recent years increasingly machine learning methods,
in multivariate time series, but also enables the fusion of these especially deep neural networks(DNNs), have been applied to
features. LSTM has the ability to learn long-term dependencies market forecasting in cryptocurrencies. In terms of the Bitcoin
in series, which is widely used in financial time series forecasting. market, it is highly volatile and generates a large amount
By mixing these two methods, the model is able to obtain highly
expressive features and efficiently learn trends and interactions of highly non-linear transaction data. In order to effectively
of multivariate time series. In the study, the impact of external explore these dynamic data, we need models which are able
factors such as macroeconomic variables and investor attention to analyze the internal interactions and hidden patterns in the
on the Bitcoin price is considered in addition to the trading data. Several researchers [7], [8] have shown that DNN models
information of the Bitcoin market. We performed experiments are well suited for learning lagged correlations between step-
to predict the daily closing price of Bitcoin (USD), and the exper-
imental results show that MRC-LSTM significantly outperforms wise trends in large financial time series. In a literature review
a variety of other network structures. Furthermore, we conduct [9] of comparative studies between artificial neural networks
additional experiments on two other cryptocurrencies, Ethereum and traditional statistical models, it was found that in 72% of
and Litecoin, to further confirm the effectiveness of the MRC- 96 cases, artificial neural network models were shown to have
LSTM in short-term forecasting for multivariate time series of better predictive performance. Consequently, there is no doubt
cryptocurrencies.
that due to the highly nonlinear and volatile nature of financial
I. I NTRODUCTION markets, DNN models are increasingly applied in the field of
Bitcoin is the world’s first distributed super sovereign digital finance, especially for financial time series forecasting.
currency, proposed and established by Satoshi Nakamoto [1] In this paper, a novel DNN model consisting of a Multi-
in 2009. It relies on an electronic payment system based on scale Residual Convolutional Neural Network with Long
cryptography and P2P (Point to Point) technology. By using Short-Term Memory (LSTM) is proposed for Bitcoin price
encryption algorithms and automatic authentication mecha- prediction. In the study, not only the influence of transaction
nisms, it is difficult to crack or forge, making its transactions information such as Bitcoin historical price is considered
more secure and transparent. Since the birth of Bitcoin, it has on the closing price of Bitcoin, but also external influences
quickly gained widespread attention. such as macroeconomic factors and investors’ attentions are
As a new type of cryptocurrency, Bitcoin conducts 24/7 introduced. The proposed model consists of two main parts.
transactions and is able to exchange for many major currencies First, is the proposed multi-scale residual module based on
at a low cost of foreign exchange. Compared to other tradi- one-dimensional convolution. It contains an identity mapping
tional financial assets, Bitcoin provides investors a new type and three branching networks, where the size of the con-
of portfolio management. Existing research [2], [3] indicated volutional kernel is different for each branching network.
that Bitcoin has an apparent role in the portfolio manage- Then, the second part is the LSTM network, which is able
ment market. Empirical analysis by Anne H et al. [4] also to further learn the trends of multivariate time series and the
corroborates the investability of Bitcoin. In recent years, its interactions between the series, and output the final predicted
values. Different from previous financial multivariate time
∗ indicates the corresponding author. series forecasting, we construct a multi-scale residual module,
which can also be called a three-bypass residual module, B. Time Series Forecasting
in which information from these bypasses can be shared Time series research has always been an important field
with each other. Moreover, this structure can extract potential of machine learning. By building up neural networks based
features that have a high impact on bitcoin price in multiple on deep learning, we are able to extract and exploit the
time scales and integrate them into highly expressive feature hidden information represented by digital currency raw data of
vectors after the concatenate operation. In addition to the digital currencies in order to make accurate and efficient price
Bitcoin price prediction experiments, we also conducted exten- predictions [10]. In recent years, researchers have continuously
sive experiments on datasets of two different cryptocurrencies made progress in the field of financial time series forecasting
(Litecoin and Ethereum) to confirm the strong ability of the [23]–[25]. Many traditional research approaches focus on
proposed MRC-LSTM model for short-term price prediction learning internal patterns, such as autocorrelation, in a partic-
of cryptocurrencies. ular time series. However, for reality scenarios, especially in
The remainder of the paper is organized as follows. Sec- cryptocurrencies, stock markets and other financial domains,
tion II reviews the relevant literature. Section III introduces in most cases we are dealing with multivariate time series.
the methodology, including the proposed model. Section IV Changwei Hu et al. [26] developed a deep learning structural
presents our experiments, while our empirical results and time series model to handle correlated multivariate time series
discussion are shown in Section V. Finally, Section VI is the inputs. Their model is able to leverage dependencies among
conclusion of this paper. multiple correlated time series and extract weighted differenc-
II. R ELATED LITERATURE ing features for better trend learning. The existence of close
correlations among many multivariate time series motivates
In this section, the related work about price prediction of the us to consider not only intra-series pattern learning but also
cryptocurrency and the basis knowledge to the MRC-LSTM inter-series pattern learning when dealing with such tasks.
are provided.
C. Residual Network
A. Neural Network Approaches
As one of the milestones in the evolution of CNN, Resid-
Salim Lahmiri et al. [10] first applied the deep neural ual Network (ResNet) [27] has achieved impressive, record-
networks (DNNs) to cryptocurrency price prediction, eluci- breaking performance on many challenging tasks. ResNet
dated the short-term predictability of cryptocurrencies, and share some similarities with Highway networks, such as
found that the predictive accuracy of Long Short-term Memory residual blocks and shortcut connections. ResNet simplifies
(LSTM) neural networks is higher than that of Generalized the training of very deep networks by bypassing signals from
Regression Neural Networks (GRNN) [11]. Subsequently, a one layer to the next through identity connections. The basic
growing body of research has emerged in this area [12], idea underlying residual learning is the branching of gradient
[13]. LSTM neural networks [14] have been shown to be propagation paths. For CNNs, this idea was first introduced in
significantly effective in forecasting bitcoin prices due to the form of parallel paths in the inception models of [28].
their ability to identify long-term dependencies and store both Many research works [29], [30] have exploited the multi-
long-term and short-term temporal information. Researchers level features in CNNs by skip-connections and found them
have obtained better results with the LSTM network, in their to be effective for a variety of visual tasks. GoogLeNet [28],
works to predict bitcoin prices and compare them with the [31] proposed an “Inception module” that connects feature
performance of different models [15], [16]. mappings generated by filters of different sizes to increase
Further, Convolutional Neural Networks (CNNs) have also the diversity of feature extraction. Meanwhile, GoogLeNet
been applied to cryptocurrency market forecasting. In a study increased the network width by skipping connections to im-
[17], researchers combined CNN with LSTM neural network prove the robustness and expressiveness of the network. Gao
for high-frequency market trend prediction for a variety of Huang et al. [32] proposed the Dense Convolutional Network
cryptocurrencies. Their empirical analysis shows that the ad- (DenseNet) to exploit the potential of the network through
dition of convolutional layers improves the prediction perfor- feature reuse. In contrast to ResNet, it introduces a direct
mance and that the hybrid network structure provides the best connection from any layer to all subsequent layers. In addition,
prediction results in the experiment. Yan Li et al. [18] propose DenseNet combines features by concatenating them, rather
a hybrid neural network model based on CNN and LSTM than combining features through summation. These works
neural networks, and experimental results show that CNN- show us the utility of skip connections, and DenseNet also
LSTM hybrid neural network can effectively improve the shows us how to connect feature maps via concatenation. Our
accuracy of value prediction and direction prediction compared work is partly inspired by these two ideas and explores their
to single-structure neural network. The hybrid neural network application to model building for short-term price forecasting
model combining CNN with LSTM has also been used by of cryptocurrencies.
many researchers for time series prediction of financial data
such as Gold volatility prediction [19], stock prices, etc. Its III. M ETHODOLOGIES
excellent performance has also been demonstrated in many In this section, the details of our proposed method MRC-
other fields [20]–[22]. LSTM and basic blocks are described.
A. ResNet
In order to solve the problem of gradient disappearance
and gradient explosion due to network deepening, He et al.
proposed a new DNN——ResNet [27]. ResNet consists of
many residual units, shown in Fig. 1, and each residual unit
can be represented in the following Equations 1 and 2.
Trends as one of the predictors, which is derived from Google where yt is the output of the prediction value, ymin , ymax
Trends 2 . are the minimum and maximum value of the target data, and
ynorm is the predicted value derived directly from the building
TABLE I model.
L IST OF INPUT ATTRIBUTES
B. Process and Parameters Settings
Transaction Information Macroeconomic Variables Investor Attention
the Open price S&P 500 Index Google Trends The proposed neural network structure in this paper consists
the Close price GVZ - of two main parts. Through repeated experiments, the param-
the Highest price VIX -
the Lowest price - - eters of the proposed network are determined as follows.
the Weighted price - -
Volume(BTC) - - • In the first part, a 1D convolutional layer is used for
Volume(Currency) - - initial feature extraction and expansion. There are 16
convolutional kernels of size 1 in this layer. After that,
The prediction period used in this experiment is five days, the features obtained from the upper layer are input into
which means the closing price of bitcoin on the sixth day the proposed residual module for feature extraction and
is predicted using the characteristic parameter data from the integration at multiple time scales. The parameters of the
previous five days. In the meantime, our proposed residual convolutional layer in the residuals module are shown in
module is capable of using three convolutional kernels of Fig. 3.
different scales to extract the temporal characteristics of a • In the second part of the network structure, the features
multivariate sequence between one, two, and three adjacent extracted by the residual module are input into the LSTM
days in a 5-day period. The richer potential information layer, which has a layer number of 1 and a number of
contained in the sequences is extracted by getting different neurons of 50. Since the features extracted by the 1D
sized receptive fields. convolution-based residual module are along the temporal
The original data is usually normalized to eliminate scale dimension, they can be entered into the LSTM to learn
effects between indicators before modeling. The common nor- the long-range time pattern changes of the trends in the
malization methods are min-max normalization and Z-Score. sequence. Finally, the full connection layer is used and
In the experiment, the min-max normalization is used. This the predicted value is output.
approach, as shown in Equation (11), is capable of varying In the experiment, the loss function is the MSE, the op-
the original data linearly and mapping the values of the data timization function is the Adam optimization algorithm, the
between 0 and 1. batch normalization number Batch size is 50. In addition, we
xt − xmin use the Piece wise decay method for learning rate decay, with
xnorm = (11) an initial learning rate of 0.001, and the learning rate decreases
xmax − xmin
to 0.3 times the original rate every 500 epochs, for a total of
where xt , xnorm are the input sample data and the data after
2000 epochs. Each time the neural network outputs a value,
normalization, xmin ,xmax are the minimum and maximum
a sliding window is used for prediction. The experiments are
value of the samples. After modeling, the target outputs need
compared and evaluated with actual values to obtain the final
to be anti-normalized.
predictive performance. All algorithms are implemented using
ŷt = ynorm (ymax − ymin ) + ymin (12) Pytorch 3 .
2 https://trends.google.com 3 https://pytorch.org/
(a) MLP, MRC-LSTM and Actual curves (b) LSTM, MRC-LSTM and Actual curves (c) CNN-LSTM, MRC-LSTM and Actual curves