Hybrid LSTM and GRU For Cryptocurrency Price Forecasting Based On Social Network Sentiment Analysis Using FinBERT
ABSTRACT Cryptocurrencies are digital assets that are widely used for trading and investing. One of the
characteristics that traders take advantage of for profit is the high volatility of the price. Its volatile and
rapidly changing prices have made cryptocurrency price predictions a challenging and highly sought-after
research topic. Cryptocurrency price predictions usually only use historical prices on the dataset, while
price movements are also influenced by other aspects such as sentiment contained in social media. This
study proposes a new machine learning method to predict Ethereum and Solana cryptocurrency price, which
integrates cryptocurrency historical price data and social media sentiment as inputs of the prediction model.
FinBERT, a pre-trained sentiment analysis model is used to extract the sentiment implied in social network
tweets into daily sentiment score, which are then combined with the historical market price data. The hybrid
model of LSTM-GRU model is used to train the dataset and perform cryptocurrency price prediction. The
experiment results show that the presented method can successfully predict the Ethereum and Solana price
movement and has superior performance than all the benchmark models.
INDEX TERMS FinBERT, social network, sentiment analysis, hybrid LSTM-GRU, Ethereum prediction,
Solana pediction.
Term Memory) has exceed the performance of moving aver- processes sequential data, it calculates gradients that indicate
age method to predict cryptocurrency price [6]. how much each weight should be adjusted to minimize the
Performance of various machine learning algorithms have error in the predictions. These gradients are backpropagated
been compared in dealing with cryptocurrency price predic- through time to update the network’s parameters. The issue
tion or forecasting problem [7], [8], [9]. Recurrent Neural occurs when the gradients become extremely small as they
Network (RNN) algorithm such as LSTM and GRU has are backpropagated from the later time steps back to the
scored a better cryptocurrency price prediction performance initial time steps. This happens because the gradients are
than ARIMA (Autoregressive Integrated Moving Average), calculated through the multiplication of many intermediate
one of the machine learning methods commonly used in gradients through time. As these gradients get multiplied,
price forecasting [10]. Hybrid model of LSTM-GRU have the their values decrease exponentially, causing them to van-
highest performance score than the comparison algorithms in ish or become very close to zero. LSTM and GRU are
regards to handle foreign exchange price prediction [11]. RNN architecture designed specifically to address the van-
Cryptocurrency price prediction or forecasting has been ishing gradient and exploding gradient problems. LSTM and
a challenging and difficult subject of research due to its GRU hybrid architecture is one of the best machine learning
unstable and rapidly changing price. Cryptocurrency price algorithm for time-series forecasting such as cryptocurrency
volatility is caused by several factors such as popularity, price prediction [21].
transaction cost and speed, market trends, news, public senti- Although capable of solving the vanishing gradient and
ment and some other factors [12]. Sentiment in social media exploding gradient problems, they have different gates.
has an important correlation and influence on cryptocurrency LSTM has three gates: the input gate, the forget gate, and the
price movements [13], [14], [15]. One example is a tweet output gate. GRU has two gates: the reset gate and the update
from Elon Musk about one of the famous cryptocurrencies, gate. In certain cases, GRU may be simpler and require fewer
doge coin. Bitcoin and doge coin price has gone up and down parameters, while LSTM can have a larger memory capacity
rapidly mainly because of that particular tweet [16]. Many but be more computationally complex [11]. Therefore, com-
traders make buying and selling decisions based on the macro bining these two algorithms, it is expected to create a dynamic
and micro public sentiment on social media. gate which optimizes model.
Without a doubt, bitcoin is the most famous cryptocur- Moreover, the influence of sentiment analysis is considered
rency and most of the research regarding cryptocurrency price to be able to influence stock price movements as seen from
prediction revolves around bitcoin. As mentioned before, several previous studies [22], [23].
cryptocurrency is the currency for a specific blockchain
project. There are several aspects of blockchain that make B. CONTRIBUTION
many parties want to build their projects using blockchain Combining LSTM and GRU is designed to solve the van-
technology, one of which is the decentralized nature of ishing gradient and exploding gradient problems. It creates
blockchain so that no single large party owns or can manage a balance between the complexity and memory capacity
every activity and manipulate the data that recorded on the required by the model. It helps control the flow of information
blockchain [17]. With smart contract feature, users can build and gradient flow within the network. The attributes used
their application or program on top of the blockchain net- for these cryptocurrency price predictions are daily closing,
work, so the application will be decentralized and protected opening, high, and low price.
from hacker attacks [18]. Ethereum is a blockchain network This research also takes advantage of the sentiments con-
with smart contract features that has the highest number tained in the social network ’s tweets and combined with
of adoptions, and its cryptocurrency, Ether (ETH) is the historical cryptocurrency data such as daily closing, opening,
cryptocurrency with the second largest market capitalization high, and low price to predict the daily closing price of
after Bitcoin [19]. Another widely adopted smart contract the cryptocurrency in the daily timeframe. Tweet sentiment
blockchain network that offer faster transaction speed than data is obtained using FinBERT, a model from BERT that
ethereum is Solana, with its token, SOL [20]. focuses on dealing with Natural Language Processing (NLP)
problems in the financial context.
A. MOTIVATION Therefore, the main contribution of this paper is to build
In this research, the daily closing price of ethereum and a crypto currency price prediction model based on the
solana cryptocurrency (ETH and SOL) will be predicted LSTM-GRU algorithm and including social network senti-
using hybrid RNN machine learning architecture, LSTM- ment data extracted with FinBERT.
GRU. Recurrent Neural Networks (RNNs) are a type of The evaluation of LSTM, GRU, LSTM-GRU are con-
neural network architecture commonly used for sequential ducted and comparing between combining with sentiment
data processing. However, they suffer from a significant and without sentiment data. The performance is calculated
weakness known as the ‘‘vanishing gradient’’ problem. The in MSE, RMSE, MAE, and MAPE.
vanishing gradient problem arises during the training phase The remainder of this paper is organized as follows.
of RNNs, particularly when using backpropagation through Section II gives a brief related works of this study.
time (BPTT) to update the network’s weights. As the network Section III provides a detailed description of the proposed
current cell, ht is the output current cell, and Xt is the input model that is engaged in the scope of Natural Language
to the current cell. The information that has been obtained Processing (NLP) which also achieves state-of-the-art results
will be filtered by the input gate, forget gate, and output in solving 11 problems in the NLP field [32]. BERT is a
gate, to obtain which information should be remembered modification of Transformer architecture which is designed
and which should be forgotten.The operations within an to pre-train the bidirectional representation of unlabeled text
LSTM cell are based on sigmoid and element-wise multi- data. The pre-trained BERT model can be modified by adding
plication operations, which allow the network to learn when an output layer to solve various problems that have specific
to add new information, forget old information, and out- needs and goals (fine tuning).
put relevant information. The LSTM architecture has been There are two main stages in BERT, namely pre-training
widely used in various applications, such as natural language and finetuning. The pre-training stage is a process where
processing, speech recognition, time series prediction, and BERT is trained to understand the language or text that is
more. Its ability to handle long-term dependencies and pre- inputted and understand the context of each sentence. The
vent gradient-related issues makes it a powerful choice for training process for BERT to understand text data is Masked
sequence modeling tasks. Language Modeling (MLM) and Next Sentence Prediction
(NSP). Before BERT is trained with MLM and NSP, all input
C. GATED RECURRENT UNIT texts must go through the input embeddings stage, the details
Gated recurrent unit or GRU is one of the architectures of of which can be seen in Figure 3. In MLM, 15% of words from
the RNN. Like LSTM, GRU also fixes the vanishing gradient each sentence are omitted and replaced by the [MASK] token.
problem that occurs in RNNs. The GRU has a similar archi- In this MLM, BERT is trained to be able to fill each [MASK]
tecture to the LSTM but is simpler because the GRU does not token with the correct word. The NSP process is a process
have a cell state (Ct ) and there are fewer gates in the GRU. where BERT is trained to understand the order of each word
The GRU has the output of the previous cell, ht−1 and the of each sentence. By understanding the position of each word
input to the current cell, Xt which generates the output of the in the sentence, BERT will be able to understand the meaning
current cell. The GRU has a reset gate for short term memory, of each word according to the context of the sentence. Text
and an update gate for long term memory. The architecture of input will go through the Sentence Embedding and Posi-
the GRU is depicted in Figure 2. tional Embedding processes first to assist the NSP process.
As depicted in Figure 3, each sentence and word are given a
D. BERT specific code to find out each sentence order and word order.
Bidirectional Encoder Representations from Transformers, After BERT has understood the meaning and context
or commonly abbreviated as BERT is a machine learning of each input text, it will proceed to the part where the
fine-tuning stage is needed. Fine tuning is the stage of modi- B. DATA COLLECTION FROM MEDIA SOCIAL
fying the output layer of the BERT architecture and adapting 1) TWEET ABOUT SOLANA AND EHTEREUM
it to specific tasks such as sentiment classification, word clas- The data collected is in the form of text from social network ‘s
sification, question and answer engine, and sentence marking, tweets with topics about Solana and Ethereum to be processed
according to the dataset to be used. and used as sentiment data. The other data to be collected is
historical cryptocurrency market data that contains informa-
E. FINBERT tion about price movements in daily timeframe. Tweets data
FinBERT is a specialized language model based on the BERT is obtained directly from Social network by using a python
(Bidirectional Encoder Representations from Transformers) library called SNScrape that will collect social network data
architecture that is specifically designed for financial sen- according to the search query that entered by user. In this
timent analysis and financial text classification tasks. It is case, up to 50 tweets which have a minimum of 20 retweets
trained on a large corpus of financial news articles, earnings will be collected per day. Also, only tweets that contain the
call transcripts, and other financial text data [28]. words ‘‘ethereum’’, ‘‘solana’’, ‘‘eth’’, or ‘‘sol’’ is included,
FinBERT leverages the pretraining capabilities of BERT, and tweets with the word ‘‘giveaway’’ or ‘‘airdrop’’ will be
which is a transformer-based model trained on a massive filtered out to minimize outlier data.
amount of general domain text data. However, FinBERT
further fine-tunes the BERT model on financial text data to 2) TWEETS PRE-PROCESSING
make it more adept at understanding the nuances of financial The tweet data that has been obtained must go through the
language and capturing financial sentiment. pre-processing stage before sentiment extraction is carried
The advantages of FinBERT lie in its ability to understand out. This stage of data preprocessing is very important and
the unique vocabulary, jargon, and context of financial text. can improve the accuracy of sentiment classification. This is
By pretraining on financial data and then fine-tuning on because tweet data has attributes that cannot be considered
specific financial tasks, FinBERT can provide more accurate features and have no influence in determining sentiment.
sentiment analysis and classification for financial text [33]. Therefore, the data must be cleaned first before going to the
next process. The preprocessing stage on social network data
III. PROPESED METHOD includes removing signs or symbols such as ‘‘#’’ and ‘‘@’’,
The proposed method is shown in Figure 3. The detailed steps removing the ‘‘RT’’ retweet sign, deleting newline marks, and
are described in this sub section. removing all URLs or links.
with and without sentiment, and also GRU with and without without the sentiment score, and the second part is a com-
sentiment. Non hybrid LSTM and GRU model each have 50 parison between the same several machine learning models,
neurons and follow the same training parameters as described but with the sentiment score added to the historical dataset.
in Table 2. This prediction comparison will use several error Machine learning models that are used in the experiment
metrics, namely Mean Absolute Error (MAE), Mean Abso- are LSTM, GRU, and Hybrid LSTM-GRU. Both Ethereum
lute Percentage Error (MAPE), and Root Mean Squared Error and Solana Cryptocurrency are tested separately, but with the
(RMSE) with the aim of measuring the prediction error com- same treatment.
pared to the actual data. In these three metrics, the smaller Table 3 and Table 4 show the experiment results of both
the error number, the more accurate the prediction results. Ethereum and Solana Cryptocurrency without sentiment. The
The three error measurement methods were chosen because testing is conducted five times for each algorithm. RNN algo-
they are often used as evaluation methods for time series rithms such as LSTM and GRU usually produce inconsistent
forecasting [34]. Previous research related to the comparison results due to their randomization element. To deal with it,
of algorithms for cryptocurrency price predictions also used each experiment is trained and tested five times, and the
these evaluation methods [21]. average value of the experiment will be used as a value for
comparison. They show that GRU is better than LSTM, and
IV. EXPERIMENT RESULT hybrid GRU-LSTM is better than GRU. MAE, MAPE, and
There are two parts experiment in this study. The first part RMSE as the evaluation metrics to determine has the lowest
is a comparison between several machine learning models error rate.
Table 5 and Table 6 show the experiment results of both and LSTM. To get the promising result of the experiment,
Ethereum and Solana Cryptocurrency with sentiment. With- and to make it easier to compare, results from five tests and
out sentiment, GRU-LSTM results are also better than GRU train for Ethereum and Solana price has been averaged and
summarized as shown in table 7 and table 8 above. Table 7 LSTM-GRU model with sentiment dataset is smallest for all
and table 8 shows MAE, MAPE, and RMSE of hybrid test case. It shows that the proposed method is better than the
hybrid LSTM-GRU which does not use daily sentiment score or solana. Sentiment values extracted using FinBERT will be
for the prediction. Not only for the hybrid LSTM-GRU, but grouped per day into daily sentiment values, which will then
all models that included the sentiment dataset have got a 0.5% be combined with daily historical data. The combined dataset
to 1% MAPE improvement than models that trained without is then fed into the hybrid LSTM-GRU, one of the best models
the sentiment dataset. for conducting time-series forecasting which is sourced from
It’s also shown that although hybrid LSTM-GRU has previous research.
the best performance, GRU has better performance than This research proves that the proposed method of adding
LSTM for experiments using a sentiment dataset or not sentiment score from social network which extracted using
using a sentiment dataset. Another finding is that although FinBERT can improve prediction performance from only
including the sentiment dataset has better performance on using commonly used time series prediction models such as
LSTM, GRU, and hybrid LSTM-GRU, the performance is LSTM, GRU, and even hybrids of the two. Social media is
also highly dependent on the machine learning model used. a platform where traders and investors express their opinions
This is shown in table 7 and table 8, where the perfor- which also affect changes in asset prices. Most of the crypto
mance for hybrid LSTM-GRU without sentiment dataset is price predictions that are usually done do not consider the
better than LSTM or GRU with the sentiment. Line plot sentiments of social media. The use of this sentiment data is
of the price prediction of Ethereum cryptocurrency which expected to help researchers to improve the performance of
was made using Pyplot and Seaborn python library can be the model and help traders and investors to maximize their
seen in figure 5. These plots are made as a visualization profits.
of the prediction results made by the LSTM-GRU hybrid This study has several limitations that can be used for
algorithm using sentiment from social network. The blue future research. Sentiment taken is only sourced from social
and orange lines represent the results of training and testing, network, and there is difficulty in choosing relevant tweets,
with a ratio of 80% of the data for training and 20% for because there are some tweets that can be considered as out-
testing and prediction. The predicted results are represented liers because they do not express sentiment well. For higher
by a purple line. This study does not provide computation performance improvements in future research, sentiment for
cost in this study because we do not calculate computation cryptocurrencies may be taken from several other sources
time. This research combines LSTM and GRU parallel which such as google trends and forums as well as the crypto
will have a higher computation cost than LSTM and GRU community, it is also possible that filtering for sentiment can
individually. be more focused and other sentiment extraction models can
be used.
This study proposed a new method to predict prices of
two widely known cryptocurrencies, namely Ethereum and ACKNOWLEDGMENT
Solana. Our time-series prediction method utilizes the senti- This work was supported by Bina Nusantara University,
ment value contained in every tweet that discusses ethereum Jakarta, Indonesia.
