0% found this document useful (0 votes)
8 views24 pages

Chapter 2

The document provides an overview of cryptocurrency, particularly Bitcoin, and its distinction from traditional financial markets, highlighting the decentralized nature and volatility of cryptocurrencies. It also discusses machine learning concepts and algorithms, including supervised, unsupervised, and reinforcement learning, as well as neural networks and their applications. The thesis aims to leverage machine learning techniques for forecasting cryptocurrency values, emphasizing the need for advanced analytical tools in a rapidly evolving market.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views24 pages

Chapter 2

The document provides an overview of cryptocurrency, particularly Bitcoin, and its distinction from traditional financial markets, highlighting the decentralized nature and volatility of cryptocurrencies. It also discusses machine learning concepts and algorithms, including supervised, unsupervised, and reinforcement learning, as well as neural networks and their applications. The thesis aims to leverage machine learning techniques for forecasting cryptocurrency values, emphasizing the need for advanced analytical tools in a rapidly evolving market.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 24

BACKGROUND AND RELATED WORK

The background part offers the necessary data for all of the issues that a reader
may need to get acquainted with in order to follow this thesis' approach. It
includes a quick review of the key distinctions between the conventional and
cryptocurrency marketplaces. an introduction to Bitcoin, fundamentals of
Machine Learning and Deep Learning, fundamentals of Sentiment Analysis
and its tools and the Sequential Data Deep Neural Networks used in the thesis.

2.1 Cryptocurrency
A cryptocurrency is a digital asset meant to function as a means of exchange by
employing cryptography to encrypt transactions, prevent payment manipulation,
and codify the rules for issuing new units of money. Unlike traditional financial
systems, cryptocurrencies function on a decentralized peer-to-peer network,
with each transaction recorded in a public ledger known as the blockchain. This
allows network participants to transmit assets directly without the use of an
intermediary. A public and private key is used to transfer cryptocurrency
between parties. Bitcoin, which debuted in 2009, is the most popular
cryptocurrency and the first decentralized money to operate on a blockchain.
Bitcoin is a virtual money; there are no real coins. Fresh Bitcoin are produced
by mining that is a way for validating transactions using computer power.
Miners contribute processing power to solve challenging
mathematical/cryptographic challenges in exchange for bitcoins when a
transaction is successfully validated. Proof-of-Work is a solution to this
problem that functions as a signature indicating the miner expended
considerable computational power. Every 10 minutes, a new block is mined and
uploaded to the blockchain. Because the Proof-of-Work algorithm requires that
new blocks be created in chronological sequence, it is extremely challenging to
reverse previously accepted transactions. The mining process also prevents
double spending, which is a problem for present financial organizations.
Mining's aforementioned qualities also serve as a true security measure against
fraudulent transactions [18],[19].

2 .1. 1 The Cryptocurrency Market


Cryptocurrencies have made considerable inroads into the world of banking in
recent years. More than 2,000 other cryptocurrencies have been developed since
the debut of Bitcoin, with the vast majority of them based on the underlying
blockchain technology. The market capitalisation of actively traded
cryptocurrencies had topped 140 billion USD as of December 2018.

Figure (2 .1) The market capitalization peaked in January 2018, with a value above 800 billion
dollars[20].

Every cryptocurrency has a preset supply of coins/tokens1, thereby creating an


anti-inflationary monetary policy. As a result, there is potential for a significant
increase in value. From the standpoint of economic theory (cislo), it is unclear if
a cryptocurrency is a value store or a medium of exchange. In general,
cryptocurrencies are now used as a means of trade and a property for
speculation. As a result, the market can be regarded as distinct from other
markets. Significant media coverage and an injection of cash into the
technology encourage a number of financial institutions to investigate the use of
cryptocurrency and blockchain technologies. According to studies, the market is
now in a very volatile situation. Some of the key variables causing ecosystem
volatility are:
• Bad media attention
• Unevenly distributed currency circulation
• Automated trading bots
• Insufficient institutional capital
• Poor regulatory control
• Security problems

Figure(2.2) A chart demonstrating Bitcoin price fluctuation in relation to the US dollar[22].

The market's novelty and volatility give an intriguing chance to use


sophisticated analytical tools to estimate future cryptocurrency values.
Accessible market data API solutions and an open source architecture serve as
the foundation for developing helpful frameworks and applications for time
series forecasting. We will take advantage of these choices and believe it is vital
to apply machine learning techniques to forecast future cryptocurrency values.

2 .1.2 Cryptocurrency Market vs. Traditional Market


Cryptocurrencies have recently grabbed the globe by storm. According to
Time.com, the entire market price of all cryptocurrencies will be more than $3
trillion by the end of 2021. As a result, many, particularly investors, have
shifted their emphasis to cryptocurrency trading. Yet, what distinguishes this
market from typical investments like stocks? We could write volumes about the
nature of cryptocurrency and stock investing; nevertheless, this part will simply
provide a basic overview of the key differences between cryptocurrency and
traditional markets.
Stock Market
Stock markets are usually the first thing that springs to mind for a novice
investor. They represent the most prevalent type of conventional investment.
Profits have been made through them for over a century. It is critical to grasp
the true meaning of stocks in order to appreciate the distinctions between it and
the cryptocurrency markets. Stocks are ownership stakes in a publicly traded
firm or business. Each share of stock purchased by an investor represents a
percentage of ownership in the firm. Stockholders can make money in two
ways[31]: Capital gain occurs when investors sell their stock holdings to others
at a greater price. Receiving dividends and cash flow: If a corporation offers
dividends, every investor can profit from the company's long-term earnings. As
a result, the price of stocks and their total performance are dictated by the
company's real performance and success. The price might rise and fall in
tandem with the business's or company's rise and decline. It is also vital to
understand that until the firm declares bankruptcy and shuts, the stocks will
continue to exist to some extent.
The stock market's prices are influenced by a number of factors. The following
are the most important ones to mention:
1. Stock prices fluctuate (up or down) based on investors' expectations of a
firm's future performance; for example, if investors are optimistic and
believe the company is on track for success, the stock's price is likely to
grow in the future. Finally, the pricing are determined by the company's
profitability and capacity to grow earnings over time.
2. No one or several stocks have a substantial market dominance. The
MANGA stocks (Meta, Amazon, Netflix, Google, and Apple) dominate
this market, accounting for over one-fifth of the whole SP 50006. Yet,
this is insufficient for one party to monopolize a market.
3. While stock prices are volatile, they are significantly less volatile than
cryptocurrencies. When opposed to the bitcoin market, it is more
predictable and hence more stable. Many stocks' prices fluctuate over
time. They may generally soar or plummet by 100% in a single year.
4. The conventional market is strongly regulated by central banks and
governments. Nearly all trades are conducted through major and central
exchanges, such as the New York Stock Exchange. As a result, prices are
governed by a centralized pricing mechanism, ensuring that there is no
unpredictability in price changes over time.
Cryptocurrency markets
Unlike local market, cryptocurrencies are entirely digital assets. For example,
the Euro is backed by a tangible component (money) that an owner may use
digitally or physically withdraw. Cryptocurrencies, on the other hand, are and
can only be utilized as a digital component. The two most common types of
cryptocurrencies are pure currencies like Bitcoin (where investors may only
sell, buy, or exchange) and utility tokens like Ethereum (Investors can also sell,
buy and trade these coins). The primary distinction is that utility token coins are
part of more complicated software and may be utilized in other types of assets,
such as NFT7.
The opportunity for an investor to purchase coins when prices are low and sell
them when prices are high is the major profit source of this market; this is also
referred to as capital gains. In essence, investors may benefit from
cryptocurrencies if they can convince another investor to purchase it at a higher
price.
Because of their basic peculiarities, bitcoin price drivers are significantly more
complicated and confusing than traditional market price drivers. The primary
underlying drivers of cryptocurrency values are as follows:
1)Cryptocurrencies are not actual assets, are not supported by cash flows, and
have no real-world value to influence or maintain the market. It instead depends
on public emotion to influence its costs. If the public's opinions and attitudes
support cryptocurrencies, the market's price will reflect this.
2)In contrast to the local market, in which the price cannot fall until the
corporation that represents the actual asset goes out of business, Cryptocurrency
values might reach zero if all traders do not support the coin and do not
recognize its existence.
3) Despite the fact that there are over 10,000 currencies, the whole market is
linked to just one cryptocurrency (Bitcoin), which controls more than 70% of
the market.
4) The cryptocurrency marketplace is regarded to be the most volatile; values
may climb three times and plummet three times in the same day. For example,
from May 8th to May 19th, 2021, the price of a single Bitcoin fell by more than
62% in just ten days.
5) The market is not regulated by central authorities or banks. Rather, all
transactions are confirmed by the network. Cryptocurrencies are essentially
transacted directly between such a recipient and the sender. Because there is no
such third party, there is no centralized pricing mechanisms for cryptocurrency
prices, making the market very volatile and unpredictable.
2.1.3 Bitcoin
Satoshi Nakamoto proposed a trust-free electronic cash system backed by a
cryptographically verifiable blockchain that performs as an unchanging public
ledger for all exchanges in the original Bitcoin whitepaper [39], which was
published in 2008. Bitcoin operates in a decentralized fashion. Because there
are no regulatory bodies that can manage bitcoin supply, fake inflation or
deflation of the currency is unfeasible. The supply of fresh bitcoins is reliable;
when a new block is uploaded to the blockchain, a preset quantity of new
bitcoins is produced every 10 minutes. Bitcoin's price is mostly controlled by
demand because to its limited supply. This demand is frequently impacted by
popular views, opinions, or news about Bitcoin. In the old days,
cryptocurrencies have responded quickly to both positive and negative Bitcoin
headlines. Earlier attempts to control Bitcoin in China, for example [40, 41],
resulted in significant price fluctuations in the crypto market. For example, once
the Chinese government's efforts to control cryptocurrency were publicly
revealed on 19.5.2021, the price of bitcoin plunged about 30% in 24 hours, from
$43,546.12 to $30,681.50. Just a month before, on 14.4.2021, the bitcoin price
reached an all-time high of $63,109.69 [42]. Similarly, multiple Bitcoin-related
tweets by Elon Musk, CEO of Tesla and SpaceX, have generated substantial
price movements in bitcoin [43, 44]. Such occurrences have proved that bitcoin
demand may be unexpected. Yet, the value of bitcoin has risen at an exponential
rate over time [45]. Bitcoin's usefulness as a payment system has now grown
less desirable due to its deflationary nature [46]. Instead, the emphasis has
switched to bitcoin's profitable potential as an investment option and a measure
of wealth, comparable to gold and other valuable metals [47].
2.2 Machine Learning
2.2.1 The Basic Algorithms
The ability of a system to recognize and infer connections from a given data set
without even being specifically coded is referred to as machine learning. As the
amount of data obtainable for learning increases, the algorithms' performance
will improve flexibly. Machine learning activities are classified into two broad
categories according to their input:
Supervised Learning
In supervised learning, an algorithm is given a training dataset usually contains
examples that have been labeled. The learning method entails seeing examples
of a random vector x and a value y and then constructing a function to forecast y
from x by predicting a probability distribution p(y|x, D) [23]. When the target
output y has a limited quantity of values (green, blue, red), the learning problem
is referred to as categorization. When the intended output y, on the other hand,
is a numeric, continuous random variable (price), a prediction model is used.
This forecast is appropriate for automated trading and will be used in this
research.
Unsupervised Learning
A data set including examples without labels was put to an unsupervised
learning system. Likewise, without even any explicit input, the learning
algorithm repeats through multiple samples of a random vector x in an attempt
to discover the probability distribution p(x) or identify underlying knowledge.
One of most common task of unsupervised learning techniques is to divide a
database into groups based on their individual inputs. The most widely used
algorithm used throughout unsupervised learning is k-means.
Reinforcement Learning
Reinforcement Learning (RL) is a branch of machine learning in which an
operative understands in an interactive experience through trial and error
utilizing responses from its own behaviors and experiences in hopes of
maximizing a numerical signal prize. Together with supervised and
unsupervised learning, RL makes up one of three major machine learning
concepts. The learning method does not employ a training data to learn patterns;
rather, the operative should find which actions yield the biggest reward. This
method goes in a closed system, so the activities of the teaching method impact
its subsequent inputs. Finally, Reinforcement Learning is distinguished by three
key characteristics: closed-loop architectural style, no direct directions on what
to do, and where the effects of actions as well as prizes play out across period
[24]. Reinforcement learning has become increasingly common in recent years
and therefore is widely used in a variety of devices and problems.

Figure(1.3)An example of how the learning process is performed in supervised, unsupervised, and
reinforcement learning.

2,2,2 Introduction to Neural Networks


Neural Networks, also known as Artificial Neural Networks (ANN), are
computational models that were inspired by neural nets in biological neural
networks. Nerve cells are the foundational basic components of the nervous
system that perform massively complicated computing tasks. In addition to
other brain cells, they create an enormous communications system with
substantial computational power. Each neuron is made up of dendrites, a cell
body, an axon, and axon terminals. The cell body gets input signals coming
from various dendrites and proficiently procedures them in the nucleus, from
which a specific output value is delivered to its terminals via the axon.
Figure (2.4) An illustration of a biological neuron

Officially, ANN are shortened technical structures that reflect a catalogue of


nonlinear statistical models which produce believable results when applied to
active learning. ANN has proven to be extremely effective in a variety of
applications, including computer vision, speech recognition, machine
translation, time series forecasting, and many others.
2.2.2.1 Feed-Forward Neural Networks
Feed-forward neural networks are the most basic neural network design. It is
characterized as a directed acyclic network with nodes representing neurons and
edges representing connections. The nodes are organized in n layers, which are
connected by edges that transmit signals. The output of an i-th layer neuron is
coupled to the input of an i+1 layer neuron until it reaches the final output. In a
way, this design lacks feedback linkages and does not produce any cycles.
2.2.2.2 Neuron
A Neuron is the basis for development unit of each ANN. Each neuron gets an
input signal and produces an output. The first layer, shown by the color blue, is
known as the input layer and contains vector characteristics (x1, x2,..., xn). This
layer is linked to the neurons via a connection that carries a numerical weight
(w1, w2,..., wj) that determines the relevance of each input to the output [27].
The architecture also includes a bias component b, which has a fixed value of 1
and is related with the weight wj0 in our example.
Figure 2,5 An illustration of a computational neuron

To structure an activation, the neuron first calculates the weighted sum of


inputs.
N
a j=∑ w(1) (1)
ij x i +w jo
i =1

The total is then subjected to an activation function h().


z j =h(a ¿¿ j)¿

(4.5) was written to add clarity to subsequent equations. The parameters are
placed in the first layer of the ANN, as shown by superscript (1) and subscript j.
In addition, (4.3) reveals the activation outputs of neurons in the buried layers.
We use the following calculation to estimate the activations of the neurons in
the output layer.
M
a k =∑ w(2) (2)
kj z j +w ko
j=1

where k = 1,...,K indicates the total number of outputs of the next layer, in this
instance the second. A sigmoid function is a widely acknowledged solution for
the nonlinear activation function h(), however additional functions are growing
rapidly in the area.
1
h=σ ( a )=
1+ exp ⁡(−a)
At this point, the outputs of neurons are modified by an optional activation
function to yield the overall network's final output.
y k (x , w)=σ ¿

That mathematical statement explains how forward propagation generally


operates. The model accepts a collection of input xi variables and uses it to
generate a set of output values yk that are constrained by an adjustable weight
vector w.

2.2.2.3 Structure of the Network


A Single-Layer Network, often known as a Perceptron, is the simplest basic
neural network. The model incorporates N weighted inputs that are immediately
fed into the output. The perceptron is widely regarded as the initial machine
learning method. It was built by a US military institute for image recognition
and established the groundwork for other ANN to come. Despite its appearance,
the perceptron was really only able to acquire linearly separable patterns and
was unable to handle more complicated situations. The Multi-Layer Perceptron
(MLP) was eventually introduced, offering greater processing capability and
substantially boosting adaptability to nonlinear issues.
Figure (2.6) A multi-layer perceptron method with one hidden layer.

The MLP network has at least three layers: the input layer, the hidden layer, and
the output layer. In addition to the input neurons, each subsequent layer is made
up of N neurons that analyze weighted inputs again from previous layer before
progressively conveying the signal to the next layer. The hidden layers are the
core computational body of neural nets and represent one of the network's
characteristics. When building the neural network, the amount of neurons and
hidden units must be carefully examined. Underfitting can occur when too few
neurons or hidden layers are used. Overfitting occurs when there are too many
neurons or hidden layers in the structure2. The multi-layer perceptron is seen in
Figure 4.4.2.1,2

2.2.2.4 Learning
Thus far, neural networks have been proposed as a category of nonlinear
functions having predictive potential. The synthetic neuron, like the biological
neuron, must be taught in order to attain or identify the intended pattern. This is
a vital phase that necessitates a thorough modification of the parameters through
a number of training phases and approaches. With our training dataset (x1,
y1),..., (xn, yn), xi is the input vector and yi is the output vector for all I 1, n.
Errors are common during training for neural networks. The loss function is
used to quantify the discrepancies between the predicted value y and the label
tk. The error of every each training session is computed using this method.
Despite its rigor, the loss function only gives more information for a single
training iteration. To confirm this observation, a cost, also known as a cost
function, must be defined. It assesses the degree to which the total training set
performs in relation to the input signal. The error function4 averages the
previously described loss function to produce the following:
N
E ( w )=1/2 ∑ ¿ ¿ ¿
n=1

2.2.2. 5 Gradient Descent


The goal of the neural network's training process is to reduce its error function,
which we will accomplish by utilizing the Gradient Descent optimization
method. The method is represented as a vector of partial differential equation of
the error function (4.6). Weight adjustments are stated as follows:
w(τ +1)=(w) τ −η ∆ E (w τ )

and denote that such vector is advancing in the direction with the largest drop in
the error function. τ identifies distinct repetition steps, and η > 0 denotes the
step size or learning rate. It is a hyper-parameter that must be manually
modified and performs the most important function in the learning experience.
It influences how rapidly the neural net responds to appropriate outputs in the
training dataset, or how rapidly the parameters updated and when the gradient
descent finds the cost function's optimum. If the hyper-parameter has a value
too low, the gradient will require an excessive amount of time to compute the
minimum. A high learning rate, on the opposite hand, might cause the error
function to divergence. As a general guideline, set the learning rate in
increments of 0.01 0.001.

2.2.3 Deep Learning


For the previous few decades, neural networks have had an essential function in
advancing computing tasks like as voice recognition, picture recognition, and
others. Ongoing neural network studies and a growth in accessible dataset lead
to the construction of progressively more complicated neural systems, giving
rise to the term Deep Learning [29]. Deep networks with numerous hidden
layers have the ability to execute more signals than shallower neural network
models and surpass them in very many essential pattern recognition applications
[30]. Deep neural networks (DNN) can quickly learn a large number of
parameters from training dataset. The additional layers given enable for feature
extraction and manipulation, which is extremely useful for complicated tasks
such as computer vision or natural language processing [31]. The fundamental
principle behind DNNs is to have the input data reflect a value consisting of
various levels of abstraction, ranging from basic to complicated. The networks
are built in a hierarchical fashion, with higher characteristics extracted from
lower features [32].

2.2.3.1 Sequential Data Deep Neural Networks


Recurrent neural networks (RNNs)
are neural network models that are frequently employed to solve sequence
issues. RNNs remain visible when it comes to challenges like Natural Language
Processing (NLP), Language and Voice Recognition, Picture Captioning, and so
on. Several studies have shown that RNNs yield promising outcomes by
surpassing alternative designs on a variety of difficult tasks [13, 6, 42, 40, 21].
Given that RNNs were influenced by other artificial neural networks such as
Convolution Neural Networks (CNNs) and FeedForward Neural Networks
(FNNs), they are all using information to train and understand patterns and
associations [46]. The primary difference is that RNNs contain internal storage,
therefore previous inputs impact the present result. This is a significant benefit,
especially when dealing with sequential or time information. For example, the
words preceding the one we are predicting impact our forecast of the next word
in a particular phrase. In these cases, it is critical to have a store that integrated
original state information and produces outputs depending on it. We generally
deal with one input and one output in classic artificial neural networks. Working
with sequential data, on the other hand, might be difficult since we frequently
have varied input and output durations. As a consequence, depending on the
application, multiple forms of RNNs might be employed. These kinds can be
classified as follows[8]:
1. One-to-one: has one input and one output.
2. Many-to-one: We have several inputs (sequences) but only one output. As in
predicting the price of a stock market the next day.
3. One-to-many: We have a single input and a number of outputs (sequence of
out- puts).
4. Many-to-many: We have a large number of inputs (sequences) and outputs
(sequence).
The RNN block/cell is the one feature that all of the aforementioned designs
share, as seen in Figure 1. The RNN cell is an unit that uses a current input and
prior memories to generate a hidden state. The concealed state is then passed on
to the following RNN BLOCK (next timestep); it may also be used to produce
an output at each timestep (one-to-many or many-to-many) [28].
The hidden state of the RNN component is expressly shown in the following
manner:
(t) (t) t −1
h =tanh ⁡(W hx x +W hh h + bh )

where t is the current timestep t, Wht, and Whh are matrices of length n and
width m, respectively, while bh is a vector of length n representing hidden state
biases. Furthermore, the result of each RNN unit may be shown as described in
the following:
(t ) (t )
y =SoftMax ⁡(W yh h +b y )

where y(t) is the result of the unit at the present timestep t, Wyh is a matrix of
length n and width m, and by is a vector of length n representing hidden state
biases.

Figure (2.7) A Simplified Architecture of RNN cells

It is worth noting that the design of the RNN block may be somewhat altered to
address a specific issue. The most typical modifications use activation functions
other than tanh and Sof tM ax. Researchers have been attempting to develop and
construct different types of RNNs due to their dynamic design [7, 48, 12, 47].
The following are the most prevalent variations:
Recurrent Bidirectional Neural Networks
The output of RNNs is directly impacted by the present and previous intakes,
which is termed as memory. Yet, the next or future entries have no effect on the
present input. This can be an issue when the problem at hand necessitates such
relationships. As a result, researchers developed BRNNs [7].
LSTM
Loss of memory between timesteps seems to be another problem of using
RNNs. To address this issue, LSTMs give two memories: long-term and short-
term memories. The long-term memory stores information that the network
considers important to maintain, whereas the short-term memory stores data
from previous timesteps. By using both memories, the network is able to
acquire all essential data from prior states that affects the present output [12].
Figure 2 depicts the general design of LSTMs units.
Bidirectional BiLSTM (Bilateral Long Short-Term Memory):
BiLSTMs, like BRNNs, were suggested for addressing the same problem by
using LSTM cells rather than RNN cells [48].
GRU (Gated Recurrent Unit):
Similarly to LSTMs, GRUs are proposed to address RNNs' disappearing
memory. Nevertheless, unlike LSTM, GRU consists of only two gates (combine
gate, and update gate). Moreover, GRU does not have a cell state. They rather
employ the previous concealed state (also known as working memory) as the
current cell state [47]. As a result, numerous studies have indicated that GRUs
perform better than LSTMs in dealing with long-term memories[38]. Moreover,
because GRUs are less computationally demanding than LSTMs, training and
prediction times seem to be much shorter [38].

Figure (2.8) A Simplified Architecture of an LSTM cell


Figure 3 depicts the unit's reduced architecture.

Figure (2.9) A Simplified Architecture of a GRU cell

2,2,4 Functions of Cost


Using and minimizing a cost function during training is the typical strategy for
addressing a regression problem and generalizing a models to operate on testing
data. There are several methods for assessing the accuracy of the final forecasts.
One of the most popular is to use a cost function to calculate the difference
between the projected and actual results. These functions are intended to
communicate how much a model's prediction differs from the actual prediction.
The term "Error Function" is often used instead of "Cost Function" to indicate
that the returned result is an erroneous value of the prediction. For regression
issues, many cost functions are available. The most prevalent error functions, as
well as the ones utilized in this research, are as follows:
SSE (Sum of Squared Errors):
The sum of squared mistakes constitutes the most classic and widely used
training function (SSE). This function, as the name implies, returns the sum of
the squared difference between the real and projected outcomes. The major
advantage of this function is that it highlights the cost/error value, which is
useful during the training stage when dealing with tiny numbers. This is how it
is defined:
n
SSE=∑ ( y i− ^y i )2
i=0
where y signifies the projected results and y denotes the tangible outcomes.
Despite the fact that the SSE is extensively employed in conventional issues, it
does not function in most practical circumstances, particularly when dealing
with enormous volumes of data. Even if the anticipated outcome is close to the
real one, the cost value will be high. As a result, comprehending this number
when testing the model might be difficult.
L1 Loss:
The L1 Loss is a cost function that computes the total of all the absolute
differences between the actual and predicted outcomes. This is how it is
described:
n
L 1−Loss=∑ | y i− ^y i|
i=0

Mean Square Error (MSE):


The MSE is a function that computes the sum of the squared discrepancies
between the prediction and the actual result. This is how it is described:
n

∑ ( y i −^y i)2
MSE= i=0
n

The MSE will always be positive, with such a value of zero indicating that the
prediction was correct. Even though MSE is often chosen, it cannot manage
outliers since the effect of an outlier is amplified by adding a square to the
difference. To address this constraint, other functions that use the absolute
difference rather than the squared difference are recommended.
Mean Absolute Error (MAE):
Unlike the MSE function, the MAE estimates the total of the absolute
differences between the prediction and the final output. When dealing with
negative numbers, through using absolute rather than the square helps maintain
the true difference between the values, which may be useful for verifying the
findings of a certain model. The following is how the MAE is defined:
n

∑| y i−^y i|
MAE= i=0
n
Root Mean Square Error (RMSE):
Like MSE, the Root Mean Square Error (RMSE) is the MSE's root. The purpose
of this cost function is to negate the squared differences using a square root in
order to generate an intelligible cost value that uses the same scale as the
projected outcome values. This technique is commonly used to test and verify
regression models and comparing them to other methodologies. The following
is how the RMSE is defined:


n

∑ y i− ^y i
i=0
RMSE=
n

2.3 Sentiment Analysis


Natural language processing includes sentiment analysis. Pozzi et al. [48]
describe sentiment analysis as the problem of developing automated algorithms
that can identify emotional features from textual content, such as sentiment,
views, and feelings [48]. They emphasize in the same paper that sentiment
classification is made up of multiple subtasks, including polarity classification,
opinion summarization, subjectivity classification, sarcasm detection, and
identification of false opinions, among others. All through this thesis, the word
"sentiment" will relate to the text's polarity, i.e., whether the text appears
positive, neutral, or negative [48].

2,3,1 VADER (Valence Aware Dictionary for Sentiment


Reasoning)
Hutto et al. [51] first proposed VADER as a sentiment analysis model in a
publication. The researchers describe its scoring method in its literature [49].
VADER will generate its compound sentiment score for a text input, which is a
sum of all adjusted polarity (intensity) scores of terms in the lexicon [49]. The
composite score is a numeric value between [1, 1] (the most negative being 1
and the most positive being 1). If necessary, the researchers' proposed limits
[49] can be used to interpret the compound score as a categorized emotion
worth:

{
positive , if compound >0.05
Sentiment ( compound ) = neutral , if −0.05 ≤compound ≥0.05
negative ,if compound<−0.05

VADER outputs the ratios of positive, neutral, and negative terms in the text in
additional to the compound score [49]. According to Hutto et al. [51], VADER
is a rule-based paradigm with its own gold-standard lexicon, where each words
hold a valence score, a decimal number between [4, 4], signifying the strength
of the term. VADER analyzes the polarity of each word within the text for
calculating the composite sentiment [51]. Words like "fine" and "good" have
intensity values of 0.9 and 3.2, correspondingly, but terms like "discomforting"
and "worst" have intensity levels of 1.6 and 3.4, respectively [49]. VADER is
especially successful in social networking environments since the gold-standard
lexicon was established by analyzing random tweets on Twitter [51]. As just a
result, the lexicon now contains modern Internet colloquialisms such initialisms
(LOL, WTF), emoticons (:(, ;-)), and emojis [51].
In alongside using the gold-standard vocabulary, VADER additionally
incorporates five grammar and syntax guidelines, that enable for a more
extensive analysis of the emotion [51]. The five principles are:
1. Punctuation. Adding punctuation, particularly the exclamation mark, may
raise the emotion of a phrase, i.e., “Buy Ethereum!!! ” is much more
powerful than “Buy Ethereum” [51].
2. Capitalization. CAPITALIZED words may enhance a statement, i.e.,
“Ethereum Blacklisted IN Asia” is much more powerful than “Ethereum
outlawed in China” [51].
3. Degree modifiers. The use of different degree modifiers may raise or
decrease the impact of a statement, i.e. “This is highly excellent for
Ethereum” is more intense than “This is marginally good for Ethereum”
[51].
4. Contrastive “but”. The line “I love to pay with Ethereum, but regrettably
it hasn’t achieved a broader acceptance.” displays a change in attitude
following the word but [51].
5. Trigram analysis. According to the researchers, examining words in
groups of three may discover approximately 90% of long-range negations
that flip the meaning of a phrase, i.e., “The future of Ethereum isn’t really
evident” [51].
Unlike to machine learning-based sentiment analysis systems, VADER does not
need any data for training,as a lexicon has been present [ 51]. Additionally,
being fully rule-based, VADER can swiftly handle vast volumes of data. When
utilizing VADER using Natural Language Toolkit (NLTK) module in Python, it
is feasible to adapt the lexicon by putting additional words and polarity values
[50].

2,3,2 TextBlob
TextBlob is a Python package for text analysis. It offers a straightforward
Application programming interface for popular natural language processing
(NLP) activities such as part-of-speech tagging, noun phrase extraction,
Sentiment Analysis, classification, translation, and others (Loria, 2020). This
section makes use of the library's built-in dictionary. TextBlob measures
polarity and subjectivity in a text using WordNet, a lexical dataset of semantic
relationships among words. The polarisation score is a float between [1, 1]. The
subjective experience is a float in the range [0, 1], with 0.0 being very objective
and 1.0 being extremely subjective. To demonstrate, consider the word "calm,"
which has a polarity score of 0.3 and a subjectivity score of 0.75. The polarity
score means that the text has a very positive sentiment. According to the
subjectivity score, this word contains so much personal view, feelings, or
judgment than actual facts. TextBlob allocates these scores by repeating through
a textual data, going to look for phrases and words to which subjectivity and
polarity can be assigned. When it has finished looping, it displays the average of
all the total score found in the textual data. TextBlobs' sentiment component
includes two design and implementation of Sentiment Classification. The
above-mentioned pattern analyzer, which is the default choice, and the Naive
Bayes Analyzer. The movie reviews were used to train the Naive Bayes
Analyzer. By connecting with the reviews text and the numeric rating, the
analyzer learnt which phrases and contexts to judge good or unfavorable. In
overall, movie reviews are more subjective than news pieces in the database. As
a consequence, the standard pattern analyzer was employed. The WordNet
dictionary serves as the foundation for the TextBlob dictionary. The dictionary
has 2 917 scored words. Some words, such as "excellent," get several scores
because they have many meanings. The average of all senses is used to calculate
scores. Table 7 shows how TextBlob ranks several terms.
It is also worth noting that TextBlob considers negation and degree amplitudes.
Negation is handled by TextBlob by multiplying the polarity by 0.5. If
something is described as "not difficult," the polarity score of difficult is
multiplied by 0.5. When trying to introduce a negation, the subjectivity is
neglected. TextBlob employs amplifiers by imbuing strengthening words like
"very" with an intensity score. The intensity score for "Very" is 1.3. "Very
challenging" is calculated by multiplying the polarity and subjectivity scores by
1.3. It is vital to remember that both polarity and subjectivity remain within
their boundary. Negation paired with modifiers is an intriguing edge case: the
inverse intensity of the modifier enters for both polarity and subjectivity, in
addition to multiplying by 0.5 for the polarity (Loria, 2018).

2,3,3 Flair
is a cutting-edge NLP framework created using PyTorch. It combines latest
research and makes it simple to apply different word embedding to various NLP
jobs. The pre-trained sentiment model provides a tool for sentiment analysis that
does not require custom training. Unlike TextBlob and VADER, which produce
a sentiment score ranging from -1 to 1, flare sentiment produces the projected
label together with a confidence level. The confidence level runs from 0 to 1,
with 1 representing extremely confident and 0 being extremely unsure.

2,4 Related Works


Bitcoins is the initial and most widely used cryptocurrency. It is a blockchain-
based ledger system using cryptography and community technology.Many
numerical methods are being created in the world of financial science to
anticipate Bitcoin's future value. Such models may give investing
recommendations for quantitative traders.
Bitcoin price projections, like some other assets such as stocks and minerals, are
a set of ongoing predictions since Bitcoin values fluctuate over time. One
notable difference among Bitcoin and a stock is that stocks exchange only at
specific hours on weekdays, but the Bitcoin market often works around the
clock, and traders may purchase or sell Bitcoins at any moment, which might
also lead in Bitcoins price movements at inopportune moments.According to a
research by the American Institute of Economic Research (AIER),
internationally significant news and sentiment may influence huge changes in
the price of Cryptocurrency [2].Several studies employ sentiment analysis based
on Tweets to estimate Bitcoin prices [1,3].It is useful to investigate people's
reactions to Bitcoin from tweets as Twitter is an exceptionally rich collection of
data about just how individuals are feeling about just a specific issue.Prior
studies techniques of opinion analysis based on Cryptocurrency comments can
be split into two categories dictionary-based techniques, such as valence aware
dictionary and sentiment reasoner (VADER) [4], and machine learning-based
strategies.McNally et al. [5] introduced and evaluated two predictive model
relying on recurrent neural networks (RNNs) and long short-term memory
(LSTM) to an autoregressive integrated moving average (ARIMA) model [6],
that is a forecasting of time series model that has been frequently utilized in the
past.
They created classifier model utilizing Price of bitcoin data to forecast whether
the future Bitcoin price would rise or fall according to recent values.
In the study of the researchers of [5], the RNN and LSTM models were proven
to be superior than the ARIMA model.In addition to price data, Saad and
Mohaisen [7] examined Bitcoin blockchain data, including the amount of
Bitcoin wallets as well as distinctive address, transaction mining complexity,
hashing rate, and so on, and built predictive model based upon these factors
which are substantially connected with the Value of bitcoin. They evaluated
numerous prediction model that included linear regression, random forests [8],
gradient boosting [9], and neural nets.Jang et al. [10] introduced a sliding
window LSTM model which demonstrated that it beat regression analysis, SVR,
neural nets, and Long short - term memory prediction models. Likewise,
Shintate and Pichl [11] designed and demonstrated a deep learning-based
random sampling approach that outperforms LSTM-based models.A more
unique technique predicts trade quantities and unique stock markets using social
signals and mood analytics [12]. Sentiment in social media, notably on Tweets,
can be utilized to predict the stock index fluctuations [16]. While there is little
indication that forecasts based on sentiments provide large returns on trading
stocks [13], a research was able to develop a trading plan using social media
opinion for the Crypto currency [15]. Another research increased the quantity of
information on alternative cryptocurrencies that suggested a way to anticipate
price swings in Bitcoins, Ethereum, and Ripple utilizing emotional analytics
[14].

You might also like