Chapter 2
Chapter 2
The background part offers the necessary data for all of the issues that a reader
may need to get acquainted with in order to follow this thesis' approach. It
includes a quick review of the key distinctions between the conventional and
cryptocurrency marketplaces. an introduction to Bitcoin, fundamentals of
Machine Learning and Deep Learning, fundamentals of Sentiment Analysis
and its tools and the Sequential Data Deep Neural Networks used in the thesis.
2.1 Cryptocurrency
A cryptocurrency is a digital asset meant to function as a means of exchange by
employing cryptography to encrypt transactions, prevent payment manipulation,
and codify the rules for issuing new units of money. Unlike traditional financial
systems, cryptocurrencies function on a decentralized peer-to-peer network,
with each transaction recorded in a public ledger known as the blockchain. This
allows network participants to transmit assets directly without the use of an
intermediary. A public and private key is used to transfer cryptocurrency
between parties. Bitcoin, which debuted in 2009, is the most popular
cryptocurrency and the first decentralized money to operate on a blockchain.
Bitcoin is a virtual money; there are no real coins. Fresh Bitcoin are produced
by mining that is a way for validating transactions using computer power.
Miners contribute processing power to solve challenging
mathematical/cryptographic challenges in exchange for bitcoins when a
transaction is successfully validated. Proof-of-Work is a solution to this
problem that functions as a signature indicating the miner expended
considerable computational power. Every 10 minutes, a new block is mined and
uploaded to the blockchain. Because the Proof-of-Work algorithm requires that
new blocks be created in chronological sequence, it is extremely challenging to
reverse previously accepted transactions. The mining process also prevents
double spending, which is a problem for present financial organizations.
Mining's aforementioned qualities also serve as a true security measure against
fraudulent transactions [18],[19].
Figure (2 .1) The market capitalization peaked in January 2018, with a value above 800 billion
dollars[20].
Figure(1.3)An example of how the learning process is performed in supervised, unsupervised, and
reinforcement learning.
(4.5) was written to add clarity to subsequent equations. The parameters are
placed in the first layer of the ANN, as shown by superscript (1) and subscript j.
In addition, (4.3) reveals the activation outputs of neurons in the buried layers.
We use the following calculation to estimate the activations of the neurons in
the output layer.
M
a k =∑ w(2) (2)
kj z j +w ko
j=1
where k = 1,...,K indicates the total number of outputs of the next layer, in this
instance the second. A sigmoid function is a widely acknowledged solution for
the nonlinear activation function h(), however additional functions are growing
rapidly in the area.
1
h=σ ( a )=
1+ exp (−a)
At this point, the outputs of neurons are modified by an optional activation
function to yield the overall network's final output.
y k (x , w)=σ ¿
The MLP network has at least three layers: the input layer, the hidden layer, and
the output layer. In addition to the input neurons, each subsequent layer is made
up of N neurons that analyze weighted inputs again from previous layer before
progressively conveying the signal to the next layer. The hidden layers are the
core computational body of neural nets and represent one of the network's
characteristics. When building the neural network, the amount of neurons and
hidden units must be carefully examined. Underfitting can occur when too few
neurons or hidden layers are used. Overfitting occurs when there are too many
neurons or hidden layers in the structure2. The multi-layer perceptron is seen in
Figure 4.4.2.1,2
2.2.2.4 Learning
Thus far, neural networks have been proposed as a category of nonlinear
functions having predictive potential. The synthetic neuron, like the biological
neuron, must be taught in order to attain or identify the intended pattern. This is
a vital phase that necessitates a thorough modification of the parameters through
a number of training phases and approaches. With our training dataset (x1,
y1),..., (xn, yn), xi is the input vector and yi is the output vector for all I 1, n.
Errors are common during training for neural networks. The loss function is
used to quantify the discrepancies between the predicted value y and the label
tk. The error of every each training session is computed using this method.
Despite its rigor, the loss function only gives more information for a single
training iteration. To confirm this observation, a cost, also known as a cost
function, must be defined. It assesses the degree to which the total training set
performs in relation to the input signal. The error function4 averages the
previously described loss function to produce the following:
N
E ( w )=1/2 ∑ ¿ ¿ ¿
n=1
and denote that such vector is advancing in the direction with the largest drop in
the error function. τ identifies distinct repetition steps, and η > 0 denotes the
step size or learning rate. It is a hyper-parameter that must be manually
modified and performs the most important function in the learning experience.
It influences how rapidly the neural net responds to appropriate outputs in the
training dataset, or how rapidly the parameters updated and when the gradient
descent finds the cost function's optimum. If the hyper-parameter has a value
too low, the gradient will require an excessive amount of time to compute the
minimum. A high learning rate, on the opposite hand, might cause the error
function to divergence. As a general guideline, set the learning rate in
increments of 0.01 0.001.
where t is the current timestep t, Wht, and Whh are matrices of length n and
width m, respectively, while bh is a vector of length n representing hidden state
biases. Furthermore, the result of each RNN unit may be shown as described in
the following:
(t ) (t )
y =SoftMax (W yh h +b y )
where y(t) is the result of the unit at the present timestep t, Wyh is a matrix of
length n and width m, and by is a vector of length n representing hidden state
biases.
It is worth noting that the design of the RNN block may be somewhat altered to
address a specific issue. The most typical modifications use activation functions
other than tanh and Sof tM ax. Researchers have been attempting to develop and
construct different types of RNNs due to their dynamic design [7, 48, 12, 47].
The following are the most prevalent variations:
Recurrent Bidirectional Neural Networks
The output of RNNs is directly impacted by the present and previous intakes,
which is termed as memory. Yet, the next or future entries have no effect on the
present input. This can be an issue when the problem at hand necessitates such
relationships. As a result, researchers developed BRNNs [7].
LSTM
Loss of memory between timesteps seems to be another problem of using
RNNs. To address this issue, LSTMs give two memories: long-term and short-
term memories. The long-term memory stores information that the network
considers important to maintain, whereas the short-term memory stores data
from previous timesteps. By using both memories, the network is able to
acquire all essential data from prior states that affects the present output [12].
Figure 2 depicts the general design of LSTMs units.
Bidirectional BiLSTM (Bilateral Long Short-Term Memory):
BiLSTMs, like BRNNs, were suggested for addressing the same problem by
using LSTM cells rather than RNN cells [48].
GRU (Gated Recurrent Unit):
Similarly to LSTMs, GRUs are proposed to address RNNs' disappearing
memory. Nevertheless, unlike LSTM, GRU consists of only two gates (combine
gate, and update gate). Moreover, GRU does not have a cell state. They rather
employ the previous concealed state (also known as working memory) as the
current cell state [47]. As a result, numerous studies have indicated that GRUs
perform better than LSTMs in dealing with long-term memories[38]. Moreover,
because GRUs are less computationally demanding than LSTMs, training and
prediction times seem to be much shorter [38].
∑ ( y i −^y i)2
MSE= i=0
n
The MSE will always be positive, with such a value of zero indicating that the
prediction was correct. Even though MSE is often chosen, it cannot manage
outliers since the effect of an outlier is amplified by adding a square to the
difference. To address this constraint, other functions that use the absolute
difference rather than the squared difference are recommended.
Mean Absolute Error (MAE):
Unlike the MSE function, the MAE estimates the total of the absolute
differences between the prediction and the final output. When dealing with
negative numbers, through using absolute rather than the square helps maintain
the true difference between the values, which may be useful for verifying the
findings of a certain model. The following is how the MAE is defined:
n
∑| y i−^y i|
MAE= i=0
n
Root Mean Square Error (RMSE):
Like MSE, the Root Mean Square Error (RMSE) is the MSE's root. The purpose
of this cost function is to negate the squared differences using a square root in
order to generate an intelligible cost value that uses the same scale as the
projected outcome values. This technique is commonly used to test and verify
regression models and comparing them to other methodologies. The following
is how the RMSE is defined:
√
n
∑ y i− ^y i
i=0
RMSE=
n
{
positive , if compound >0.05
Sentiment ( compound ) = neutral , if −0.05 ≤compound ≥0.05
negative ,if compound<−0.05
VADER outputs the ratios of positive, neutral, and negative terms in the text in
additional to the compound score [49]. According to Hutto et al. [51], VADER
is a rule-based paradigm with its own gold-standard lexicon, where each words
hold a valence score, a decimal number between [4, 4], signifying the strength
of the term. VADER analyzes the polarity of each word within the text for
calculating the composite sentiment [51]. Words like "fine" and "good" have
intensity values of 0.9 and 3.2, correspondingly, but terms like "discomforting"
and "worst" have intensity levels of 1.6 and 3.4, respectively [49]. VADER is
especially successful in social networking environments since the gold-standard
lexicon was established by analyzing random tweets on Twitter [51]. As just a
result, the lexicon now contains modern Internet colloquialisms such initialisms
(LOL, WTF), emoticons (:(, ;-)), and emojis [51].
In alongside using the gold-standard vocabulary, VADER additionally
incorporates five grammar and syntax guidelines, that enable for a more
extensive analysis of the emotion [51]. The five principles are:
1. Punctuation. Adding punctuation, particularly the exclamation mark, may
raise the emotion of a phrase, i.e., “Buy Ethereum!!! ” is much more
powerful than “Buy Ethereum” [51].
2. Capitalization. CAPITALIZED words may enhance a statement, i.e.,
“Ethereum Blacklisted IN Asia” is much more powerful than “Ethereum
outlawed in China” [51].
3. Degree modifiers. The use of different degree modifiers may raise or
decrease the impact of a statement, i.e. “This is highly excellent for
Ethereum” is more intense than “This is marginally good for Ethereum”
[51].
4. Contrastive “but”. The line “I love to pay with Ethereum, but regrettably
it hasn’t achieved a broader acceptance.” displays a change in attitude
following the word but [51].
5. Trigram analysis. According to the researchers, examining words in
groups of three may discover approximately 90% of long-range negations
that flip the meaning of a phrase, i.e., “The future of Ethereum isn’t really
evident” [51].
Unlike to machine learning-based sentiment analysis systems, VADER does not
need any data for training,as a lexicon has been present [ 51]. Additionally,
being fully rule-based, VADER can swiftly handle vast volumes of data. When
utilizing VADER using Natural Language Toolkit (NLTK) module in Python, it
is feasible to adapt the lexicon by putting additional words and polarity values
[50].
2,3,2 TextBlob
TextBlob is a Python package for text analysis. It offers a straightforward
Application programming interface for popular natural language processing
(NLP) activities such as part-of-speech tagging, noun phrase extraction,
Sentiment Analysis, classification, translation, and others (Loria, 2020). This
section makes use of the library's built-in dictionary. TextBlob measures
polarity and subjectivity in a text using WordNet, a lexical dataset of semantic
relationships among words. The polarisation score is a float between [1, 1]. The
subjective experience is a float in the range [0, 1], with 0.0 being very objective
and 1.0 being extremely subjective. To demonstrate, consider the word "calm,"
which has a polarity score of 0.3 and a subjectivity score of 0.75. The polarity
score means that the text has a very positive sentiment. According to the
subjectivity score, this word contains so much personal view, feelings, or
judgment than actual facts. TextBlob allocates these scores by repeating through
a textual data, going to look for phrases and words to which subjectivity and
polarity can be assigned. When it has finished looping, it displays the average of
all the total score found in the textual data. TextBlobs' sentiment component
includes two design and implementation of Sentiment Classification. The
above-mentioned pattern analyzer, which is the default choice, and the Naive
Bayes Analyzer. The movie reviews were used to train the Naive Bayes
Analyzer. By connecting with the reviews text and the numeric rating, the
analyzer learnt which phrases and contexts to judge good or unfavorable. In
overall, movie reviews are more subjective than news pieces in the database. As
a consequence, the standard pattern analyzer was employed. The WordNet
dictionary serves as the foundation for the TextBlob dictionary. The dictionary
has 2 917 scored words. Some words, such as "excellent," get several scores
because they have many meanings. The average of all senses is used to calculate
scores. Table 7 shows how TextBlob ranks several terms.
It is also worth noting that TextBlob considers negation and degree amplitudes.
Negation is handled by TextBlob by multiplying the polarity by 0.5. If
something is described as "not difficult," the polarity score of difficult is
multiplied by 0.5. When trying to introduce a negation, the subjectivity is
neglected. TextBlob employs amplifiers by imbuing strengthening words like
"very" with an intensity score. The intensity score for "Very" is 1.3. "Very
challenging" is calculated by multiplying the polarity and subjectivity scores by
1.3. It is vital to remember that both polarity and subjectivity remain within
their boundary. Negation paired with modifiers is an intriguing edge case: the
inverse intensity of the modifier enters for both polarity and subjectivity, in
addition to multiplying by 0.5 for the polarity (Loria, 2018).
2,3,3 Flair
is a cutting-edge NLP framework created using PyTorch. It combines latest
research and makes it simple to apply different word embedding to various NLP
jobs. The pre-trained sentiment model provides a tool for sentiment analysis that
does not require custom training. Unlike TextBlob and VADER, which produce
a sentiment score ranging from -1 to 1, flare sentiment produces the projected
label together with a confidence level. The confidence level runs from 0 to 1,
with 1 representing extremely confident and 0 being extremely unsure.