Deep Learning Crypto
Deep Learning Crypto
Figure 1. Development of the number of publications and citations for individual years from (Zuzana, 2021)
deep neural networks to design an indicator which
was based on investor’s sentiment. Their predictions
using this technique has outperformed other widely
recognized predictors and showing that AI can be
very effective in estimating sentiments.
Both long-term and short-term decisions have
been point of interest for researchers in this field, but
Millner & Heyen in (Millner & Heyen, 2021) state
that short-term decision making systems are of
higher priority as they can be more useful for
mangers to alter their decisions effectively. Also
Millner & Heyen point to more difficult nature of
long-term predictions due to the fact that they require
new scientific approaches that reduce model
misspecification in comparison with short-term
predictions where they often can be substantia lly
improved by simply reducing measurement errors in
initial conditions or in other words by increasing the
quality of observations. Petrelli and Cesarini in
(Petrelli & Cesarini, 2021) specifically recommend
usage of AI in high frequency trading systems.
Many researchers suggest that all businesses,
especially the ones facing a competitive relations hip
with each other, will have to apply and adopt to AI
approaches because failure to deployment of these
methods can put their business in danger when
competed with firms which they use AI (Milana &
Ashta, 2021). It is worthy to mention that AI
application goes beyond analytics of data in this field
as it can provide a feedback loop and allow itself to
Figure 2. Keyword occurrence analysis from (Zuzana, 2021)
𝑆𝑒𝑙𝑙 𝑜𝑟𝐵𝑢𝑦 𝐴𝑚𝑜𝑢𝑛𝑡 = 𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝐵𝑢𝑦𝑖𝑛𝑔 𝐴𝑚𝑜𝑢𝑛𝑡 ∗ 𝐴𝑐𝑡𝑖𝑜𝑛 𝑉𝑎𝑙𝑢𝑒 total_timesteps (The total number of samples to
train on): 200000
If the agent tries to buy or sell more than its
available balances, environment takes the action at Hyper parameters involved in Soft Actor Critic
the maximum possible amount. To facilitate the algorithm are as follows:
learning process we have added some penalty to the
agent denoted with “𝜉” (as part of its reward signa l) gamma (Discount factor): 0.99
each time it exceeds its available balances.
We define “step” as each time our agent has made learning_rate: 0.01
the observation and took the action and produced
new states. Per each step in running algorithm we buffer_size (Size of the replay buffer): 1000
calculate the difference between “the asset value plus
account balance value” (The Gross Value of the batch_size (Minibatch size for each gradient
account) before and after it and multiply that amount update): 1000
to a reward scalar to normalize it. This scalar factor
has amount of 10−4 for our cryptocurrency pairs. ent_coef (Entropy regularization coefficient): auto
Resulting value (𝜉) makes the main part of the (using 0.1 as initial value)
reward signal and sums with previously introduced
value to make the final reward signal at each step. learning_starts (How many steps of the model to
Final reward signal’s formula is the following: collect transitions for before learning starts): 200
Figure 5. Overall Structure of The Proposed Expert System for DRL Methods.
Green lines indicate train phase and red lines indicate exertion phase
A. ETH-USDT: