Crypto Predictions
Crypto Predictions
Crypto Predictions
BACHELOR’S DISSERTATION
REPORT 1
Cryptocurrencies Price Prediction
Lecturer in charge:
Dr. Mohammad Javad Faraji
3 CryptoPredictions 3
4 Models 4
4.1 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4.2 LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.3 GRU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.4 Orbit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.5 Arima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.6 SARIMAX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.7 Prophet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.8 XGBOOST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5 Model Performance 11
5.1 Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.2 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.2.1 Mean Absolute Error (MAE) . . . . . . . . . . . . . . . . 13
5.2.2 Mean Squared Error (MSE) . . . . . . . . . . . . . . . . . 13
5.2.3 Root Mean Squared Error (RMSE) . . . . . . . . . . . . . 13
5.2.4 Mean Absolute Percentage Error (MAPE) . . . . . . . . . 14
5.2.5 Symmetric Mean Absolute Percentage Error (SMAPE) . . 14
5.2.6 Mean Absolute Scaled Error (MASE) . . . . . . . . . . . . 14
5.2.7 Mean Squared Logarithmic Error (MSLE) . . . . . . . . . 15
6 Result 15
6.1 Accuracy Score & F1-Score . . . . . . . . . . . . . . . . . . . . . 15
6.2 Recall Score & Precision Score . . . . . . . . . . . . . . . . . . . . 15
6.3 MAPE, SMAPE, MASE, and MSLE . . . . . . . . . . . . . . . . 17
6.4 Results in Bitcoin . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.5 Deduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
7 Conclusion 23
8 References 23
1 Introduction
Cryptocurrency is a form of digital currency that regulates the generation of
currency units and verifies the transfer of funds using encryption techniques. No-
tably, cryptocurrencies are not governed by a central authority and operate on a
decentralized structure. Since the launch of Bitcoin in 2009, cryptocurrencies have
revolutionized the way people transfer money. Cryptocurrency was first proposed
in 1998 by a computer scientist, Wei Dai, who developed a cryptography-based
system that could be used to ease payments between parties. This system, called
”b-money,” laid the groundwork for future cryptocurrencies.
2
2 Machine Learning Technology
Machine Learning is a powerful and effective choice for trading strategies [4].
Its ability to uncover hidden data relationships that may elude human observation
makes it invaluable in predicting numeric outputs like price or volume and iden-
tifying categorical outputs such as trends. By providing the model with heuristic
input data, traders can leverage a wide array of machine learning models to gain
insights and make informed trading decisions.
Several machine learning models have proven successful in trading. Regression
models, including linear regression [5] and support vector regression [6], offer ac-
curate price movement estimation based on historical data. Classification models
like decision trees [7] and random forests [8] excel at identifying market trends and
making categorical predictions. Neural networks, such as deep learning models
[9], are highly adept at capturing complex patterns in financial data.
Extensive research has demonstrated the efficacy of machine learning in trad-
ing, with studies showing superior performance compared to traditional strategies
and higher returns [10] [11]. Furthermore, machine learning techniques have been
employed to analyze alternative data sources like social media sentiment [12] and
news articles [13] to gain a competitive edge in the market.
Machine learning provides traders with a diverse set of models and techniques
that enhance trading strategies. As technology continues to advance and more
data becomes available, the role of machine learning in the financial markets is
expected to grow significantly.
3 CryptoPredictions
In order to provide community with a platform in which different models and
cryptocurrencies are available, we have designed a library named CryptoPredic-
tions. Previous cryptocurrencies price forecasting papers used different metrics
and dataset settings, which caused ambiguities and interpretation problems. To
reduce those differences, we created a CryptoPredictions (a library with 8 models,
30 indicators, and 10 metrics).
2. Before the advent of our library, users had to run different codes for different
models, making it difficult to compare them fairly. Fortunately, CryptoPre-
dictions has made it possible to conduct a unified and equitable evaluation
of different models.
3. With Hydra, users can easily structure and understand arguments, making
it easier to run codes on different settings and check results. By using
Hydra, users have a better understanding of the arguments. Furthermore,
it is far easier to run a code on different settings and check the result.
3
4. While some models may perform exceptionally well in terms of accuracy,
they often require a well-defined strategy for successful trading. Our back-
tester can help users determine the effectiveness of the used model in real-
world scenarios.
4 Models
In this section, the information about different models that are used in the
library can be obtained.
4
values of m. In general, if the variables chosen at each node are highly connected,
small values of m result in favorable outcomes. As a result, at each node, m
observations are used for training and M-m observations are used for testing.
Globally, given a training set D of size n, m new training sets Di=1, ..., m of size n′
are created from m samples with replacement. Afterwards, each decision tree is
trained using the data set Di . In addition, it enters the nodes of each tree in or-
der to make a forecast when it has a new observation, as shown shown in Figure 1.
4.2 LSTM
LSTM (Long Short Term Memory) is another form of RNN module. Hochre-
iter and Schmidhuber (1997) [19] created LSTM, which was later developed and
popularized by several researchers. The LSTM network consists of modules with
recurrent consistency, similar to an RNN. The distinction between LSTM and
RNN is the connectivity between the hidden layers of RNN. The RNN expla-
nation structure is depicted in Figure. The only distinction between RNN and
LSTM is the memory cell of the structure’s hidden layer. And the design of
three unique gates efficiently resolves gradient issues. Figure depicts the LSTM
memory structure of the hidden layer [20].
Figure 2 explains that the RNN has deficiencies, which may be observed in
the input. This problem was discovered by Bengio, et al. (1994) [21]. X0 , X1
have a very large range of information Xt , Xt+1 , so that when ht+1 requires
information, those that are relevant to X0 , X1 in the RNN are unable to learn
to link information. Because the old memory that is saved becomes increasingly
useless over time given that it is overwritten or replaced by new memory.
As it is shown in Figure 3, the LSTM’s special units(recurrent hidden lay-
ers) contain memory blocks. In addition to memory cells with self-connections
that store the network’s temporal state, the memory blocks also contain multi-
plicative units called gates that regulate the flow of information. In the original
architecture, each memory block comprised an input gate and an output gate.
Controlling the flow of input activations into the memory cell is the input gate.
The output gate regulates the flow of cell activations from the cell to the remain-
der of the network. Subsequently, the memory block received the forget gate [22].
5
Figure 2: The Expanded Structure of RNN [18]
This addressed a shortcoming of LSTM models that prevented them from pro-
cessing continuous input streams that were not divided into subsequences. The
forget gate scales the internal state of the cell prior to adding it as input to the
cell via its self-recurrent link, thereby forgetting or resetting the cell’s memory in
an adaptive manner. In addition, the contemporary LSTM architecture includes
peephole connections from its internal cells to the gates in the same cell in order
to learn precise output timing [23].
4.3 GRU
A gated recurrent unit (GRU) was presented [24] to enable each recurrent unit
to capture adaptive dependencies on several time scales. Similar to the LSTM
unit, the GRU possesses gating units that influence the flow of information inside
the unit, but without distinct memory cells.
6
The architecture of the Gated Recurrent Unit:
In Figure 4, we have a GRU cell that is comparable to an LSTM cell or RNN
cell.
rt = σ(xt ∗ Ur + Ht−1 ∗ Wr )
It resembles the LSTM gate equation. The sigmoid function limits rt to (0,1) of
weight matrices Ur and Wr .
Update Gate (long-term memory)
Similarly, we have an update gate for long-term memory and the equation of the
gate is shown below.
ut = σ(xt ∗ Uu + Ht−1 ∗ Wu )
4.4 Orbit
Uber’s Orbit is an open source package designed to ease time series infer-
ences and forecasts using structural Bayesian time series models for real-world
applications and scientific study [26].It employs probabilistic programming lan-
guages such as Stan [27] and Pyro [28] while providing a familiar and intuitive
initialize-fit-predict interface for time series workloads.
7
It introduces a collection of refined Bayesian exponential smoothing models
with a wide range of priors, model type specifications, and noise distribution
options. The model includes a novel global trend term that is effective for short-
term time series. Most significantly, it includes a well-crafted Python compute
software/package named Orbit (Object-oriented Bayesian Time Series). The un-
derlying MCMC sampling process and optimization are handled using the proba-
bilistic programming languages Stan (Carpenter et al., 2017) and Pryo (Bingham
et al., 2019). Pyro, created by Uber researchers, is a universal probabilistic pro-
gramming language (PPL) built in Python and backed on the backend by PyTorch
and JAX. Orbit presently has a subset of the available prediction and sampling
algorithms for Pyro estimating.
4.5 Arima
The Autoregressive Integrated Moving Average (ARIMA) method was de-
veloped in 1970 by George Box and Gwilyn Jenkins and is also known as the
BoxJenskins method [29]. The ARIMA method completely ignores independent
variables while predicting, making it suited for interconnected statistical data
(dependent) and requiring some assumptions such as autocorrelation, trend, or
seasonality. The ARIMA method can predict historical data with the influence
of difficult-to-understand data, has a high degree of accuracy in short-term fore-
casting, and can deal with seasonal data variations.
The ARIMA method is classified into four categories: Autoregressive (AR),
Moving Average (MA), Autoregressive Moving Average (ARMA), and Autore-
gressive Integrated Moving Average (ARIMA) [30][31].
1. Autoregressive (AR)
It was introduced by Yule in 1926 and expanded by Walker in 1932. This
model assumes that data from prior periods is currently influencing current
data. It’s called autoregressive since it’s rebuilt against the variable’s prior
values in this model. The AR method is used to calculate the order value
of the coefficient p, which represents a value’s dependence on its previous
nearest value [32].
The the general form of an AR model with order p (AR (p)) or an ARIMA
model (p, 0, 0)is as follows:
8
value. The following are common forms of the AR and MA or ARIMA pro-
cesses’ models (p, 0, q).
Xt = µ + ϕ1 Xt−1 + ϕ2 Xt−2 + ... + ϕp Xt−p + et −
ϕ1 et−1 − ϕ2 et−2 − ... − ϕq Xt−q
4.6 SARIMAX
ARMA is the combination of the AR and MA models. Adding an integration
operator to an ARMA model produces an ARIMA model. A SARIMAX model
incorporates exogenous variables assessed at time t that influence the value of
input data at time t and integer multipliers of seasonality [34]. The parameters
required to define the SARIMAX model are listed in Table.
9
Finally, Seasonal Autoregressive Integrated Moving Average Model with Ex-
plicative Variable SARIMAX((p, d, q) * (P, D, Q) is stated as follows:
n
X
p s p
Θ(L) ∗ θ(L ) ∗ ∆D
s
d q
∗ ∆ ∗ yt = ϕ(L) ∗ ϕ(L ) ∗ ∆ ∗ s Q d
∆D
s ∗ ϵt + βi ∗ xit
i=1
4.7 Prophet
Prophet is a method for forecasting time series data based on an additive
model in which non-linear trends are fitted with annual, weekly, and daily sea-
sonality, in addition to holiday effects. It is most effective when applied to time
series with substantial seasonal effects and multiple seasons of historical data.
Prophet is robust to missing data and fluctuations in the trend, and it typically
handles outliers well [35].
As long as the Prophet accurately displays the conditional mean and con-
ditional variance, it should function adequately. Mathematically, we have this
formula:
4.8 XGBOOST
Gradient boosting (GBM) trees do unsupervised learning, in which they learn
from data without a specified model. XGBoost is a popular gradient-boosting
library. It can be used for GPU training, distributed computing, and paral-
lelization. It is accurate, adaptable to all forms of data and situations, well-
documented, and extremely user-friendly.
XGBoost is an abbreviation for Extreme Gradient Boosting. It is a properly
parallelized and optimized version of the gradient boosting technique. Paralleliz-
ing the entire boosting procedure drastically reduces training time.
Rather than training the best possible model on the data (as is the case with
conventional approaches), they trained hundreds of models on various subsets of
the training dataset and then conducted a vote to determine the model with the
best performance.
In many situations, XGBoost is superior to conventional gradient-boosting
methods. The Python implementation provides access to a huge array of inner
parameters that can be modified to improve precision and accuracy.
Parallelization, regularization, non-linearity, cross-validation, and scalability
are some of XGBoost’s most essential characteristics.
10
The XGBOOST algorithm works in such a way that it considers or estimates
a function. To begin, we generate a sequence based on the function gradients.
The following equation models a specific type of gradient descent. It specifies
the direction in which the function decreases, as it represents the loss function to
minimize. corresponds to the learning rate in gradient descent and is the rate of
change fitted to the loss function. is anticipated to replicate the loss’s behavior
adequately.
∂F
Fxt +1 = Fxt + ϵxt (xt )
∂x
To iterate over the model and determine its optimal formulation, we must
describe the entire formula as a sequence and identify a function that will converge
to the function’s minimum. This function will serve as an error metric to help
us minimize losses and sustain performance over time. The series approaches the
minimal value of the function. This specific notation denotes the error function
that applies while evaluating a gradient boosting regressor [36].
X
f (x, θ) = l(F ((Xi , θ), yi ))
5 Model Performance
In this section, the information about a common validation method for ad-
justing hyperparameters and different metrics for validating the prediction are
provided.
11
Figure 6
Figure 7
Method: After setting the number of splits, the training dataset is splitted
to equal subsets. For instance, there are 6 equal subsets in the above figure. In
the k th iteration of the outer loop,the k first subsets are considered as a training
subset and k + 1th is considered as a validation subset.
12
Figure 8
5.2 Metrics
After the obtaining the final predictions of the model, validating the data will
usually be carried out by calculating [?]:
13
MAE worlds. But, since you square the error, it could be still less interpretable.
Furthermore, it is scale dependent.
v
u n
u1 X
RM SE = t (Ai − Fi )2
n i=1
n
1X |Ai − Fi |
M AP E = (100 × )
n i=1 Ai
n
1X |Ai − Fi |
SM AP E = (200 × )
n i=1 |Ai | + |Fi |
1
Pn
n i=1 Ai − Fi M AE
M ASE = 1
PT = 1
PT
T −1 t=2 At − At−1 T −1 t=2 At − At−1
In case of no seasonality:
1
Pn
n i=1 Ai − Fi M AE
M ASE = 1
PT = 1
PT
T −m t=m+1 At − At−m T −m t=m+1 At − At−m
14
This is the mean absolute scaled error for both seasonal and non-seasonal time
series and is probably the best and most fair metric to use. This metric compares
the output to the naive forecast.
Naive forecasts are the most cost-effective forecasting model, and provide
a benchmark against which more sophisticated models can be compared. This
forecasting method is only suitable for time series data. Using the naive approach,
forecasts are produced that are equal to the last observed value. This method
works quite well for economic and financial time series, which often have patterns
that are difficult to reliably and accurately predict. If the time series is believed
to have seasonality, the seasonal naive approach may be more appropriate where
the forecasts are equal to the value from last season.
In time series notation: ŷT +h|T = yt
In MASE, if the error is less than one, then it can be concluded that the
forecast is better than an averaged naive forecast. On the contrary, if there
is more than one, the forecast is worse than an averaged naive forecast. The
advantages of this metric are scale independency and penalizing under and over
forecasting equally.
6 Result
To have a fair evaluation, we decided to compare different models on different
cryptocurrencies and metrics. The training dataset is equal in all models which is
from ’2022-11-13 13:30:00’ to ’2023-01-01 9:30:00’. Also, the test dataset is equal
in all models as well and is from ’2023-01-01 10:30:00’ to ’2023-02-16 10:30:00’.
We used hourly data and report the results in the below graphs.
15
Figure 9
Figure 10
serve the same graph to the former. Meanwhile in these two metrics the results
for ADA, AVAX and AXS show the lower score than others and it makes the task
harder for prediction.
Figure 11
16
Figure 12
Figure 13
Figure 14
17
Figure 15
Figure 16
In the BTC case, we can see similar results. The detail of the results is shown
in Figure 14 to Figure 26.
6.5 Deduction
With respect to the previous graphs, it can be concluded that despite the
superb result of SARIMAX and ARIMA, they have problems with predicting
the price accurately. Meanwhile, Orbit works better in terms of MAPE, MAE,
18
Figure 17
Figure 18
Figure 19
SMAPE, MASE, and MSLE. Lastly, Prophet not only shows a good performance
in accuracy and F1-score but also demonstrates a stunning result in MAPE,
MAE, SMAPEm and MSLE compared to other methods.
19
Figure 20
Figure 21
Figure 22
20
Figure 23
Figure 24
Figure 25
21
Figure 26
22
7 Conclusion
In summary, this report explored various aspects of cryptocurrency forecast-
ing, machine learning models, and evaluation metrics. The introduction provided
an overview of cryptocurrencies, their decentralized nature, and their significant
impact on the financial landscape.
The section on machine learning technology highlighted the suitability of ma-
chine learning models for cryptocurrency trading strategies, emphasizing their
ability to uncover hidden data relationships.
The CryptoPredictions library was introduced as a valuable platform for cryp-
tocurrency price forecasting, to overcome challenges such as dataset scarcity and
the need for unified evaluation of different models. The library’s features, includ-
ing data collection, model evaluation, and indicator calculation, were outlined.
The models section covered several prominent models used for cryptocurrency
forecasting, including Random Forest, LSTM, GRU, Orbit, ARIMA, SARIMAX,
Prophet, and XGBOOST. Each model was briefly described, showcasing their
unique characteristics and applications.
Lastly, the discussion delved into model performance evaluation, highlighting
the importance of cross-validation for hyperparameter tuning and model selec-
tion. Various metrics, such as MAPE, MAE, SMAPE, MASE, MSLE, accuracy,
and F1-score, were identified as essential tools for assessing the accuracy and
effectiveness of the forecasting models.
Overall, while different models showed varying levels of performance in terms
of accuracy and metrics, it was observed that Orbit and particularly Prophet con-
sistently demonstrated strong results across multiple evaluation criteria. These
models exhibited the potential to provide accurate and reliable cryptocurrency
price predictions.
It is worth noting that the field of cryptocurrency forecasting is dynamic
and evolving, and further research and experimentation are necessary to con-
tinually improve prediction accuracy and adapt to changing market conditions.
The CryptoPredictions library and the models discussed in this exploration pro-
vide valuable tools and insights for researchers, traders, and investors seeking to
navigate the world of cryptocurrency with greater confidence and understanding.
8 References
[1] S. Nakamoto, ”Bitcoin: A Peer-to-Peer Electronic Cash System,” 2008.
[3] Makarov, I., & Schoar, A. (2020). Trading and arbitrage in cryptocurrency
markets. Journal of Financial Economics, 135(2), 293–319.
[4] McNally, Sean, et al. “Predicting the Price of Bitcoin Using Machine Learn-
ing.” 2018 26th Euromicro International Conference on Parallel, Distributed
and Network-Based Processing (PDP), 2018, pp. 339–43.
[5] Geron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and
TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems.
O’Reilly Media.
23
[6] Drucker, H., Burges, C. J., Kaufman, L., Smola, A., Vapnik, V. (1997). Sup-
port vector regression machines. Advances in neural information processing
systems, 9, 155-161.
[7] Breiman, L., Friedman, J., Stone, C. J., Olshen, R. A. (1984). Classification
and regression trees. CRC press.
[9] LeCun, Y., Bengio, Y., Hinton, G. (2015). Deep learning. Nature, 521(7553),
436-444.
[10] Zheng, H., Shi, J., Zhang, X., Li, F., Li, G. (2020). Deep reinforcement
learning for stock trading: From models to reality. IEEE Transactions on
Neural Networks and Learning Systems, 32(6), 2563-2575.
[11] Grootveld, M., Hallerbach, W. (2018). Machine learning for trading. The
Journal of Portfolio Management, 44(3), 113-125.
[12] Bollen, J., Mao, H., Zeng, X. (2011). Twitter mood predicts the stock
market. Journal of computational science, 2(1), 1-8.
[13] Ma, J., Gao, W., Fan, Y. (2020). News-driven stock market prediction
using multi-scale deep neural networks. Expert Systems with Applications,
150, 113274.
[17] Biau, G. (2012). Analysis of a random forests model. The Journal of Machine
Learning Research, 13(1), 1063–1095.
[20] F. Qian and X. Chen, “Stock Prediction Based on LSTM under Different
Stability,” in 2019 IEEE 4th International Conference on Cloud Computing
and Big Data Analysis (ICCCBDA), 2019, pp. 483–486.
24
[24] Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, Yoshua Bengio “Em-
pirical Evaluation of Gated Recurrent Neural Networks on Sequence Model-
ing”
[25] https://www.analyticsvidhya.com/blog/2021/03/introduction-to-gated-
recurrent-unit-gru/
[26] Edwin Ng, Zhishi Wang, Huigang Chen, Steve Yang, and Slawek Smyl. 2021.
Orbit: Probabilistic Forecast with Exponential Smoothing. arXiv:2004.08492
[stat.CO]
[27] Bob Carpenter, Andrew Gelman, Matthew D. Hoffman, Daniel Lee. “Stan :
A Probabilistic Programming Language”
[28] Eli Bingham, Jonathan P. Chen, Martin Jankowiak, Fritz Obermeyer, Neeraj
Pradhan, Theofanis Karaletsos, Rohit Singh, Paul Szerlip, Paul Horsfall,
Noah D. Goodman. “Pyro: Deep Universal Probabilistic Programming”
[35] https://facebook.github.io/prophet/
[36] Tianqi Chen, Carlos Guestrin, “XGBoost: A Scalable Tree Boosting System”
25