Paper 06
Paper 06
Paper 06
2 Bitcoin Price
1 Introduction • Decentralized
x − mean(X)
x0 = (3)
max(X) − min(X)
1 n
• MAE(X, h) = n Σi=1 h(xi ) − y i
• Cell state (memory for next input): ct = (it ∗ C̃t )+ 4.6.4 Dropout Rate
(ft ∗ ct−1 ) Regularization is the technique for constraining the
weights of the network. While in simple neural net-
• Calculating new state: ht = ot ∗ tanh(ct )
works, l1 and l2 regularization is used, in multi layer
As it can be seen from the equations, each gate has networks, drop out regularization takes place. It ran-
different sets of weights. In the last equation, the in- domly sets some input units to 0 in order to prevent
put gate and intermediate cell state are added with overfitting. Hence, its value represents the percentage
the old cell state and the forget gate. Output of this of disabled neurons in the preceding layer and ranges
operation is then used to calculate the new state. So, from 0 to 1. We have tried 0.25 and 0.3 and lastly we
this advanced cell with four interacting layers instead decided for 0.3.
of just one tanh layer in RNN, make LSTM perfect for
sequence prediction. 4.6.5 Number of Neurons in hidden layers
We opted for 10 neurons in the hidden layers, it actu-
4.6 Hyperparameters ally costs a lot to have more neurons, as the training
4.6.1 Optimizer process will last longer. Also, trying a larger number
did not give improved results.
While Stochastic Gradient Descent is used in many
Neural Network problems, it has the problem of con- 4.6.6 Epochs
verging to a local minimum. This of course presents a
problem considering Bitcoin price. Some other nice op- Rather arbitrarily, we decided for 100 epochs, after
timizers are variations of adaptive learning algorithms, trying other values, like 50, or 20. As with the number
like Adam, Adagrad, and RMSProp. Adam was found of hidden layer neurons, the more epochs, the more
to work slightly better than the rest, and that’s why time it takes for training to finish, since one epoch is
we go for it. (All of these come packed with Keras.) a full iteration over the training data. Also, it may
overfit the model.
4.6.2 Loss function
4.6.7 Batch Size
The performance measure for regression problems, will
typically be either RMSE (Root Mean Square Error) We decided to feed the network, with batches of 120
or MAE (Mean Absolute Error). data (again this number is a guess).
4.6.8 Architecture of Network
We used the Sequential API of Keras, rather than the
functional one. The overall architecture is as follows:
• 1 LSTM Layer: The LSTM layer is the inner
one, and all the gates, mentioned at the very be-
ginning are already implemented by Keras, with
a default activation of hard-sigmoid [Keras2015].
The LSTM parameters are the number of neurons,
and the input shape as discussed above.
• 1 Dropout Layer: Typically this is used before
the Dense layer. As for Keras, a dropout can be
added after any hidden layer, in our case it is after
the LSTM.
• 1 Dense Layer: This is the regular fully con-
nected layer.
• 1 Activation Layer: Because we are solving a
regression problem, the last layer should give the Figure 5: Error loss during training
linear combination of the activations of the previ-
ous layer with the weight vectors. Therefore, this
activation is a linear one. Alternatively, it could the data we gathered for Bitcoin, even though has been
be passed as a parameter to the previous Dense collected through years, might have become interest-
layer. ing, producing historic interpretation only in the last
couple of years. Furthermore, a breakthtrough evolu-
5 Results and Analysis tion in peer-to-peer transactions is ongoing and trans-
forming the landscape of payment services. While it
In this section we show the results of our LSTM model. seems all doubts have not been settled, time might be
It was noted during training that the higher the batch perfect to act. We think its difficult to give a mature
size (200) (Fig. 7, 8) the worst the prediction on the thought on Bitcoin for the future.
test set. Of course this is no wonder, since the more
training, the more prone to overfitting the model be-
comes . While it is difficult to predict the price of Bit- References
coin, we see that features are critical to the algorithm, [Nakamoto2008] Satoshi Nakamoto. Bitcoin: A Peer-
future work includes trying out the Gated Recurrent to-Peer Electronic Cash System
Unit version of RNN, as well as tuning, on existing
hyper-parameters. Below we show the loss from the [BitcoinWiki2017] Bitcoin Wiki. Deep Learning with
Mean Absolute Error function, when using the model Python. https://en.bitcoin.it/wiki/
to predict the training and test data. Controlled_supply