Mathematics 08 01245 v2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

mathematics

Article
Deep Learning Methods for Modeling Bitcoin Price
Prosper Lamothe-Fernández 1 , David Alaminos 2, * , Prosper Lamothe-López 3
and Manuel A. Fernández-Gámez 4
1 Department of Financing and Commercial Research, UDI of Financing, Calle Francisco Tomás y Valiente, 5,
Universidad Autónoma de Madrid, 28049 Madrid, Spain; [email protected]
2 Department of Economic Theory and Economic History, Campus El Ejido s/n, University of Malaga,
29071 Malaga, Spain; [email protected]
3 Rho Finanzas Partner, Calle de Zorrilla, 21, 28014 Madrid, Spain; [email protected]
4 Department of Finance and Accounting, Campus El Ejido s/n, University of Malaga, 29071 Malaga, Spain
* Correspondence: [email protected]
!"#!$%&'(!
Received: 25 June 2020; Accepted: 28 July 2020; Published: 30 July 2020 !"#$%&'

Abstract: A precise prediction of Bitcoin price is an important aspect of digital financial markets
because it improves the valuation of an asset belonging to a decentralized control market. Numerous
studies have studied the accuracy of models from a set of factors. Hence, previous literature shows
how models for the prediction of Bitcoin su↵er from poor performance capacity and, therefore,
more progress is needed on predictive models, and they do not select the most significant variables.
This paper presents a comparison of deep learning methodologies for forecasting Bitcoin price and,
therefore, a new prediction model with the ability to estimate accurately. A sample of 29 initial factors
was used, which has made possible the application of explanatory factors of di↵erent aspects related
to the formation of the price of Bitcoin. To the sample under study, di↵erent methods have been
applied to achieve a robust model, namely, deep recurrent convolutional neural networks, which have
shown the importance of transaction costs and difficulty in Bitcoin price, among others. Our results
have a great potential impact on the adequacy of asset pricing against the uncertainties derived
from digital currencies, providing tools that help to achieve stability in cryptocurrency markets.
Our models o↵er high and stable success results for a future prediction horizon, something useful for
asset valuation of cryptocurrencies like Bitcoin.

Keywords: bitcoin; deep learning; deep recurrent convolutional neural networks; forecasting;
asset pricing

1. Introduction
Bitcoin is a cryptocurrency built by free software based on peer-to-peer networks as an irreversible
private payment platform. Bitcoin lacks a physical form, is not backed by any public body,
and therefore any intervention by a government agency or other agent is not necessary to transact [1].
These transactions are made from the blockchain system. Blockchain is an open accounting book,
which records transactions between two parties efficiently, leaving such a mark permanently and
impossible to erase, making this tool a decentralized validation protocol that is difficult to manipulate,
and with low risk of fraud. The blockchain system is not subject to any individual entity [2].
For Bitcoin, the concept originated from the concept of cryptocurrency, or virtual currency [3].
Cryptocurrencies are a monetary medium that is not a↵ected by public regulation, nor is it subject to a
regulatory body. It only a↵ects the activity and rules developed by the developers. Cryptocurrencies
are virtual currencies that can be created and stored only electronically [4]. The cryptocurrency is
designed to serve as a medium of exchange and for this, it uses cryptography systems to secure the
transaction and control the subsequent creation of the cryptocurrency. Cryptocurrency is a subset of a

Mathematics 2020, 8, 1245; doi:10.3390/math8081245 www.mdpi.com/journal/mathematics


Mathematics 2020, 8, 1245 2 of 13

digital currency designed to function as a medium of exchange and cryptography is used to secure the
transaction and control the future creation of the cryptocurrency.
Forecasting Bitcoin price is vitally important for both asset managers and independent investors.
Although Bitcoin is a currency, it cannot be studied as another traditional currency where economic
theories about uncovered interest rate parity, future cash-flows model, and purchasing power parity
matter, since di↵erent standard factors of the relationship between supply and demand cannot
be applied in the digital currency market like Bitcoin [5]. On the one hand, Bitcoin has di↵erent
characteristics that make it useful for those agents who invest in Bitcoin, such as transaction speed,
dissemination, decentrality, and the large virtual community of people interested in talking and
providing relevant information about digital currencies, mainly Bitcoin [6].
Velankar and colleagues [7] attempted to predict the daily price change sign as accurately as
possible using Bayesian regression and generalized linear model. To do this, they considered the daily
trends of the Bitcoin market and focused on the characteristics of Bitcoin transactions, reaching an
accuracy of 51% with the generalized linear model. McNally and co-workers [8] studied the precision
with which the direction of the Bitcoin price in United States Dollar (USD) can be predicted. They used
a recurrent neural network (RNN), a long short-term memory (LSTM) network, and the autoregressive
integrated moving average (ARIMA) method. The LSTM network obtains the highest classification
accuracy of 52% and a root mean square error (RMSE) of 8%. As expected, non-linear deep learning
methods exceeded the ARIMA method’s prognosis. For their part, Yogeshwaran and co-workers [9]
applied convolutional and recurrent neural networks to predict the price of Bitcoin using data from
a time interval of 5 min to 2 h, with convolutional neural networks showing a lower level of error,
at around 5%. Demir and colleagues [10] predicted the price of Bitcoin using methods such as long
short-term memory networks, naïve Bayes, and the nearest neighbor algorithm. These methods
achieved accuracy rates between 97.2% and 81.2%. Rizwan, Narejo, and Javed [11] continued with the
application of deep learning methods with the techniques of RNN and LSTM. Their results showed an
accuracy of 52% and an 8% RMSE by the LSTM. Linardatos and Kotsiantis [12] had the same results,
after using eXtreme Gradient Boosting (XGBoost) and LSTM; they concluded that this last technique
yielded a lower RMSE of 0.999. Despite the superiority of computational techniques, Felizardo and
colleagues [13] showed that ARIMA had a lower error rate than methods, such as random forest (RF),
support vector machine (SVM), LSTM, and WaveNets, to predict the future price of Bitcoin. Finally,
other works showed new deep learning methods, such as Dutta, Kumar, and Basu [14], who applied
both LSTM and the gated recurring unit (GRU) model; the latter showed the best error result, with an
RMSE of 0.019. Ji and co-workers [15] predicted the price of Bitcoin with di↵erent methodologies such
as deep neural network (DNN), the LSTM model, and convolutional neural network. They obtained a
precision of 60%, leaving the improvement of precision with deep learning techniques and a greater
definition of significant variables as a future line of research. These authors show the need for stable
prediction models, not only with data in and out of the sample, but also in forecasts of future results.
To contribute to the robustness of the Bitcoin price prediction models, in the present study a
comparison of deep learning methodologies to predict and model the Bitcoin price is developed and,
as a consequence, a new model that generates better forecasts of the Bitcoin price and its behavior in
the future. This model can predict achieving accuracy levels above 95%. This model was constructed
from a sample of 29 variables. Di↵erent methods were applied in the construction of the Bitcoin price
prediction model to build a reliable model, which is contrasted with various methodologies used in
previous works to check with which technique a high predictive capacity is achieved; specifically,
the methods of deep recurrent neural networks, deep neural decision trees, and deep support vector
machines, were used. Furthermore, this work attempts to obtain high accuracy, but it is also robust
and stable in the future horizon to predict new observations, something that has not yet been reported
by previous works [7–15], but which some authors demand for the development of these models and
their real contribution [9,12].
Mathematics 2020, 8, 1245 3 of 13

We make two main contributions to the literature. First, we consider new explanatory variables for
modeling the Bitcoin price, testing the importance of these variables which have not been considered
so far. It has important implications for investors, who will know which indicators provide reliable,
accurate, and potential forecasts of the Bitcoin price. Second, we improve the prediction accuracy
concerning that obtained in previous studies with innovative methodologies.
This study is structured as follows: Section 2 explains the theory of methods applied. Section 3
o↵ers details of the data and the variables used in this study. Section 4 develops the results obtained.
Section 5 provides conclusions of the study and the purposes of the models obtained.

2. Deep Learning Methods


As previously stated, di↵erent deep learning methods have been applied for the development of
Bitcoin price prediction models. We use this type of methodology thanks to its high predictive capacity
obtained in the previous literature on asset pricing to meet one of the objectives of this study, which is
to achieve a robust model. Specifically, deep recurrent convolution neural network, deep neural
decision trees, and deep learning linear support vector machines have been used. The characteristics
of each classification technique used are detailed below. In addition, the method of analysis of the
sensitivity of variables used in the present study, in particular, the method of Sobol [16], which is
necessary to determine the level of significance of the variables used in the prediction of Bitcoin price is
recorded, fulfilling the need presented by the previous literature in the realization of the task of feature
selection [15].

2.1. Deep Recurrent Convolution Neural Network (DRCNN)


Recurrent neural networks (RNN) have been applied in di↵erent fields for prediction due to its
huge prediction performance. The previous calculations made are those that form the result within the
structure of the RNN [17]. Having an input sequence vector x, the hidden nodes of a layer s, and the
output of a hidden layer y, can be estimated as explained in Equations (1) and (2).

st = (Wxs xt + Wss st 1 + bs ) (1)

yt = o(Wso st + by ) (2)

where Wxs , Wss , and Wso define the weights from the input layer x to the hidden layer s, by the biases
of the hidden layer and output layer. Equation (3) points out and o as the activation functions.
Z +1
j!t
STFT{z(t)}(⌧, !) ⌘ T (⌧, !) = z(t)!(t ⌧)e dt (3)
1

where z(t) is the vibration signals, and !(t) is the Gaussian window function focused around 0. T(⌧, !)
is the function that expresses the vibration signals. To calculate the hidden layers with the convolutional
operation, Equations (4) and (5) are applied.

St = (WTS ⇤ Tt + Wss ⇤ St 1 + Bs ) (4)

Yt = o(WYS ⇤ St + By ) (5)

where W indicates the convolution kernels.


Recurrent convolutional neural network (RCNN) can be heaped to establish a deep architecture,
called the deep recurrent convolutional neural network (DRCNN) [18,19]. To use the DRCNN method
in the predictive task, Equation (6) determines how the last phase of the model serves as a supervised
learning layer.
r̂ = (Wh ⇤ h + bh ) (6)
Mathematics 2020, 8, 1245 4 of 13

where Wh is the weight and bh is the bias. The model calculates the residuals caused by the di↵erence
between the predicted and the actual observations in the training stage [20]. Stochastic gradient descent
is applied for optimization to learn the parameters. Considering that the data at time t is r, the loss
function is determined as shown in Equation (7).

^ 1 ^ 2
L(r, r ) = kr rk2 (7)
2

2.2. Deep Neural Decision Trees (DNDT)


Deep neural decision trees are decision tree (DT) models performed by deep learning neural
networks, where a weight division corresponding to the DNDT belongs to a specific decision tree and,
therefore, it is possible to interpret its information [21]. Stochastic gradient descent (SGD) is used to
optimize the parameters at the same time; this partitions the learning processing in mini-batches and
can be attached to a larger standard neural network (NN) model for end-to-end learning with backward
propagation. In addition, standard DTs gain experience through a greedy and recursive factor division.
This can make a selection of functions more efficient [22]. The method starts by performing a soft
binning function to compute the residual rate for each node, making it possible to make decisions
divided into DNDTs [23]. The input of a binning function is a real scalar x which makes an index of the
containers to which x belongs.
The activation function of the DNDT algorithm is carried out based on the NN represented in
Equation (8).
⇡ = fw,b,⌧ (x) = softmax((wx + b)/⌧) (8)

where w is a constant with value w = [1, 2, ..., n + 1], ⌧ > 0 is a temperature factor, and b is defined in
Equation (9).
b = [0, 1, 1, 2, ..., 1 2 ··· n] (9)

The coding of the binning function x is given by the NN according the expression of Equation (9) [24].
The key idea is to build the DT with the applied Kronecker product from the binning function defined
above. Connecting every feature xd with its NN fd (xd ), we can determine all the final nodes of the DT
as appears in Equation (10).
z = f1(x1) ⌦ f2(x2) ⌦ · · · ⌦fD(xD) (10)

where z expresses the leaf node index obtained by instance x in vector form. The complexity parameter
of the model is determined by the number of cut points of each node. There may be inactive points
since the values of the cut points are usually not limited.

2.3. Deep Learning Linear Support Vector Machines (DSVR)


Support vector machines (SVMs) were created for binary classification. Training data are
denoted by its labels (xn , yn ), n = 1, . . . , N, xn 2 RD , tn 2 { 1, +1}; SVMs are optimized according to
Equation (11).
P
min 12 W T W + C N n = 1 ⇠n
w⇠n
s.t. W T xn tn 1 ⇠n > 8n (11)
⇠n 0 8n
where ⇠n are features that punish observations that do not meet the margin requirements [25].
The optimization problem is defined as appears in Equation (12).

1 XN
min W T W + C max(1 W T xn tn , 0) (12)
w 2 n=1
Mathematics 2020, 8, 1245 5 of 13

Usually the Softmax or 1-of-K encoding method is applied in the classification task of deep
learning algorithms. In the case of working with 10 classes, the Softmax layer is composed of 10 nodes
P
and expressed by pi , where i = 1, ..., 10; pi specifies a discrete probability distribution, 10
i pi = 1.
Equation (13) is defined by h as the activation of the penultimate layer nodes, W as the weight
linked by the penultimate layer to the Softmax layer, and the total input into a Softmax layer. The next
expression is the result. X
ai = hk Wki (13)
k

exp(ai )
pi = P10 (14)
j exp(a j )

The predicted class î would be as follows in Equation (15).

î = argmaxpi = argmaxai (15)


i i

Since linear-SVM is not di↵erentiable, a popular variation is known as the DSVR, which minimizes
the squared hinge loss as indicated in Equation (16).

1 XN 2
min W T W + C max(1 W T xn tn , 0) (16)
w 2 n=1

The target of the DSVR is to train deep neural networks for prediction [24,25]. Equation (17)
expresses the di↵erentiation of the activation concerning the penultimate layer, where l (w) is said
di↵erentiation, changing the input x for the activation h.

@l(w)
= Ctn w(I{1 > wT ht tn }) (17)
@hn

where I{·} is the indicator function. Likewise, for the DSVR, we have Equation (18).

@l(w)
= 2Ctn w(max(1 W T hn tn , 0)) (18)
@hn

2.4. Sensitivity Analysis


Data mining methods have the virtue of o↵ering a great amount of explanation to the authors’
studied problem. To know what the degree is, sensitivity analysis is performed. This analysis tries to
quantify the relative importance of the independent variables concerning the dependent variable [26,27].
To do this, the search for the reduction of the set of initial variables continues, leaving only the most
significant ones. The variance limit follows, where one variable is significant if its variance increases
concerning the rest of the variables as a whole. The Sobol method [16] is applied to decompose the
variance of the total output V (Y) o↵ered by the set of equations expressed in Equation (19).
X X X
V (Y ) = Vi + Vij + . . . + V1,2,...k (19)
i i j>1

where Vi = VE(Y Xi ) and Vij = VE(Y Xi , X j )) Vi V j .


Si = Vi/V and Si j = Vi j /V define the sensitivity indexes, with Sij being the e↵ect of interaction
between two variables. The Sobol decomposition allows the estimation of a total sensitivity index, STi,
which measures the sum of all the sensitivity e↵ects involved in the independent variables.
Mathematics 2020, 8, 1245 6 of 13

3. Data and Variables


The sample period selected is from 2011 to 2019, with a quarterly frequency of data. To obtain the
information of the independent variables, data from the IMF’s International Financial Statistics (IFS),
the World Bank, FRED Sant Louis, Google Trends, Quandl, and Blockchain.info were used.
The dependent variable used in this study is the Bitcoin price and is defined as the value of Bitcoin
in USD. In addition, we used 29 independent variables, classified into demand and supply variables,
attractiveness, and macroeconomic and financial variables, as possible predictors of the Bitcoin future
price (Table 1). These variables were used throughout the previous literature [1,3,4,14].

Table 1. Independent variables.

Variables Description
(a) Demand and Supply
Transaction value Value of daily transactions
Number of mined Bitcoins currently circulating on
Number of Bitcoins
the network
Bitcoins addresses Number of unique Bitcoin addresses used per day
Transaction volume Number of transactions per day
Unspent transactions Number of valid unspent transactions
Blockchain transactions Number of transactions on blockchain
Blockchain addresses Number of unique addresses used in blockchain
Block size Average block size expressed in megabytes
Miners reward Block rewards paid to miners
Mining commissions Average transaction fees (in USD)
Miners’ income divided by the number of
Cost per transaction
transactions
Difficulty Difficulty mining a new blockchain block
Hash Times a hash function can be calculated per second
Halving Process of reducing the emission rate of new units
(b) Attractive
Forum posts Number of new members in online Bitcoin forums
Forum members New posts in online Bitcoin forums
(c) Macroeconomic and Financial
Texas oil Oil Price (West Texas)
Brent oil Oil Price (Brent, London)
Dollar exchange rate Exchange rate between the US dollar and the euro
Dow Jones Dow Jones Index of the New York Stock Exchange
Gold Gold price in US dollars per troy ounce

The sample is fragmented into three mutually exclusive parts, one for training (70% of the data),
one for validation (10% of the data), and the third group for testing (20% of the data). The training data
are used to build the intended models, while the validation data attempt to assess whether there is
overtraining of those models. As for the test data, they serve to evaluate the built model and measure
the predictive capacity. The percentage of correctly classified cases is the precision results and RMSE
measures the level of errors made. Furthermore, for the distribution of the sample data in these three
phases, cross-validation 10 times with 500 iterations was used [28,29].

4. Results

4.1. Descriptive Statistics


Table 2 shows a statistical summary of the independent variables for predicting Bitcoin price. It is
observed that all the variables obtain a standard deviation not higher than each value of the mean.
Therefore, the data show initial stability. On the other hand, there is a greater di↵erence between
the minimum and maximum values. Variables like mining commissions and cost per transaction
show a small minimum value compared to their mean value. The same fact happens with the hash
Mathematics 2020, 8, 1245 7 of 13

variable. Despite these extremes, they do not a↵ect the values of the standard deviations of the
respective variables.

Table 2. Summary statistics.

Variables Obs Mean SD Min Max


Transaction value 112 342,460,106,866,711.0000 143,084,554,727,531.0000 59,238,547,391,199.6000 735,905,260,141,564.0000
Number of bitcoins 112 13,634,297.4824 3,709,010.0736 5,235,454.5455 18,311,982.5000
Bitcoins addresses 112 285,034.2515 219,406.3874 1576.8333 849,668.1000
Transaction volume 112 154,548.8041 117,104.3686 1105.5000 373,845.6000
Unspent transactions 112 28,581,914.9054 22,987,595.3012 78,469.7273 66,688,779.9000
Blockchain transactions 112 156,444,312.9120 161,252,448.1997 237,174.8889 520,792,976.5000
Blockchain addresses 112 4,812,692.05 13,735,245.35 14,437,299.03 117,863,226.2
Block size 112 0.4956 0.3638 0.0022 0.9875
Miners reward 112 420,160,582,581,028.0000 174,396,895,338,462.0000 101,244,436,734,897.0000 796,533,076,376,536.0000
Mining commissions 112 9,581,973,325,205.4400 42,699,799,790,392.8000 0.2591 315,387,506,596,395.0000
Cost per transaction 112 155,354,364,458,705.0000 156,696,788,525,225.0000 0.1179 757,049,771,708,905.0000
Difficulty 112 187,513,499,336,866.0000 195,421,886,528,251.0000 212,295,141,771.2000 836,728,509,520,663.0000
Hash 112 110,434,372.2765 154,717,725.3881 0.5705 516,395,703.4338
Halving 112 279,853,454,485,387.0000 162,806,469,642,875.0000 6,473,142,955,255.1700 804,437,327,302,638.0000
Forum posts 112 9279.8844 8585.0583 455.0000 53132.0000
Forum members 112 2432.2545 3394.4635 30.6364 14,833.3409
Texas Oil 112 72.4878 23.7311 21.1230 135.6700
Brent Oil 112 78.4964 26.5819 19.1900 139.3800
Dollar exchange rate 112 1.3767 0.9604 1.0494 8.7912
Dow Jones 112 15,926.7161 3324.8875 11,602.5212 22,044.8627
Gold 112 1329.400847 244.4099259 739.15 1846.75

4.2. Empirical Results


Table 3 and Figures 1–3 show the level of accuracy, the root mean square error (RMSE), and the
mean absolute percentage error (MAPE). In all models, the level of accuracy always exceeds 92.61%
for testing data. For its part, the RMSE and MAPE levels are adequate. The model with the highest
accuracy is that of deep recurrent convolution neural network (DRCNN) with 97.34%, followed by the
model of deep neural decision trees (DNDT) method with 96.94% on average by regions. Taken together,
these results provide a level of accuracy far superior to that of previous studies. Thus, in the work of Ji
and co-workers [15], an accuracy of around 60% is revealed; in the case of McNally and co-workers [8],
it is close to 52%; and in the study of Rizwan, Narejo, and Javed [11], it approaches 52%. Finally,
Table 4 shows the most significative variables by methods after applying the Sobol method for the
sensitivity analysis.

Table 3. Results of accuracy evaluation: classification (%).

DRCNN DNDT DSVR


Sample
Acc. (%) RMSE MAPE Acc. (%) RMSE MAPE Acc. (%) RMSE MAPE
Training 97.34 0.66 0.29 95.86 0.70 0.33 94.49 0.75 0.38
Validation 96.18 0.71 0.34 95.07 0.74 0.37 93.18 0.81 0.43
Testing 95.27 0.77 0.40 94.42 0.79 0.42 92.61 0.84 0.47
DRCNN: deep recurrent convolution neural network; DNDT: deep neural decision trees; DSVR: deep learning linear
support vector machines; Acc: accuracy; RMSE: root mean square error; MAPE: mean absolute percentage error.
Mathematics
Mathematics 2020,
2020, 8,
8, xx FOR
FOR PEER
PEER REVIEW
REVIEW 88 of
of 14
14

Mathematics 2020, 8, 1245 8 of 13

Table
Table 3.
3. Results
Results of
of accuracy
accuracy evaluation:
evaluation: classification
classification (%).
(%).
Table 4. Results of accuracy evaluation: greater sensitivity variables.
DRCNN
DRCNN DNDT
DNDT DSVR
DSVR
Sample DRCNN DNDT DSVR
Sample Acc.
Acc.
Acc.
Acc. (%)
(%) RMSE
RMSEvalueMAPE
Transaction Acc.
Acc. (%)
MAPE Transaction RMSE
(%) volume
RMSE MAPE
MAPE
Transaction value RMSE
RMSE MAPE
MAPE
(%)
(%)
Transaction volume Block size Block size
Training
Training 97.34
97.34 0.66
0.66 0.29 95.86
0.29 Blockchain 0.70
95.86transactions
0.70 0.33
0.33 94.49
94.49 0.75
0.75 0.38
0.38
Block size Blockchain transactions
Validatio
Validatio Cost per transaction Cost per transaction Cost per transaction
96.18
96.18 0.71
0.71 0.34
0.34 95.07
95.07 0.74
0.74 0.37
0.37 93.18
93.18 0.81
0.81 0.43
0.43
nn Difficulty Difficulty Difficulty
Testing
Testing 95.27
95.27Dollar exchange
0.77
0.77 rate0.40
0.40 Forum
94.42 posts 0.79
94.42 0.79 0.42Forum 92.61
0.42 posts
92.61 0.84
0.84 0.47
0.47
DRCNN: Dow Jones Dow Jones Dollar exchange rate
DRCNN: deep
deep recurrent
recurrent convolution
convolution neural
neural network;
network; DNDT:
DNDT: deep
deep neural
neural decision
decision trees;
trees; DSVR:
DSVR: deep
deep learning
learning
Gold Gold Dow Jones
linear
linear support
support vector
vector machines;
machines; Acc:
Acc: accuracy;
accuracy; RMSE:
RMSE: rootroot mean
mean square
square error;
error; MAPE:
MAPE: mean absolute percentage
Gold mean absolute percentage
error.
error.

Figure
Figure 1.
Figure 1. Results
1. Results of
Results of accuracy
of accuracy evaluation:
accuracy evaluation: classification
evaluation: classification (%).
classification (%).
(%).

Figure
Figure 2.
2. Results
Results of accuracy evaluation:
evaluation: RMSE.
Figure 2. Results of
of accuracy
accuracy evaluation: RMSE.
RMSE.
Mathematics 2020, 8, 1245 9 of 13
Mathematics 2020, 8, x FOR PEER REVIEW 9 of 14

Figure 3.
Figure 3. Results
Results of
of accuracy
accuracy evaluation: MAPE.
evaluation: MAPE.

Table 4 shows additional information


Table 4. Results on evaluation:
of accuracy the significant variables.
greater Block
sensitivity size, cost per transaction,
variables.
and difficulty were significant in the three models for each method applied. This demonstrates the
DRCNN DNDT DSVR
importance of the cost to carry out the Bitcoin transaction, of the block of Bitcoins to buy, as well as the
difficulty of the Transaction
miners to value
find new Bitcoins,Transaction
as thevolume
main factors in Transaction value
the task of determining the price
Transaction volume Block size Block
of Bitcoin. This contrasts with the results shown in previous studies, where these variables are not size
significant or are not Blockusedsizeby the initialBlockchain transactions
set of variables Blockchain
[5,7,8]. The best resultstransactions
were obtained by the
DRCNN method, Cost where
per transaction
in addition to the Cost per transactionvariables,
aforementioned Cost
theper transaction
transaction value, transaction
volume, block size,Difficulty
dollar exchange rate, Dow Difficulty
Jones, and gold were alsoDifficulty
significant. This shows that
the demand and Dollar exchange
supply rate of the Bitcoin
variables Forum postsare essential to predict
market Forum its posts
price, something that
has been shown by Dowsome Jones
previous works [1,30]. DowYetJones Dollar exchange
significant macroeconomic and rate
financial variables
have not been observed Gold as important factors by Gold
other recent works [30,31], DowsinceJonesthey were shown as
variables that did not influence Bitcoin price fluctuations. In our results, the Gold
macroeconomic variables
of Dow Jones and gold have been significant in all methods.
Table 4 shows additional information on the significant variables. Block size, cost per
On the other
transaction, hand, thewere
and difficulty models built byin
significant thetheDNDT
threeand DSVR
models formethods
each methodshow high levels
applied. of
This
precision,
demonstratesalthough lower than of
the importance those
the obtained by the
cost to carry outDRCNN. Furthermore,
the Bitcoin transaction, these methods
of the block show some
of Bitcoins
di↵erent
to buy, assignificant
well as thevariables.
difficultySuch of theis miners
the casetooffind
the new
variables of forum
Bitcoins, as theposts,
main afactors
variable popularly
in the task of
used as a proxy
determining thefor the of
price level of future
Bitcoin. Thisdemand
contrasts that Bitcoin
with could have,
the results shown although with studies,
in previous divergences
wherein
previous works regarding its significance to predict the price of Bitcoin, where some
these variables are not significant or are not used by the initial set of variables [5,7,8]. The best results works show that
this
werevariable
obtained is not
by significant
the DRCNN [11,14]. Finally,
method, these
where inmethods
additionshowto the another macroeconomic
aforementioned variable
variables, the
that is morevalue,
transaction significant, in the volume,
transaction case of the dollar
block exchange
size, rate. Thisrate,
dollar exchange represents
Dow Jones, the importance
and gold werethat
changes in the price
also significant. This of the USD
shows that with Bitcoin and
the demand can be decisive
supply in estimating
variables the possible
of the Bitcoin marketdemand and,
are essential
therefore,
to predict aitschange
price, in price. This
something variable,
that has been likeshown
the restbyofsome
the macroeconomic
previous worksvariables,
[1,30]. Yethas not been
significant
shown as a significant
macroeconomic variablevariables
and financial [5,31]. have not been observed as important factors by other recent
works This set ofsince
[30,31], variables
they observed
were shown as significant
as variablesrepresents
that did anot
group of novel
influence factors
Bitcoin thatfluctuations.
price determine theIn
price of Bitcoin
our results, and therefore, isvariables
the macroeconomic di↵erent of fromDowthat shown
Jones andin the have
gold previous
beenliterature.
significant in all methods.
On the other hand, the models built by the DNDT and DSVR methods show high levels of
4.3. Post-Estimations
precision, although lower than those obtained by the DRCNN. Furthermore, these methods show
someIndifferent
this section, we try to
significant perform estimations
variables. Such is the of models
case of theto generate
variablesforecasts
of foruminposts,
a future horizon.
a variable
For this, weused
popularly usedas thea framework of multiple-step
proxy for the level of futureahead prediction,
demand applying
that Bitcoin the iterative
could strategywith
have, although and
models built in
divergences to previous
predict one step regarding
works forward areitstrained [32]. At
significance to time t, a the
predict prediction
price ofisBitcoin,
made for moment
where somet
1, andshow
+works this prediction
that this isvariable
used to is
predict for moment[11,14].
not significant t + 2 and so on. these
Finally, This means
methods thatshow
the predicted
another
data for t + 1 arevariable
macroeconomic considered
thatreal data and
is more are added
significant, in to
thethecase
endof of the
the available data [33].
dollar exchange Table
rate. 5
This
represents the importance that changes in the price of the USD with Bitcoin can be decisive in
4.3. Post-Estimations
In this section, we try to perform estimations of models to generate forecasts in a future horizon.
For this, we used the framework of multiple-step ahead prediction, applying the iterative strategy
Mathematics
and models 2020, 8, 1245
built to predict one step forward are trained [32]. At time t, a prediction is made 10 offor
13

moment t + 1, and this prediction is used to predict for moment t + 2 and so on. This means that the
predicted
and Figures data
4–6for t + 1the
show areaccuracy
consideredandreal data
error and are
results for added
t + 1 andto the
t + 2end of the available
forecasting data
horizons. [33].
For t+
Table 5 and Figures 4–6 show the accuracy and error results for t + 1 and t + 2 forecasting
1, the range of precision for the three methods is 88.34–94.19% on average, where the percentage of horizons.
For t + 1,isthe
accuracy range
higher of precision
in the for the three
DRCNN (94.19%). For t methods is 88.34–94.19%
+ 2, this range of precisionon average, where
is 85.76–91.37%, the
where
percentage of accuracy is higher in the DRCNN (94.19%). For t + 2, this range of
the percentage of accuracy is once again higher in the DRCNN (91.37%). These results show the high precision is 85.76–
91.37%,
precisionwhere the percentage
and great robustnessofofaccuracy is once again higher in the DRCNN (91.37%). These results
the models.
show the high precision and great robustness of the models.
Table 5. Multiple-step ahead forecasts in forecast horizon = t + 1 and t + 2.
Table 5. Multiple-step ahead forecasts in forecast horizon = t + 1 and t + 2.
DRCNN DNDT DSVR
Horizon DRCNN DNDT DSVR
Horizon Acc. (%) RMSE MAPE Acc. (%) RMSE MAPE Acc. (%) RMSE MAPE
Acc. (%) RMSE MAPE Acc. (%) RMSE MAPE Acc. (%) RMSE MAPE
t+1 94.19 0.81 0.52 92.35 0.87 0.59 88.34 0.97 0.65
t + 1 t + 2 94.19 91.37 0.81 0.92 0.520.63 92.35
89.41 0.87
1.03 0.59
0.67 88.34
85.76 0.97
1.10 0.78 0.65
t+2 91.37 0.92 0.63 89.41 1.03
Acc: accuracy.
0.67 85.76 1.10 0.78
Acc: accuracy.

Figure
PEER4.
Figure
Mathematics 2020, 8, x FOR Multiple-step
Multiple-step ahead
4.REVIEW ahead forecasts
forecasts in
in forecast
forecast horizon:
horizon: accuracy.
accuracy. 11 of 14

Figure
Figure 5.
5. Multiple-step
Multiple-step ahead
ahead forecasts
forecasts in
in forecast
forecast horizon:
horizon: RMSE.
RMSE.
Mathematics 2020, 8, 1245 11 of 13
Figure 5. Multiple-step ahead forecasts in forecast horizon: RMSE.

Figure 6.
Figure 6. Multiple-step
Multiple-step ahead
ahead forecasts in forecast
forecasts in forecast horizon:
horizon: MAPE.
MAPE.

5. Conclusions
5. Conclusions
This study developed
This study developed aa comparison
comparisonof ofmethodologies
methodologiestotopredictpredictBitcoin
Bitcoinprice
priceand,
and,therefore,
therefore,a
anew
newmodel
modelwaswascreated
createdtotoforecast
forecastthisthisprice.
price.TheTheperiod
periodselected
selectedwaswas from
from 2011
2011 to
to 2019.
2019. We
We applied
applied
di↵erent
different deep
deep learning
learning methods
methods in in the
the construction
construction of of the
the Bitcoin
Bitcoin price
price prediction
prediction model
model toto achieve
achieve
aa robust
robust model,
model,such
suchasasdeep
deeprecurrent
recurrent convolutional
convolutional neural
neural network,
network, deep deep neural
neural decision
decision treestrees
and
and deep support vector machines. The DRCNN model obtained the
deep support vector machines. The DRCNN model obtained the highest levels of precision. We highest levels of precision.
We propose
propose to increase
to increase thethe level
level of of performance
performance ofofthe
themodels
modelstotopredict
predictthetheprice
priceofof Bitcoin
Bitcoin compared
compared
to
to previous
previousliterature.
literature.This
Thisresearch
research hashas
shownshownsignificantly higher
significantly precision
higher resultsresults
precision than those
thanshown
those
in previous works, achieving a precision hit range of 92.61–95.27%. Likewise, it
shown in previous works, achieving a precision hit range of 92.61–95.27%. Likewise, it was possible was possible to identify
atonew set ofasignificant
identify new set ofvariables
significant forvariables
the prediction
for theof prediction
the price ofofBitcoin, o↵ering
the price great stability
of Bitcoin, offering in the
great
models developed predicting in the future horizons of one and two years.
stability in the models developed predicting in the future horizons of one and two years.
This
This research
research allows
allows usus to
to increase
increase the
the results
results and
and conclusions
conclusions on on the
the price
price of
of Bitcoin
Bitcoin concerning
concerning
previous
previous works,
works, both
both in
in matters
matters of of precision
precision andand error,
error, but
but also
also on
on significant
significant variables.
variables. A A set
set of
of
significant variables for each methodology applied has been selected analyzing
significant variables for each methodology applied has been selected analyzing our results, but some our results, but some
of
of these
these variables
variables are
are recurrent
recurrent in in the
the three
three methods.
methods. This supposes an
This supposes important addition
an important addition toto the
the field
field
of cryptocurrency pricing. The conclusions are relevant to central bankers, investors, asset managers,
private forecasters, and business professionals for the cryptocurrencies market, who are generally
interested in knowing which indicators provide reliable, accurate, and potential forecasts of price
changes. Our study suggests new and significant explanatory variables to allow these agents to predict
the Bitcoin price phenomenon. These results have provided a new Bitcoin price forecasting model
developed using three methods, with the DCRNN model as the most accurate, thus contributing to
existing knowledge in the field of machine learning, and especially, deep learning. This new model
can be used as a reference for setting asset pricing and improved investment decision-making.
In summary, this study provides a significant opportunity to contribute to the field of finance,
since the results obtained have significant implications for the future decisions of asset managers,
making it possible to avoid big change events of the price and the potential associated costs. It also
helps these agents send warning signals to financial markets and avoid massive losses derived from an
increase of volatility in the price.
Opportunities for further research in this field include developing predictive models considering
volatility correlation of the other new alternative assets and also safe-haven assets such as gold or
stable currencies, that evaluate the di↵erent scenarios of portfolio choice and optimization.
Mathematics 2020, 8, 1245 12 of 13

Author Contributions: Conceptualization, P.L.-F., D.A., P.L.-L. and M.A.F.-G.; Data curation, D.A. and M.A.F.-G.;
Formal analysis, P.L.-F., D.A. and P.L.-L.; Funding acquisition, P.L.-F., P.L.-L. and M.A.F.-G.; Investigation, D.A.
and M.A.F.-G.; Methodology, D.A.; Project administration, P.L.-F. and M.A.F.-G.; Resources, P.L.-F. and M.A.F.-G.;
Software, D.A.; Supervision, D.A.; Validation, D.A. and P.L.-L.; Visualization, P.L.-F. and D.A.; Writing—original
draft, P.L.-F. and D.A.; Writing—review & editing, P.L.-F., D.A., P.L.-L. and M.A.F.-G. All authors have read and
agreed to the published version of the manuscript.
Funding: This research was funded by Cátedra de Economía y Finanzas Sostenibles, University of Malaga, Spain.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Kristoufek, L. What Are the Main Drivers of the Bitcoin Price? Evidence from Wavelet Coherence Analysis.
PLoS ONE 2015, 10, e0123923. [CrossRef]
2. Wamba, S.F.; Kamdjoug, J.R.K.; Bawack, R.E.; Keogh, J.G. Bitcoin, Blockchain and Fintech: A systematic
review and case studies in the supply chain. Prod. Plan. Control Manag. Oper. 2019, 31, 115–142. [CrossRef]
3. Chen, W.; Zheng, Z.; Ma, M.; Wu, J.; Zhou, Y.; Yao, J. Dependence structure between bitcoin price and its
influence factors. Int. J. Comput. Sci. Eng. 2020, 21, 334–345. [CrossRef]
4. Balcilar, M.; Bouri, E.; Gupta, R.; Roubaud, D. Can volume predict bitcoin returns and volatility?
A quantiles-based approach. Econ. Model. 2017, 64, 74–81. [CrossRef]
5. Ciaian, P.; Rajcaniova, M.; Artis Kancs, D. The economics of BitCoin price formation. Appl. Econ. 2016,
48, 1799–1815. [CrossRef]
6. Schmidt, R.; Möhring, M.; Glück, D.; Haerting, R.; Keller, B.; Reichstein, C. Benefits from Using Bitcoin:
Empirical Evidence from a European Country. Int. J. Serv. Sci. Manag. Eng. Technol. 2016, 7, 48–62. [CrossRef]
7. Velankar, S.; Valecha, S.; Maji, S. Bitcoin Price Prediction using Machine Learning. In Proceedings of the
20th International Conference on Advanced Communications Technology (ICACT), Chuncheon-si, Korea,
11–14 February 2018.
8. McNally, S.; Roche, J.; Caton, S. Predicting the Price of Bitcoin Using Machine Learning. In Proceedings
of the 26th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing,
Cambridge, UK, 21–23 March 2018.
9. Yogeshwaran, S.; Kaur, M.J.; Maheshwari, P. Project Based Learning: Predicting Bitcoin Prices using Deep
Learning. In Proceedings of the 2019 IEEE Global Engineering Education Conference (EDUCON), Dubai,
UAE, 9–11 April 2019.
10. Demir, A.; Akılotu, B.N.; Kadiroğlu, Z.; Şengür, A. Bitcoin Price Prediction Using Machine Learning Methods.
In Proceedings of the 2019 1st International Informatics and Software Engineering Conference (UBMYK),
Ankara, Turkey, 6–7 November 2019.
11. Rizwan, M.; Narejo, S.; Javed, M. Bitcoin price prediction using Deep Learning Algorithm. In Proceedings
of the 13th International Conference on Mathematics, Actuarial Science, Computer Science and Statistics
(MACS), Karachi, Pakistan, 14–15 December 2019.
12. Linardatos, P.; Kotsiantis, S. Bitcoin Price Prediction Combining Data and Text Mining. In Advances in
Integrations of Intelligent Methods. Smart Innovation, Systems and Technologies; Hatzilygeroudis, I., Perikos, I.,
Grivokostopoulou, F., Eds.; Springer: Singapore, 2020.
13. Felizardo, L.; Oliveira, R.; Del-Moral-Hernández, E.; Cozman, F. Comparative study of Bitcoin price prediction
using WaveNets, Recurrent Neural Networks and other Machine Learning Methods. In Proceedings of the
6th International Conference on Behavioral, Economic and Socio-Cultural Computing (BESC), Beijing, China,
28–30 October 2019.
14. Dutta, A.; Kumar, S.; Basu, M. A Gated Recurrent Unit Approach to Bitcoin Price Prediction. J. Risk
Financ. Manag. 2020, 13, 23. [CrossRef]
15. Ji, S.; Kim, J.; Im, H. A Comparative Study of Bitcoin Price Prediction Using Deep Learning. Mathematics
2019, 7, 898. [CrossRef]
16. Saltelli, A. Making best use of model evaluations to compute sensitivity indices. Comput. Phys. Commun.
2002, 145, 280–297. [CrossRef]
17. Wang, S.; Chen, X.; Tong, C.; Zhao, Z. Matching Synchrosqueezing Wavelet Transform and Application to
Aeroengine Vibration Monitoring. IEEE Trans. Instrum. Meas. 2017, 66, 360–372. [CrossRef]
Mathematics 2020, 8, 1245 13 of 13

18. Huang, C.-W.; Narayanan, S.S. Deep convolutional recurrent neural network with attention mechanism for
robust speech emotion recognition. In Proceedings of the 2017 IEEE International Conference on Multimedia
and Expo, Hong Kong, China, 10–14 July 2017; pp. 583–588.
19. Ran, X.; Xue, L.; Zhang, Y.; Liu, Z.; Sang, X.; Xe, J. Rock Classification from Field Image Patches Analyzed
Using a Deep Convolutional Neural Network. Mathematics 2019, 7, 755. [CrossRef]
20. Ma, M.; Mao, Z. Deep Recurrent Convolutional Neural Network for Remaining Useful Life Prediction.
In Proceedings of the 2019 IEEE International Conference on Prognostics and Health Management (ICPHM),
San Francisco, CA, USA, 17–20 June 2019; pp. 1–4.
21. Yang, Y.; Garcia-Morillo, I.; Hospedales, T.M. Deep Neural Decision Trees. In Proceedings of the 2018 ICML
Workshop on Human Interpretability in Machine Learning (WHI 2018), Stockholm, Sweden, 14 July 2018.
22. Norouzi, M.; Collins, M.D.; Johnson, M.; Fleet, D.J.; Kohli, P. Efficient non-greedy optimization of decision
trees. In Proceedings of the 28th International Conference on Neural Information Processing Systems,
Montreal, QC, Canada, 8–13 December 2015; pp. 1729–1737.
23. Dougherty, J.; Kohavi, R.; Sahami, M. Supervised and unsupervised discretization of continuous features.
In Proceedings of the 12th International Conference on Machine Learning (ICML), Tahoe City, CA, USA,
9–12 July 1995.
24. Jang, E.; Gu, S.; Poole, B. Categorical reparameterization with Gumbel-Softmax. arXiv 2017, arXiv:1611.01144.
25. Tang, Y. Categorical reparameterization with Gumbel-Softmax. arXiv 2013, arXiv:1306.0239.
26. Delen, D.; Kuzey, C.; Uyar, A. Measuring firm performance using financial ratios: A decision tree approach.
Expert Syst. Appl. 2013, 40, 3970–3983. [CrossRef]
27. Efimov, D.; Sulieman, H. Sobol Sensitivity: A Strategy for Feature Selection. In Mathematics Across
Contemporary Sciences. AUS-ICMS 2015; Springer Proceedings in Mathematics & Statistics: Cham, Switzerland,
2017; Volume 190.
28. Alaminos, D.; Fernández, S.M.; García, F.; Fernández, M.A. Data Mining for Municipal Financial Distress
Prediction, Advances in Data Mining, Applications and Theoretical Aspects. Lect. Notes Comput. Sci. 2018,
10933, 296–308.
29. Zhang, G.P.; Qi, M. Neural network forecasting for seasonal and trend time series. Eur. J. Oper. Res. 2005,
160, 501–514. [CrossRef]
30. Polasik, M.; Piotrowska, A.I.; Wisniewski, T.P.; Kotkowski, R.; Lightfoot, G. Price fluctuations and the use of
Bitcoin: An empirical inquiry. Int. J. Electron. Commer. 2015, 20, 9–49. [CrossRef]
31. Al-Khazali, O.; Bouri, E.; Roubaud, D. The impact of positive and negative macroeconomic news surprises:
Gold versus Bitcoin. Econ. Bull. 2018, 38, 373–382.
32. Koprinska, I.; Rana, M.; Rahman, A. Dynamic ensemble using previous and predicted future performance for
Multi-step-ahead solar power forecasting. In Proceedings of the ICANN 2019: Artificial Neural Networks
and Machine Learning, Munich, Germany, 17–19 September 2019; pp. 436–449.
33. Makridakis, S.; Spiliotis, E.; Assimakopoulos, V. Statistical and Machine Learning forecasting methods:
Concerns and ways forward. PLoS ONE 2018, 13, e0194889. [CrossRef] [PubMed]

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).
PT. TRI DIGITAL PERKASA
Equity Tower, lantai 37 unit D&H, Jl. Jend. Sudirman Kav. 52-53 (SCBD),
Jakarta Selatan - 12190
Telp. 081973008899

TANDA TERIMA
001/TT/TDP/IX/2022

Telah diterima dari : PT. Tri Digital Perkasa


Berupa : Angsuran tanggal 30/07/2022 sebesar Rp 15.000.000
: Angsuran tanggal 12/08/2022 sebesar Rp 17.600.000
Yang diterima oleh : Alif Karnadi (No. Rek 5315180918)
Untuk Keperluan : Hutang Gaji An. Alif Karnadi

Jakarta, 02 September 2022

Penerima, Pengirim,

Alif Karnadi Dian Primasanti


PT. TRI DIGITAL PERKASA
Equity Tower, lantai 37 unit D&H, Jl. Jend. Sudirman Kav. 52-53 (SCBD), Jakarta Selatan - 12190
Telp. 081973008899

SURAT HUTANG
0002.SUP/02.09.TDP/2022

Yang bertanda tangan di bawah ini:

Nama : Alif Karnadi Yulvianto


No. KTP : 3173070607970004
Alamat : Jl. Nusa Indah No. 13, 011/005, Cipete Selatan, Cilandak, Jakarta Selatan
Selanjutnya disebut sebagai pihak pertama

Nama : Gayuh Tri Satria


Jabatan : Direktur
Selanjutnya disebut sebagai pihak kedua

Dengan ini menyatakan bahwa benar pihak kedua memiliki hutang gaji kepada pihak pertama
sebesar Rp 30.000.000. Dalam pelunasannya akan dilakukan secara berangsur.

Jakarta, 02 September 2022

Penerima, Direktur,

Alif Karnadi Gayuh Tri Satria


Signal Processing 183 (2021) 107994

Contents lists available at ScienceDirect

Signal Processing
journal homepage: www.elsevier.com/locate/sigpro

A jump-diffusion particle filter for price prediction!


Myrsini Ntemi∗, Constantine Kotropoulos1
Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, 54124, Greece

a r t i c l e i n f o a b s t r a c t

Article history: In stock and flight price time series diffusion and jumps govern price evolution over time. A jump-
Received 29 August 2020 diffusion dyadic particle filter is proposed for price prediction. In stock price prediction, the dyad com-
Revised 17 December 2020
prises a latent vector modeling each stock and a latent vector modeling the group of companies in the
Accepted 16 January 2021
same category. In flight price prediction, the dyad consists of a departure latent vector and an arrival
Available online 19 January 2021
latent vector, respectively. A particle coefficient is introduced to encode both diffusion and jumps. The
Keywords: diffusion process is assumed to be a geometric Brownian motion whose dynamics are modeled by a
Particle filters Kalman filter. The negative log-likelihood of the posterior distribution is approximated by a Taylor ex-
Price diffusion pansion around the previously observed drift parameter. Efficient approximations of the first and second-
Price jumps order derivatives of the negative log-likelihood with respect to the previously observed drift parameter
Stock price prediction are derived. To infer sudden price jumps, a reversible jump Markov chain Monte-Carlo framework is used.
Flight price prediction
Experiments have demonstrated that price jump and diffusion inference mechanisms lead to more accu-
rate predictions compared to state-of-the-art techniques. Performance gains are attested to be statistically
significant.
© 2021 Elsevier B.V. All rights reserved.

1. Introduction and exponentially distributed [8]. The memoryless property of the


exponential random variables allows for analytical calculations of
Financial time series are characterized by small price shifts expectations.
through time, known as volatility [1]. These shifts are modeled However, these processes suffer from several weaknesses.
by diffusion processes [2]. A diffusion process is a continuous-time Specifically, since the capital structure of the market is complex,
Markov process having almost surely continuous sample paths [3]. unrealistic assumptions are invoked [10]. In addition, when signif-
A widely known example of diffusion process is the Brownian mo- icant values are assigned to volatility, these models tend to un-
tion [4]. However, the most challenging task in financial forecast- derestimate the observed prices. These facts are evidenced by the
ing is the modeling of sudden price jumps that occur through time. failure of the Merton model on the US corporate bonds [11]. Ac-
Jumps, i.e., sharp peaks and crashes, are rare events, occurring at curate price prediction is a difficult task given only the histori-
random intervals. A jump process is usually modeled by a com- cal prices. The most critical factor for efficient financial time se-
pound Poisson process [5,6]. In the latter process, the arrival and ries forecasting is the construction of a robust latent space. In ad-
the size of jumps are random variables. Stock and flight price time dition, an effective filtering technique should be applied in order
series exhibit both diffusion and jumps [7]. Their evolution can be to conduct accurate predictions. One widely applied filtering tech-
represented by jump-diffusion models [8]. The most popular exam- nique to time series is particle filtering (PF) [12]. According to PF, a
ple of such processes is the Merton jump-diffusion model, where weighted set of samples of the states are utilized to infer tractably
the jumps are normally distributed [9]. Kou model is a widely ap- the posterior distribution of the states of a system [13]. These sam-
plied model, which assumes that the jump sizes are asymmetric ples are known as particles and the state estimation efficacy de-
pends on their population and quality. A large amount of high
quality particles leads to an optimal posterior distribution approx-
!
Acknowledgement: This research has been co-inanced by the European Union imation [14]. Particle generation is conducted through systematic
and Greek national funds through the Operational Program Competitiveness, importance sampling (SIS). In this procedure, insignificant weights
Entrepreneurship and Innovation, under the call RESEARCH CREATE INNOVATE
may be assigned to the majority of particles. This is known as de-
(project code:T1EDK-02474).

Corresponding author. generacy [15]. Resampling of particles with significant weights is
E-mail addresses: [email protected] (M. Ntemi), [email protected] (C. a solution to this problem. This technique is called systematic im-
Kotropoulos).
1
EURASIP member.

https://doi.org/10.1016/j.sigpro.2021.107994
0165-1684/© 2021 Elsevier B.V. All rights reserved.
M. Ntemi and C. Kotropoulos Signal Processing 183 (2021) 107994

portance resampling (SIR). To ensure particle variability, noise is Table 1 summarizes the notation used throughout the paper.
added to particles, which is known as roughening [16]. Section 2 reviews related work. Section 3.1 discusses the con-
In this paper, a pair of latent vectors (i.e., a dyad) constructs struction of the latent space of dyadic particle filter. The diffu-
a latent space. These latent vectors interact through time as in sion and jump processes are elaborated in Sections 3.2 and 3.3,
Collaborative Kalman filter (CKF)[17]. In stock price prediction, a respectively. The particle coefficient vector generation is detailed
dyad is formed by a stock latent vector modeling the stock evo- in Section 3.4 and PF is described in Section 3.5. Experimental re-
lution and a market segment latent vector modeling the evolution sults are presented in Section 4 and conclusions are summarized
of the market segment where the respective stock belongs to. For in Section 5.
example, the stock latent vector models BP stock price evolution
through time and the segment latent vector models the evolution 2. Related work
of oil companies through time. In flight price prediction, a dyad is
constructed by a departure latent vector, i.e., Amsterdam and an Jump-diffusion models have gained much attention in price
arrival latent vector, i.e., Athens. This latent space is utilized in a time series forecasting. The Merton jump-diffusion model is a pop-
dyadic PF (DPF) [18]. Instead of assuming these vectors fixed as in ular model to describe price dynamics. In [24], a Stochastic Grid
[18], here they are treated as dynamically evolving entities. Specifi- Bundling Method is applied to a Merton jump-diffusion model
cally, the stock latent vector is assumed to evolve through time ac- for pricing multidimensional Bermudan options, which reduces the
cording to a geometric Brownian motion log drift parameter. This uncertainty of price estimates by control variates. Other works uti-
parameter records the diffusion process. It is computed through a lize radial basis function interpolation techniques in the Merton
second-order Taylor expansion of the negative log-likelihood of the model to compute option prices, which allow the implementation
posterior distribution of the latent vectors about the previous value of the boundary conditions efficiently [25,26]. In [27], a tridiagonal
of the log drift parameter. The posterior distribution is obtained system of linear equations is proposed to solve a partial integro-
by Kalman filtering. A Newton-type update of the drift parameter differential equation in Merton and Kou jump-diffusion models.
is employed as in [17]. Extending [17], efficient approximations of Some works combine the Markov-switching approach and Lèvy
the first and second-order derivatives of the negative log-likelihood processes to detect jumps and regime-switches in a price time se-
with respect to (w.r.t.) the previous value of the log drift parame- ries [28].
ter are derived from first principles. The jump process is inferred Many non-linear filters (e.g., Extended Kalman filter, Unscented
through a reversible-jump Markov chain Monte Carlo (rjMCMC) Kalman filter) have been proposed to approximate the non-
scenario, since both the arrival rate of a jump and the times of Gaussian dynamics governing price evolution through time. An ex-
a jump occurrence are unknown [19], [20]. Here, the jump arrivals tended Kalman filter is applied to estimate the Schwartz model
follow a Poisson process, where the rate is locally constant within and estimate the price of WTI crude oil [29]. An H-extended
a window of 3 price observations. The model parameters, i.e., jump Kalman filter is proposed in [30] to bound the influence of un-
times and jump rate, are sampled from their full conditional distri- certainties. Other works combine a square root Unscented Kalman
bution via a Gibbs sampler. Both diffusion and jump components filter and control theory to ensure positive semi-definiteness of
are attached to the main diagonal of the posterior covariance ma- the state covariances [31], [32]. PFs also deal with the non-linear
trix of the latent vectors at the previous time step, forming the dynamics in time series. In [33], an iterated auxiliary PF is pro-
prior covariance matrix at the current time step. Then, the product posed to approximate an optimal sequence of positive functions.
of the diagonal elements of the prior covariance matrices of the In [34], a PF with non-Gaussian transition densities is proposed
latent vectors of the dyad is computed. A particle coefficient vec- for high-frequency financial data estimation. Although these filters
tor is introduced, whose components encode the aforementioned track price volatility through time, they fail to capture any sudden
product, yielding a jump-diffusion particle filter (JDPF). Particles jumps.
are generated according to this aforementioned particle coefficient
vectors, which influence strongly their efficacy. As a result, a small
3. The jump-diffusion particle filter
number of particles leads to the optimal approximation of the pos-
terior distribution within a Bayesian context. In contrast to [21],
The jump-diffusion particle filter builds on the (DPF) [18]. Here,
the tracking ability of JDPF does not rely on the assumption of a
the dynamic evolution of the prior covariance matrix of the latent
heavy-tailed distributed observation error variance, but it relies on
state vectors is governed by a jump-diffusion model.
the jump-diffusion scheme, leading to a totally different approxi-
mation of the posterior mean vectors, the posterior covariance ma-
trices, and the particles. By doing so, the efficacy of the particle 3.1. The latent state of the dyadic particle filter
filter is strongly enhanced, as it is demonstrated by experiments.
JDPF tracks closely both the volatility and price jumps through The latent state of the DPF is constructed through a probabilis-
time, indeed. JDPF predictions are compared to those of state-of- tic approach, by utilizing KF. That is, the latent state consists of
the-art methods in [17], [20], [22], and [23] as well as our previous a dyad i, j of two latent vectors si [t] ∈ Rn and m j [t] ∈ Rn , which
results [18] and [21]. In stock price prediction, JDPF outperforms all interact through time [17]. At every time step, the prior probabil-
methods in all cases. In flight price prediction, JDPF performs best ity distributions of these latent vectors are computed and when an
in those routes, where jumps are present. Summing up, three novel observation occurs, their posterior probability distributions should
contributions are made in the paper: be inferred. The prior distributions of the latent vectors at time t
form the state model
! " ! "
si [t] ∼ N µsi [t], !si [t] , m j [t] ∼ N µm j [t], !m j [t] (1)
• A new analytical approximation is derived for the log drift pa-
rameter. which are multivariate Gaussians with prior mean vectors
• Jumps are added as a concept to the particle filter, allowing it µsi , µm j ∈ Rn and prior covariance matrices !si , !m j ∈ Rn×n . In
to track consistently the overall price evolution through time. Eq. (1), instead of assuming that the prior covariance ma-
• Statistical hypothesis testing demonstrates that there is strong trix !si ∈ Rn×n is the posterior one at t − 1, i.e., !si [t] =
evidence to reject the null hypothesis of equal mean squared !&si [t − 1] as in [18], the prior covariance matrix is allowed to
prediction error at 5% significance level. evolve according to a jump-diffusion framework. Specifically, the

2
M. Ntemi and C. Kotropoulos Signal Processing 183 (2021) 107994

Table 1
Notation.

a[t] log drift value


α geometric Brownian motion
b parameter vector used in first and second-order derivatives computation
c particle coefficient vector
d number of particles
G (ζ [t ], ξ [t ] ) Gamma distribution with parameters ζ [t] and ξ [t]
H sliding window over the time series
I identity matrix
J[t] price jump at time t
k realizations k = 1, 2, . . . , d
L evidence lower bound function
LT length of Markov chain
m item/segment/arrival latent state vector
M matrix equal to the product of the transpose matrix of eigenvectors, the posterior covariance matrix and the matrix of eigenvectors
N Gaussian distribution
NJ Poisson process counting jump arrivals
n latent state vector dimension (i.e., size)
o particle vector
Pb,d,m probability of acceptance of proposals
p probability density function
q approximate posterior distribution
r realizations r = 1, 2, . . . , d
s stock or departure latent state vector
t time step
Tk set of jump times
Tk& proposals (i.e., new states of the Markov chain)
TN truncated normal distribution
tr trace
U diagonal matrix in coefficient vector computation
u zero-mean Gaussian random variable
v truncated Gaussian random variable
W matrix of eigenvectors
xn parameter used in first and second-order derivatives computation
y observation vector
ȳ average across the observations
y real price
yˆ predicted price
z latent variable describing price prediction
Z jump size
β significance level of the F test
γJ jump arrival rate
δ particle weight
'[t] time elapsed since the previous observation
( extra drift parameter
η parameter used in the calculation of acceptance probabilities
θ [t] set of parameters of rjMCMC at time t
κ noise vector in roughening
# matrix of eigenvalues of the prior covariance matrix
λn nth eigenvalue
# ˜ diagonal matrix where each element is the sum of an eigenvalue and geometric Brownian motion multiplied by '[t]
λ˜ n nth diagonal element of # ˜
µ prior mean vector
µ& posterior mean vector
ρ uniform random variables used in systematic resampling
σδ zero-mean truncated Gaussian random variable
σi2j price observation variance
σJ standard deviation of jump size
σκ variance of vector used in roughening
! prior covariance matrix
!& posterior covariance matrix
!&& updated posterior covariance matrix
. number of resampling loops
τi time instant of the i-th jump on the Markov chain
ϕ resampling threshold
ψ systematic resampling parameter
|| · ||F Frobenius norm
∗ Hadamard product

prior mean vector and the prior covariance matrix are defined is the identity matrix. In Eq. (2), αsi [t] is the drift value of the
as Brownian motion of si and J[t] is the jump value at time t. These
! " entities represent the diffusion and jump components of JDPF. Let
µsi [t] = µ&si [t − 1], !si [t] = !&si [t − 1] + αsi [t] + J[t] I (2) the state equation si [t] = s&i [t − '[t] ] + g[t]. Assuming that the pos-
terior latent vector at t − '[t] and the noise vector g are Gaussian,
where µ& [t − 1] denotes the posterior mean vector at the previous
the expression for the covariance matrix of the prior latent vector
time step, !& [t − 1] is the posterior covariance matrix and I ∈ Rn×n

3
M. Ntemi and C. Kotropoulos Signal Processing 183 (2021) 107994

at t in Eq. (2) results from with y[t − 1] being the price at the previous time step. The opti-
#$ t
% mal parameter updates for the approximate distribution of m j are
!si [t] = !&si [t − '[t] ] + ea[t˜] dt˜ + J[t] I (3) obtained in a similar manner. As can be seen in Eq. (8), both the
t−'[t] posterior mean vector µ&si [t] and the posterior covariance matrix
where the stochastic integral is associated to geometric Brown- !&si [t] contain information about m j [t]. In the corresponding com-
ian motion. Since typically the paths of a Brownian motion are putation of µ&m j [t] and !&m j [t], the dynamically evolving si [t] drifts
nowhere differentiable, a solution can be provided by extending m j in its path in the joint latent space. The posterior parameters
the differential calculus to Itô calculus. By doing so, the stochas- at time t constitute the prior parameters at time t + 1. Next, the
&t
tic integral of Brownian motion is approximated as t−'[t] ea[t˜] dt˜ ≈ drift value αsi [t] in Eq. (2) is calculated.
ea[t] '[t] = α [t] '[t] with '[t] denoting the number of time steps
that passed since the last observation of si (here, '[t] = 1) [17]. 3.2. Diffusion process
Accordingly, the main diagonal of the prior covariance matrix of
si [t] encodes the jump-diffusion model to be exploited by the par- The diffusion dynamics of the model lie in the conditional pdf
ticle coefficient vector to improve price estimation. The drift is re- of the prior latent vector si [t] given the posterior latent vector at
sponsible for the constant motion of si capturing the price volatil- the previous time step
ity through time, while the jump component captures discontinu- ! "
ities, i.e., price jumps. Eq. (2) is also applied to compute the pa- p(si [t] | s&i [t − 1] ) ∼ N µs&i [t − 1], !&si [t − 1] + (αsi [t] + J[t] )I
rameters of the prior distribution of the latent vector m j [t] with
(9)
the difference that there is not a drift component. m j [t] evolution
relies on the motion of si [t], a procedure accomplished through as [t]
where α is a geometric Brownian motion αsi [t] = e i , with asi [t]
the posterior probability density function (pdf) calculation, which representing the Brownian log drift value distributed as:
will be discussed next. ) *
The posterior pdfs of the latent vectors, taking into account the asi [t] ∼ N asi [t − 1], ( '[at] (10)
price observation y[t], are also multivariate Gaussians
! & " ! " [t]
where 'a is the time elapsed since the last observation of asi ,
s&i [t] ∼ N µsi [t], !&si [t] , m&j [t] ∼ N µ&m j [t], !&m j [t] (4)
assuming values equal to 1 or 2 (days), and ( is an extra drift pa-
with µ& and !& denoting the posterior mean vectors and the rameter [17]. The geometric Brownian motion is also inferred at
posterior covariance matrices, respectively. Since the calculation every time step via variational inference. That is, to compute the
of the posterior mean vector and covariance matrix is an in- Brownian log drift value, a second-order Taylor expansion about
tractable problem, a variational inference technique is utilized to its previous occurrence value at t − 1, asi [t − 1], is applied [17]:
obtain approximate solutions [35]. That is, a factorized distribu-
tion f (asi [t] ) ≈ f (asi [t − 1] ) + (asi [t] − asi [t − 1] ) f & (asi [t − 1] ) +
! is assumed " to approximate the true posterior distribution
p si [t ], m j [t ] at time t i.e., 1
+ (as [t] − asi [t − 1] )2 f && (asi [t − 1] ) (11)
! " ! " ! " 2 i
q si [t ], m j [t ] ≈ q si [t ] q m j [t ] (5)
where f (· ) = − ln p(si [t ], asi [t ] ) is the negative log-likelihood and
where q(· ) is the approximate distribution. Price observation at f & () and f && () denote its first-order and second-order derivative
time t is normally distributed with mean value equal to the expec- w.r.t. asi [t − 1], respectively. The optimal solution w.r.t. asi [t] is ob-
tation of the inner product of si [t] and m j [t] w.r.t. the approximate d f (as [t] )
tained by solving f & (asi [t] ) = i
= 0, i.e., given by:
posterior distribution and fixed variance σi2j , so that d as [t]
i

! ' ( " f & (asi [t − 1] ) + (asi [t] − asi [t − 1] ) f && (asi [t − 1] ) = 0 ⇔


y[t] ∼ N Eq sTi [t] m j [t] , σi2j . (6)
' f & (as [t − 1] )
Eq sTi [t ] m j [t ]} is price prediction according to CKF [17]. To predict asi [t] = asi [t − 1] − && i . (12)
f (asi [t − 1] )
a price, instead of evaluating the aforementioned expected value,
PF is employed as will be discussed next. f (· ) should be minimized w.r.t. asi [t]. The analytic calculation of
The optimal approximate distribution is found through the the first-order and second-order derivatives is given in Appendix A.
minimization of the Kullback-Leibler (KL) divergence KL(q|| p) = Let !&si [t − 1] = W#WT , where # = diag(λn ) is the diagonal ma-
' q( trix of the eigenvalues of the prior covariance matrix at t − 1
Eq log between q(· ) and p(· ). Equivalently, the evidence lower
p and W ∈ Rn×n is the matrix having as column the correspond-
bound (elbo) function ing eigenvectors. If M = WT !&si [t]W, the derivatives appearing in
' ( ' (
L = Eq log p(si [t ], m j [t ] ) − Eq log q (7) Eq. (12) are given by

is derived through the application of Jensen’s inequality to log pdfs 1


f & (asi [t − 1] ) = − (asi [t] − asi [t − 1] ) +
and is maximized. A simple solution is provided by choosing the ( '[at]
mean-field variational distribution family for q(· ) [36]. ,# %# %-
! " 1+ b 2
2Mnn
In order to find the optimal approximate distributions q∗ si [t] + xn 1− n 1+ (13)
! " 2 n λ˜ n ˜n
Mnn − λ
and q∗ m j [t] , the respective optimal posterior mean vec-
tors µ&si [t ], µ&m j [t ] and optimal posterior covariance matrices and
, !&si [t], !&m j [t] should be calculated. This is accomplished through 1 1+
f && (asi [t − 1] ) = + xn ( 1 − xn ) +
a coordinate ascent update [17]: (' [t] 2 n
a
# % , -
& −1
µ&m j [t ] µ&mT j [t ] + !&m j [t ] −1 1 + b2n xn Mnn
!si [t] = !si [t ] + + 1 − 2 ( 1 − xn ) +
σi2j 2 n λ˜n ˜n
Mnn − λ
# & % , # %-
y[t − 1]µm j [t ] + Mnn Mnn
µ&si [t] = !&si [t ] + ! −1
si [t ] µ si [t ] (8) + xn bn 1 − xn . (14)
σi2j n
˜n
Mnn − λ ˜n
Mnn − λ

4
M. Ntemi and C. Kotropoulos Signal Processing 183 (2021) 107994

Table 2
First and second-order derivatives here and in [17].
. /! b2n
"! "0
here f & (asi [t − 1] ) = − 1
(asi [t] − asi [t − 1] ) + 1
n xn 1− 1+ 2Mnn
( '[at] 2 λ˜ n ˜n
Mnn −λ
1
. . b2n xn
/ 0 . ! "2
f && (asi [t − 1] ) = 1
+ 1
n xn ( 1 − xn ) + 1
n 1 − 2 ( 1 − xn ) Mnn
+ n xn bn Mnn
1 − xn Mnn
( '[at] 2 2 λ˜ n ˜n
Mnn −λ ˜n
Mnn −λ ˜n
Mnn −λ

. /! b˜ 2n +Mnn
"0
[17] f & (asi [t − 1] ) = − 1
(asi [t] − asi [t − 1] ) − 1
n xn 1−
( '[at] 2 λ˜ n
. . b˜ 2 +M
f && (asi [t − 1] ) = − 1
− 1
2 n xn ( 1 − xn ) + 1
2 n xn ( 1 − 2xn n ˜ nn
)
( '[at] λn

where λ˜ n = (λn + easi [t] '[t] ), 2 ˜ n ), bn is the nth element


˜ = diag(λ the first and the last jump time of the Markov chain, respectively.
a
of b = (M# ˜ −1 − I )WT µ& [t − 1], and This new jump time is embedded to the existed set of jumps, i.e.,
si
Tk& = Tk ∪ τi∗ . Let θ [t] = {σJ [t], γJ [t]} be the set of model parameters
easi [t−1] '[at] at time t, with σJ [t] denoting the standard deviation of jump size.
xn = . (15) The probability of acceptance of this proposal is Pb = min(1, ηb )
λn + easi [t−1] '[at]
[20], where
Derivations (13) and (14) are variants of the corresponding expres-
˜ = (µ& [t] − µ [t] )T W. p(yw [t ]|Tk& , θ [t ] ) p(Tk& |θ [t ] )(tk − t0 )
sions in [17]. Let b˜ n be the nth element of b si si ηb = (19)
Table 2 summarizes the approximations of first and second-order p(yw [t ]|Tk , θ [t ] ) p(Tk |θ [t ] )(NJ [t ] + 1 )
derivative derived here and in [17]. with yw [t] denoting the window of three observations. In a death
Each stock latent vector si [t] is assigned a unique drift param- proposal, the discarding of an existed jump time in Tk is τi∗ ∼
eter. It enables si [t] to capture price volatility which then is pro- U (t0 , tk ) and the proposal becomes Tk& = Tk \ τi∗ , which is accepted
vided to particle coefficient vector. This procedure enriches JDPF with probability Pd = min(1, ηd ), where
with dynamic tracking abilities, reinforcing price prediction.
p(yw [t ]|Tk& , θ [t ] ) p(Tk& |θ [t ] )NJ [t ]
ηd = . (20)
p(yw [t ]|Tk , θ [t ] ) p(Tk |θ [t ] )(tk − t0 )
3.3. Jump process
For move proposals, a randomly picked uniform jump time τi from
The jump component J[t] in Eq. (2) is a homogeneous com- Tk is assigned a new position in the sequence of jumps and be-
pound Poisson process [37,38] in the window of the three obser- comes τi& forming the proposal as Tk& = (Tk \ τi ) ∪ τi& . This new po-
vations yw [t] = (y[t], y[t − 1], y[t − 2] )T defined as sition is normally distributed as τi& ∼ N (τi , σm2 ), with mean value
NJ [t]
the previous position of τi and variance σm2 ∼ N (0, 1 ). The accep-
+ tance probability is Pm = min(1, ηm ), where
J[t] = Z3 (16)
3=1 p(yw [t ]|Tk& , θ [t ] ) p(Tk& |θ [t ] )
ηm = . (21)
where the jump sizes Z3 are independent and identically dis- p(yw [t ]|Tk , θ [t ] ) p(Tk |θ [t ] )
tributed Gaussian random variables with mean value 0 and vari- This model follows a Gaussian distribution conditional on the jump
ance σJ2 [t] (i.e., Z3 ∼ N (0, σJ2 [t] )), independent of NJ [t]. The homo- times and depends linearly on the previous state of the jump se-
geneous Poisson process NJ [t] counts the jump arrivals, i.e., quence Tk . The prior probability distribution of jump times Tk at
+ time step t is:
NJ [t] = 1[T k ,∞] [t] (17)  9NJ [t]
k≥1 
1 − exp(tk − τNJ [t] |γJ [t] ) i=1
. p(Tk |θ [t] ) = · exp(τi − τi−1 |γJ [t] ), if τi ∈ [t0 , tk ] (22)
with T k = ki=1 τi denoting the sum of the times when jumps oc- 

cur in the observation window. In Eq. (17), 0, otherwise
3
1 if t ≥ T k where τ1 < τ2 < · · · < τNJ [t] denotes the ordered sequence of NJ [t]
1[T k ,∞] [t] = (18)
0 if 0 ≤ t < T k . jump times in Tk at time t [20].
In order to infer jumps at time t, the parameters of the com-
4
Let Tk = ki=1 τi be the set of times when jumps occur. Jumps pound Poisson process should be defined. That is, the parameters
arrive with constant rate γJ [t] within the observation win- θl [t] ∈ {σJ [t], γJ [t]} are sampled from their full conditional distribu-
dow. The idle time between two consecutive jumps as τi − tion via a Metropolis-within-Gibbs method [20]:
τi−1 ∼ exponential(γJ [t] ). It is also possible to simulate a non-
homogeneous Poisson process with the thinning algorithm for suit- p(θl [t ]|θl− [t ], Tk , yw ) ∝ p(yw |θ [t], Tk ) p(Tk |θ [t] ) p(θl [t ]|θl− [t ] )
able bound γJ∗ , such that a time-varying rate within the obser- (23)
vation window satisfies γJ [t] ≤ γJ∗ [39,40]. The ith jump occurs at where θl− [t]
= θ [t] \ θl [t]. To sample the parameter θl [t], a pro-
time τi on the jump time sequence Tk . Since these times are un- posal density q(θl∗ [t ]|θl& [t ] ) is introduced, where θl∗ [t] is the pro-
known, an rjMCMC scheme is applied to infer them [20]. The jump posal value for θl [t], and θl& [t] is the value of the current sample
times Tk are represented by a Markov chain with length LT = 20, θl [t]. The acceptance probability is Pθl [t] = min(1, ηθl [t] ) where
which should be reversible. According to this method, new states
of the Markov chain are proposed, which are called proposals and p(yw |θl− [t ], θl∗ [t ], Tk ) p(Tk |θl− [t ], θl∗ [t ] ) p(θl∗ [t ]|θl− [t ] )q(θl& [t ]|θl∗ [t ] )
ηθl [t] = .
a new sequence is formed as Tk& . The acceptance of these proposals p(yw |θl− [t ], θl& [t ], Tk ) p(Tk |θl− [t ], θl& [t ] ) p(θl& [t ]|θl− [t ] )q(θl∗ [t ]|θl& [t ] )
rely on a probability derived by the consideration of a transition of (24)
the Markov chain and its inverse [19]. Three kinds of proposals are
To sample the jump rate γJ [t] at time step t, a suitable conju-
introduced. The move proposal, where the jump time on the chain
gate prior should be utilized. Given that the conjugate prior of an
is shifted, the birth proposal, where a new jump time is generated,
exponentially distributed random variable is the Gamma distribu-
and a death proposal, where a jump time is discarded [20] [41].
tion [42]:
In a birth proposal, a new jump time is derived according to
τi∗ ∼ U (t0 , tk ), where U (· ) is the uniform distribution and t0 , tk is γJ [t] ∼ G (ζ [t], ξ [t] ) (25)

5
M. Ntemi and C. Kotropoulos Signal Processing 183 (2021) 107994

with parameters given by: Table 3


Augmented Dickey-Fuller test.
ζ [t] = ζ [t − 1] + |H| (26)
Stock ADF statistic Non-stationary

BP –1.055574 yes
+ Shell –1.095411 yes
ξ [t] = ξ [t − 1] + y[t] (27) Coca-Cola –0.442442 yes
t∈H Pepsi –1.530511 yes
Pfizer –1.554863 yes
where |H| = 3 is a window containing 3 price observations, which Novartis –2.414171 yes
shifts over the price time series. This window size provides the op- Roche –1.640908 yes
timal trade-off between computational complexity and prediction Posco –1.637748 yes
performance. The prior distributions of parameters p(θl [t ]|θl− [t ] )
are assumed to be Gamma distributions [20].
is the average of the components of the rth observation vector de-
3.4. Particle coefficient vector fined in Eq. (30) and σδ2 ∼ T N (0, 0.03 ). Let us introduce ψ (r ) =
exp(−δ (r ) /max{|δ (l ) |} ), which undergoes normalization yielding
.
PF is applied to estimate the distribution in Eq. (6) at time t. ψ˜ (r ) = ψ (r ) /l dl=1 ψ (l ) . The threshold of the systematic resam-
Let c(r ) [t] = s(r ) [t] ∗ m(r ) [t], where ∗ is the Hadamard product [21]. .d
pling is ϕ = r=1 ψ˜ (r ) . If ρ (r ) < ϕ , a new observation vector arises
The particles o(r ) ∈ Rn are generated according to according to Eq. (30). Otherwise, the current observation vec-
o(r ) [t] = o(r ) [t − 1] + c(r ) [t − 1], r = 1, . . . , d, (28) tor y(r ) [t] is utilized. Systematic resampling is applied iteratively,
within . resampling loops. This process ensures that only the par-
Let σs2i,r [t] denote the rth diagonal element of the prior covari- ticles with significant weights are maintained.
ance matrix !si [t] ≈ diag(σs2i,1 [t], σs2i,2 [t], . . . , σs2i,n [t]). Similarly let Gaussian noise is attached to observation vectors of Eq. (30) to
σm2 j,r [t] denote the rth diagonal element of the prior covariance deal with impoverishment. This mechanism, known as roughening,
ensures their variability and it is conducted according to:
matrix !m j [t]. The elements of the coefficient vector c(r ) [t] ∈ Rn
! "
follow a Gaussian distribution N 0, σs2 [t ]σm
2 [t ] [21]. Specifically, y(r ) [t] ← y(r ) [t] + κ[t] (34)
i,l j,l
they are computed through: where κ[t] ∼ N ( 0, σ 2
κ I ). Eventually, the price estimation at time
(r ) 2 2 (r ) step t is derived through the computation of the average across
cl [t] = σ si,l [t] ·σ m j,l [t] ·u [t], l = 1, 2, . . . , n (29)
d observation vectors y(r ) [t] and their n components at every re-
with u(r ) [t] ∼ N (0, 1 ). sampling loop, i.e.,
The prior covariance matrix !si [t] of the latent vector si [t] con- n d
1 + + (r )
tains the diffusion and jump modeling factors αsi [t] and J[t], re- yˆ[t ] = yl [t ]. (35)
n·d
spectively. The particle coefficient generation cl(r ) [t] relies also on l=1 r=1
!si [t]. As a result, the dynamic information regarding diffusion and
jump modelling is embedded into the particle coefficients through 4. Experimental results
PF.
4.1. Stock price prediction
3.5. Particle filtering
Experiments are conducted on the opening prices of Pepsi,
At time t a predefined number of particles d is sampled Coca-Cola, Pfizer, Roche, Novartis, Shell, BP and Posco stocks in
from an importance distribution [43]. Here, the prior distribution order to compare the prediction performance with that of [17],
p(o(r ) [t]|o(r ) [t − 1] ) is the importance distribution. This is the SIS [18], [21] as well as [20] and [23], which also model jumps and
step. The prior covariance matrices !si [t ], !m j [t ] are utilized to diffusion in financial time series. A time series is constructed for
each stock containing the first 1,0 0 0 daily historical prices from
compute the coefficient vector c(r ) [t], as explained in Section 3.4.
1961 to 2003, which are selected as being the most erratic ones.
Particle generation relies on Eq. (28). The observation model of
Augmented Dickey-Fuller (ADF) tests are applied to these time se-
JDPF is given by:
ries in order to check stationarity [46]. The null hypothesis H0 as-
y(r ) [t] = o(r ) [t] + c(r ) [t], r = 1, . . . , d (30) sumes the presence of a unit root in the time series, meaning that
the series is non-stationary. H0 is rejected if the ADF statistic is
where y(r ) [t] ∈ Rn , r = 1, . . . , d are observation vectors. This is the less than the critical values at β1% = −3.437, β5% = −2.864, and
update step in PF. In order to eliminate degeneracy, systematic re- β10% = −2.568 for a significance level 99%, 95%, or 90%, respec-
sampling is applied on particles [44]. For this reason, d uniform tively. From the results of the test gathered in Table 3, it is at-
random variables are generated tested that there is non-significant evidence against H0 at any level
(r − 1 ) + ρ˜ of significance. As a consequence, all stock price time series are
ρ (r ) = , ρ˜ ∼ U (0, 1 ) r = 1, . . . , d (31) non-stationary.
d
JDPF is an on-line system and predicts the opening stock prices
where ρ (r ) ∈ [ r−1
d
, dr ). These d random variables can be consid- of the next day based on its current price. The latent vector di-
ered as a “comb” of d regularly spaced points [45]. Then, particle mension is n = 5, since prediction performance has not enhanced
weights are drawn from a Gaussian distribution, i.e, further with n = 10, 15, and 30. Multiple trials shown that the
δ (r ) [t] ∼ N (y[t − 1], v(r ) [t] ), r = 1, . . . , d (32) optimal number of particles is d = 20. The resampling loops are
. = 100, the variance of the random vector κ[t] used in roughen-
with v(r ) [t] denoting a truncated Gaussian random variable dis- ing is σκ2 = 0.01, the extra drift parameter ( = 8 × 10−3 , and the
tributed as v(r ) [t] ∼ T N (ȳ(r ) , σδ2 ), where drift value of the Brownian motion of si is asi [t0 ] = −6.5 in or-
1 T der to be compatible with [21]. The variance of the jump size is
ȳ(r ) [t ] = 1 y(r ) [t ] (33) σJ2 = 0.05. The initial parameters of Gamma distribution of jump
n n×1
6
M. Ntemi and C. Kotropoulos Signal Processing 183 (2021) 107994

Fig. 1. Real Novartis prices. Fig. 3. Real Roche prices.

Fig. 4. Roche predictions.


Fig. 2. Novartis predictions.

arrivals γJ are set to ζ [t0 ] = 1, ξ [t0 ] = 10. The length of the Markov
chain is set to LT = 20. The probabilities of acceptance are Pb = 0.8,
Pd = 0.1, and Pm = 0.1, as in [20].
The historical daily prices are illustrated in Figs. 1, 3, 5, 7, 9,
11, and 13 in black color. Figs. 2, 4, 6, 8, 10, 12, and 14 depict the
respective predictions in red color.
Figs. 15 and 16 illustrate a zoom in 180 Novartis and Shell stock
price time series, respectively. The historical prices are those in
black color and the predicted prices are those in red color. Clearly,
JDPF conducts remarkable price predictions. The plots of predicted
and historical prices are considerably similar. The evaluation of the
prediction performance is conducted w.r.t. the root-mean-square
error (RMSE) in USD. The results are summarized in Table 4. The
first column indicates the stock, the second column refers to the
performance of the proposed JDPF, the third column shows the
performance of the DDPF [21], the fourth column represents the
performance of the DPF [18], the fifth column summarizes the per-
formance of the Collaborative Kalman Filter [17], the sixth column
refers to [20], the seventh column refers to [23], and the price
range of each stock in USD is summarized in the eighth column. Fig. 5. Real Pfizer prices.

7
M. Ntemi and C. Kotropoulos Signal Processing 183 (2021) 107994

Table 4
Stock prices prediction performance.

Stock RMSE in USD Price range (USD)

JDPF DDPF [21] DPF [18] CKF [17] [20] [23]

BP 0.2345 1.0066 1.5694 7.4542 1.1995 1.0897 3.6563 - 21.9375


Shell 1.0202 1.2115 1.6991 3.2059 1.2389 1.2848 36.75 - 87.56
Coca-Cola 0.2009 1.0492 1.6388 2.4013 1.0288 1.1311 0.1927 - 1.5573
Pepsi 0.1995 1.0032 1.6141 2.6129 0.9791 1.0065 0.5648 - 2.7153
Pfizer 0.1098 1.1098 1.6276 4.9927 0.8922 0.8809 0.4505 - 1.8177
Novartis 0.6197 0.9918 1.2615 3.1994 1.1097 1.2041 27.6875 - 60.89
Roche 0.4858 1.2433 2.4954 6.0009 1.1935 1.3136 9.6125 - 38.6
Posco 0.7944 1.1997 2.2491 3.9528 1.3271 1.2899 10.375 - 75.28
Average 0.5078 1.1019 1.5728 7.56 1.1211 1.1538 0.1927 - 87.56

Fig. 6. Pfizer predictions. Fig. 8. Pepsi predictions.

Fig. 7. Real Pepsi prices.


Fig. 9. Real Coca-Cola prices.

The proposed JDPF performs best among the methods in [17],


[18], [20], [21], and [23]. As can be seen in Figs. 15 and 16, the procedure. This vital information is embedded into the coefficient
jump-diffusion modeling is successful. Despite the fact that this vectors and then it is introduced to particles. As a result, only 20
part of the time series exhibits high volatility, sharp jumps, and particles can efficiently approximate the posterior distribution. The
it is non-stationary, JDPF does not loose price tracking. Specifi- jump parameter along with the introduced Brownian drift parame-
cally, the drift and jump parameters enabled the latent vectors to ter computation play a principal role in tracking success, enriching
capture this dynamic information through a point-wise estimation JDPF with an advantage against the other methods.

8
M. Ntemi and C. Kotropoulos Signal Processing 183 (2021) 107994

Fig. 10. Coca-Cola predictions. Fig. 13. Real BP prices.

Fig. 11. Real Shell prices. Fig. 14. BP predictions.

Fig. 12. Shell predictions. Fig. 15. Zoom in 180 Novartis predicted and real prices.

9
M. Ntemi and C. Kotropoulos Signal Processing 183 (2021) 107994

Table 5 Table 6
F-test JDPF against DDPF, DPF, CKF, [20], [23] (N = 1, 0 0 0, Fβ /2 = JDPF-LSTM comparison of the prediction performance (N = 1, 0 0 0, F0.9775 = 0.9011,
0.9011, F1−β /2 = 1.1097, β = 5%). F0.025 = 1.1097).

Stock F12 F13 F14 F15 F16 Stock LSTM - JDPF RMSE (USD) F -test JDPF-LSTM

BP 0.5601 0.3021 0.0008 0.4327 0.4082 train/test RMSE LSTM [22] test RMSE JDPF F17
Coca-Cola 0.6639 0.3947 0.0235 0.5024 0.5380
BP 0.91/1.25 0.3091 0.4003
Novartis 0.6842 0.5601 0.0054 0.6099 0.6991
Coca-Cola 1.38/2.09 0.7998 0.04521
Pfizer 0.2998 0.1093 0.0014 0.4986 0.4894
Novartis 1.2/4.75 0.7006 0.0399
Pepsi 0.7230 0.4922 0.0109 0.5928 0.6013
Pfizer 1.49/1.25 0.2192 0.3095
Posco 0.6987 0.2191 0.0015 0.4651 0.4429
Pepsi 1.31/2.78 0.6711 0.0113
Shell 0.5999 0.1922 0.0145 0.5767 0.5806
Posco 2.94/2.63 0.9153 0.0106
Roche 0.6063 0.4005 0.0081 0.5982 0.6102
Shell 1.92/3.07 0.9041 0.0981
Roche 0.78/1.81 0.5069 0.4432

Fig. 16. Zoom in 180 Shell predicted and real prices.

Fig. 17. Berlin-Thessaloniki route.

The prediction performance is further examined. F -tests are ap-


plied to evaluate the statistical significance of MSE differences. The
π12 π2 π2 π2
F statistic is defined as F12 = , F = 12 , F14 = 12 , F15 = 12 , and
π22 13 π3 π4 π5
π12
F16 = where the subscripts 1, 2, 3, 4, 5, 6 refer to the proposed
π62
JDPF, the DDPF, the DPF, CKF, the method in [20], and the method
1 .N
in [23], respectively. For each algorithm, π 2 = N−1 ˆl − ȳ )2 is
l=1 (y
1 .N
defined, where yˆ denotes the predicted price, ȳ = N l=1 yˆ is the
mean of predicted prices, and N is the number of observations. The
significance level of the F test is assumed β = 5%. The null hypoth-
esis H0 is defined between JDPF and any other model. Specifically,
for JDPF and DDPF H0 : π12 = π22 . The null hypothesis is rejected
if F12 < Fβ /2 or F12 > F1−β /2 , where Fβ /2 = F (β /2, N − 1, N − 1 ) and
F1−β /2 = F (1 − β /2, N − 1, N − 1 ) are the critical values of the F
distribution with significance level equal to the subscript and N − 1
the degrees of freedom. Similar null hypotheses are defined for
JDPF and DPF H0 : π12 = π32 using test statistic F13 ; JDPF and CKF
H0 : π12 = π42 using test statistic F14 ; JDPF and [20] H0 : π12 = π52
using test statistic F15 ; JDPF and [23] H0 : π12 = π62 using test statis-
tic F16 . The F statistic and the critical values are summarized in
Table 5.
Fig. 18. Brussels-Thessaloniki route.
JDPF is also compared with a Long Short-Term Memory (LSTM)
deep neural network [22]. To compare the two models on a same
basis, 70% of stock prices are used for training, 30% are used for The second column in Table 6 summarizes the performance of
testing, and the number of epochs is set to 100 [22]. JDPF is ap- the LSTM deep neural network [22], i.e, test/train RMSE, respec-
plied to the same test set used for LSTM in order to guarantee a tively, while the third column summarizes the performance of the
fair comparison. Again, a statistical significance test is applied. The proposed JDPF on the test set used for LSTM [22]. JDPF performs
π12 best. According to the F tests, there is strong evidence to reject the
F statistic is F17 =
π72
. The null hypothesis H0 : π12 = π72 is rejected null hypothesis H0 of equal MSE for JDPF against DDPF, DPF, CKF,
if F17 < Fβ /2 or if F17 > F1−β /2 . The results are listed in Table 6. [20], [23] or LSTM at 0.05 significance level for all stocks.

10
M. Ntemi and C. Kotropoulos Signal Processing 183 (2021) 107994

Table 7
Flight ticket price prediction performance.

Departure Arrival RMSE in EUR Price Range (EUR)

train/test RMSE LSTM DPF DDPF [20] [23] JDPF

Amsterdam Athens 185.37 / 83.67 48.5826 47.4356 49.1134 51.0031 47.2006 64 - 646
Amsterdam Thessaloniki 128.83/58.82 56.7311 28.1487 32.8597 31.4864 27.6051 109 - 711
Berlin Athens 57.24/56.76 16.8384 12.8372 15.0561 16.0118 8.1201 59 - 258
Berlin Thessaloniki 10.34/9.85 20.4419 2.1951 24.3284 22.1639 2.2143 89 - 227
Brussels Athens 111.90/182.42 41.0444 40.1591 48.6628 50.1339 38.9566 88 - 638
Brussels Thessaloniki 95.18/79.19 91.9626 42.6099 39.7812 41.6271 20.1681 120 - 1358
Eindhoven Athens 60.37/25.77 14.446 12.8718 13.1991 13.9932 11.9272 163 - 431
Eindhoven Thessaloniki 233.44/67.68 91.9626 50.5403 54.4557 53.1087 41.9925 120 - 1358
Frankfurt Athens 107.84/361.45 43.3365 42.3067 51.2522 52.1939 41.1193 77 - 658
Frankfurt Thessaloniki 123.08/43.85 115.2531 33.5208 130.3652 100.9986 40.4662 198 - 977
Geneva Athens 86.03/42.55 25.4913 23.0955 24.3644 26.1724 19.9116 53 - 419
Geneva Thessaloniki 14.06/75.41 8.7674 4.5604 6.3966 7.8013 4.0199 133 - 239
London Athens 48.83/157.92 25.0637 22.6694 31.0811 32.1791 22.6546 65 - 393
London Thessaloniki 31.11/11.32 10.84 9.5415 12.7893 14.9965 4.7865 211 - 313
Milan Athens 64.84/75.16 20.7424 17.6877 19.9322 22.1006 17.3024 59 - 323
Milan Thessaloniki 13.80/4.50 7.7521 3.6736 9.0446 12.8012 2.0751 156 - 197
Munich Athens 183.32/438.02 81.3257 80.645 90.8981 93.3648 76.9139 76 - 1250
Munich Thessaloniki 113.32 /63.40 55.1415 24.9361 26.3841 30.0424 22.1319 49 - 612
Paris Athens 71.82/151.05 26.209 24.4464 25.5191 29.2012 19.3779 70 - 508
Paris Thessaloniki 84.15/28.13 23.6168 16.4376 20.2914 21.1986 10.3622 194 - 444
Prague Athens 225.88/39.81 55.6095 54.9637 53.0887 47.0748 49.1495 90 - 1104
Prague Thessaloniki 12.20/17.63 8.3879 2.8646 3.0776 3.5901 2.0197 95 - 151
Stockholm Athens 64.40/63.42 18.4782 16.2179 22.6697 20.3299 12.7036 44 - 296
Stockholm Thessaloniki 9.95/17.26 22.4197 4.8552 36.1517 38.0429 4.8562 79 - 287
Zurich Athens 94.79/63.79 28.8506 26.8774 25.0998 27.9432 20.6916 60 - 466
Zurich Thessaloniki 40.31/64.89 15.4131 10.1105 12.1726 14.1071 6.3143 180 - 332

Table 8
F-tests JDPF against LSTM, DPF, DDPF, [20], and [23].

Departure Arrival N F1−β /2 F1−β /2 F12 F13 F14 F15 F16

Amsterdam Athens 28 0.5250 1.9048 0.0928 0.6014 0.6133 0.5307 0.5112


Amsterdam Thessaloniki 31 0.5432 1.8409 0.0973 0.0934 0.5411 0.5129 0.5206
Berlin Athens 29 0.5313 1.8821 0.0087 0.0962 0.1221 0.0996 0.1181
Berlin Thessaloniki 34 0.5593 1.7878 0.2055 0.0042 1.7669 0.0038 0.0145
Brussels Athens 29 0.5313 1.8821 0.0036 1.7173 1.6981 0.4228 0.3043
Brussels Thessaloniki 17 0.4285 2.3335 0.0041 0.0016 0.1071 0.0932 0.1938
Eindhoven Athens 28 0.5250 1.9048 0.1644 0.4861 0.5307 0.5299 0.5404
Eindhoven Thessaloniki 25 0.5041 1.9838 0.2262 0.0836 0.2497 0.1955 0.1839
Frankfurt Athens 29 0.5313 1.8821 0.0012 0.5018 0.5644 0.2807 0.2698
Frankfurt Thessaloniki 10 0.3146 3.1789 0.3097 0.0027 1.1119 0.0558 0.1067
Geneva Athens 30 0.5374 1.8608 0.0761 0.3021 0.4981 0.4156 0.3982
Geneva Thessaloniki 29 0.5313 1.8821 0.0944 0.4997 1.0685 0.5398 0.5269
London Athens 30 0.5374 1.8608 0.0401 0.4916 0.5998 0.4698 0.4304
London Thessaloniki 10 0.3146 3.1789 0.0993 0.1001 0.1509 0.0867 0.0699
Milan Athens 30 0.5374 1.8608 0.1041 0.4916 1.0439 0.5015 0.4893
Milan Thessaloniki 29 0.5313 1.8821 0.5202 0.4014 0.5439 0.3907 0.3163
Munich Athens 30 0.5374 1.8608 0.0881 0.5082 0.5443 0.4006 0.3497
Munich Thessaloniki 33 0.5542 1.8045 0.1736 0.3141 1.0034 0.5500 0.5015
Paris Athens 29 0.5313 1.8821 0.1218 0.3677 0.5936 0.4942 0.4133
Paris Thessaloniki 10 0.3146 3.1789 0.1091 0.1354 0.2808 0.2015 0.1972
Prague Athens 29 0.5313 1.8821 1.9921 0.3811 0.4012 0.4996 0.4802
Prague Thessaloniki 28 0.5255 1.9048 0.2117 0.3286 0.5925 0.5201 0.5176
Stockholm Athens 29 0.5313 1.8821 0.0930 0.4124 0.5004 0.1198 0.0982
Stockholm Thessaloniki 33 0.5542 1.8045 0.2416 0.1067 1.0211 0.0910 0.0103
Zurich Athens 29 0.5313 1.8821 0.0981 0.3301 0.4909 0.5179 0.4967
Zurich Thessaloniki 10 0.3146 3.1789 0.0041 0.0974 0.2919 0.1893 0.0961

4.2. Flight price prediction π12


F statistic is F1i = , where the subscript i = {1, 2, 3, 4, 5, 6} cor-
πi2
JDPF is also applied for flight price prediction. The dataset con- responds to JDPF, LSTM, DPF, DDPF, [20], and [23], respectively.
sists of 26 routes, containing the flight route and the price of the The null hypothesis H0 : π12 = πi2 is rejected if F1i < Fβ /2 or if F1i >
flight ticket in Euros. All routes depart from several of European F1−β /2 . Table 8 represents the results.
cities. Athens, Greece constitutes the arrival city in 13 of them, and As can be seen in Table 7, JDPF outperforms DDPF in 23 out of
Thessaloniki, Greece constitutes the arrival city in the remaining 26 total routes. The flight prices of Berlin-Thessaloniki, Frankfurt-
ones. Here, the departure airport corresponds to the latent state Thessaloniki, and Stockholm-Thessaloniki routes are almost fixed
vector and the arrival corresponds to the segment latent state vec- for several time steps without exhibiting any price jumps. The real
tor. In Table 7, the performance of JDPF is compared to that of prices of route Berlin-Thessaloniki are plotted in Fig. 17. JDPF loses
LSTM [22], DPF, DDPF, and [20]. The RMSE is again utilized as fig- price tracking in the plateaus of fixed prices at time steps 4 − 12,
ure of merit. F -tests are applied, similarly with Section 4.1. The 14 − 19 and at time steps 20 − 31, where only a slight diffusion ex-

11
M. Ntemi and C. Kotropoulos Signal Processing 183 (2021) 107994

ists given the price range of the route. As a consequence, JDPF pre- 1 (0
· exp{− (si [t] − µ&si [t − 1] )T !−1 &
si [t] (si [t] − µsi [t − 1] )} (A.5)
dicted prices, which include jumps lost price tracking. On the con- 2
trary, when large price jumps were present w.r.t. the mean flight can be used to approximate fI (· ). Accordingly, Eq. (A.5) can be
price, such as in routes Brussels-Thessaloniki, London-Thessaloniki, rewritten as
and Zurich-Thessaloniki, JDPF performs best. The prices of route $
1 1
Brussels-Thessaloniki are plotted in Fig. 18. The RMSE could be al- f I (· ) = ln det(2π !si [t] ) ?
most half of that delivered by the other methods. The combination 2 det(2π !&si [t] )
of jumps and drift enhanced the robustness of the particle coeffi- B 1
C
cients. As a result, more accurate flight price predictions are dis- · exp − (si [t] − µ&si [t] )T (!&si [t] )−1 (si [t] − µ&si [t] ) dsi [t]
2
closed. $
1
+ (si [t] − µ&si [t − 1] )T !−1
si [t] (si [t]
5. Conclusion 2
1
−µ&si [t − 1] ) ?
A jump-diffusion particle filter (JDPF) has been proposed. Both det(2π !&si [t] )
Brownian motion log drift parameter and the jump parameter en-
1
B C
able the latent vectors to capture efficiently price volatility as well · exp − (si [t] − µ&si [t] )T (!&si )−1 [t](si [t] − µ&si [t] ) dsi [t].
2
as most of the vast price jumps. Particle coefficients constitute the
carriers of this dynamic information, since they are drawn from la- (A.6)
tent vector prior distributions. As a consequence, a small number 1
In Eq. (A.6), the first term is equal to 2 ln det(2π !si [t] ). Now let
of particles, which are generated w.r.t. to these coefficients, can ef-
ficiently approximate posterior distributions and ensure an effec- r = (!&si [t] )−1/2 (si [t] − µ&si [t] ) ⇔ si [t] = µ&si [t] + (!&si [t] )−1/2 r.
tive price prediction performance. In stock price prediction, JDPF (A.7)
outperforms the state-of-the-art methods. In flight price predic-
tion, JDPF performs best, whenever the time series exhibits price Then
jumps.
/ 0 ?
dr = det (!&si [t] )−1/2 dsi [t] ⇔ dsi [t] = det(!&si [t] )dr. (A.8)
Declaration of Competing Interest The combination of Eqs. (A.7) and (A.8) leads to
$ ! & " ! & "
The authors declare that they have no known competing finan- µsi [t] − µ&si [t − 1] + (!&si [t] )1/2 r T !−1 & &
si [t ] µsi [t ] − µsi [t − 1] + (!si [t] )
1/2
r
: ;< =
cial interests or personal relationships that could have appeared to y
influence the work reported in this paper. 1
B 1 C
·? T
exp − r r dr (A.9)
det(2π I ) 2
Appendix A. Appendix
Four terms compose term y in the integrand in Eq. (A.9), i.e.,
In Eq. (11) f (t ) is the negative log-likelihood, i.e., ! & " ! & "
µsi [t] − µ&si [t − 1] T !−1 &
si [t ] µsi [t ] − µsi [t − 1] +
f (t ) = − ln p(si [t ], asi [t ] ) = − ln p(si [t]|asi [t − 1] ) − ln p(asi [t] ) . ! & "
: ;< = : ;< = +rT (!&si [t ] )1/2 !−1 &
si [t ] µsi [t ] − µsi [t − 1] +
f I (· ) fII (· ) ! "
+ µ&si [t] − µ&si [t − 1] T !−1 &
si [t ] (!si [t ] )
1/2
r+
(A.1)
+rT (!&si [t ] )1/2 !−1 &
si [t ] (!si [t ] )
1/2
r. (A.10)
Let us omit the subscripts si from 'a for notation simplicity. The
second term fII (· ) in Eq. (A.1) can be rewritten as, It can be shown that the integral of the first term in
> 3 @A Eq. (A.10) yields
1 1 $
− ln p(asi [t] ) = − ln ? exp − (asi [t] − asi [t − 1] )2 (µ&si [t] − µ&si [t − 1] )T !−1 &
2π( ' [t] 2( '[at] si [t] (µsi [t] (A.11)
a
1
B 1 C
1 1
= ln(2π( '[at] ) + (asi [t] − asi [t − 1] )2 −µ&si [t − 1] ) ? exp − rT r dr
2 2( '[at] det(2π I ) 2
1 ! −1 ! & "! & " "
∝ (asi [t] − asi [t − 1] )2 (A.2) = tr !si [t ] µsi [t ] − µsi [t − 1] µsi [t ] − µ&si [t − 1] T .
& (A.11)
2( '[at]
The integral of the fourth term in Eq. (A.10) reads as
The first and second-order derivatives of fII (· ) w.r.t. asi [t − 1] in $ B C
1 1
Eq. (A.1) are rT (!&si [t ] )1/2 !−1 &
si [t ]$!si [t ] )
1/2
r? exp − rT r dr
det(2π I ) 2
1
fII& (asi [t − 1] ) = − (asi [t] − asi [t − 1] ) ! "
( '[at] = tr (!&si [t] )−1 !si [t] . (A.12)
1 The integration of the second and the third term of
fII&& (asi [t − 1] ) = . (A.3)
( '[at] Eq. (A.10) yields zero. Summing up, fI (· ) in Eq. (A.1) becomes
For the first term in Eq. (A.1) the elbo function is utilized by ap- 1 1 ! "
f I (· ) = ln det(2π !si [t] ) + tr (!&si [t ] )−1 !si [t ]
plying Jensen’s inequality for concave functions [47]: 2 2
1 ! ! & &
"! & &
"T "
fI (· ) = − ln p(si [t]|asi [t − 1] ) ≤ −Eq [ln p(si [t]|asi [t − 1] )] + KL{q| p} + tr !−1si [t ] µsi [t ] − µsi [t − 1] µsi [t ] − µsi [t − 1] .
2
(A.4) (A.13)
where In Eq. (A.13), the terms should be identified which depend on
/ ' 1 asi [t − 1]. As can be seen in Eq. (1), the prior covariance matrix
− Eq [ln p(si [t]|asi [t − 1] )] = −Eq ln ?
det(2π !si [t] ) !si [t] clearly depends on asi [t − 1]. Let the eigendecomposition

12
M. Ntemi and C. Kotropoulos Signal Processing 183 (2021) 107994

of the posterior covariance matrix at the previous time step be The substitution of Eq. (A.21) into Eq. (A.20) leads to
!&si [t − 1] = W#WT , where # = diag(λn ) with λn being the nth # % J K
+ b2 + bn n
+
eigenvalue of the posterior covariance matrix at t − 1. Then, the ∂ x
n
=− 2 Mnl l φl + bn xn . (A.22)
eigendecomposition of the a priori covariance matrix !si [t] taking ∂ asi [t − 1] n λ˜ n ˜
n λn
˜
λl l=1
into account drift reads
The differentiation of the third term in (A.13) gives
˜ WT
!si [t] = W# (A.14)
∂ ∂
˜ n = (λn + easi [t−1] '[t] ). Let us elaborate
˜ ) with λ
˜ = diag(λ tr[(!&si [t] )−1 !si [t]] = tr[(!−1
si [t] + X )!si [t]]
where # a ∂ asi [t − 1] ∂ asi [t − 1]
Eq. (8):

≈ tr(I ) = 0. (A.23)
(µ&m j [t ]µTm j [t ] − !&m j [t] ) ∂ asi [t − 1]
(!&si [t ] )−1 = !−1
si [t ] +
˜ −1 WT + X.
= W#
σi2j The combination of Eqs. (A.13), (A.17), and (A.22) leads to
: ;< = H J KI
n
X ∂ 1 + + bn + x
f I (· ) = xn − 2 Mnl l φl + bn xn
(A.15) ∂ asi [t − 1] 2 n ˜
λ n
˜
λ
n l l=1
The second term in Eq. (A.15) is a constant matrix w.r.t. asi [t − 1]. # % n
1+ bn + bn + x
Then, it can be shown that = xn − bn xn − Mnl l φl .
2 n λ˜ n n
˜
λ n
˜
λ l
l=1
∂ + easi [t−1] '[at] +
log det(2π !si [t] ) = = xn . (A.24)
∂ asi [t − 1] n λn + easi [t−1] '[at] n
: ;< = If we omit second order terms w.r.t. Mnl in the second term of
xn Eq. (A.24) and retain only the term for l = n, the first order deriva-
(A.16) tive in Eq. (A.2) reads as
a [t−1] [t]
e si 'a
where xn = . Let b = WT (µ&si [t] − µ&si [t − 1] ). The third f & = ∂ as ∂[t−1]
f
a [t−1] [t]
λn + e s i 'a i
.
1) *) *2
1 1 b2n 2Mnn
term in Eq. (A.13) is rewritten: =− (asi [t] − asi [t − 1] ) + xn (1 − 1+
(' [t]
a
2
n λ˜ n ˜n
Mnn −λ
∂ ! "
˜ −1 WT (µ& [t] − µ& [t − 1] )(µ& [t] − µ& [t − 1] )T
tr (W# (A.25)
si si si si
∂ asi [t − 1]
+ where bn 0 ( M˜ nn − 1 )φn ⇔ φn 0 bn
. Similarly, the second-
∂ ˜ −1 b = ∂ b2n λn ( M˜ nn −1 )
= bT # . (A.17) λn
∂ asi [t − 1] ∂ asi [t − 1] d λn + e si [t−1] '[at]
a order derivative in Eq. (11) reads as
&
Using Eq. (A.17) the term of b which depends on asi [t − 1] can be f && = ∂ as∂[ft−1]
approximated as i
. .
1 2
1 1 1 b2n xn Mnn
  =
(' [t] + 2
xn ( 1 − xn ) + 2 λ˜ n
1 − 2 ( 1 − xn ) ˜n
Mnn −λ
.
a n 1 ) n *2
b ≈ W !&si [t]W #
˜ −1 WT − WT )µ& [t − 1]
T Mnn Mnn
si + xn bn ˜n 1 − xn ˜ .
Mnn −λ Mnn −λn
: ;< = n
M (A.26)
˜ −1 − I ) WT µ& [t − 1]
= ( M# si
: ;< = CRediT authorship contribution statement
φ
˜ −1 − I )φ.
= ( M# (A.18) Myrsini Ntemi: Methodology, Validation, Formal analysis, Writ-
ing - review & editing, Conceptualization, Software, Investigation,
The nth element of b is Resources, Data curation, Writing - original draft, Visualization.
# %
Mnn + Mnl φl Constantine Kotropoulos: Methodology, Validation, Formal analy-
˜ −1 − I )φ]n =
bn ≈ [ ( M# −1 φn + (A.19)
λ˜ n λ˜ l sis, Writing - review & editing, Supervision, Project administration,
l/=n
Funding acquisition.
which depends on asi [t − 1] as well. Let us elaborate Eq. (A.17):
# % References
∂ + b2n + ∂ b2n
= = [1] T. Bollerslev, P.E. Rossi, Introduction: Modelling stock market volatility-bridg-
∂ asi [t − 1] n λn + easi [t−1] '[at] n
∂ asi [t − 1] λ˜ n ing the gap to continuous time, in: P.E. Rossi (Ed.), Modelling Stock Market
: ;< = Volatility, Academic Press, San Diego, CA, 1996.
λ˜ n [2] R.S. Tsay, Financial time series, Wiley StatsRef: Statistics Reference Online
# % (2014) 1–23.
+ bn ∂ bn
= 2 − bn xn . (A.20) [3] P.W. Glynn, Diffusion approximations, Handbooks in Operations Research and
˜
n λn
∂ asi [t − 1] Management Science 2 (1990) 145–198.
[4] W. Paul, J. Baschnagel, Stochastic Processes, 1, Springer, 2013.
By utilizing Eq. (A.19), it can be shown [5] P. Tankov, Financial Modelling With Jump Processes, Chapman and Hall/CRC,
2003.
∂ bn −Mnn easi [t−1] '[at] + −Mnl φl (easi [t−1] '[at] ) [6] D. Hainaut, N. Leonenko, Option pricing in illiquid markets: a fractional
= φ n + jump-diffusion approach, J Comput Appl Math 381 (2020) 112995.
∂ asi [t − 1] λ˜ 2n λ˜ 2
l/=n l [7] K.F. Mina, G.H. Cheang, C. Chiarella, Approximate hedging of options under
H I jump-diffusion processes, Int J Theo Appl Financ 18 (04) (2015) 155–224.
xn + x [8] S.G. Kou, A jump-diffusion model for option pricing, Manage Sci 48 (8) (2002)
= − Mnn φn + Mnl l φl 1086–1101.
˜λn ˜λl [9] R.C. Merton, Option pricing when underlying stock returns are discontinuous,
l/=n
J Financ Econ 3 (1–2) (1976) 125–144.
n
+ [10] R. Anderson, S. Sundaresan, A comparative study of structural models of cor-
x
=− Mnl l φl . (A.21) porate bond yields: an exploratory investigation, J Bank Financ 24 (1–2) (20 0 0)
˜λl 255–269.
l=1

13
M. Ntemi and C. Kotropoulos Signal Processing 183 (2021) 107994

[11] E.P. Jones, S.P. Mason, E. Rosenfeld, Contingent claims analysis of corporate [29] C.-O. Ewald, A. Zhang, Z. Zong, On the calibration of the Schwartz two-factor
capital structures: an empirical investigation, J Finance 39 (3) (1984) 611–625. model to WTI crude oil options and the extended Kalman filter, Ann Oper Res
[12] A. Doucet, N. De Freitas, N. Gordon, An introduction to sequential Monte 282 (1–2) (2019) 119–130.
Carlo methods, in: Sequential Monte Carlo Methods in Practice, Springer, 2001, [30] J. Zhao, Dynamic state estimation with model uncertainties using H-Extended
pp. 3–14. Kalman filter, IEEE Trans. Power Syst 33 (1) (2018) 1099–1100.
[13] P.M. Djuric, J.H. Kotecha, J. Zhang, Y. Huang, T. Ghirmai, M.F. Bugallo, J. Miguez, [31] J. Qi, K. Sun, J. Wang, H. Liu, Dynamic state estimation for multi-machine
Particle filtering, IEEE Signal Process Mag 20 (5) (2003) 19–38. power system by unscented Kalman filter with enhanced numerical stability,
[14] M.S. Arulampalam, S. Maskell, N. Gordon, T. Clapp, A tutorial on particle filters IEEE Trans. Smart Grid 9 (2) (2018) 1184–1196.
for online nonlinear/non-Gaussian Bayesian tracking, IEEE Trans. Signal Pro- [32] E. Bradford, L. Imsland, Economic stochastic model predictive control using the
cessing 50 (2) (2002) 174–188. unscented Kalman filter, IFAC Proceedings Volumes 51 (18) (2018) 417–422.
[15] R. Van Der Merwe, A. Doucet, N. De Freitas, E.A. Wan, The unscented par- [33] P. Guarniero, A.M. Johansen, A. Lee, The iterated auxiliary particle filter, J Am
ticle filter, in: Advances in Neural Information Processing Systems, 2001, Stat Assoc 112 (520) (2017) 1636–1647.
pp. 584–590. [34] P. Wang, L. Li, S.J. Godsill, Particle filtering and inference for limit order books
[16] N.J. Gordon, D.J. Salmond, A.F.M. Smith, Novel approach to nonlin- in high frequency finance, in: Proc. 43rd IEEE Int. Conf. Acoustics, Speech and
ear/non-Gaussian Bayesian state estimation, IEE Proceedings F - Radar and Sig- Signal Processing, 2018, pp. 4264–4268.
nal Processing 140 (2) (1993) 107–113. [35] M.J. Wainwright, M.I. Jordan, Graphical models, exponential families, and vari-
[17] S. Gultekin, J. Paisley, A collaborative Kalman filter for time-evolving dyadic ational inference, Foundations and Trends® in Machine Learning 1 (1–2)
processes, in: Proc. 2014 IEEE Int. Conf. Data Mining, 2014, pp. 30–37. (2008) 1–305.
[18] M. Ntemi, C. Kotropoulos, A Dyadic particle filter for price prediction, in: Proc. [36] D. Blei, A. Kucukelbir, J.D. McAuliffe, Variational inference: a review for statis-
27th European Signal Processing Conf. (EUSIPCO), 2019. ticians, J Am Stat Assoc 112 (518) (2017) 859–877.
[19] P.J. Green, Reversible jump Markov chain Monte Carlo computation and [37] P. Tankov, E. Voltchkova, Jump-diffusion models: a practitioners guide, Banque
Bayesian model determination, Biometrika 82 (4) (1995) 711–732. et Marchés 99 (1) (2009) 24.
[20] J. Murphy, S. Godsill, Bayesian parameter estimation of jump-Langevin sys- [38] G. Franciszek, Nonhomogeneous compound poisson process application to
tems for trend following in finance, in: Proc. 40th IEEE Int. Conf. on Acoustics, modeling of random processes related to accidents in the baltic sea waters and
Speech and Signal Processing, 2015, pp. 4125–4129. ports, J Polish Safe Reliab Assoc Summer Safe Reliab Semin 9 (2018) 21–29.
[21] M. Ntemi, C. Kotropoulos, A dynamic dyadic particle filter for price prediction, [39] K. Sigmann, Notes on Poisson processes and compound Poisson processes,
Signal Processing 167 (2020), doi:10.1016/j.sigpro.2019.107334. 2007. Course Hero IEOR 4706, Columbia College.
[22] S. Siami-Namini, A.S. Namin, Forecasting economics and financial time series: [40] C. Ibe Oliver, Markov Processes for Stochastic Modeling, 2nd ed., Elsevier, 2013.
ARIMA vs. LSTM, (2018), arXiv preprint arXiv:1803.06386. [41] H.L. Christensen, J. Murphy, S.J. Godsill, Forecasting high-frequency futures re-
[23] T. Wang, Z. Huang, et al., The relationship between volatility and trading vol- turns using online Langevin dynamics, IEEE J Sel Top Signal Process 6 (4)
ume in the Chinese stock market: a volatility decomposition perspective, An- (2012) 366–380.
nals of Economics and Finance 13 (1) (2012) 211–236. [42] P. Diaconis, D. Ylvisaker, Conjugate priors for exponential families, The Annals
[24] F. Cong, C.W. Oosterlee, Pricing Bermudan options under Merton jump-diffu- of Statistics (1979) 269–281.
sion asset dynamics, Int J Comput Math 92 (12) (2015) 2406–2432. [43] A. Doucet, A.M. Johansen, A tutorial on particle filtering and smoothing: fifteen
[25] R.T.L. Chan, Adaptive radial basis function methods for pricing options under years later, Handbook of Nonlinear Filtering 12 (656–704) (2009) 3.
jump-diffusion models, Computational Economics 47 (4) (2016) 623–643. [44] V. Elvira, L. Martino, D. Luengo, M.F. Bugallo, Population Monte Carlo schemes
[26] A.A.E.-F. Saib, D.Y. Tangman, M. Bhuruth, A new radial basis functions method with reduced path degeneracy, in: Proc. 7th IEEE Int. Workshop Computational
for pricing american options under Merton’s jump-diffusion model, Int J Com- Advances Multi-Sensor Adaptive Processing, 2017, pp. 1–5.
put Math 89 (9) (2012) 1164–1185. [45] D. Salmond, N. Gordon, An introduction to particle filters, State Space and Un-
[27] K.S. Patel, M. Mehra, Fourth-order compact scheme for option pricing under observed Component Models Theo Appl (2005) 1–19.
the Mertons and Kous jump-diffusion models, Int J Theo Appl Financ 21 (04) [46] Y.-W. Cheung, K.S. Lai, Lag order and critical values of the augmented Dickey—
(2018) 1079–1098. Fuller test, Journal of Business & Economic Statistics 13 (3) (1995) 277–280.
[28] J. Chevallier, S. Goutte, Detecting jumps and regime switches in international [47] D. Barber, Bayesian Reasoning and Machine Learning, Cambridge University
stock markets returns, Appl Econ Lett 22 (13) (2015) 1011–1019. Press, 2012.

14

You might also like