0% found this document useful (0 votes)

19 views

Classification-Based Financial Markets Prediction Using Deep Neural Networks

This document discusses using deep neural networks to predict the direction of financial market movements. It describes training a single deep neural network model using data from 43 commodities and currencies. The model achieved sample mean annualized Sharpe Ratios as high as 3.29 when backtested in a walk-forward manner.

Uploaded by

khadijamalak95

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

Classification-Based Financial Markets Prediction Using Deep Neural Networks

Uploaded by

khadijamalak95

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Algorithmic Finance 6 (2017) 67–77 67

DOI:10.3233/AF-170176
IOS Press

Classification-based financial markets

prediction using deep neural networks
Matthew Dixona,∗ , Diego Klabjanb and Jin Hoon Bangc
a Stuart School of Business, Illinois Institute of Technology, Chicago, IL, USA
b Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, IL, USA
c Department of Computer Science, Northwestern University, Evanston, IL, USA

Abstract. Deep neural networks (DNNs) are powerful types of artificial neural networks (ANNs) that use several hid-
den layers. They have recently gained considerable attention in the speech transcription and image recognition community
(Krizhevsky et al., 2012) for their superior predictive properties including robustness to overfitting. However their appli-
cation to algorithmic trading has not been previously researched, partly because of their computational complexity. This
paper describes the application of DNNs to predicting financial market movement directions. In particular we describe the
configuration and training approach and then demonstrate their application to backtesting a simple trading strategy over
43 different Commodity and FX future mid-prices at 5-minute intervals. All results in this paper are generated using a
C++ implementation on the Intel Xeon Phi co-processor which is 11.4x faster than the serial version and a Python strategy
backtesting environment both of which are available as open source code written by the authors.

1. Introduction by advances in modern computer architecture (Chen

et al., 2013; Niaki and Hoseinzade, 2013; Vanstone
Many of the challenges facing methods of financial and Hahn, 2010).
econometrics include non-stationarity, non-linearity A deep neural network (DNN) is an artificial neu-
or noisiness of the time series. While the applica- ral network with multiple hidden layers of units
tion of artificial neural networks (ANNs) to time between the input and output layers. They have
series methods are well documented (Faraway and been popularized in the artificial intelligence com-
Chatfield, 1998; Refenes, 1994; Trippi and DeSieno, munity for their successful use in image classification
1992; Kaastra and Boyd, 1995) their proneness to (Krizhevsky et al., 2012) and speech recognition. The
over-fitting, convergence problems, and difficulty field is referred to as "Deep Learning".
of implementation raised concerns. Moreover, their In this paper, we shall use DNNs to partially
departure from the foundations of financial econo- address some of the historical deficiencies of ANNs.
metrics alienated the financial econometrics research Specifically, we model complex non-linear rela-
community and finance practitioners. tionships between the independent variables and
However, algotrading firms employ computer sci- dependent variable and reduced tendency to overfit.
entists and mathematicians who are able to perceive In order to do this we shall exploit advances in low
ANNs as not merely black-boxes, but rather a cost many-core accelerator platform to train and tune
semi-parametric approach to modeling based on min- the parameters of our model.
imizing an entropy function. As such, there has been For financial forecasting, especially in multivari-
a recent resurgence in the method, in part facilitated ate forecasting analysis, the feed-forward topology
has gained much more attention and shall be the
∗ Corresponding author: Matthew Dixon, Stuart School of approach used here. Back-propagation and gradient
Business, Illinois Institute of Technology, 10 West 35th Street, descent have been the preferred method for training
Chicago, IL 60616, USA. E-mail: [email protected]. these structures due to the ease of implementation

2158-5571/17/$35.00 © 2017 – IOS Press and the authors. All rights reserved
68 M. Dixon et al. / Classiﬁcation-based ﬁnancial markets prediction using deep neural networks

and their tendency to converge to better local optima sample mean Annualized Sharpe Ratios as high as
in comparison with other trained models. However, 3.29 with a standard deviation of 1.12.
these methods can be computationally expensive, So in summary, our approach differs from other
especially when used to train DNNs. financial studies described in the literature in two
There are many training parameters to be con- distinct ways:
sidered with a DNN, such as the size (number of
1. ANNs are applied to historical prices on an indi-
layers and number of units per layer), the learn-
vidual symbol and here 43 commodities and FX
ing rate and initial weights. Sweeping through the
futures traded on the CME have been combined.
parameter space for optimal parameters is not fea-
Furthermore time series of lags, moving aver-
sible due to the cost in time and computational
ages and moving correlations have been generated
resources. We shall use mini-batching (computing
to capture memory and co-movements between
the gradient on several training examples at once
symbols. Thus we have generated a richer dataset
rather than individual examples) as one common
for the DNN to explore complex patterns.
approach to speeding up computation. We go fur-
2. ANNs are applied as a regression, whereas here
ther by expressing the back-propagation algorithm
the output is one of {−1, 0, 1} representing a nega-
in a form that is amenable to fast performance on
tive, flat or positive price movement respectively.
an Intel Xeon Phi co-processor (Jeffers and Rein-
The threshold for determining the zero state is
ders, 2013). General purpose hardware optimized
set to 1 × 10−3 (this is chosen to balance the
implementations of the back-propagation algorithm
class labels). The caveat is that restriction to a
are described by Shekhar and Amin (1994), how-
discrete set of output states may not replace a clas-
ever our approach is tailored for the Intel Xeon Phi
sical financial econometric technique, but it may
co-processor.
be applicable for simple trading strategies which
The main contribution of this paper is to describe
rely on the sign, and not the magnitude, of the
the application of deep neural networks to financial
forecasted price.
time series data in order to classify financial mar-
ket movement directions. Traditionally, researchers In the following section we introduce the
will iteratively experiment with a handful of sig- back-propagation learning algorithm and use mini-
nals to train a level based method, such as vector batching to express the most compute intensive
autoregression, for each instrument (see for exam- equations in matrix form. Once expressed in matrix
ple Kaastra and Boyd (1995); Refenes (1994); Trippi form, hardware optimized numerical linear algebra
and DeSieno (1992)). More recently, however, Leung routines are used to achieve an efficient mapping of
et al., (2000) provide evidence that classification the algorithm on to the Intel Xeon Phi co-processor.
based methods outperform level based methods in Section 3 describes the preparation of the data used
the prediction of the direction of stock movement and to train the DNN. Section 4 describes the implemen-
trading returns maximization. tation of the DNN. Section 5 then presents results
Using 5 minute interval prices from June 1989 to measuring the performance of a DNN. Finally in Sec-
March 2013, our approach departs from the literature tion 6, we demonstrate the application of DNNs to
by using state-of-the-art parallel computing architec- backtesting using a walk forward methodology, and
ture to simultaneously train a single model from a provide performance results for a simple buy-hold-
large number of signals across multiple instruments, sell strategy.
rather than using one model for each instrument.
By aggregating the data across multiple instruments
and signals, we enable the model to capture a richer 2. Deep neural network classifiers
set of information describing the time-varying co-
movements across signals for each instrument price We begin with mathematical preliminaries. Let D
movement. Our results show that our model is able denote the historical dataset of M features and N
to predict the direction of instrument movement to, observations. We draw a training subset Dtrain ⊂ D
on average, 42% accuracy with a standard deviation of Ntrain observations and a test subset of Dtest ⊂ D
across instruments of 11%. In some cases, we are of Ntest observations.
able to predict as high as 68%. We further show how Denote the nth observation (feature vector) as
backtesting accuracy translates into the P&L for a xn ∈ Dtrain . In an ANN, each element of the vec-
simple long-only trading strategy and demonstrate tor becomes a node in the input layer, as illustrated
M. Dixon et al. / Classification-based financial markets prediction using deep neural networks 69

in Fig. 1 below for the case when there are 7 input where for a fully connected feed-forward network,
(l)
variables (features) per observation. In a fully con- sj is the weighted sum of outputs from the previous
nected feed-forward network, each node is connected layer l − 1 that connect to node j in layer l:
to every node in the next layer. Although not shown
n(l−1)
in the figure, associated with each edge between the (l)
(l) (l−1) (l)
ith node in the previous layer and the j th node in the sj = wij xi + biasj , (3)
(l) i=1
current layer l is a weight wij .
In order to find optimal weightings w := where n(l) are the number of nodes in layer l. The
{w(l) }l:=1→L between nodes in a fully connected feed (L)
gradient of the likelihood function w.r.t. sk takes
forward network with L layers, we seek to minimize the simple form:
a cross-entropy function1 of the form
∂e(w)
(L)
= ŷk − yk . (4)
N
test K
∂sk
E(w) = − en (w), en (w) := ykn ln (ŷkn ) .
n=1 k=1
The recursion relation for the back propagation using
(1) conjugate gradients is:
For clarity of exposition, we drop the subscript n. n(l−1)
The binary target y and output variables ŷ have a 1-of- (l−1)
(l) (l) (l−1) (l−1)
δi = δj wij σ(si )(1 − σ(si )), (5)
ks encoding for each symbol, where yk ∈ {0, 1} and
i+ks j=1
k=i ŷk = 1 and i = {1, 1 + ks , 1 + 2ks . . . , K − ks }, so
that each output state associated with a symbol can where we have used the analytic form of the derivative
be interpreted as a probabilistic weighting. To both of the sigmoid function
ensure analytic gradient functions under the cross-
entropy error measure and that the probabilities of σ (v) = σ(v)(1 − σ(v)) (6)
each state sum to unity, the output layer is activated
with a softmax function of the form to activate all hidden layer nodes. So in summary, a
(L)
trained feed-forward network can be used to predict
exp(sk ) the probability of an output state (or class) for each
ŷk := φsoftmax (s(L) ) = k (L)
, (2)
s of the symbols concurrently, given any observation
j=1 exp(sj )
as an input, by recursively applying Equation 3.
The description of how the network is trained now
follows.

Stochastic Gradient Descent. Following Rojas

(1996), we now revisit the backpropagation learning
algorithm based on the method of stochastic gradient
descent (SGD) algorithm. Despite only being first
order, SGD serves as the optimization method of
choice for DNNs due to the highly non-convex form
of the utility function (see for example Li et al.
(2014)). After random sampling of an observation
i, the SGD algorithm updates the parameter vector
w(l) for the lth layer using
Fig. 1. An illustrative example of a feed-forward neural network
with two hidden layers, seven features and two output states. Deep
learning network classifiers typically have many more layers, use w(l) = w(l) − γ∇Ei (w(l) ), (7)
a large number of features and several output states or classes. The
goal of learning is to find the weight on every edge that minimizes
the out-of-sample error measure.
where γ is the learning rate. A high level description
of the sequential version of the SGD algorithm is
1 The use of entropy in econometrics research has been well given in Algorithm 1. Note that for reasons of keep-
established (see for example Golan et al., (1996); Wu and Perloff ing the description simple, we have avoided some
(2007)). subtleties of the implementation.
70 M. Dixon et al. / Classiﬁcation-based ﬁnancial markets prediction using deep neural networks

Algorithm 1 Stochastic Gradient Descent 3. The data

1: w ← r, ri ∈ N (μ, σ), ∀i
2: E ← 0
Our historical dataset contains 5 minute mid-prices
3: for i = 0 to n − 1 do
for 43 CME listed commodity and FX futures from
4: E ← E + Ei (w) March 31st 1991 to September 30th, 2014. We use
5: end for
the most recent fifteen years of data because the pre-
6: while E ≥ τ do
vious period is less liquid for some of the symbols,
7: for t = 0 to n − 1 do resulting in long sections of 5 minute candles with
8: i ← sample with replacement in [0, n − 1] no price movement. Each feature is normalized by
9: w ← w − γ∇Ei (w) subtracting the mean and dividing by the standard
10: end for deviation. The training set consists of 25,000 con-
11: E←0 secutive observations and the test set consists of the
12: for i = 0 to n − 1 do next 12,500 observations. As described in Section 6,
13: E ← E + Ei (w) these sets are rolled forward ten times from the start
14: end for of the liquid observation period, in 1000 observation
15: end while
period increments, until the final 37,500 observa-
tions from March 31st, 2005 until the end of the
dataset.
2.1. Mini-batching The overall training dataset consists of the aggre-
gate of feature training sets for each of the symbols.
It is well known that mini-batching improves the The training set of each symbol consists of price dif-
computational performance of the feedforward and ferences and engineered features including lagged
back-propagation computations (Shekhar and Amin, prices differences from 1 to 100, moving price aver-
1994; Li et al., 2014). We process b observations in ages with window sizes from 5 to 100, and pair-wise
one mini-batch. This results in a change to the SGD correlations between the returns and the returns of all
algorithm and the dimensions of data-structures that other symbols. The overall training set contains 9895
are used to store variables. In particular, δ, x, s and features. The motivation for including these features
E now have a batch dimension. Note however that in the model is to capture memory in the historical
the dimensions of w(l) remain the same. The above data and co-movements between symbols.
equations can be now be modified.
With slight abuse of notation, we redefine the
dimension δ(l) , X(l) , S (l) ∈ Rnl ×b , ∀l, E ∈ RnL ×b , 4. Implementation
where b is the size of the mini-batch.
Crucially for computational performance of the The architecture of our network contains five
mini-batching, the computation of the sum in each learned fully connected layers. The first of the four
layer of the feed-forward network can be expressed hidden layers contains 1000 neurons and each subse-
as a matrix-matrix product: quent layer is tapered by 100. The final layer contains

(l−1) T (l) 129 output neurons - three values per symbol of each
S (l) = Xi w . (8) of the 43 futures contracts. The result of including a
large number of features and multiple hidden layers
For the ith neuron in output layer L and the j th obser- is that there are 12,174,500 weights in total.
vation in the mini-batch The weights are initialized with an Intel MKL VSL
(L) (L) (L)
δij = σij (1 − σij )Eij . (9) random number generator implementation that uses
the Mersenne Twistor (MT19937) routine. Gaussian
For all intermediate layers l < L, the recursion rela- random numbers are generated from transforming the
tion for δ is uniform random numbers with an inverse Gaussian
(l−1) (l) (l) (l) (l) cumulative distribution function with zero mean and
δij = σij (1 − σij )wij δij . (10)
standard deviation of 0.01. We initialized the neuron
The weights are updated with matrix-matrix products biases in the hidden layers with the constant 1.
for each layer We used the same learning rate for all layers. The
T learning rate was adjusted according to a heuristic
w(l) = γX(l−1) δ(l) . (11) which is described in Algorithm 2 below and is simi-
M. Dixon et al. / Classiﬁcation-based ﬁnancial markets prediction using deep neural networks 71

Algorithm 2 Deep Learning Methodology

1: for γ := 0.1, 0.2, . . . , 1 do
(l)
2: wi,j ← r, r ∈ N (μ, σ), ∀i, j, l 䉯Initialize all weights
3: for e = 1, . . . , Ne do 䉯Iterate over epochs
4: Generate De
5: for m = 1, . . . , M do 䉯Iterate over mini-batches
6: Generate Dm
7: for l = 2, . . . , L do
(l)
8: Compute all xj 䉯Feed-Forward network construction
9: end for
10: for l = L, . . . , 2 do
(l)
11: Compute all δj := ∇s(l) E 䉯Backpropagation
j

(l) T
12: Update the weights: w(l) ← w(l) − γX δ (l−1)

13: end for

14: end for
15: end for
16: If cross entropy(e) ≤ cross entropy(e-1) then γ ← γ/2
17: end for
(l)
18: Return final weights wi,j

lar to the approach taken by Krizhevsky et al. (2012) 5. Results

except that we use cross entropy rather than the val-
idation error. We sweep the parameter space of the This section describes the backtesting of DNNs for
learning rate from [0.1, 1] with increments of 0.1. a simple algo-trading strategy. The purpose is to tie
We further divide the learning rate γ by 2 if the together classification accuracy with strategy perfor-
cross-entropy does not decrease between epochs. mance measurements and is not intended to provided
In Algorithm 2, the subset of the training set used an exhaustive exploration of trading strategies or their
for each epoch is defined as performance. For each symbol, we calculate the clas-
sification accuracies for each 130 day moving test
De := {xnk ∈ Dtrain | nk ∈ U(1, Ntrain ), window. This is repeated to give a set of ten clas-
k := 1, . . . , Nepoch } (12) sification errors. Figure 2 shows a box plot of the
classification accuracy of the DNN for all the 43 CME
and the mini-batch with in each epoch set is defined Commodity and FX futures. Each symbol is repre-
as sented by a box and whisker vertical bar - the box
represents the region between the lower and upper
quartiles of the sample distribution of classification
Dm := {xnk ∈ Dep | nk ∈ U(1, Nepoch ),
accuracies. The median of the sample distribution
k := 1, . . . , Nmini-batch }. (13) of classification accuracies is represented as a red
horizontal line.
As mentioned earlier, the mini-batching formu- Figure 3 below shows the distribution of the aver-
lation of the algorithm facilitates efficient parallel age classification accuracy over 10 samples of the
implementation, the details and timings of which are DNN across the 43 CME Commodity and FX futures.
described by Dixon et al. (2015). The overall time There’s a heavier density around an accuracy of 0.35
to train a DNN on an Intel Xeon Phi using the data which is slightly better than a random selection.
described above is approximately 8 hours when fac- Table 1 shows the top five instruments for which
toring in time for calculation of error measures on the the sample mean of the classification rate was highest
test set and thus the training can be run as an overnight on average over the ten walk forward experiments.
batch job should daily retraining be necessary. This Also shown are the F1-scores (‘harmonic means’)
is 11.4x faster than running the serial version of the which are considered to be a more robust measure of
algorithm. performance due to less sensitivity to class imbalance
72 M. Dixon et al. / Classiﬁcation-based ﬁnancial markets prediction using deep neural networks

Fig. 2. This figure shows the classification accuracy of the DNN applied to 43 CME Commodity and FX futures. Each symbol is represented
by a box and whisker vertical bar - the box represents the region between the lower and upper quartiles of the sample distribution of
classification accuracies. The median of the sample distribution of classification accuracies is represented as a red horizontal line.

Note that the worst five performing instruments

performed no better or even worse than white noise
on average over the ten experiments.

6. Strategy backtesting

The paper has thus far considered the predictive

properties of the deep neural network. Using com-
modity futures historical data at 5 minute intervals
over the period from March 31st 1991 to September
30th, 2014 this section describes the application of a
walk forward optimization approach for backtesting
a simple trading strategy.
Following the walk forward optimization approach
described in Tomasini and Jaekle (2011), an initial
Fig. 3. This figure shows the distribution of the average classifi- optimization window of 25,000 5-minute observa-
cation accuracy of the DNN applied to 43 CME Commodity and tion periods or approximately 260 days (slightly
FX futures. more than a year) is chosen for training the model
using all the symbol data and their engineered time
series. The learning rate range is swept to find the
than classification accuracies. The mean and stan- model which gives the best out-of-sample prediction
dard deviation of the sample averaged classification rate - the highest classification rate on the out-
accuracies and F1-scores over the 43 futures are also of-sample (‘hold-out’) set consisting of 12,500
provided. consecutive and more recent observations.
M. Dixon et al. / Classiﬁcation-based ﬁnancial markets prediction using deep neural networks 73

Table 1
This table shows the top five instruments for which the sample mean of the classification rate was highest over the ten
walk forward experiments. F1-scores are also provided. The mean and standard deviation of the sample mean
classification accuracies and F1-scores over the 43 futures are also provided
Symbol Futures Classification F1-score
Accuracy
HG Copper 0.68 0.59
ST Transco Zone 6 Natural Gas (Platts Gas Daily) Swing 0.67 0.54
ME Gulf Coast Jet (Platts) Up-Down 0.67 0.54
TU Gasoil 0.1 Cargoes CIF NWE (Platts) vs. Low Sulphur Gasoil 0.56 0.52
MI Michigan Hub 5 MW Off-Peak Calendar-Month Day-Ahead Swap 0.55 0.5
mean - 0.42 0.37
std - 0.11 0.1

Using the optimized model, the expected P&L of

the trading strategy is then evaluated over the out-
of-sample period consisting of 12,500 consecutive
5-minute observation periods or approximately 130
days. Even though all symbols are trained together
using one DNN model, the cumulative P&L is calcu-
lated independently for each symbol. As illustrated
in Fig. 4, this step is repeated by sliding the training
window forward by 1000 observation periods
and repeating the out-of-sample error analysis
Fig. 4. An illustration of the walk forward optimization method and strategy performance measurement for ten
used for backtesting the strategy. windows.

Fig. 5. This figure shows a box plot of the sample distribution of the time-averaged daily returns of the strategy applied separately to each of
the 43 CME Commodity and FX futures over the 130 day trading horizons. The red square with a black border denotes the sample average
for each symbol.
74 M. Dixon et al. / Classiﬁcation-based ﬁnancial markets prediction using deep neural networks

Fig. 8. This figure shows a box plot of the distribution of the

annualized Sharpe ratios sampled over ten walk forward exper-
iments of 12,500 observation points. Only the top five performing
Fig. 6. This figure shows the cumulative unrealized net dollar profit futures contracts have been considered. The simple trading strat-
of a simple strategy. In order to quantify the impact of information egy is described above. Key: PL: Platinum, NQ: E-mini NASDAQ
loss, the profit under perfect forecasting information is denoted 100 Futures, AD: Australian Dollar, BP: British Pound, ES: E-mini
as ‘perfect foresight’ (green line) and the profit using the DNN S&P 500 Futures.
prediction is denoted as ‘predict’ (blue line). The graph is shown
for one 130 day trading horizon in front month Platinum (PL)
futures. or decrease over the next time interval respectively.
For simplicity, the strategy only places one lot market
6.1. Example trading strategy orders. The strategy closes out a short position and
takes a long position if the label is 1, holds the posi-
In order to demonstrate the application of DNNs tion if the label is zero and closes out the long position
to algorithmic trading, a simple buy-hold-sell intra- and takes a short position if the label is –1. In calcu-
day trading strategy is chosen contingent on whether lating the cumulative unrealized P&L, the following
the instrument price is likely to increase, be neutral, simplifying assumptions are made:

Fig. 7. This figure shows a box plot of the maximum drawdown of a simple strategy applied over ten walk forward experiments for each
symbol.
M. Dixon et al. / Classiﬁcation-based ﬁnancial markets prediction using deep neural networks 75

Table 2
This table shows the top five instruments for which the mean annualized Shape ratio was highest on average over
the ten walk forward optimizations. The values in parentheses denote the standard deviation over the ten
experiments. Also shown, are the mean and standard deviation of the Capability ratios under
the assumption of normality of returns
Symbol Futures Annualized Sharpe Ratio Capability Ratio
PL Platinum 3.29 (1.12 ) 12.51 (4.27)
NQ E-mini NASDAQ 100 Futures 2.07 (2.11) 7.89 (8.03)
AD Australian Dollar 1.48 (1.09) 5.63 (4.13)
BP British Pound 1.29 (0.90) 4.90 (3.44)
ES E-mini S&P 500 Futures 1.11 (1.69 ) 4.22 (6.42)

Table 3
r The placing of 1 lot market orders at 5 minute
This table lists the initial margin, maintenance margin and intervals has no significant impact on the market
contract size specified by the CME used to calculate the
cumulative P&L and strategy performance for the top five and thus the forecast does not account for limit
performing futures positions order book dynamics in response to the trade
Symbol initial margin maint. margin contract size execution.
PL 2090 1900 50
NQ 5280 4800 50 These assumptions, especially those concerning
AD 1980 1800 100000 trade execution and absence of live simulation in
BP 2035 1850 62500 the backtesting environment are of course inadequate
ES 5225 4750 50
to demonstrate alpha generation capabilities of the
DNN based strategy but serve as a starting point for
commercial application of this research.
r the account is opened with $100k of USD; Returns of the strategy are calculated by first aggre-
r there is sufficient surplus cash available in gating intraday P&L changes to daily returns and then
order to always maintain the brokerage account annualizing them.
margin, through realization of the profit or oth- Figure 5 show a box plot of the sample distribu-
erwise; tion of the time-averaged daily returns of the strategy
r there are no limits on the minimum or maxi- applied separately to each of the 43 CME front month
mum holding period and positions can be held Commodity and FX futures over the 130 day trading
overnight; horizons.
r the margin account is assumed to accrue zero Figure 6 compares the cumulative unrealized net
interest; dollar profit of the strategy for the case when perfect
r transaction costs are ignored; forecasting information is available (‘perfect fore-
r no operational risk measures are deployed, such sight’) against using the DNN prediction (‘predict’).
as placing stop-loss orders. The graph is shown for one 130 day trading horizon
r the market is always sufficiently liquid that a for front month Platinum (PL) futures.
market order gets filled immediately at the mid- Figure 7 shows a box plot of the maximum draw-
price listed at 5 minute intervals and so slippage down of a simple strategy applied over ten walk
effects are ignored; and forward experiments for each symbol.

Table 4
This table shows the correlation of the daily returns of the strategy on each of the five most liquid instruments in the
list of 43 CME futures with their relevant ETF benchmarks. The values represent the summary statistics of the
correlations over the ten experiments. Key: NQ: E-mini NASDAQ 100 Futures, DJ: DJIA ($10) Futures, ES:
E-mini S&P 500 Futures, YM: E-mini Dow ($5) Futures, EC: Euro FX Futures.
Symbol Benchmark ETF Mean Correlation Max Min
Std. Dev.
NQ PowerShares QQQ ETF (QQQ) 0.013 0.167 0.237 –0.282
DJ SPDR Dow Jones Industrial Average ETF (DIA) 0.008 0.194 0.444 –0.257
ES SPDR S&P 500 ETF (SPY) –0.111 0.110 0.057 –0.269
YM SPDR Dow Jones Industrial Average ETF (DIA) –0.141 0.146 0.142 –0.428
EC CurrencyShares Euro ETF (FXE) –0.135 0.108 0.154 –0.229
76 M. Dixon et al. / Classiﬁcation-based ﬁnancial markets prediction using deep neural networks

Figure 8 shows the range of annualized Sharpe Acknowledgments

ratios measured over each moving period of 12,500
observation points for the top five performing futures The authors gratefully acknowledge the support
contracts2 . This figure is also supplemented by of Intel Corporation in funding this research and the
Table 2 which shows the top five instruments for anonymous reviewers for useful comments.
which the sample mean of the annualized Sharpe ratio
was highest over the ten walk forward experiments.
The values in parentheses denote the standard devi- References
ation over the ten experiments. Also shown, are the
sample mean and standard deviations of the Capabil- Chen, J., Diaz, J.F., Huang, Y.F., 2013. High technology ETF fore-
ity ratios (where n = 130) under the assumption of casting: Appli-cation of Grey Relational Analysis and Artificial
Neural Networks, Frontiers in Finance and Economics 10(2),
normality of returns.
129–155.
Table 3 shows the correlation of the daily returns Dixon, M., Klabjan, D., Bang, J.H., 2015. Implementing
of the strategy on each of the five most liquid instru- deep neural networks for financial market prediction on
ments in the list of 43 CME futures with their relevant the Intel Xeon Phi. In Proceedings of the 8th Work-
shop on High Performance Computational Finance, WHPCF
ETF benchmarks. The values represent the summary ’15, pp. 6:1–6:6, New York, NY, USA. ACM. ISBN
statistics of the correlations over the ten experiments. 978-1-4503-4015-1. doi: 10.1145/2830556.2830562. URL
When averaged over ten experiments, the strategy http://doi.acm.org/10.1145/2830556.2830562
returns are observed to be weakly correlated with Faraway, J., Chatfield, C., 1998. Time series forecasting with neu-
ral networks: A comparative study using the air line data,
the benchmark returns and, in any given experi- Journal of the Royal Statistical Society: Series C (Applied
ment, the absolute value of the correlations are all Statistics) 47(2), 231–250.
under 0.5. Golan, A., Judge, G., Miller, D., 1996. Maximum Entropy Econo-
Table 4 lists the initial margin, maintenance mar- metrics: Robust Estimation with Limited Data. John Wiley &
Sons, March. ISBN 0471953113.
gin and contract size specified by the CME used to Jeffers, J., Reinders, J., 2013. Intel Xeon Phi Coprocessor High Per-
calculate the cumulative unrealized P&L and strat- formance Pro-gramming. Morgan Kaufmann Publishers Inc.,
egy performance for the top five performing futures San Francisco, CA, USA, 1st edition. ISBN 9780124104143,
positions. 9780124104945.
Kaastra, L., Boyd, M.S., 1995. Forecasting futures trading vol-
ume using neural networks, Journal of Futures Markets 15(8),
953–970. ISSN 1096-9934. doi: 10.1002/fut.3990150806.
7. Conclusion URL http://dx.doi.org/10.1002/fut.3990150806
Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classi-
fication with deep convolutional neural networks. In Advances
Deep neural networks (DNNs) are a powerful type in neural information processing systems, pp. 1097–1105.
of artificial neural network (ANN) that use several Leung, M., Daouk, H., Chen, A., 2000. Forecasting stock indices:
A comparison of classification and level estimation models,
hidden layers. In this paper we describe the imple-
International Journal of Forecasting 16(2), 173–190.
mentation and training of DNNs. We observe, for Li, M., Zhang, T., Chen, Y., Smola, A.J., 2014. Eff-
a historical dataset of 5 minute mid-prices of mul- cient minibatch training for stochastic optimization. In
tiple CME listed futures prices and other lags and Proceedings of the 20th ACM SIGKDD International Con-
ference on Knowledge Discovery and Data Mining, KDD
filters that DNNs have substantial predictive capa- ’14, pp. 661–670, New York, NY, USA. ACM. ISBN
bilities as classifiers if trained concurrently across 978-1-4503-2956-9. doi: 10.1145/2623330.2623612. URL
several markets on labelled data. We further demon- http://doi.acm.org/10.1145/2623330.2623612
strate the application of DNNs to backtesting a simple Niaki, S., Hoseinzade, S., 2013. Forecasting S&P 500 index using
artificial neural networks and design of experiments, Journal of
trading strategy and demonstrate the prediction accu- Industrial Engineering International 9(1), 1. ISSN 1735-5702.
racy and its relation to the strategy profitability. Refenes, A.-P., 1994. Neural Networks in the Capital Markets. John
All results in this paper are generated using a C++ Wiley & Sons, Inc., New York, NY, USA. ISBN 0471943649.
implementation on the Intel Xeon Phi co-processor Rojas, R., 1996. Neural Networks: A Systematic Introduction.
Springer-Verlag New York, Inc., New York, NY, USA. ISBN
which is 11.4x faster than the serial version and 3-540-60505-3.
a Python strategy backtesting environment both of Shekhar, V.K.S., Amin, M.B., 1994. A scalable parallel formu-
which are available as open source code written by the lation of the backpropagation algorithm for hypercubes and
authors. related architectures, IEEE Transactions on Parallel and Dis-
tributed Systems 5, 1073–1090.
Tomasini, E., Jaekle, U., 2011. Trading Systems. Har-
2 No benchmark has been used in the calculation of the Sharpe riman House Limited. ISBN 9780857191496. URL
ratios. https://books.google.com/books?id=xGIQSLujSmoC
M. Dixon et al. / Classiﬁcation-based ﬁnancial markets prediction using deep neural networks 77

Trippi, R.R., DeSieno, D., 1992. Trading equity index futures with Wu, X., Perloff, J.M., 2007. GMM estimation of a maximum
a neural network, The Journal of Portfolio Management 19(1), entropy distribution with interval data, Journal of Econometrics
27–33. 138, 532–546.
Vanstone, B., Hahn, T., 2010. Designing Stock Market Trading
Systems: With and Without Soft Computing. Harriman House.
ISBN 1906659583, 9781906659585.

Hubspot Social Media Marketing Certification Exam Answers 2022
100% (2)
Hubspot Social Media Marketing Certification Exam Answers 2022
8 pages
Sunbeam Popcorn Maker FPSBPP7310 FPSBPP7316
60% (10)
Sunbeam Popcorn Maker FPSBPP7310 FPSBPP7316
9 pages
Wiley - Operations Management - An Integrated Approach, 7th Edition - 978-1-119-49706-6
No ratings yet
Wiley - Operations Management - An Integrated Approach, 7th Edition - 978-1-119-49706-6
3 pages
Ford Escape 4wd Workshop Manual v6 3 0l 2008
100% (4)
Ford Escape 4wd Workshop Manual v6 3 0l 2008
7,556 pages
Statistics, Data Analysis, and Decision Modeling, 5th Edition
100% (5)
Statistics, Data Analysis, and Decision Modeling, 5th Edition
556 pages
2019 Book EssentialsOfBusinessAnalytics PDF
93% (14)
2019 Book EssentialsOfBusinessAnalytics PDF
971 pages
Plaxis 3D Tutorial
No ratings yet
Plaxis 3D Tutorial
104 pages
1-s2.0-S0957417420310228-main
No ratings yet
1-s2.0-S0957417420310228-main
12 pages
Stock Market Prediction Using Machine Learning: December 2018
No ratings yet
Stock Market Prediction Using Machine Learning: December 2018
4 pages
[email protected]
No ratings yet
[email protected]
8 pages
Financial Forecasting Using Support Vector Machine
No ratings yet
Financial Forecasting Using Support Vector Machine
10 pages
Recent Progresses in Deep Learning Based Acoustic Models: Dong Yu and Jinyu Li
No ratings yet
Recent Progresses in Deep Learning Based Acoustic Models: Dong Yu and Jinyu Li
14 pages
Stock Market Prediction Using Machine Learning: December 2018
No ratings yet
Stock Market Prediction Using Machine Learning: December 2018
4 pages
Switchable Decision: Dynamic Neural Generation Networks
No ratings yet
Switchable Decision: Dynamic Neural Generation Networks
13 pages
Stock Market Prediction Using Machine Learning: December 2018
No ratings yet
Stock Market Prediction Using Machine Learning: December 2018
4 pages
Leverage Financial News To Predict Stock Price Movements Using Word Embeddings and Deep Neural Networks
No ratings yet
Leverage Financial News To Predict Stock Price Movements Using Word Embeddings and Deep Neural Networks
6 pages
Deep Reinforcement Learning for Stock Prediction
No ratings yet
Deep Reinforcement Learning for Stock Prediction
6 pages
1506.07220
No ratings yet
1506.07220
5 pages
3005 A deep learning based stock trading model with 2-D CNN trend detection
No ratings yet
3005 A deep learning based stock trading model with 2-D CNN trend detection
8 pages
ASCAI - Adaptive Sampling For Acquiring Compact AI
No ratings yet
ASCAI - Adaptive Sampling For Acquiring Compact AI
8 pages
Deep Learning and the Cross-Section of Expected Returns
No ratings yet
Deep Learning and the Cross-Section of Expected Returns
56 pages
Stock Price Trends Prediction Paper
No ratings yet
Stock Price Trends Prediction Paper
4 pages
Final_Papre_LSTM_BiLSTM
No ratings yet
Final_Papre_LSTM_BiLSTM
7 pages
1-s2.0-S2452414X24001547-main
No ratings yet
1-s2.0-S2452414X24001547-main
11 pages
2021-Seul Ki Yeom-Pruning by explaining A novel criterion for deep neural network pruning
No ratings yet
2021-Seul Ki Yeom-Pruning by explaining A novel criterion for deep neural network pruning
14 pages
Talbi 2021
No ratings yet
Talbi 2021
37 pages
Cross-Domain_Disentanglement_A_Novel_Approach_to_Financial_Market_Prediction
No ratings yet
Cross-Domain_Disentanglement_A_Novel_Approach_to_Financial_Market_Prediction
11 pages
A Hybrid Method of Exponential Smoothing and Recurre - 2020 - International Jour
No ratings yet
A Hybrid Method of Exponential Smoothing and Recurre - 2020 - International Jour
11 pages
Sample 2 Research Paper
No ratings yet
Sample 2 Research Paper
19 pages
2023-Optimization of Microarchitecture and Dataflow For Sparse Tensor CNN Acceleration
No ratings yet
2023-Optimization of Microarchitecture and Dataflow For Sparse Tensor CNN Acceleration
15 pages
Comparative Analysis of Time-Series Forecasting Algorithms For Stock Pice Prediction 2019
No ratings yet
Comparative Analysis of Time-Series Forecasting Algorithms For Stock Pice Prediction 2019
6 pages
A Modified Adam Algorithm For Deep Neural Network Optimization
No ratings yet
A Modified Adam Algorithm For Deep Neural Network Optimization
18 pages
StockMarketPredictionTermPaper Final1
No ratings yet
StockMarketPredictionTermPaper Final1
4 pages
Yoshua Bengio, Nicolas Boulanger-Lewandowski and Razvan Pascanu
No ratings yet
Yoshua Bengio, Nicolas Boulanger-Lewandowski and Razvan Pascanu
5 pages
Enhancing Neural Network Models For MNIST Digit Recognition
No ratings yet
Enhancing Neural Network Models For MNIST Digit Recognition
6 pages
2304.04567v1
No ratings yet
2304.04567v1
17 pages
Predicting Stock Market Movement With Deep RNNS: Jason Poulos
No ratings yet
Predicting Stock Market Movement With Deep RNNS: Jason Poulos
7 pages
Modelling 05 00009
No ratings yet
Modelling 05 00009
27 pages
1-s2.0-S2199853124002324-main
No ratings yet
1-s2.0-S2199853124002324-main
15 pages
A Comparative Analysis of Gradient Descent-Based Optimization Algorithms on Convolutional Neural Networks (2)
No ratings yet
A Comparative Analysis of Gradient Descent-Based Optimization Algorithms on Convolutional Neural Networks (2)
8 pages
1906.05774v2
No ratings yet
1906.05774v2
6 pages
1-s2.0-S0952197623018018-main
No ratings yet
1-s2.0-S0952197623018018-main
11 pages
Engineering Applications of Artificial Intelligence: Mohit Beniwal, Archana Singh, Nand Kumar
No ratings yet
Engineering Applications of Artificial Intelligence: Mohit Beniwal, Archana Singh, Nand Kumar
11 pages
Shah Etal 2018 - IEEE 2018
No ratings yet
Shah Etal 2018 - IEEE 2018
9 pages
2020 Paper 1
No ratings yet
2020 Paper 1
13 pages
3559540
No ratings yet
3559540
31 pages
Generative Adversarial Networks in Time Series: A Systematic Literature Review
No ratings yet
Generative Adversarial Networks in Time Series: A Systematic Literature Review
31 pages
TT06
No ratings yet
TT06
12 pages
DL_Cie2
No ratings yet
DL_Cie2
5 pages
0640
No ratings yet
0640
7 pages
Outrageously Large Neural Networks - The Sparely-Gated Mixture-of-Experts Layer
No ratings yet
Outrageously Large Neural Networks - The Sparely-Gated Mixture-of-Experts Layer
19 pages
Improving Returns On Stock Investment Through Neural Network Selection
No ratings yet
Improving Returns On Stock Investment Through Neural Network Selection
7 pages
Amir ND Time Series Prediction
No ratings yet
Amir ND Time Series Prediction
8 pages
2103.03875v1
No ratings yet
2103.03875v1
20 pages
Sales Forecasting Using Kernel Based Support Vector Machine Algorithm
No ratings yet
Sales Forecasting Using Kernel Based Support Vector Machine Algorithm
6 pages
Image Based Classification
No ratings yet
Image Based Classification
8 pages
Mixed Precision Training
No ratings yet
Mixed Precision Training
12 pages
bi
No ratings yet
bi
6 pages
Deep Surrogates - Hui Chen
No ratings yet
Deep Surrogates - Hui Chen
59 pages
6301-Article Text-6728-1-10-20230403
No ratings yet
6301-Article Text-6728-1-10-20230403
7 pages
2408.12408v1
No ratings yet
2408.12408v1
11 pages
SNODE-PP
No ratings yet
SNODE-PP
15 pages
Mbedded Methods For Feature Selection in Neural Networks: Reprint
No ratings yet
Mbedded Methods For Feature Selection in Neural Networks: Reprint
7 pages
Improving CNN Performance With Min-Max Objective
No ratings yet
Improving CNN Performance With Min-Max Objective
7 pages
Comparison of Predictive Algorithms: Backpropagation, SVM, LSTM and Kalman Filter For Stock Market
No ratings yet
Comparison of Predictive Algorithms: Backpropagation, SVM, LSTM and Kalman Filter For Stock Market
7 pages
Algorithmic Stock Trading Based On Ensemble Deep Neural Networks Trained
No ratings yet
Algorithmic Stock Trading Based On Ensemble Deep Neural Networks Trained
11 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Consumer Reports Buying Guide 2021
100% (1)
Consumer Reports Buying Guide 2021
227 pages
Online Casino Software For Sale and Casino Software Solutions
No ratings yet
Online Casino Software For Sale and Casino Software Solutions
2 pages
A Collection of Fraud Schemes
67% (3)
A Collection of Fraud Schemes
54 pages
Microsoft AppSource Partner Listing Guidelines PDF
No ratings yet
Microsoft AppSource Partner Listing Guidelines PDF
10 pages
Resume Updated
100% (3)
Resume Updated
2 pages
Data Warehouse and Data Mining Notes
No ratings yet
Data Warehouse and Data Mining Notes
66 pages
Focus Investing PDF
No ratings yet
Focus Investing PDF
18 pages
Mindmapping in 8 Easy Steps
No ratings yet
Mindmapping in 8 Easy Steps
40 pages
Grant7 8 9 PDF
100% (1)
Grant7 8 9 PDF
26 pages
Putting Intelligent Insights To Work
100% (1)
Putting Intelligent Insights To Work
12 pages
Organizations' System Noise Creates Errors in Decision Making - McKinsey
No ratings yet
Organizations' System Noise Creates Errors in Decision Making - McKinsey
10 pages
Blackbook - Stealth Influence
100% (5)
Blackbook - Stealth Influence
104 pages
Data Analytics Concepts Techniques and A PDF
100% (11)
Data Analytics Concepts Techniques and A PDF
451 pages
GRE Text Completion and Sentence Equivalence Practice Questions
100% (2)
GRE Text Completion and Sentence Equivalence Practice Questions
32 pages
Bill Bartmann Interview PDF
No ratings yet
Bill Bartmann Interview PDF
63 pages
Medical Coding
100% (1)
Medical Coding
43 pages
QuickBooks Online Core Certification Self Study Workbook V21.2.2
100% (1)
QuickBooks Online Core Certification Self Study Workbook V21.2.2
55 pages
The Impact of Control Technology
No ratings yet
The Impact of Control Technology
246 pages
Cadastre PHD Thesis
100% (1)
Cadastre PHD Thesis
227 pages
Horizon Power Testing and Commissioning Manual PDF
No ratings yet
Horizon Power Testing and Commissioning Manual PDF
78 pages
TED Talks List
100% (2)
TED Talks List
15 pages
Immediate download Decision Sciences: Theory and Practice 1st Edition Raghu Nandan Sengupta ebooks 2024
100% (1)
Immediate download Decision Sciences: Theory and Practice 1st Edition Raghu Nandan Sengupta ebooks 2024
55 pages
SCOTUS Filing: Arizona Lawsuit, Kari Lake and Mark Finchem
No ratings yet
SCOTUS Filing: Arizona Lawsuit, Kari Lake and Mark Finchem
176 pages
(PDF) Introduction To Selling Value - Course-Final
No ratings yet
(PDF) Introduction To Selling Value - Course-Final
75 pages
Correlational Research and Survey Research
No ratings yet
Correlational Research and Survey Research
31 pages
Capitulo 1 Sandler
No ratings yet
Capitulo 1 Sandler
24 pages
ECEM - Analysis of Pin Jointed Plane Trusses - Method of Joints
100% (1)
ECEM - Analysis of Pin Jointed Plane Trusses - Method of Joints
15 pages
Introduction To TKS Solver
No ratings yet
Introduction To TKS Solver
67 pages
DLL - All Subjects 2 - Q2 - W1 - D2
No ratings yet
DLL - All Subjects 2 - Q2 - W1 - D2
7 pages
8 Influence Lines
No ratings yet
8 Influence Lines
40 pages
FANUC Ladder 0i-M Doosan
No ratings yet
FANUC Ladder 0i-M Doosan
349 pages
Elementos Finitos Ejemplos
No ratings yet
Elementos Finitos Ejemplos
12 pages
Home Work (Satistics AIUB)
No ratings yet
Home Work (Satistics AIUB)
5 pages
Class 1
No ratings yet
Class 1
32 pages
Lecture-18-19-20 Introduction To Digital Control Systems Dr. Imtiaz Hussain
No ratings yet
Lecture-18-19-20 Introduction To Digital Control Systems Dr. Imtiaz Hussain
60 pages
Multiple Choice Questions in Engineering Mathematics by Diego Inocencio T. Gillesania
90% (10)
Multiple Choice Questions in Engineering Mathematics by Diego Inocencio T. Gillesania
216 pages
Research Article Research Progress of Hot Air Drying Technology For Fruits and Vegetables
No ratings yet
Research Article Research Progress of Hot Air Drying Technology For Fruits and Vegetables
7 pages
Methods in Molecular Biology Volume Vol. 857
No ratings yet
Methods in Molecular Biology Volume Vol. 857
432 pages
Ganak
No ratings yet
Ganak
7 pages
X10 RF Receiver
No ratings yet
X10 RF Receiver
12 pages
DLL - Mathematics 4 - Q1 - W7
No ratings yet
DLL - Mathematics 4 - Q1 - W7
4 pages
Factoring Polynomials (Double Jeopardy)
No ratings yet
Factoring Polynomials (Double Jeopardy)
21 pages
Cathodic Protection of An Oil Platform Jacket
No ratings yet
Cathodic Protection of An Oil Platform Jacket
17 pages
Edsc 304 - Student Example Graphic Organizer
No ratings yet
Edsc 304 - Student Example Graphic Organizer
1 page
Properties of A Well Written Paragraph
No ratings yet
Properties of A Well Written Paragraph
17 pages
11th Science
No ratings yet
11th Science
2 pages
Eagle Point Mannual 2005 (Road Calc)
100% (7)
Eagle Point Mannual 2005 (Road Calc)
22 pages
Design Considerations For The Vibration of Floors - Part 2: Advisory Desk
No ratings yet
Design Considerations For The Vibration of Floors - Part 2: Advisory Desk
3 pages
Negarestani - Synechrestic Critique of the Aesthetic
No ratings yet
Negarestani - Synechrestic Critique of the Aesthetic
8 pages
Resource Progress Curves in Primavera P6
100% (1)
Resource Progress Curves in Primavera P6
7 pages
CAA Case Study - Zeenat Chady PDF
No ratings yet
CAA Case Study - Zeenat Chady PDF
1 page
Conductor Spacing
No ratings yet
Conductor Spacing
5 pages
Module-4-and-5-python
No ratings yet
Module-4-and-5-python
5 pages

Classification-Based Financial Markets Prediction Using Deep Neural Networks

Uploaded by

Classification-Based Financial Markets Prediction Using Deep Neural Networks

Uploaded by

Algorithmic Finance 6 (2017) 67–77 67

Classification-based financial markets

1. Introduction by advances in modern computer architecture (Chen

Stochastic Gradient Descent. Following Rojas

Algorithm 1 Stochastic Gradient Descent 3. The data

Algorithm 2 Deep Learning Methodology

13: end for

lar to the approach taken by Krizhevsky et al. (2012) 5. Results

Note that the worst five performing instruments

The paper has thus far considered the predictive

Using the optimized model, the expected P&L of

Fig. 8. This figure shows a box plot of the distribution of the

Figure 8 shows the range of annualized Sharpe Acknowledgments

You might also like