0% found this document useful (0 votes)
19 views

Classification-Based Financial Markets Prediction Using Deep Neural Networks

This document discusses using deep neural networks to predict the direction of financial market movements. It describes training a single deep neural network model using data from 43 commodities and currencies. The model achieved sample mean annualized Sharpe Ratios as high as 3.29 when backtested in a walk-forward manner.

Uploaded by

khadijamalak95
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Classification-Based Financial Markets Prediction Using Deep Neural Networks

This document discusses using deep neural networks to predict the direction of financial market movements. It describes training a single deep neural network model using data from 43 commodities and currencies. The model achieved sample mean annualized Sharpe Ratios as high as 3.29 when backtested in a walk-forward manner.

Uploaded by

khadijamalak95
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Algorithmic Finance 6 (2017) 67–77 67

DOI:10.3233/AF-170176
IOS Press

Classification-based financial markets


prediction using deep neural networks
Matthew Dixona,∗ , Diego Klabjanb and Jin Hoon Bangc
a Stuart School of Business, Illinois Institute of Technology, Chicago, IL, USA
b Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, IL, USA
c Department of Computer Science, Northwestern University, Evanston, IL, USA

Abstract. Deep neural networks (DNNs) are powerful types of artificial neural networks (ANNs) that use several hid-
den layers. They have recently gained considerable attention in the speech transcription and image recognition community
(Krizhevsky et al., 2012) for their superior predictive properties including robustness to overfitting. However their appli-
cation to algorithmic trading has not been previously researched, partly because of their computational complexity. This
paper describes the application of DNNs to predicting financial market movement directions. In particular we describe the
configuration and training approach and then demonstrate their application to backtesting a simple trading strategy over
43 different Commodity and FX future mid-prices at 5-minute intervals. All results in this paper are generated using a
C++ implementation on the Intel Xeon Phi co-processor which is 11.4x faster than the serial version and a Python strategy
backtesting environment both of which are available as open source code written by the authors.

1. Introduction by advances in modern computer architecture (Chen


et al., 2013; Niaki and Hoseinzade, 2013; Vanstone
Many of the challenges facing methods of financial and Hahn, 2010).
econometrics include non-stationarity, non-linearity A deep neural network (DNN) is an artificial neu-
or noisiness of the time series. While the applica- ral network with multiple hidden layers of units
tion of artificial neural networks (ANNs) to time between the input and output layers. They have
series methods are well documented (Faraway and been popularized in the artificial intelligence com-
Chatfield, 1998; Refenes, 1994; Trippi and DeSieno, munity for their successful use in image classification
1992; Kaastra and Boyd, 1995) their proneness to (Krizhevsky et al., 2012) and speech recognition. The
over-fitting, convergence problems, and difficulty field is referred to as "Deep Learning".
of implementation raised concerns. Moreover, their In this paper, we shall use DNNs to partially
departure from the foundations of financial econo- address some of the historical deficiencies of ANNs.
metrics alienated the financial econometrics research Specifically, we model complex non-linear rela-
community and finance practitioners. tionships between the independent variables and
However, algotrading firms employ computer sci- dependent variable and reduced tendency to overfit.
entists and mathematicians who are able to perceive In order to do this we shall exploit advances in low
ANNs as not merely black-boxes, but rather a cost many-core accelerator platform to train and tune
semi-parametric approach to modeling based on min- the parameters of our model.
imizing an entropy function. As such, there has been For financial forecasting, especially in multivari-
a recent resurgence in the method, in part facilitated ate forecasting analysis, the feed-forward topology
has gained much more attention and shall be the
∗ Corresponding author: Matthew Dixon, Stuart School of approach used here. Back-propagation and gradient
Business, Illinois Institute of Technology, 10 West 35th Street, descent have been the preferred method for training
Chicago, IL 60616, USA. E-mail: [email protected]. these structures due to the ease of implementation

2158-5571/17/$35.00 © 2017 – IOS Press and the authors. All rights reserved
68 M. Dixon et al. / Classification-based financial markets prediction using deep neural networks

and their tendency to converge to better local optima sample mean Annualized Sharpe Ratios as high as
in comparison with other trained models. However, 3.29 with a standard deviation of 1.12.
these methods can be computationally expensive, So in summary, our approach differs from other
especially when used to train DNNs. financial studies described in the literature in two
There are many training parameters to be con- distinct ways:
sidered with a DNN, such as the size (number of
1. ANNs are applied to historical prices on an indi-
layers and number of units per layer), the learn-
vidual symbol and here 43 commodities and FX
ing rate and initial weights. Sweeping through the
futures traded on the CME have been combined.
parameter space for optimal parameters is not fea-
Furthermore time series of lags, moving aver-
sible due to the cost in time and computational
ages and moving correlations have been generated
resources. We shall use mini-batching (computing
to capture memory and co-movements between
the gradient on several training examples at once
symbols. Thus we have generated a richer dataset
rather than individual examples) as one common
for the DNN to explore complex patterns.
approach to speeding up computation. We go fur-
2. ANNs are applied as a regression, whereas here
ther by expressing the back-propagation algorithm
the output is one of {−1, 0, 1} representing a nega-
in a form that is amenable to fast performance on
tive, flat or positive price movement respectively.
an Intel Xeon Phi co-processor (Jeffers and Rein-
The threshold for determining the zero state is
ders, 2013). General purpose hardware optimized
set to 1 × 10−3 (this is chosen to balance the
implementations of the back-propagation algorithm
class labels). The caveat is that restriction to a
are described by Shekhar and Amin (1994), how-
discrete set of output states may not replace a clas-
ever our approach is tailored for the Intel Xeon Phi
sical financial econometric technique, but it may
co-processor.
be applicable for simple trading strategies which
The main contribution of this paper is to describe
rely on the sign, and not the magnitude, of the
the application of deep neural networks to financial
forecasted price.
time series data in order to classify financial mar-
ket movement directions. Traditionally, researchers In the following section we introduce the
will iteratively experiment with a handful of sig- back-propagation learning algorithm and use mini-
nals to train a level based method, such as vector batching to express the most compute intensive
autoregression, for each instrument (see for exam- equations in matrix form. Once expressed in matrix
ple Kaastra and Boyd (1995); Refenes (1994); Trippi form, hardware optimized numerical linear algebra
and DeSieno (1992)). More recently, however, Leung routines are used to achieve an efficient mapping of
et al., (2000) provide evidence that classification the algorithm on to the Intel Xeon Phi co-processor.
based methods outperform level based methods in Section 3 describes the preparation of the data used
the prediction of the direction of stock movement and to train the DNN. Section 4 describes the implemen-
trading returns maximization. tation of the DNN. Section 5 then presents results
Using 5 minute interval prices from June 1989 to measuring the performance of a DNN. Finally in Sec-
March 2013, our approach departs from the literature tion 6, we demonstrate the application of DNNs to
by using state-of-the-art parallel computing architec- backtesting using a walk forward methodology, and
ture to simultaneously train a single model from a provide performance results for a simple buy-hold-
large number of signals across multiple instruments, sell strategy.
rather than using one model for each instrument.
By aggregating the data across multiple instruments
and signals, we enable the model to capture a richer 2. Deep neural network classifiers
set of information describing the time-varying co-
movements across signals for each instrument price We begin with mathematical preliminaries. Let D
movement. Our results show that our model is able denote the historical dataset of M features and N
to predict the direction of instrument movement to, observations. We draw a training subset Dtrain ⊂ D
on average, 42% accuracy with a standard deviation of Ntrain observations and a test subset of Dtest ⊂ D
across instruments of 11%. In some cases, we are of Ntest observations.
able to predict as high as 68%. We further show how Denote the nth observation (feature vector) as
backtesting accuracy translates into the P&L for a xn ∈ Dtrain . In an ANN, each element of the vec-
simple long-only trading strategy and demonstrate tor becomes a node in the input layer, as illustrated
M. Dixon et al. / Classification-based financial markets prediction using deep neural networks 69

in Fig. 1 below for the case when there are 7 input where for a fully connected feed-forward network,
(l)
variables (features) per observation. In a fully con- sj is the weighted sum of outputs from the previous
nected feed-forward network, each node is connected layer l − 1 that connect to node j in layer l:
to every node in the next layer. Although not shown
n(l−1)
in the figure, associated with each edge between the (l)
 (l) (l−1) (l)
ith node in the previous layer and the j th node in the sj = wij xi + biasj , (3)
(l) i=1
current layer l is a weight wij .
In order to find optimal weightings w := where n(l) are the number of nodes in layer l. The
{w(l) }l:=1→L between nodes in a fully connected feed (L)
gradient of the likelihood function w.r.t. sk takes
forward network with L layers, we seek to minimize the simple form:
a cross-entropy function1 of the form
∂e(w)
(L)
= ŷk − yk . (4)
N
test K
 ∂sk
E(w) = − en (w), en (w) := ykn ln (ŷkn ) .
n=1 k=1
The recursion relation for the back propagation using
(1) conjugate gradients is:
For clarity of exposition, we drop the subscript n. n(l−1)
The binary target y and output variables ŷ have a 1-of- (l−1)
 (l) (l) (l−1) (l−1)
δi = δj wij σ(si )(1 − σ(si )), (5)
ks encoding for each symbol, where yk ∈ {0, 1} and
 i+ks j=1
k=i ŷk = 1 and i = {1, 1 + ks , 1 + 2ks . . . , K − ks }, so
that each output state associated with a symbol can where we have used the analytic form of the derivative
be interpreted as a probabilistic weighting. To both of the sigmoid function
ensure analytic gradient functions under the cross-
entropy error measure and that the probabilities of σ  (v) = σ(v)(1 − σ(v)) (6)
each state sum to unity, the output layer is activated
with a softmax function of the form to activate all hidden layer nodes. So in summary, a
(L)
trained feed-forward network can be used to predict
exp(sk ) the probability of an output state (or class) for each
ŷk := φsoftmax (s(L) ) = k (L)
, (2)
s of the symbols concurrently, given any observation
j=1 exp(sj )
as an input, by recursively applying Equation 3.
The description of how the network is trained now
follows.

Stochastic Gradient Descent. Following Rojas


(1996), we now revisit the backpropagation learning
algorithm based on the method of stochastic gradient
descent (SGD) algorithm. Despite only being first
order, SGD serves as the optimization method of
choice for DNNs due to the highly non-convex form
of the utility function (see for example Li et al.
(2014)). After random sampling of an observation
i, the SGD algorithm updates the parameter vector
w(l) for the lth layer using
Fig. 1. An illustrative example of a feed-forward neural network
with two hidden layers, seven features and two output states. Deep
learning network classifiers typically have many more layers, use w(l) = w(l) − γ∇Ei (w(l) ), (7)
a large number of features and several output states or classes. The
goal of learning is to find the weight on every edge that minimizes
the out-of-sample error measure.
where γ is the learning rate. A high level description
of the sequential version of the SGD algorithm is
1 The use of entropy in econometrics research has been well given in Algorithm 1. Note that for reasons of keep-
established (see for example Golan et al., (1996); Wu and Perloff ing the description simple, we have avoided some
(2007)). subtleties of the implementation.
70 M. Dixon et al. / Classification-based financial markets prediction using deep neural networks

Algorithm 1 Stochastic Gradient Descent 3. The data


1: w ← r, ri ∈ N (μ, σ), ∀i
2: E ← 0
Our historical dataset contains 5 minute mid-prices
3: for i = 0 to n − 1 do
for 43 CME listed commodity and FX futures from
4: E ← E + Ei (w) March 31st 1991 to September 30th, 2014. We use
5: end for
the most recent fifteen years of data because the pre-
6: while E ≥ τ do
vious period is less liquid for some of the symbols,
7: for t = 0 to n − 1 do resulting in long sections of 5 minute candles with
8: i ← sample with replacement in [0, n − 1] no price movement. Each feature is normalized by
9: w ← w − γ∇Ei (w) subtracting the mean and dividing by the standard
10: end for deviation. The training set consists of 25,000 con-
11: E←0 secutive observations and the test set consists of the
12: for i = 0 to n − 1 do next 12,500 observations. As described in Section 6,
13: E ← E + Ei (w) these sets are rolled forward ten times from the start
14: end for of the liquid observation period, in 1000 observation
15: end while
period increments, until the final 37,500 observa-
tions from March 31st, 2005 until the end of the
dataset.
2.1. Mini-batching The overall training dataset consists of the aggre-
gate of feature training sets for each of the symbols.
It is well known that mini-batching improves the The training set of each symbol consists of price dif-
computational performance of the feedforward and ferences and engineered features including lagged
back-propagation computations (Shekhar and Amin, prices differences from 1 to 100, moving price aver-
1994; Li et al., 2014). We process b observations in ages with window sizes from 5 to 100, and pair-wise
one mini-batch. This results in a change to the SGD correlations between the returns and the returns of all
algorithm and the dimensions of data-structures that other symbols. The overall training set contains 9895
are used to store variables. In particular, δ, x, s and features. The motivation for including these features
E now have a batch dimension. Note however that in the model is to capture memory in the historical
the dimensions of w(l) remain the same. The above data and co-movements between symbols.
equations can be now be modified.
With slight abuse of notation, we redefine the
dimension δ(l) , X(l) , S (l) ∈ Rnl ×b , ∀l, E ∈ RnL ×b , 4. Implementation
where b is the size of the mini-batch.
Crucially for computational performance of the The architecture of our network contains five
mini-batching, the computation of the sum in each learned fully connected layers. The first of the four
layer of the feed-forward network can be expressed hidden layers contains 1000 neurons and each subse-
as a matrix-matrix product: quent layer is tapered by 100. The final layer contains
 
(l−1) T (l) 129 output neurons - three values per symbol of each
S (l) = Xi w . (8) of the 43 futures contracts. The result of including a
large number of features and multiple hidden layers
For the ith neuron in output layer L and the j th obser- is that there are 12,174,500 weights in total.
vation in the mini-batch The weights are initialized with an Intel MKL VSL
(L) (L) (L)
δij = σij (1 − σij )Eij . (9) random number generator implementation that uses
the Mersenne Twistor (MT19937) routine. Gaussian
For all intermediate layers l < L, the recursion rela- random numbers are generated from transforming the
tion for δ is uniform random numbers with an inverse Gaussian
(l−1) (l) (l) (l) (l) cumulative distribution function with zero mean and
δij = σij (1 − σij )wij δij . (10)
standard deviation of 0.01. We initialized the neuron
The weights are updated with matrix-matrix products biases in the hidden layers with the constant 1.
for each layer We used the same learning rate for all layers. The
 T learning rate was adjusted according to a heuristic
w(l) = γX(l−1) δ(l) . (11) which is described in Algorithm 2 below and is simi-
M. Dixon et al. / Classification-based financial markets prediction using deep neural networks 71

Algorithm 2 Deep Learning Methodology


1: for γ := 0.1, 0.2, . . . , 1 do
(l)
2: wi,j ← r, r ∈ N (μ, σ), ∀i, j, l 䉯Initialize all weights
3: for e = 1, . . . , Ne do 䉯Iterate over epochs
4: Generate De
5: for m = 1, . . . , M do 䉯Iterate over mini-batches
6: Generate Dm
7: for l = 2, . . . , L do
(l)
8: Compute all xj 䉯Feed-Forward network construction
9: end for
10: for l = L, . . . , 2 do
(l)
11: Compute all δj := ∇s(l) E 䉯Backpropagation
j
 
(l) T
12: Update the weights: w(l) ← w(l) − γX δ (l−1)

13: end for


14: end for
15: end for
16: If cross entropy(e) ≤ cross entropy(e-1) then γ ← γ/2
17: end for
(l)
18: Return final weights wi,j

lar to the approach taken by Krizhevsky et al. (2012) 5. Results


except that we use cross entropy rather than the val-
idation error. We sweep the parameter space of the This section describes the backtesting of DNNs for
learning rate from [0.1, 1] with increments of 0.1. a simple algo-trading strategy. The purpose is to tie
We further divide the learning rate γ by 2 if the together classification accuracy with strategy perfor-
cross-entropy does not decrease between epochs. mance measurements and is not intended to provided
In Algorithm 2, the subset of the training set used an exhaustive exploration of trading strategies or their
for each epoch is defined as performance. For each symbol, we calculate the clas-
sification accuracies for each 130 day moving test
De := {xnk ∈ Dtrain | nk ∈ U(1, Ntrain ), window. This is repeated to give a set of ten clas-
k := 1, . . . , Nepoch } (12) sification errors. Figure 2 shows a box plot of the
classification accuracy of the DNN for all the 43 CME
and the mini-batch with in each epoch set is defined Commodity and FX futures. Each symbol is repre-
as sented by a box and whisker vertical bar - the box
represents the region between the lower and upper
quartiles of the sample distribution of classification
Dm := {xnk ∈ Dep | nk ∈ U(1, Nepoch ),
accuracies. The median of the sample distribution
k := 1, . . . , Nmini-batch }. (13) of classification accuracies is represented as a red
horizontal line.
As mentioned earlier, the mini-batching formu- Figure 3 below shows the distribution of the aver-
lation of the algorithm facilitates efficient parallel age classification accuracy over 10 samples of the
implementation, the details and timings of which are DNN across the 43 CME Commodity and FX futures.
described by Dixon et al. (2015). The overall time There’s a heavier density around an accuracy of 0.35
to train a DNN on an Intel Xeon Phi using the data which is slightly better than a random selection.
described above is approximately 8 hours when fac- Table 1 shows the top five instruments for which
toring in time for calculation of error measures on the the sample mean of the classification rate was highest
test set and thus the training can be run as an overnight on average over the ten walk forward experiments.
batch job should daily retraining be necessary. This Also shown are the F1-scores (‘harmonic means’)
is 11.4x faster than running the serial version of the which are considered to be a more robust measure of
algorithm. performance due to less sensitivity to class imbalance
72 M. Dixon et al. / Classification-based financial markets prediction using deep neural networks

Fig. 2. This figure shows the classification accuracy of the DNN applied to 43 CME Commodity and FX futures. Each symbol is represented
by a box and whisker vertical bar - the box represents the region between the lower and upper quartiles of the sample distribution of
classification accuracies. The median of the sample distribution of classification accuracies is represented as a red horizontal line.

Note that the worst five performing instruments


performed no better or even worse than white noise
on average over the ten experiments.

6. Strategy backtesting

The paper has thus far considered the predictive


properties of the deep neural network. Using com-
modity futures historical data at 5 minute intervals
over the period from March 31st 1991 to September
30th, 2014 this section describes the application of a
walk forward optimization approach for backtesting
a simple trading strategy.
Following the walk forward optimization approach
described in Tomasini and Jaekle (2011), an initial
Fig. 3. This figure shows the distribution of the average classifi- optimization window of 25,000 5-minute observa-
cation accuracy of the DNN applied to 43 CME Commodity and tion periods or approximately 260 days (slightly
FX futures. more than a year) is chosen for training the model
using all the symbol data and their engineered time
series. The learning rate range is swept to find the
than classification accuracies. The mean and stan- model which gives the best out-of-sample prediction
dard deviation of the sample averaged classification rate - the highest classification rate on the out-
accuracies and F1-scores over the 43 futures are also of-sample (‘hold-out’) set consisting of 12,500
provided. consecutive and more recent observations.
M. Dixon et al. / Classification-based financial markets prediction using deep neural networks 73

Table 1
This table shows the top five instruments for which the sample mean of the classification rate was highest over the ten
walk forward experiments. F1-scores are also provided. The mean and standard deviation of the sample mean
classification accuracies and F1-scores over the 43 futures are also provided
Symbol Futures Classification F1-score
Accuracy
HG Copper 0.68 0.59
ST Transco Zone 6 Natural Gas (Platts Gas Daily) Swing 0.67 0.54
ME Gulf Coast Jet (Platts) Up-Down 0.67 0.54
TU Gasoil 0.1 Cargoes CIF NWE (Platts) vs. Low Sulphur Gasoil 0.56 0.52
MI Michigan Hub 5 MW Off-Peak Calendar-Month Day-Ahead Swap 0.55 0.5
mean - 0.42 0.37
std - 0.11 0.1

Using the optimized model, the expected P&L of


the trading strategy is then evaluated over the out-
of-sample period consisting of 12,500 consecutive
5-minute observation periods or approximately 130
days. Even though all symbols are trained together
using one DNN model, the cumulative P&L is calcu-
lated independently for each symbol. As illustrated
in Fig. 4, this step is repeated by sliding the training
window forward by 1000 observation periods
and repeating the out-of-sample error analysis
Fig. 4. An illustration of the walk forward optimization method and strategy performance measurement for ten
used for backtesting the strategy. windows.

Fig. 5. This figure shows a box plot of the sample distribution of the time-averaged daily returns of the strategy applied separately to each of
the 43 CME Commodity and FX futures over the 130 day trading horizons. The red square with a black border denotes the sample average
for each symbol.
74 M. Dixon et al. / Classification-based financial markets prediction using deep neural networks

Fig. 8. This figure shows a box plot of the distribution of the


annualized Sharpe ratios sampled over ten walk forward exper-
iments of 12,500 observation points. Only the top five performing
Fig. 6. This figure shows the cumulative unrealized net dollar profit futures contracts have been considered. The simple trading strat-
of a simple strategy. In order to quantify the impact of information egy is described above. Key: PL: Platinum, NQ: E-mini NASDAQ
loss, the profit under perfect forecasting information is denoted 100 Futures, AD: Australian Dollar, BP: British Pound, ES: E-mini
as ‘perfect foresight’ (green line) and the profit using the DNN S&P 500 Futures.
prediction is denoted as ‘predict’ (blue line). The graph is shown
for one 130 day trading horizon in front month Platinum (PL)
futures. or decrease over the next time interval respectively.
For simplicity, the strategy only places one lot market
6.1. Example trading strategy orders. The strategy closes out a short position and
takes a long position if the label is 1, holds the posi-
In order to demonstrate the application of DNNs tion if the label is zero and closes out the long position
to algorithmic trading, a simple buy-hold-sell intra- and takes a short position if the label is –1. In calcu-
day trading strategy is chosen contingent on whether lating the cumulative unrealized P&L, the following
the instrument price is likely to increase, be neutral, simplifying assumptions are made:

Fig. 7. This figure shows a box plot of the maximum drawdown of a simple strategy applied over ten walk forward experiments for each
symbol.
M. Dixon et al. / Classification-based financial markets prediction using deep neural networks 75

Table 2
This table shows the top five instruments for which the mean annualized Shape ratio was highest on average over
the ten walk forward optimizations. The values in parentheses denote the standard deviation over the ten
experiments. Also shown, are the mean and standard deviation of the Capability ratios under
the assumption of normality of returns
Symbol Futures Annualized Sharpe Ratio Capability Ratio
PL Platinum 3.29 (1.12 ) 12.51 (4.27)
NQ E-mini NASDAQ 100 Futures 2.07 (2.11) 7.89 (8.03)
AD Australian Dollar 1.48 (1.09) 5.63 (4.13)
BP British Pound 1.29 (0.90) 4.90 (3.44)
ES E-mini S&P 500 Futures 1.11 (1.69 ) 4.22 (6.42)

Table 3
r The placing of 1 lot market orders at 5 minute
This table lists the initial margin, maintenance margin and intervals has no significant impact on the market
contract size specified by the CME used to calculate the
cumulative P&L and strategy performance for the top five and thus the forecast does not account for limit
performing futures positions order book dynamics in response to the trade
Symbol initial margin maint. margin contract size execution.
PL 2090 1900 50
NQ 5280 4800 50 These assumptions, especially those concerning
AD 1980 1800 100000 trade execution and absence of live simulation in
BP 2035 1850 62500 the backtesting environment are of course inadequate
ES 5225 4750 50
to demonstrate alpha generation capabilities of the
DNN based strategy but serve as a starting point for
commercial application of this research.
r the account is opened with $100k of USD; Returns of the strategy are calculated by first aggre-
r there is sufficient surplus cash available in gating intraday P&L changes to daily returns and then
order to always maintain the brokerage account annualizing them.
margin, through realization of the profit or oth- Figure 5 show a box plot of the sample distribu-
erwise; tion of the time-averaged daily returns of the strategy
r there are no limits on the minimum or maxi- applied separately to each of the 43 CME front month
mum holding period and positions can be held Commodity and FX futures over the 130 day trading
overnight; horizons.
r the margin account is assumed to accrue zero Figure 6 compares the cumulative unrealized net
interest; dollar profit of the strategy for the case when perfect
r transaction costs are ignored; forecasting information is available (‘perfect fore-
r no operational risk measures are deployed, such sight’) against using the DNN prediction (‘predict’).
as placing stop-loss orders. The graph is shown for one 130 day trading horizon
r the market is always sufficiently liquid that a for front month Platinum (PL) futures.
market order gets filled immediately at the mid- Figure 7 shows a box plot of the maximum draw-
price listed at 5 minute intervals and so slippage down of a simple strategy applied over ten walk
effects are ignored; and forward experiments for each symbol.

Table 4
This table shows the correlation of the daily returns of the strategy on each of the five most liquid instruments in the
list of 43 CME futures with their relevant ETF benchmarks. The values represent the summary statistics of the
correlations over the ten experiments. Key: NQ: E-mini NASDAQ 100 Futures, DJ: DJIA ($10) Futures, ES:
E-mini S&P 500 Futures, YM: E-mini Dow ($5) Futures, EC: Euro FX Futures.
Symbol Benchmark ETF Mean Correlation Max Min
Std. Dev.
NQ PowerShares QQQ ETF (QQQ) 0.013 0.167 0.237 –0.282
DJ SPDR Dow Jones Industrial Average ETF (DIA) 0.008 0.194 0.444 –0.257
ES SPDR S&P 500 ETF (SPY) –0.111 0.110 0.057 –0.269
YM SPDR Dow Jones Industrial Average ETF (DIA) –0.141 0.146 0.142 –0.428
EC CurrencyShares Euro ETF (FXE) –0.135 0.108 0.154 –0.229
76 M. Dixon et al. / Classification-based financial markets prediction using deep neural networks

Figure 8 shows the range of annualized Sharpe Acknowledgments


ratios measured over each moving period of 12,500
observation points for the top five performing futures The authors gratefully acknowledge the support
contracts2 . This figure is also supplemented by of Intel Corporation in funding this research and the
Table 2 which shows the top five instruments for anonymous reviewers for useful comments.
which the sample mean of the annualized Sharpe ratio
was highest over the ten walk forward experiments.
The values in parentheses denote the standard devi- References
ation over the ten experiments. Also shown, are the
sample mean and standard deviations of the Capabil- Chen, J., Diaz, J.F., Huang, Y.F., 2013. High technology ETF fore-
ity ratios (where n = 130) under the assumption of casting: Appli-cation of Grey Relational Analysis and Artificial
Neural Networks, Frontiers in Finance and Economics 10(2),
normality of returns.
129–155.
Table 3 shows the correlation of the daily returns Dixon, M., Klabjan, D., Bang, J.H., 2015. Implementing
of the strategy on each of the five most liquid instru- deep neural networks for financial market prediction on
ments in the list of 43 CME futures with their relevant the Intel Xeon Phi. In Proceedings of the 8th Work-
shop on High Performance Computational Finance, WHPCF
ETF benchmarks. The values represent the summary ’15, pp. 6:1–6:6, New York, NY, USA. ACM. ISBN
statistics of the correlations over the ten experiments. 978-1-4503-4015-1. doi: 10.1145/2830556.2830562. URL
When averaged over ten experiments, the strategy http://doi.acm.org/10.1145/2830556.2830562
returns are observed to be weakly correlated with Faraway, J., Chatfield, C., 1998. Time series forecasting with neu-
ral networks: A comparative study using the air line data,
the benchmark returns and, in any given experi- Journal of the Royal Statistical Society: Series C (Applied
ment, the absolute value of the correlations are all Statistics) 47(2), 231–250.
under 0.5. Golan, A., Judge, G., Miller, D., 1996. Maximum Entropy Econo-
Table 4 lists the initial margin, maintenance mar- metrics: Robust Estimation with Limited Data. John Wiley &
Sons, March. ISBN 0471953113.
gin and contract size specified by the CME used to Jeffers, J., Reinders, J., 2013. Intel Xeon Phi Coprocessor High Per-
calculate the cumulative unrealized P&L and strat- formance Pro-gramming. Morgan Kaufmann Publishers Inc.,
egy performance for the top five performing futures San Francisco, CA, USA, 1st edition. ISBN 9780124104143,
positions. 9780124104945.
Kaastra, L., Boyd, M.S., 1995. Forecasting futures trading vol-
ume using neural networks, Journal of Futures Markets 15(8),
953–970. ISSN 1096-9934. doi: 10.1002/fut.3990150806.
7. Conclusion URL http://dx.doi.org/10.1002/fut.3990150806
Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classi-
fication with deep convolutional neural networks. In Advances
Deep neural networks (DNNs) are a powerful type in neural information processing systems, pp. 1097–1105.
of artificial neural network (ANN) that use several Leung, M., Daouk, H., Chen, A., 2000. Forecasting stock indices:
A comparison of classification and level estimation models,
hidden layers. In this paper we describe the imple-
International Journal of Forecasting 16(2), 173–190.
mentation and training of DNNs. We observe, for Li, M., Zhang, T., Chen, Y., Smola, A.J., 2014. Eff-
a historical dataset of 5 minute mid-prices of mul- cient minibatch training for stochastic optimization. In
tiple CME listed futures prices and other lags and Proceedings of the 20th ACM SIGKDD International Con-
ference on Knowledge Discovery and Data Mining, KDD
filters that DNNs have substantial predictive capa- ’14, pp. 661–670, New York, NY, USA. ACM. ISBN
bilities as classifiers if trained concurrently across 978-1-4503-2956-9. doi: 10.1145/2623330.2623612. URL
several markets on labelled data. We further demon- http://doi.acm.org/10.1145/2623330.2623612
strate the application of DNNs to backtesting a simple Niaki, S., Hoseinzade, S., 2013. Forecasting S&P 500 index using
artificial neural networks and design of experiments, Journal of
trading strategy and demonstrate the prediction accu- Industrial Engineering International 9(1), 1. ISSN 1735-5702.
racy and its relation to the strategy profitability. Refenes, A.-P., 1994. Neural Networks in the Capital Markets. John
All results in this paper are generated using a C++ Wiley & Sons, Inc., New York, NY, USA. ISBN 0471943649.
implementation on the Intel Xeon Phi co-processor Rojas, R., 1996. Neural Networks: A Systematic Introduction.
Springer-Verlag New York, Inc., New York, NY, USA. ISBN
which is 11.4x faster than the serial version and 3-540-60505-3.
a Python strategy backtesting environment both of Shekhar, V.K.S., Amin, M.B., 1994. A scalable parallel formu-
which are available as open source code written by the lation of the backpropagation algorithm for hypercubes and
authors. related architectures, IEEE Transactions on Parallel and Dis-
tributed Systems 5, 1073–1090.
Tomasini, E., Jaekle, U., 2011. Trading Systems. Har-
2 No benchmark has been used in the calculation of the Sharpe riman House Limited. ISBN 9780857191496. URL
ratios. https://books.google.com/books?id=xGIQSLujSmoC
M. Dixon et al. / Classification-based financial markets prediction using deep neural networks 77

Trippi, R.R., DeSieno, D., 1992. Trading equity index futures with Wu, X., Perloff, J.M., 2007. GMM estimation of a maximum
a neural network, The Journal of Portfolio Management 19(1), entropy distribution with interval data, Journal of Econometrics
27–33. 138, 532–546.
Vanstone, B., Hahn, T., 2010. Designing Stock Market Trading
Systems: With and Without Soft Computing. Harriman House.
ISBN 1906659583, 9781906659585.

You might also like