Classification-Based Financial Markets Prediction Using Deep Neural Networks
Classification-Based Financial Markets Prediction Using Deep Neural Networks
DOI:10.3233/AF-170176
IOS Press
Abstract. Deep neural networks (DNNs) are powerful types of artificial neural networks (ANNs) that use several hid-
den layers. They have recently gained considerable attention in the speech transcription and image recognition community
(Krizhevsky et al., 2012) for their superior predictive properties including robustness to overfitting. However their appli-
cation to algorithmic trading has not been previously researched, partly because of their computational complexity. This
paper describes the application of DNNs to predicting financial market movement directions. In particular we describe the
configuration and training approach and then demonstrate their application to backtesting a simple trading strategy over
43 different Commodity and FX future mid-prices at 5-minute intervals. All results in this paper are generated using a
C++ implementation on the Intel Xeon Phi co-processor which is 11.4x faster than the serial version and a Python strategy
backtesting environment both of which are available as open source code written by the authors.
2158-5571/17/$35.00 © 2017 – IOS Press and the authors. All rights reserved
68 M. Dixon et al. / Classification-based financial markets prediction using deep neural networks
and their tendency to converge to better local optima sample mean Annualized Sharpe Ratios as high as
in comparison with other trained models. However, 3.29 with a standard deviation of 1.12.
these methods can be computationally expensive, So in summary, our approach differs from other
especially when used to train DNNs. financial studies described in the literature in two
There are many training parameters to be con- distinct ways:
sidered with a DNN, such as the size (number of
1. ANNs are applied to historical prices on an indi-
layers and number of units per layer), the learn-
vidual symbol and here 43 commodities and FX
ing rate and initial weights. Sweeping through the
futures traded on the CME have been combined.
parameter space for optimal parameters is not fea-
Furthermore time series of lags, moving aver-
sible due to the cost in time and computational
ages and moving correlations have been generated
resources. We shall use mini-batching (computing
to capture memory and co-movements between
the gradient on several training examples at once
symbols. Thus we have generated a richer dataset
rather than individual examples) as one common
for the DNN to explore complex patterns.
approach to speeding up computation. We go fur-
2. ANNs are applied as a regression, whereas here
ther by expressing the back-propagation algorithm
the output is one of {−1, 0, 1} representing a nega-
in a form that is amenable to fast performance on
tive, flat or positive price movement respectively.
an Intel Xeon Phi co-processor (Jeffers and Rein-
The threshold for determining the zero state is
ders, 2013). General purpose hardware optimized
set to 1 × 10−3 (this is chosen to balance the
implementations of the back-propagation algorithm
class labels). The caveat is that restriction to a
are described by Shekhar and Amin (1994), how-
discrete set of output states may not replace a clas-
ever our approach is tailored for the Intel Xeon Phi
sical financial econometric technique, but it may
co-processor.
be applicable for simple trading strategies which
The main contribution of this paper is to describe
rely on the sign, and not the magnitude, of the
the application of deep neural networks to financial
forecasted price.
time series data in order to classify financial mar-
ket movement directions. Traditionally, researchers In the following section we introduce the
will iteratively experiment with a handful of sig- back-propagation learning algorithm and use mini-
nals to train a level based method, such as vector batching to express the most compute intensive
autoregression, for each instrument (see for exam- equations in matrix form. Once expressed in matrix
ple Kaastra and Boyd (1995); Refenes (1994); Trippi form, hardware optimized numerical linear algebra
and DeSieno (1992)). More recently, however, Leung routines are used to achieve an efficient mapping of
et al., (2000) provide evidence that classification the algorithm on to the Intel Xeon Phi co-processor.
based methods outperform level based methods in Section 3 describes the preparation of the data used
the prediction of the direction of stock movement and to train the DNN. Section 4 describes the implemen-
trading returns maximization. tation of the DNN. Section 5 then presents results
Using 5 minute interval prices from June 1989 to measuring the performance of a DNN. Finally in Sec-
March 2013, our approach departs from the literature tion 6, we demonstrate the application of DNNs to
by using state-of-the-art parallel computing architec- backtesting using a walk forward methodology, and
ture to simultaneously train a single model from a provide performance results for a simple buy-hold-
large number of signals across multiple instruments, sell strategy.
rather than using one model for each instrument.
By aggregating the data across multiple instruments
and signals, we enable the model to capture a richer 2. Deep neural network classifiers
set of information describing the time-varying co-
movements across signals for each instrument price We begin with mathematical preliminaries. Let D
movement. Our results show that our model is able denote the historical dataset of M features and N
to predict the direction of instrument movement to, observations. We draw a training subset Dtrain ⊂ D
on average, 42% accuracy with a standard deviation of Ntrain observations and a test subset of Dtest ⊂ D
across instruments of 11%. In some cases, we are of Ntest observations.
able to predict as high as 68%. We further show how Denote the nth observation (feature vector) as
backtesting accuracy translates into the P&L for a xn ∈ Dtrain . In an ANN, each element of the vec-
simple long-only trading strategy and demonstrate tor becomes a node in the input layer, as illustrated
M. Dixon et al. / Classification-based financial markets prediction using deep neural networks 69
in Fig. 1 below for the case when there are 7 input where for a fully connected feed-forward network,
(l)
variables (features) per observation. In a fully con- sj is the weighted sum of outputs from the previous
nected feed-forward network, each node is connected layer l − 1 that connect to node j in layer l:
to every node in the next layer. Although not shown
n(l−1)
in the figure, associated with each edge between the (l)
(l) (l−1) (l)
ith node in the previous layer and the j th node in the sj = wij xi + biasj , (3)
(l) i=1
current layer l is a weight wij .
In order to find optimal weightings w := where n(l) are the number of nodes in layer l. The
{w(l) }l:=1→L between nodes in a fully connected feed (L)
gradient of the likelihood function w.r.t. sk takes
forward network with L layers, we seek to minimize the simple form:
a cross-entropy function1 of the form
∂e(w)
(L)
= ŷk − yk . (4)
N
test K
∂sk
E(w) = − en (w), en (w) := ykn ln (ŷkn ) .
n=1 k=1
The recursion relation for the back propagation using
(1) conjugate gradients is:
For clarity of exposition, we drop the subscript n. n(l−1)
The binary target y and output variables ŷ have a 1-of- (l−1)
(l) (l) (l−1) (l−1)
δi = δj wij σ(si )(1 − σ(si )), (5)
ks encoding for each symbol, where yk ∈ {0, 1} and
i+ks j=1
k=i ŷk = 1 and i = {1, 1 + ks , 1 + 2ks . . . , K − ks }, so
that each output state associated with a symbol can where we have used the analytic form of the derivative
be interpreted as a probabilistic weighting. To both of the sigmoid function
ensure analytic gradient functions under the cross-
entropy error measure and that the probabilities of σ (v) = σ(v)(1 − σ(v)) (6)
each state sum to unity, the output layer is activated
with a softmax function of the form to activate all hidden layer nodes. So in summary, a
(L)
trained feed-forward network can be used to predict
exp(sk ) the probability of an output state (or class) for each
ŷk := φsoftmax (s(L) ) = k (L)
, (2)
s of the symbols concurrently, given any observation
j=1 exp(sj )
as an input, by recursively applying Equation 3.
The description of how the network is trained now
follows.
Fig. 2. This figure shows the classification accuracy of the DNN applied to 43 CME Commodity and FX futures. Each symbol is represented
by a box and whisker vertical bar - the box represents the region between the lower and upper quartiles of the sample distribution of
classification accuracies. The median of the sample distribution of classification accuracies is represented as a red horizontal line.
6. Strategy backtesting
Table 1
This table shows the top five instruments for which the sample mean of the classification rate was highest over the ten
walk forward experiments. F1-scores are also provided. The mean and standard deviation of the sample mean
classification accuracies and F1-scores over the 43 futures are also provided
Symbol Futures Classification F1-score
Accuracy
HG Copper 0.68 0.59
ST Transco Zone 6 Natural Gas (Platts Gas Daily) Swing 0.67 0.54
ME Gulf Coast Jet (Platts) Up-Down 0.67 0.54
TU Gasoil 0.1 Cargoes CIF NWE (Platts) vs. Low Sulphur Gasoil 0.56 0.52
MI Michigan Hub 5 MW Off-Peak Calendar-Month Day-Ahead Swap 0.55 0.5
mean - 0.42 0.37
std - 0.11 0.1
Fig. 5. This figure shows a box plot of the sample distribution of the time-averaged daily returns of the strategy applied separately to each of
the 43 CME Commodity and FX futures over the 130 day trading horizons. The red square with a black border denotes the sample average
for each symbol.
74 M. Dixon et al. / Classification-based financial markets prediction using deep neural networks
Fig. 7. This figure shows a box plot of the maximum drawdown of a simple strategy applied over ten walk forward experiments for each
symbol.
M. Dixon et al. / Classification-based financial markets prediction using deep neural networks 75
Table 2
This table shows the top five instruments for which the mean annualized Shape ratio was highest on average over
the ten walk forward optimizations. The values in parentheses denote the standard deviation over the ten
experiments. Also shown, are the mean and standard deviation of the Capability ratios under
the assumption of normality of returns
Symbol Futures Annualized Sharpe Ratio Capability Ratio
PL Platinum 3.29 (1.12 ) 12.51 (4.27)
NQ E-mini NASDAQ 100 Futures 2.07 (2.11) 7.89 (8.03)
AD Australian Dollar 1.48 (1.09) 5.63 (4.13)
BP British Pound 1.29 (0.90) 4.90 (3.44)
ES E-mini S&P 500 Futures 1.11 (1.69 ) 4.22 (6.42)
Table 3
r The placing of 1 lot market orders at 5 minute
This table lists the initial margin, maintenance margin and intervals has no significant impact on the market
contract size specified by the CME used to calculate the
cumulative P&L and strategy performance for the top five and thus the forecast does not account for limit
performing futures positions order book dynamics in response to the trade
Symbol initial margin maint. margin contract size execution.
PL 2090 1900 50
NQ 5280 4800 50 These assumptions, especially those concerning
AD 1980 1800 100000 trade execution and absence of live simulation in
BP 2035 1850 62500 the backtesting environment are of course inadequate
ES 5225 4750 50
to demonstrate alpha generation capabilities of the
DNN based strategy but serve as a starting point for
commercial application of this research.
r the account is opened with $100k of USD; Returns of the strategy are calculated by first aggre-
r there is sufficient surplus cash available in gating intraday P&L changes to daily returns and then
order to always maintain the brokerage account annualizing them.
margin, through realization of the profit or oth- Figure 5 show a box plot of the sample distribu-
erwise; tion of the time-averaged daily returns of the strategy
r there are no limits on the minimum or maxi- applied separately to each of the 43 CME front month
mum holding period and positions can be held Commodity and FX futures over the 130 day trading
overnight; horizons.
r the margin account is assumed to accrue zero Figure 6 compares the cumulative unrealized net
interest; dollar profit of the strategy for the case when perfect
r transaction costs are ignored; forecasting information is available (‘perfect fore-
r no operational risk measures are deployed, such sight’) against using the DNN prediction (‘predict’).
as placing stop-loss orders. The graph is shown for one 130 day trading horizon
r the market is always sufficiently liquid that a for front month Platinum (PL) futures.
market order gets filled immediately at the mid- Figure 7 shows a box plot of the maximum draw-
price listed at 5 minute intervals and so slippage down of a simple strategy applied over ten walk
effects are ignored; and forward experiments for each symbol.
Table 4
This table shows the correlation of the daily returns of the strategy on each of the five most liquid instruments in the
list of 43 CME futures with their relevant ETF benchmarks. The values represent the summary statistics of the
correlations over the ten experiments. Key: NQ: E-mini NASDAQ 100 Futures, DJ: DJIA ($10) Futures, ES:
E-mini S&P 500 Futures, YM: E-mini Dow ($5) Futures, EC: Euro FX Futures.
Symbol Benchmark ETF Mean Correlation Max Min
Std. Dev.
NQ PowerShares QQQ ETF (QQQ) 0.013 0.167 0.237 –0.282
DJ SPDR Dow Jones Industrial Average ETF (DIA) 0.008 0.194 0.444 –0.257
ES SPDR S&P 500 ETF (SPY) –0.111 0.110 0.057 –0.269
YM SPDR Dow Jones Industrial Average ETF (DIA) –0.141 0.146 0.142 –0.428
EC CurrencyShares Euro ETF (FXE) –0.135 0.108 0.154 –0.229
76 M. Dixon et al. / Classification-based financial markets prediction using deep neural networks
Trippi, R.R., DeSieno, D., 1992. Trading equity index futures with Wu, X., Perloff, J.M., 2007. GMM estimation of a maximum
a neural network, The Journal of Portfolio Management 19(1), entropy distribution with interval data, Journal of Econometrics
27–33. 138, 532–546.
Vanstone, B., Hahn, T., 2010. Designing Stock Market Trading
Systems: With and Without Soft Computing. Harriman House.
ISBN 1906659583, 9781906659585.