Financial Markets Prediction With Deep Learning
Financial Markets Prediction With Deep Learning
Abstract—Financial markets are difficult to predict due to In a nutshell, the essence of financial markets prediction is
its complex systems dynamics. Although there have been some to find out generalized and informative features from the joint
recent studies that use machine learning techniques for financial distribution of prices and volumes [16]. We need to train a
markets prediction, they do not offer satisfactory performance on
financial returns. We propose a novel one-dimensional convolu- model to represent a generalized joint distribution based on
tional neural networks (CNN) model to predict financial market current market environment. In this paper, we propose and
movement. The customized one-dimensional convolutional layers develop a deep convolutional neural network model (CNN) to
scan financial trading data through time, while different types of automatically extract features from historical financial trading
data, such as prices and volume, share parameters (kernels) with data and to predict the price movement. Through the multi-
each other. Our model automatically extracts features instead
of using traditional technical indicators and thus can avoid layer customized 1-D convolutions, noise can be filtered out,
biases caused by selection of technical indicators and pre-defined meanwhile highly correlated underlying features emerge and
coefficients in technical indicators. We evaluate the performance are clustered to corresponding channels for further feature
of our prediction model with strictly backtesting on historical combinations in fully connected layers. The whole process
trading data of six futures from January 2010 to October is end-to-end so that potentially negative influence brought by
2017. The experiment results show that our CNN model can
effectively extract more generalized and informative features than human interference, such as selection of technical indicators
traditional technical indicators, and achieves more robust and and pre-defined coefficients in technical indicators, can be
profitable financial performance than previous machine learning avoided.
approaches. To our knowledge, this is among the first efforts to use
Index Terms—Deep learning, convolutional neural networks, deep CNN to predict financial markets and we verify the
Finance.
performance of our model using strict backtests. We test our
model with six futures from Chicago Mercantile Exchange
I. I NTRODUCTION
and New York Mercantile Exchange. Backtest results show
Financial market is a very complex adaptive system. The that our 1-D CNN model achieves significantly higher average
complexity mainly derives from the interaction among markets annual return and more robust performance (higher Sharpe
and market participants—the current environment of markets ratio1 ) over previous approaches based on Nearest Neighbor,
influence strategies of market participants, while the overall SVM, and Deep Feedforward Networks.
behavior of market participants decides the trend of finan- We also observe that our 1-D CNN model without using
cial market. According to the Adaptive Markets Hypothesis the technical indicators as input achieves better results than
(AMH) [17], behavioral biases of market participants, such as the model that uses the technical indicators. This shows that
loss aversion, overconfidence, and overreaction, always exist. our 1-D CNN model can effectively extract more generalized
Moreover, once market environment changes, the heuristics and informative features than those represented by traditional
of the old environment may not work any longer and thus technical indicators. Our results confirm the observation that
degenerate to behavioral biases. As a result, financial market’s common metrics in machine learning, such as accuracy and F1
trend interwoven with market participants’ biased strategies score, are not suitable for financial markets prediction because
make financial market very difficult to predict. different types of prediction errors have different impacts
There have been many attempts to predict the market move- on financial performance [22]. In our study, we propose a
ment using various approaches. Technical indicators/technical modified version of the F-measure score, called Weighted-
analysis [1] has been traditionally used in [2]–[4], [24], but F-Score, to address this issue. Our experiments show that
these methods tend to lose predictive power after they are Weighted-F-Score highly correlates with average annual return
published. Recently there have been some studies that use ma- and Sharpe ratio in our backtest results with the minimum
chine learning techniques (e.g., feedforward neural networks, cross-correlate coefficient of 0.79 and 0.84, respectively.
SVM, ensembles) for financial market prediction [6], [22]. We summarize our contributions as follows:
They achieved good performance in terms of the benchmark of 1 Sharpe ratio is the average return earned in excess of the risk-free rate per
machine learning, however, have no comparable results from unit of volatility or total risk. It is widely used for calculating risk-adjusted
the financial perspective. return.
98
Authorized licensed use limited to: PUC GO - Universidade Católica de Goiás. Downloaded on March 22,2024 at 17:33:45 UTC from IEEE Xplore. Restrictions apply.
x-axis: Timeline 1
o∗1 o∗2 o∗3 M ∗
o1 o2 o3 o4 o5 o6 o7 o8 o9 C
2
3
h∗1 h∗2 o∗∗ ∗∗ ∗∗
Open Price o1 o2 o3 o4 o5 o6 o7 o8 o9 ∗∗
h1 h2 h3 o1 o2 o3 o4 oh58 oh69 o7 o8 o9 C 1 o2 o3 M
4 1
y-axis: Data Types
... 5 2
l1∗ l2∗ hl∗∗
∗ ∗∗ ∗∗
High Price h 1 h 2 h 3 h 4 h 5 h 6 h 7 h8 h 9 l1 l2 l3 h1 h2 h3 hl74 hl85 h6 h7 h8 h9 3 h2 h3
1
6 dense 3 dens 1
e
7 4 2
c∗1 c∗2 lc1∗∗
∗ ∗∗ ∗∗
Low Price l1 l2 l 3 l4 l5 l6 l7 l8 l9 c1 c2 c3 l1 l2 l3 cl47 cl58 l6 l7 l8 l9 3 l 2 l3
8 5 3
9 6 Softmax
v1∗ v2∗ cv∗∗
∗ ∗∗ ∗∗
Close Price c1 c2 c3 c4 c5 c6 c7 c8 c9 v1 v2 v3 cv14 cv25 cv36 cv47 cv58 c6 c7 c8 c9 13 c2 c3
10 FC
... ... 11
v1∗∗ v2∗∗ v3∗∗
Volume v1 v2 v3 v4 v 5 v6 v7 v8 v9 v1 v2 v3 v 4 v5 v6 v7 v8 v9
12
FC
Fig. 1: Cross-Data-Type 1-D Convolution Architecture. The 1-D kernels (the red and blue one) scan along with the x-axis
while each one of them goes to every position of input 2-D frames by a stride of one. The C and C represent the output
channels of the red and blue kernel, respectively. A max-pooling layers follows a convolutional layer, and only max-pooling
layers condense the dimensions of the x-axis. The M ∗ and M ∗∗ represent the output channels of the max-pooling operations
(the orange and green one) by a stride of three.
The scan strategy of the CDT 1-D kernels is different from scaling, we should have αhigh = αlow = αclose . Note that the
that of the regular 1-D or 2-D convolution: all the kernels parameter sharing in the CDT 1-D CNN guarantees outputs
scan only along the x-axis while each one of them goes to from the same channel to be scaled by the same value. In this
every position of the 2-D frames. That means, once a kernel mutual relationship among different data types are preserved
finishes the scan for one row (the x-axis), it will turn to scan and inherited layer after layer, and thus underlying features
from the start point of the next row and so forth until the emerge with training iterations.
whole 2-D frame is scanned. As illustrated in Figure 1, the
1x3 red and blue kernel scan the whole 2-D input frame B. Without technical indicators
respectively, and they only touch three elements of one data
type at each time. In other words, data of different types can Technical indicators may capture profitable patterns in fi-
share parameters by the scan strategy. It is crucial because nancial markets, which is the main reason why it has been
it matches up the characteristics of the structure of financial popular among market practitioners over one hundred years.
data. Unlike in computer vision’s datasets, the 2-D frames However, how to choose indicators relies on the experience
contain only one type of data pixel, the 2-D frames in our of practitioners and the popularity of indicators. Although
datasets contains prices, volumes, and technical indicators (if feature selection methods such as Information Gain and
any). However, according to [16], the joint distribution of correlation-based filters [22] can be helpful, the indicator
these types of data has strong forecastability over marginal candidates pool still limits the model’s performance. Also,
distributions of every single one. Therefore, on one hand, we some machine learning algorithms with technical indicators,
cannot directly convolve different types of data; on the other such as Nearest-neighbour and SVM, are reported to achieve
hand, we want to preserve the parameter sharing so that the occasional rather than consistent success [26]. These facts may
model can represent the desirable joint distribution effectively. reveal the limitation of technical indicators/technical analysis
Although this is evident in computer vision applications, in our that human interventions, such as choosing technical indicators
study, it needs more explanation. Take one technical indicator, and defining parameters of technical indicators, bring negative
Typical Point (T P ), for instance, effects. Inspired by the evolution of computer vision research
from extracting hand-designed features such as SIFT [19] to
Phigh + Plow + Pclose autonomously learning features within training process [14],
TP = (1)
3 [27], we argue that the CDT 1-D CNN model is more suitable
Now assume that these three prices need to be scaled for for profitable patterns extraction over technical analysis by the
training purposes and assume that the effect of the activation three following reasons:
is ignored, we can modify Equation (1) to: 1) The CDT 1-D CNN model itself is a nonlinear and
Phigh + Plow + Pclose autonomously-trained variant of technical analysis. The
TP = (2) fundamental elements of technical analysis, technical
3
indicators, which are mathematical formulas based on
where Phigh = αhigh Phigh , Plow = αlow Plow , Pclose = historical data of prices and volumes, may be approxi-
αclose Pclose . To preserve the properties of T P after the mated by the convolutional and fully connected layers.
99
Authorized licensed use limited to: PUC GO - Universidade Católica de Goiás. Downloaded on March 22,2024 at 17:33:45 UTC from IEEE Xplore. Restrictions apply.
Take another technical indicators, Moving Average Con- IV. E XPERIMENTAL SETUP
vergence Divergence (MACD), for instance: A. Data
EM An [i] = αPclose [i] + (1 − α)EM An [i − 1] (3) The datasets we used in our study are historical trading
records of four commodity futures and two equity index
M ACD = EM Aa − EM Ab (4) futures, including WTI Crude Oil (CL), Natural Gas (NG),
Soybeans (S), Gold (GC), E-mini Nasdaq 100 (NQ), and E-
where n, a, and b denote time spans, and a > b.
mini S&P 500 (ES). Since there are no open datasets available
i denotes the ith time of a given time span, and
thus far, the six datasets in our study are collected from online
i ∈ [1, n]. Pclose [i] denotes the close price at the ith
brokers. The datasets of the six futures range from January
time, EM An [i] denotes Exponential Moving Average,
2010 to October 2017, each dataset contains 330,000-400,000
and EM An [1] = Pclose [1]. α denotes a predefined
5-minute trading records. Each record has the following seven
coefficient. EM An is, in fact, equivalent to an infinite
attributes: date, time, open price, high price, low price, close
impulse response (IIR) filter for Pclose in the period n,
price, and trading volume.
which can be implemented by convolutions. Thus after a
A large variety of technical indicators have been developed
subtraction operation provided in fully connected layers,
to predict the future price levels, or simply the general price
the functionality of M ACD can be realized.
direction, of a security by looking at past patterns. We choose
2) More importantly, in previous studies the coefficients
technical indicators according to their functionality and popu-
of technical indicators, such as α, n, a, and b in
larity. For lagging indicators, which are often used to identify
Equation (3) and (4), are predefined by users and
and confirm the strength of a pattern or trend, we pick EMA,
thus not trainable. However, deep learning models need
MACD Histogram, and Bollinger Bands. For leading indica-
to train parameters to learn distributed representations
tors, which usually change before a trend or pattern and are
with learning algorithms, such as loss function and
thus used during periods of sideways or non-trending ranges,
back propagation. Therefore, non-trainable coefficients
we pick Relative Strength Index (RSI), Commodity Channel
of technical indicators are inflexible for training process
Index (CCI), Volume Weighted Average Price (VWAP), On-
so that the effectiveness of technical indicators relies on
balance Volume (OBV), Average Directional Index (ADX),
user experience and knowledge rather than the joint dis-
Accumulation Distribution Line (ADL), and Chaikin Money
tribution of training data. Formally, our model computes
Flow (CMF). Also, we include technical indicators regarded as
a function as follows:
neither lagging nor leading indicators such as Rate of Change
t, W )
yt = F ( X (5) (ROC), which is used to measure the change of prices over
time. Different time spans are used to generate EMA, MACD
where Xt is the t-th input vector, and W represents Histogram, Bollinger Bands, ROC, RSI, CCI, CMF, and ADX.
trainable parameters (e.g. weights of convolution kernels Together with the original five attributes (open, high, low,
and fully connected layers) in the model. Meanwhile, close prices, and trading volumes), each data point has a total
the model minimizes a loss function within the t-th number of 46 attributes.
prediction yt and the t-th ground truth yt : Our model predicts the future price movement every two
hours and makes the trading decisions accordingly based on
Et = L(yt , yt ) (6) the prediction outcome. The reason that our model works with
In other words, the training process is to find the value a two-hour time interval is two folds:
of W that minimizes Et . Now assume the X t is a vector 1) Longer time intervals provide necessarily data sequence
of technical indicators at the t-th time as follows: size for convolutions. Each 24 consecutive 5-minute data
records are organized to form a data frame, where 1-D
Xt = Ta (xt , Θ) (7) convolution is applied along the timeline. For the sake
a∈N of fair comparisons between our MLP and 1-D CNN
where xt is the raw financial trading data vector models, the datasets fed to the MLP model also have
at the t-
th time that includes prices and volume only, a∈N T () the same time interval.
represents a set of functions of technical indicators, 2) The price change of a commodity over a shorter time
and Θ denotes a collection of predefined coefficients interval is usually much smaller than that over a longer
in the functions of technical indicators. Since prices and time interval. For example, the average price change of
volume in raw financial trading data reflect all available CL over a 5-minute time interval is $0.02 compared to
information [20], and all technical indicators are based $0.10 over a 2-hour time interval. The profitability of
on raw financial trading data. Therefore, technical in- trading over short time intervals is thus diminished after
dicators do not provide additional information beyond slippage and transaction cost are applied.
raw financial trading data, and prediction models cannot Another vital part of our data pre-processing is labeling.
adjust predefined Θ so that bias and overfitting issues The popular way for labeling stock or future price movements
may occur. is based on a fixed threshold for the log return of the price. For
100
Authorized licensed use limited to: PUC GO - Universidade Católica de Goiás. Downloaded on March 22,2024 at 17:33:45 UTC from IEEE Xplore. Restrictions apply.
(a): Cumulative Return
5
80 3
Close
60 2
1
40
0
0.2
0.1
0.0
-0.1
Oct-2012
Oct-2013
Oct-2014
Oct-2015
Oct-2016
Apr-2012
Apr-2013
Apr-2014
Apr-2015
Apr-2016
Apr-2017
Jan-2012
Jan-2013
Jan-2014
Jan-2015
Jan-2016
Jan-2017
Jul-2012
Jul-2013
Jul-2014
Jul-2015
Jul-2016
Jul-2017
Fig. 2: Cumulative and Monthly Return of the future CL by the CDT 1-D CNN without technical indicators . (a): The
cumulative return of the future CL stays positive over 71 months, and it eventually achieves 500%. (b): The Monthly Return
of the future CL over 71 months. 60 of the 71 months are profitable, and only two months have negative return lower than
-5%.
example, the data point at time t is marked as “going up” or executed. F denotes minimum price fluctuation, the smallest
“going down” if the log return of the price at time t+1 over t is increment of price movement possible in trading a given
above or below the pre-specified threshold. However, because contract. B denotes bid-ask spread, the amount by which the
the aforementioned method does not thoroughly consider the ask price exceeds the bid price for an asset in the market. B
distribution of future price’s trend [18], we adopt the dynamic denotes magnified bid-ask spread. M denotes the multiplier,
threshold method as described in [25]: the deliverable quantity of commodities and option contracts
⎧ that are traded on an exchange. C denotes commissions per
⎪
⎨up, if ct+1 ≥ ct (1 + αvt ) contract, the fixed fee to brokers. n denotes the number of con-
lt+1 = down, if ct+1 ≤ ct (1 − αvt ) (8) tracts. T denotes transaction cost. Note that we intentionally
⎪
⎩ raise S by three times and B by two times to make our backtest
flat, otherwise
result conservative enough. Take CL for instance, the bid-ask
where ct and vt represent the close price and volatility of a spread is $0.01, the minimum price fluctuation is $0.01, the
commodity price at time t, and α is a parameter that can be multiplier is 1,000 times, and the commissions is $2.75 at
adjusted to balance the three classes. In our study, the volatility Interactive Brokers2 , the profit of a trade with one contract
is measured by using the standard deviation of the past ten data needs to gain no less than $145.50; otherwise, this trade will
points of any given time and α is set to be 0.55. end up losing money.
The backtest strategy that we use follows a principle with
B. Criterion
minimum human interventions—at any time, depending on the
1) Backtest (From Finance Viewpoint): Previous studies predicting price movement at next time interval, the trading
usually choose buy-and-hold strategy as the baseline. However, strategy determines whether to enter or leave short/long trades.
we believe that this strategy is too simple to serve as a baseline. The detailed breakdown of this strategy is that the first U p or
Take the future CL for instance, if we placed and held one Down predicting label renders the test into the long trade
contract since July 2014, our asset thus far would shrink or the short trade, respectively. After that, once a turning
around 70%. In our research, the backtest strategy that we point appears, for example, the current trading status is the
use is more conservative than that in [6], [7]. Specifically, we long trade but the next predicting label is Down, the trading
use high transaction cost to make our backtest result more strategy will leave the long trade and enter the short trade.
conservative: Note that the term T in Equation (9) is calculated for each
trade. In our experiments, we allocate $100,000 as initial
S = 5F (9) capital, and use Equation (9) for transaction cost. The number
of contracts is one, and all other terms in Equation (9), (10),
B = 2B (10) and (11) follow the standard settings of futures markets.
T = n(M (B + S) + C) (11) 2) Weighted F Score (From Machine Learning Viewpoint):
The commonly used metrics in machine learning, such as
where S denotes slippage, the difference between the expected
2 https://www.interactivebrokers.com
price of a trade and the price at which the trade is actually
101
Authorized licensed use limited to: PUC GO - Universidade Católica de Goiás. Downloaded on March 22,2024 at 17:33:45 UTC from IEEE Xplore. Restrictions apply.
accuracy and F1 score, do not correlate well with the trading 80
SVM MLP w/ TIs
regular 1-D CNN w/o TIs CDT 1-D CNN w/ TIs
performance (e.g., profit return, Sharpe ratio) in this research. 60
CDT 1-D CNN w/o TIs
3 https://www.tensorflow.org 4 http://scikit-learn.org
102
Authorized licensed use limited to: PUC GO - Universidade Católica de Goiás. Downloaded on March 22,2024 at 17:33:45 UTC from IEEE Xplore. Restrictions apply.
CL NG
w/ TIs? NQ S ES GC
SVM 50.2% 41.0% 45.4% 43.0% 41.1% 38.6%
MLP 49.3% 51.1% 52.3% 48.1% 43.1% 41.9%
regular 1-D CNN 50.4% 50.9% 54.8% 50.4% 50.8% 44.8%
CDT 1-D CNN 57.4% 52.5% 55.9% 51.7% 51.8% 47.3%
54.1% 48.6% 49.2% 49.4% 47.7% 44.2%
TABLE I: Weighted F Score (WFS). The CDT 1-D CNN without technical indicators (TIs) outperforms other models.
1.2
AAR Vs. AC SR Vs. AC
1.1 AAR Vs. WFS SR Vs. WFS
SVM MLP w/ TIs 1
regular 1-D CNN w/o TIs CDT 1-D CNN w/ TIs 0.9
3 CDT 1-D CNN w/o TIs
0.8
Cross Correlations
0.7
0.6
2
Sharpe Ratio
0.5
0.4
1 0.3
0.2
0.1
0
0
−0.1
CL NG NQ S ES GC
−1
CL NG NQ S ES GC
Fig. 5: Cross-Correlation between Metrics in Finance and
Fig. 4: The Sharpe ratio of the six futures over 71 Machine Learning. Weighted F score (WFS) highly correlates
months. The CDT 1-D CNN without technical indicators (TIs) with average annual return (AAR)/Sharpe ratio (SR) on all the
significantly outperform the baselines. futures’ experiments.
the average annual return, Sharpe ratio, and weighted F score As plotted in Figure 5, the cross-correlation between Accu-
of the five approaches for the six futures are shown in Figure 3, racy (AC) and average annual return (AAR)/Sharpe ratio (SR)
Figure 4, and Table I, respectively. The cumulative return and varies from -0.01 for the AAR of the Gold (GC) futures to 0.91
monthly return of the WTI crude oil (CL) futures are plotted for AAR of the E-mini S&P 500 (ES) futures. In contrast, our
in Figure 2. We also calculate the cross-correlation between proposed weighted F score (WFS) highly correlates with both
the metrics of finance and machine learning, which is shown AAR and SR, and has the highest score of cross-correlation
in Figure 5. Since Nearest-neighbour’s results are similar to for each of the futures. It confirms our argument in Section
SVM’s, we omit its results in the aforementioned table and IV-B2 that correlation between Weighted F Score and average
figures. annual return/Sharpe ratio is significantly larger than that
Our CDT 1-D CNN model achieves the best performance between common metrics in machine learning and average
(57.4% weighted F score, 71.9% average annual return, and annual return/Sharpe ratio with the most of the futures.
2.72 Sharpe ratio) for the WTI crude oil (CL) futures. As Technical indicators are effective to MLP, but they bring
plotted in Figure 2, the cumulative return of the CL futures negative effects to 1-D CNN. As shown in Figure 3 and 4, for
stays positive during the nearly 7-years period and eventually commodity futures (CL, NQ, S, and GC), CDT 1-D CNN with
goes up to 500%. We notice that our CDT 1-D CNN model technical indicators degrade by 6.1%-53.0%, 53.0%-199.0%,
is able to turn around quickly when drawdowns happen. Only and 2.3%-6.7% for average annual return (AAR), Sharpe ratio
two of the 13 negative return months have lower than 5% (SR), and weighted F score (WFS), respectively, compared
drawdown. In contrast, the highest profitable month has 29.5% to that of CDT 1-D CNN without technical indicators. Except
positive return and the profit of 40 months is higher than 5%. for the probable reason we mentioned in Section III-B, another
Although the regular 1-D CNN model achieves good result one is that redundant data types make the parameter sharing
sometimes, such as 54.8% weighted F score (WFS), 10.8% difficult to work. Parameters can be updated by back propa-
average annual return (AAR), and 0.76 Sharpe ratio (SR) gation of each data type, but unfortunately, not all technical
for the E-mini S&P 500 (ES) futures, its robustness is not indicators correlate with price trends all the time. Therefore,
consistent in that its performance for the Natural Gas (NG), some useful features may be counteracted by the unnecessary
Soybeans (S), Gold (GC) futures is as low as that of the base- parameter updates.
lines. Compared to our CDT 1-D CNN model, the maximum VI. D ISCUSSION
performance degradation of the regular 1-D CNN model is
up to 7.0% for WFS, 40.4% for AAR and 1.60 for SR for Long-term features. Thus far we only consider features
the CL futures, and the average degradation is up to 4.3% within two hours by CDT 1-D convolutions, and relations
WFS, 25.5% AAR, and 1.41 SR. These experiments confirm among data inputs are mutually independent. However, we
the effectiveness of the scan strategy of our CDT 1-D CNN believe that there may exist informative features within longer
model.
103
Authorized licensed use limited to: PUC GO - Universidade Católica de Goiás. Downloaded on March 22,2024 at 17:33:45 UTC from IEEE Xplore. Restrictions apply.
time spans. In our future research, We will explore more com- [10] Glorot X, Bengio Y. Understanding the difficulty of training deep feed-
forward neural networks. InProceedings of the thirteenth international
plex model architectures, such as recurrent neural networks conference on artificial intelligence and statistics 2010 Mar 31 (pp. 249-
combined with convolutional kernels, in order to capture both 256).
short-term and long-term features. [11] Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks.
InProceedings of the Fourteenth International Conference on Artificial
Data quality. We notice that some big losses in our backtest Intelligence and Statistics 2011 Jun 14 (pp. 315-323).
experiments result from data missing in the raw financial [12] Goodfellow I, Bengio Y, Courville A, Bengio Y. Deep learning. Cam-
trading data. The prices and volume of the current data bridge: MIT press; 2016 Nov 18.
[13] Kim KJ. Financial time series forecasting using support vector machines.
point may correlate with the upcoming price trend, but may Neurocomputing. 2003 Sep 1;55(1-2):307-19.
be rarely informative for the one-day-later trend. The data [14] Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with
missing of raw financial data also brings negative effect to deep convolutional neural networks. InAdvances in neural information
processing systems 2012 (pp. 1097-1105).
training process due to the mismatch between the label and [15] Lo AW, MacKinlay AC. Stock market prices do not follow random
the intrinsic information of the current data point. We will walks: Evidence from a simple specification test. The review of financial
develop appropriate data preprocessing methods to solve this studies. 1988 Jan 1;1(1):41-66.
[16] Lo AW, Mamaysky H, Wang J. Foundations of technical analysis:
issue. Computational algorithms, statistical inference, and empirical implemen-
Labeling. Although our experiments verify that supervised tation. The journal of finance. 2000 Aug 1;55(4):1705-65.
training based on our three-class labeling strategy can achieve [17] Lo AW. The adaptive markets hypothesis. The Journal of Portfolio
Management, 30(5):1529, 2004.
state-of-the-art performance in both finance and machine [18] Marcos Lopez de Prado. The 7 reasons most machine learning funds
learning benchmark, this labeling strategy cannot distinguish fail (presentation slides). 2017.
more precise classes, such as violent surge, moderate surge, [19] Lowe DG. Object recognition from local scale-invariant features. In-
Computer vision, 1999. The proceedings of the seventh IEEE interna-
crash, and edge down. We plan to investigate more fine-grained tional conference on 1999 (Vol. 2, pp. 1150-1157). Ieee.
labeling strategy and study its impact on the performance. [20] Fama EF. Efficient capital markets: A review of theory and empirical
work. The journal of Finance. 1970 May 1;25(2):383-417.
VII. C ONCLUSION [21] Park CH, Irwin SH. What do we know about the profitability of technical
analysis?. Journal of Economic Surveys. 2007 Sep 1;21(4):786-826.
In this paper, we propose and develop a deep convolutional [22] Rechenthin MD. Machine-learning classification techniques for the
analysis and prediction of high-frequency stock direction. The University
neural network model to predict the market movement. It of Iowa; 2014.
uses a novel 1-D convolution, called Cross-Data-Type 1- [23] Samuelson PA. Proof that properly anticipated prices fluctuate randomly.
D Convolution, to capture extracts directly from financial IMR; Industrial Management Review (pre-1986). 1965 Apr 1;6(2):41.
[24] Sullivan R, Timmermann A, White H. Datasnooping, technical trading
historical trading data. Backtest results show that our model rule performance, and the bootstrap. The journal of Finance. 1999 Oct
can effectively extract features that are more generalized and 1;54(5):1647-91.
informative than those represented by traditional technical in- [25] Sun T, Wang J, Zhang P, Cao Y, Liu B, Wang D. Predicting Stock
Price Returns Using Microblog Sentiment for Chinese Stock Market.
dicators. Experiment results show that our model outperforms InBig Data Computing and Communications (BIGCOM), 2017 3rd
the baselines by 12.4%-63.3% on average annual return and International Conference on 2017 Aug 10 (pp. 87-96). IEEE.
99%-245% on Sharpe ratio over previous machine learning [26] Timmermann A, Granger CW. Efficient market hypothesis and forecast-
ing. International Journal of forecasting. 2004 Jan 1;20(1):15-27.
approaches. In addition, we propose a new measure, weighted [27] Zeiler MD, Fergus R. Visualizing and understanding convolutional
F-score, which better correlates with financial returns than networks. InEuropean conference on computer vision 2014 Sep 6 (pp.
traditional machine learning performance metrics. 818-833). Springer, Cham.
[28] Zhang L, Aggarwal C, Qi GJ. Stock Price Prediction via Discovering
Multi-Frequency Trading Patterns. InProceedings of the 23rd ACM
R EFERENCES SIGKDD International Conference on Knowledge Discovery and Data
[1] Achelis SB. Technical Analysis from A to Z. New York: McGraw Hill; Mining 2017 Aug 13 (pp. 2141-2149). ACM.
2001 Apr. [29] Hochreiter S, Schmidhuber J. Long short-term memory. Neural compu-
[2] Aiolfi M, Favero CA. Model uncertainty, thick modelling and the pre- tation. 1997 Nov 15;9(8):1735-80.
dictability of stock returns. Journal of Forecasting. 2005 Jul 1;24(4):233- [30] Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv
54. preprint arXiv:1412.6980. 2014 Dec 22.
[3] Brock W, Lakonishok J, LeBaron B. Simple technical trading rules and
the stochastic properties of stock returns. The Journal of finance. 1992
Dec 1;47(5):1731-64.
[4] Elroy D and Paul M. Murphys law and market anomalies. The Journal
of Portfolio Management, 25(2):5369, 1999.
[5] Ding X, Zhang Y, Liu T, Duan J. Deep learning for event-driven stock
prediction. InIjcai 2015 Jul 25 (pp. 2327-2333).
[6] Dixon M, Klabjan D, Bang JH. Classification-based financial markets
prediction using deep neural networks. Algorithmic Finance. 2016 Jul
18(Preprint):1-1.
[7] Fernndez-Rodrguez F, Sosvilla-Rivero S, Andrada-Felix J. Technical
analysis in foreign exchange markets: evidence from the EMS. Applied
Financial Economics. 2003 Jan 1;13(2):113-22.
[8] Gencay R. The predictability of security returns with simple technical
trading rules. Journal of Empirical Finance. 1998 Oct 1;5(4):347-59.
[9] Giles CL, Lawrence S, Tsoi AC. Noisy time series prediction using
recurrent neural networks and grammatical inference. Machine learning.
2001 Jul 1;44(1-2):161-83.
104
Authorized licensed use limited to: PUC GO - Universidade Católica de Goiás. Downloaded on March 22,2024 at 17:33:45 UTC from IEEE Xplore. Restrictions apply.