0% found this document useful (0 votes)
63 views5 pages

Hu 2018 Deep Stock

This document proposes using deep learning to measure stock similarity for investment decisions. Specifically, it suggests using a Convolutional AutoEncoder (CAE) to learn representations from candlestick charts of stock price data, in order to capture nonlinear stock dynamics and translation invariance better than raw time series. A novel portfolio construction strategy is then presented that uses the deep representations to cluster stocks and select those with high Sharpe ratios within each cluster, aiming to provide low-risk, high-return portfolios. Evaluation on FTSE 100 data shows the proposed strategy outperforms the index and other funds over 2000 trading days.

Uploaded by

prime developers
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views5 pages

Hu 2018 Deep Stock

This document proposes using deep learning to measure stock similarity for investment decisions. Specifically, it suggests using a Convolutional AutoEncoder (CAE) to learn representations from candlestick charts of stock price data, in order to capture nonlinear stock dynamics and translation invariance better than raw time series. A novel portfolio construction strategy is then presented that uses the deep representations to cluster stocks and select those with high Sharpe ratios within each cluster, aiming to provide low-risk, high-return portfolios. Evaluation on FTSE 100 data shows the proposed strategy outperforms the index and other funds over 2000 trading days.

Uploaded by

prime developers
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

DEEP STOCK REPRESENTATION LEARNING: FROM CANDLESTICK CHARTS TO

INVESTMENT DECISIONS
Guosheng Hu5,∗ Yuxin Hu1,∗ Kai Yang2 Zehao Yu3 Flood Sung4 Zhihong Zhang3 Fei Xie1,♦
Jianguo Liu1 Neil Robertson5 Timothy Hospedales6,♥ Qiangwei Miemie6,7,8,♥
1
Shanghai University of Finance and Economics 2 University of Shanghai for Science and Technology 3 Xiamen University 4 Independent Researcher
5
Queen’s University Belfast 6 The University of Edinburgh 7 Yang’s Accounting Consultancy Ltd 8 ArrayStream Technologies Ltd

ABSTRACT days/weeks data is needed to estimate an accurate covariance.


However, the past n days/weeks data cannot represent the cur-
We propose a novel investment decision strategy (IDS)
rent market effectively. (c) Most similarity measurements do
based on deep learning. The performance of many IDSs is
not consider translation (time)-invariance, which is important
affected by stock similarity. Most existing stock similarity
for stock similarity. For example, the price of Apple stock
measurements have the problems: (a) The linear nature of
increases at one particular day, however, the stock prices of
many measurements cannot capture nonlinear stock dynam-
Apple suppliers might increase after 3 days.
ics; (b) The estimation of many similarity metrics (e.g. co-
variance) needs very long period historic data (e.g. 3K days) To solve the aforementioned problems, we propose to use
which cannot represent current market effectively; (c) They deep learning (DL) features for stock similarity measurement
cannot capture translation-invariance. To solve these prob- instead of raw time series. Convolutional DL approaches
lems, we apply Convolutional AutoEncoder to learn a stock such as Convolutional AutoEncoder (CAE, unsupervised) [5]
representation, based on which we propose a novel portfolio and Convolutional Neural Network (CNN, supervised) [6],
construction strategy by: (i) using the deeply learned repre- have achieved very impressive performance for analysing
sentation and modularity optimisation to cluster stocks and visual imagery. This has motivated researchers to convert
identify diverse sectors, (ii) picking stocks within each clus- raw input signals from other modalities into images to be
ter according to their Sharpe ratio (Sharpe 1994). Overall this processed by CNNs or CAEs. In this way, good results have
strategy provides low-risk high-return portfolios. We use the been achieved for diverse applications. For example, tradi-
Financial Times Stock Exchange 100 Index (FTSE 100) data tional speech recognition methods used the 1-D signal vector,
for evaluation. Results show our portfolio outperforms FTSE e.g. the raw input waveform [7, 8]. In contrast an alterna-
100 index and many well known funds in terms of total return tive approach is to convert the 1-D signal to a spectrogram,
in 2000 trading days. i.e. an image, in order to leverage the strength of CNNs to
achieve promising recognition performance [9]. As another
1 Introduction well known example, AlphaGo [10] represents the board po-
Investment decision making is a classic research area in quan- sition as a 19×19 image, which is fed into a CNN for feature
titative and behavioural finance. One of the most important learning. Besides, computer vision techniques have also been
decision problems is portfolio construction and optimisation applied to judge the quality of paper [11] and calculate the
[1, 2], which addresses selection and weighting of assets to rank of matrix [12] from its appearance only. With similar
be held in a portfolio. Financial institutions try to construct motivation, we explore to convert a 4-channel stock time-
and optimise portfolios in order to maximise investor returns series (lowest, highest, opening and closing price for the day)
while minimising investor risk. to candlestick charts by synthesis technique to present price
Stock similarity is important for many investment deci- history as images. To avoid expensive annotation, we choose
sion strategies [3]. For example, the classical investment the unsupervised CAE for stock feature learning using the
strategy, mean-variance theory [1], measures stock similar- synthetic candlestick images.
ity using variance. Most similarity measurements have the Hence, the first novelty of this study is exploiting deep
following problems: (a) Usually, the time series (linear sig- learning (i.e. CAE) to encode stock time series. Compared
nal) is fed to linear metric (e.g. covariance, Pearson) to ob- with raw time series, deeply learned features can effectively
tain similarity. The linear nature of most similarities cannot capture (i) nonlinear stock dynamics and semantics; (ii) the
capture the nonlinear dynamics of the stocks. (b) In [4], it translation-invariance. The similarity measurements based on
is claimed that n (the number of stocks in one market, e.g. deep features can overcome the aforementioned weaknesses
2,033 tradable stocks in London Stock Exchange) historic of most existing measurements. In addition, we contribute a
∗These authors contributed equally to this work
new valuable signal, deep feature, to the investment decision
♦ Corresponding author: [email protected] society, in which new effective signal is important for risk
♥ Email: [email protected], [email protected] hedging. Though some deep learning models, such as LSTM
[13] and RNN [14], have been applied to optimise portfolio, draw CAE 512D best
3.12, 3.37, ... ... P
they use raw time series rather than charts as input. O
Second, motivated by momentum effect [15], we con- R
clustering
struct a novel portfolio generation pipeline including: (1) T

...

...

...

...
F
deep feature learning by visual interpretation price history, O
(2) clustering the stocks based on the similarity computed on draw CAE 512D L
7,56, 7.63, ... ...
deep features to provide a data-driven segmentation of the I
time series best
O
market, (3) actual portfolio construction. For visual repre-
sentation learning, we generate millions of training images Fig. 1. Schematic illustration of our investment decision pipeline.
(synthetic candlestick charts) which are fed to a deep CAE The architecture of CAE is detailed in Fig. 2.
for feature learning. In the next clustering step, we aim to
segment the market into diverse sectors in a data-driven way. vgg16
This is important to provide risk reduction by selecting a well
3
our feature
diversified portfolio [16, 17]. The similarity embedded in 64 128
512
512
clustering method is computed using deep features. Popular avg

clustering methods such as K-means are not suitable here be- 56


14 pooling

224 112
cause they are non-deterministic and/or require a pre-defined
numbers of clusters to find. In particular non-deterministic 3
64
128
64
methods are not acceptable to real financial users. To address 32 16

this we adapt the modularity optimization method [18] – orig- 14 7


56 28
inally designed for network community structure – to stock 224
112
784
clustering. Finally, we perform portfolio construction by f
Fig. 2. CAE overview. An encoder (top) - decoder (bottom) frame-
the simple yet effective approach of choosing the best stock
work. The 512D feature following average pooling provides our rep-
within each cluster according to their Sharpe ratio [19]. As resentation for clustering and portfolio construction.
we will see in our evaluation, this portfolio selection strategy
combines high returns with low risk.
time series (the lowest, the highest, open, and closing price
2 Methodology for the day) in a 20-day time sequence. We use computer
graphics techniques to convert these to a candlestick chart
Our overall investment decision pipeline includes three main
represented as a RGB image as shown in Fig. 1 and 2. The
modules: deep feature learning, clustering, and portfolio con-
whisker plots describe the four raw channels, with colour
struction. For deep feature learning, raw 4-channel time se-
coding describing whether the stock closed higher (green)
ries data describing stock price history are converted to stan-
or lower (red) than opening. An encoded candlestick chart
dard candlestick charts. These charts are fed into deep CAEs
image provides the visual representation of one stock over a
for visual feature learning. These learned features provide a
20-day window for subsequent visual interpretation by our
vector embedding of a historical time-series that captures key
deep learning method.
quantitative and semantic information. Next we cluster the
features in order to provide a data-driven segmentation of the Convolutional Autoencoder Our CAE architecture is
market to underpin subsequent selection of a diverse portfo- summarised in Fig. 2. It is based on the landmark VGG
lio. Many common clustering methods are not suitable here network [20], specifically VGG16. The VGG network is a
because they are non-deterministic or require predefinition of highly successful architecture initially proposed for visual
the number of clusters. Thus we adapt modularity optimisa- recognition. To adapt it for use as a CAE encoder, we remove
tion for this purpose. Note that stock similarity embedded in the final 4096D FC layers from VGG-16 and replace them
our clustering method is computed using the nonlinear deep by an average pooling layer to generate one 512D feature.
features. Finally, we perform portfolio construction by choos- The decoder is a 7-layer deconvolutional network that starts
ing stocks with the best performance measured by Sharpe ra- with a 784D layer that is fully connected with the 512D
tio [19] from each cluster. The overall pipeline is summarised embedding layer. Following 6 up-sampling deconvolution
schematically in Fig. 1. Each component is discussed in more layers eventually reconstruct the input based on our 512D
detail in the following sections. feature. When trained with a reconstruction objective, the
2.1 Deep Feature Learning with CAEs CAE network learns to compress input images to the 512D
bottleneck in a manner that preserves as much information
Chart Encoding To realise an algorithmic portfolio con- as possible in order to be able to reconstruct the input. Thus
struction method based on visual interpretation of stock this single 512D vector encodes the 20-day 4-channel price
charts, we need to convert raw price history data to an image history of the stock, and will provide the representation for
representation. Our raw data for each stock is a 4-channel further processing (clustering and portfolio construction).
2.2 Clustering K2 stocks are picked by taking (i) Q stocks from each of the
K1 clusters and (ii) the remaining R best performing stocks
We next aim to provide a clustering method for diversified –
across all K1 clusters. Then, we allocate equally 1/K2 of the
and hence low risk – portfolio selection. As discussed, many
fund to each of the chosen stocks.
existing clustering methods are non-deterministic or require
pre-specification of the number of clusters, which make them 3 Experiments
unsuited for our application. To solve these problems, we in- We first introduce our dataset and experiment settings. We
troduce the network modularity method [18] to find the clus- analyse the outputs of feature learning. Finally, we compare
ter structure of the stocks, where each stock is set as one node our whole investment strategy (feature extraction, clustering,
and the link between each pair of stocks is set as the cosine portfolio optimisation) with alternatives.
similarity calculated by our learned CAE features. Modular-
ity is introduced as the fraction of the links that fall within 3.1 Dataset and Settings
the given group minus the expected fraction if links are dis- For evaluation, we use the stock data of Financial Times Stock
tributed at random. Modularity optimisation [18], originally Exchange 100 Index (FTSE 100), which is a share index of
used for detecting community structure in networks, can end the 100 companies listed on the London Stock Exchange with
with generating clusters. Specifically, optimisation operates the highest market capitalisation. We use all the stocks in
on a graph (one 20-day history of the entire market in our FTSE 100 from 4th Jan 2000 to 14th May 2017. The stock
case), and updates the graph to group stocks so as to eventu- price is adjusted accounting for stock splits, dividends and
ally achieve maximum modularity before terminating. Thus distributions. Every 20-day 4-channel time series generates
it does not need a specified number of clusters and is not af- a standard candlestick chart. We generate 400K FTSE100
fected by initial node selection. charts in all. The training images for our CAE are candle-
stick charts rendered as 224 × 224 images to suit our VGG16
2.3 Portfolio Construction and Backtesting
architecture [20]. During training, the batch size is 64, learn-
Given the learned stock clustering (market segmentation) , we ing rate is set to 0.001, and the learning rate decreases with a
construct a complete portfolio by picking diverse yet high- factor of 0.1 once the network converges.
return stocks, and evaluate the result.
3.2 Qualitative Results
Stock Performance Return (profit on an investment) is de-
fined as rt = (Vf − Vi )/Vi , where Vf and Vi are the final and Visualising and Understanding Deep Features The fea-
initial values, respectively. For example, to compute daily tures of one year (2012) for a given stock are concatenated to
stock return, Vf and Vi are closing prices of today and yester- form a new feature. These features of all the stocks are visu-
day, respectively. We measure the performance of one partic- alised in Fig. 3 using the t-Distributed Stochastic Neighbour
ular stock over a period using the Sharpe ratio [19] s = r/σr , Embedding (t-SNE) [21] method. One colour indicates one
where r is the mean return, σr is the standard deviation over industrial sector defined by Bloomberg From Fig. 3, we can
that period. Thus the Sharpe ratio s encodes a trade-off of re- see the stocks with similar semantics (industrial sector) are
turn and stability. Maximum Drawdown (MDD) is the mea- represented close to each other in the learned feature space.
sure of decline from peak during a specific period of invest- For example, Materials related stocks are clustered. This il-
ment: MDD = (Vt − Vp )/Vp , where Vt and Vp mean the lustrates the efficacy of our CAE and learned feature for cap-
trough and peak values, respectively. turing semantic information about stocks.
Training and Testing For every 20 trading days, we cluster 3.3 Quantitative Results
all the stocks. To actually construct a portfolio we then choose For quantitative evaluations, we apply 7 measures for eval-
the stock with the highest Sharpe ratio [19] within each clus- uation: Total return, daily Sharpe ratio, max drawdown,
ter. We then hold the selected portfolio for 10 days. Over daily/monthly/yearly mean return, and win year. Win year
these following 10 days, we evaluate the portfolio by com- indicates the percent of the winning years. The other mea-
puting our ‘compound return’ for each selected stock. The sures are defined in the section of ‘Portfolio Construction and
overall return of one portfolio is the average compound return Backtesting’. We choose K2 = 5 stocks to construct all the
of all the selected stocks. We use a stride of 10. The process portfolios compared.
of portfolio selection and return computation are analogous to Comparison with FTSE 100 Index We perform backtest-
training and testing process in machine learning, respectively. ing to compare our full portfolio optimisation strategy against
Fund Allocation Since our clustering method discovers the the market benchmark (FTSE 100 Index). In Fig. 4, we com-
number of stocks in a data driven way, in different trading pare with FTSE 100 index. Fig. 4 (a) shows the comparison
periods we may have different number of clusters. Assume over a long-term trading period (4K trading days, 31/01/2000
that we obtain K1 clusters in one period and will select K2 - 06/10/2016) showing the overall effectiveness of our strat-
stocks to construct one portfolio. Then, letting Q and R indi- egy. Note that it is very difficult for funds to consistently out-
cate quotient and remainder respectively in [Q, R] = K2 /K1 : perform the market index over an extended period of time due
Table 1. Comparison of Features and Clustering Methods.
R-M D-M (Ours) D-K
Total Ret. (↑) 208.8% 283.5% 272.6%
Daily Sharpe (↑) 0.44 0.50 0.49
Max Drawdown (↑) -55.6% -60.5% -59.0 %
Daily Mean Ret. (↑) 9.7 % 11.1% 10.9%
Monthly Mean Ret. (↑) 8.7% 10.0 % 9.9%
Yearly Mean Ret. (↑) 9.6% 10.0 % 11.19%
Win Years (↑) 64.71% 69.52% 66.31%
acceptable to financial users in practice as they add another
source of uncertainty (risk) that is hard to quantify.
Fig. 3. t-SNE visualisation of FTSE 100 CAE features. One colour Comparison with Funds To further analyse the effective-
indicates one industrial sector. The stocks on the right are all from ness of our strategy, we compare our strategy with well known
Sector Materials: RRS.L (Randgold Resources Limited), FRES.L public funds in stock market in Table 2. Specifically, we se-
(Fresnillo PLC), ANTO.L (Antofagasta PLC), BLT.L (BHP Billi- lect 2 big funds (CCA and VXX) and the top 3 best performed
ton Ltd), AAL.L (Anglo American PLC), RIO.L (Rio Tinto Group), funds (IEO, PXE, PXI) recommended by YAHOO ( https:
KAZ.L (KAZ Minerals), EVR.L (EVRAZ PLC) //finance.yahoo.com/etfs). Note that the ranking
of funds change over time. The fund data is obtained from
Yahoo Finance. Because VXX starts from 20/01/2009, this
evaluation is computed over 2K trading days (20/01/2009-
09/01/2017). From Table 2, our portfolio achieved the high-
est returns: Total (215.4%), daily (16.7%), monthly (16.6%),
yearly (11.8%) in 2000 trading days, showing the strong prof-
itability of our strategy. We also achieved the highest daily
Sharpe ratio (0.8), meaning that we effectively balance the
profitability and variance. We achieve the 2nd lowest max
drawdown, meaning that our method can effectively manage
the investment risk. In most years (62.5%), our portfolio
makes a profit. It is only slightly worse than PXI in terms
Fig. 4. Our portfolio vs FTSE 100 Index of 75.0% of profitable years. This shows the stability of our
to the complexity and diversity of market variations. In Fig. 4 strategy.
(b)-(c) we show specific shorter term periods where the mar- Table 2. Comparison with Well-known Funds
ket is behaving very differently including down-up (b), flat CCA VXX IEO PXE PXI Ours
Total Ret. 117.0% -99.9% 89.9% 101.6% 152.2% 215.4%
(c) and bullish (d). The overall dynamic trends of our strategy Daily Sharpe 0.7 -1.1 0.4 0.4 0.6 0.8
reflect the conditions of the market (meaning that the stocks Max Drawdown -22.2% -99.9% -56.8% -57.6% -59.3% -30.9%
Daily Mean Ret. 10.9% -67.7% 12.7% 13.4% 15.9% 16.7%
selected by our strategy are representative of the market), yet Monthly Mean Ret. 10.7% -66.4% 11.5% 12.4% 14.9% 16.6%
we outperform the market even across a diverse range of con- Yearly Mean Ret 6.5% -44.6% 5.0% 8.2% 9.4% 11.8%
Win Years 62.5% 0.0% 62.5% 50.0% 75.0% 62.5%
ditions (b-c), and over a long time-period (a).
Feature and Clustering We evaluate features and clus-
4 Conclusions
tering methods over a long term period (4K trading days).
From Table 1, the total return of our method (D-M, deep fea- We propose a deep learned-based investment strategy, which
ture + modularity-based clustering) is higher than R-M (R-M, includes: (1) novel stock representation learning by deep
Raw time series + modularity-based clustering), 283.5% vs CAE encoding of candlestick charts, (2) diversification
208.8%. It means the deeply learned feature capture richer in- through modularity optimisation based clustering and (3)
formation, which is more effective for portfolio optimisation portfolio construction by selecting the best Sharpe ratio stock
than raw time series. Similar conclusions can be drawn based in each cluster. Experimental results show: (a) our learned
on other measures. In terms of clustering method, our modu- stock feature captures semantic information and (b) our port-
larity optimization method works better than D-K (deep fea- folio outperforms the FTSE 100 index and many well-known
ture + k-means) in terms of returns and daily Sharpe, show- funds in terms of total return.
ing the effectiveness of modularity-based clustering. As ex- Acknowledgements This work was supported by EPSRC
plained in the Introduction, k-means cannot be used for port- (EP/R026173/1), the European Union’s Horizon 2020 re-
folio construction in practice. Specifically, the results of k- search and innovation program under grant agreement No
means cannot be repeated because of the randomness of the 640891, and National Natural Science Foundation of China
initial seed. Non-deterministic investment strategies are not No. 61773248.
5 References [13] Thomas Fischer and Christopher Krauss, “Deep learn-
[1] Harry Markowitz, “Portfolio selection,” The journal of ing with long short-term memory networks for financial
finance, vol. 7, no. 1, pp. 77–91, 1952. market predictions,” European Journal of Operational
Research, 2017.
[2] John L Kelly, “A new interpretation of information rate,”
[14] Ritika Singh and Shashi Srivastava, “Stock prediction
Bell Labs Technical Journal, 1956.
using deep learning,” Multimedia Tools and Applica-
[3] Olivier Ledoit and Michael Wolf, “Improved estima- tions, vol. 76, no. 18, pp. 18569–18584, 2017.
tion of the covariance matrix of stock returns with an
[15] Narasimhan Jegadeesh and Sheridan Titman, “Returns
application to portfolio selection,” Journal of empirical
to buying winners and selling losers: Implications for
finance, vol. 10, no. 5, pp. 603–621, 2003.
stock market efficiency,” The Journal of finance, vol.
[4] Gary Chamberlain and Michael Rothschild, “Arbitrage, 48, no. 1, pp. 65–91, 1993.
factor structure, and mean-variance analysis on large as-
[16] SR Nanda, Biswajit Mahanty, and MK Tiwari, “Cluster-
set markets,” 1982.
ing indian stock market data for portfolio management,”
[5] Jonathan Masci, Ueli Meier, Dan Cireşan, and Jürgen Expert Systems with Applications, 2010.
Schmidhuber, “Stacked convolutional auto-encoders for [17] Vincenzo Tola, Fabrizio Lillo, Mauro Gallegati, and
hierarchical feature extraction,” Artificial Neural Net- Rosario N Mantegna, “Cluster analysis for portfolio op-
works and Machine Learning–ICANN 2011, pp. 52–59, timization,” Journal of Economic Dynamics and Con-
2011. trol, 2008.
[6] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick [18] Mark EJ Newman, “Modularity and community struc-
Haffner, “Gradient-based learning applied to document ture in networks,” Proceedings of the national academy
recognition,” Proceedings of the IEEE, 1998. of sciences, 2006.
[7] Lawrence Rabiner and Biing-Hwang Juang, Fundamen- [19] William F Sharpe, “The sharpe ratio,” The journal of
tals of Speech Recognition, Prentice-Hall, Inc., Upper portfolio management, 1994.
Saddle River, NJ, USA, 1993.
[20] Karen Simonyan and Andrew Zisserman, “Very deep
[8] Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas convolutional networks for large-scale image recogni-
Burget, Ondrej Glembek, Nagendra Goel, Mirko Han- tion,” arXiv preprint arXiv:1409.1556, 2014.
nemann, Petr Motlicek, Yanmin Qian, Petr Schwarz,
et al., “The kaldi speech recognition toolkit,” in IEEE [21] Laurens van der Maaten and Geoffrey Hinton, “Visu-
2011 workshop on automatic speech recognition and alizing data using t-sne,” Journal of Machine Learning
understanding. IEEE Signal Processing Society, 2011. Research, pp. 2579–2605, 2008.

[9] Dario Amodei, Sundaram Ananthanarayanan, Rishita


Anubhai, et al., “Deep speech 2: End-to-end speech
recognition in english and mandarin,” in ICML, 2016,
pp. 173–182.

[10] David Silver, Aja Huang, Chris J Maddison, Arthur


Guez, Laurent Sifre, George Van Den Driessche, Ju-
lian Schrittwieser, Ioannis Antonoglou, Veda Panneer-
shelvam, Marc Lanctot, et al., “Mastering the game of
go with deep neural networks and tree search,” Nature,
2016.

[11] Carven von Bearnensquash, “Paper gestalt,” in In Secret


Proceedings of Computer Vision and Pattern Recogni-
tion (CVPR), 2010.

[12] David F. Fouhey, Daniel Maturana, and Rufus von


Woofles, “Visually identifying rank,” in ACH Special
Interest Group on Harry Quetzcoatl Bovik (SIGBOVIK),
2015.

You might also like