0% found this document useful (0 votes)
48 views13 pages

Sarmento 2020

1) The document discusses enhancing a pairs trading strategy with machine learning techniques. Pairs trading involves identifying two assets whose prices historically move together and taking long and short positions when the spread between their prices deviates from the historical mean. 2) Two problems are addressed: how to find profitable pairs while constraining the search space, and how to avoid long decline periods from pairs that remain divergent. The authors propose using an unsupervised learning algorithm and a forecasting-based trading model to address these problems. 3) The proposed strategy is tested on 208 commodity ETFs using 5-minute price data from 2009-2018. Results show the unsupervised learning technique can outperform common pair selection methods, and the

Uploaded by

Kylian M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views13 pages

Sarmento 2020

1) The document discusses enhancing a pairs trading strategy with machine learning techniques. Pairs trading involves identifying two assets whose prices historically move together and taking long and short positions when the spread between their prices deviates from the historical mean. 2) Two problems are addressed: how to find profitable pairs while constraining the search space, and how to avoid long decline periods from pairs that remain divergent. The authors propose using an unsupervised learning algorithm and a forecasting-based trading model to address these problems. 3) The proposed strategy is tested on 208 commodity ETFs using 5-minute price data from 2009-2018. Results show the unsupervised learning technique can outperform common pair selection methods, and the

Uploaded by

Kylian M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Expert Systems with Applications 158 (2020) 113490

Contents lists available at ScienceDirect

Expert Systems with Applications


journal homepage: www.elsevier.com/locate/eswa

Enhancing a Pairs Trading strategy with the application of Machine


Learning
Simão Moraes Sarmento ⇑, Nuno Horta
Instituto de Telecomunicações, Instituto Superior Técnico, Torre Norte, Av. Rovisco Pais, 1, 1049-001 Lisboa, Portugal

a r t i c l e i n f o a b s t r a c t

Article history: Pairs Trading is one of the most valuable market-neutral strategies used by hedge funds. It is particularly
Received 6 November 2019 interesting because it overcomes the arduous process of valuing securities by focusing on relative pricing.
Revised 10 March 2020 By buying a relatively undervalued security and selling a relatively overvalued one, a profit can be made
Accepted 27 April 2020
upon the pair’s price convergence. However, with the growing availability of data, it became increasingly
Available online 4 May 2020
harder to find rewarding pairs. In this work, we address two problems: (i) how to find profitable pairs
while constraining the search space and (ii) how to avoid long decline periods due to prolonged divergent
Keywords:
pairs. To manage these difficulties, the application of promising Machine Learning techniques is investi-
Pairs trading
Market neutral
gated in detail. We propose the integration of an Unsupervised Learning algorithm, OPTICS, to handle
Machine Learning problem (i). The results obtained demonstrate the suggested technique can outperform the common
Deep learning pairs’ search methods, achieving an average portfolio Sharpe ratio of 3.79, in comparison to 3.58 and
Unsupervised learning 2.59 obtained by standard approaches. For problem (ii), we introduce a forecasting-based trading model,
capable of reducing the periods of portfolio decline by 75%. Yet, this comes at the expense of decreasing
overall profitability. The proposed strategy is tested using an ARMA model, an LSTM and an LSTM
Encoder-Decoder. This work’s results are simulated during varying periods between January 2009 and
December 2018, using 5-min price data from a group of 208 commodity-linked ETFs, and accounting
for transaction costs.
Ó 2020 Elsevier Ltd. All rights reserved.

1. Introduction The main contributions of this paper are: a novel framework


based on the application of Principal Component Analysis (PCA)
Pairs Trading is a popular trading strategy widely used by hedge followed by the OPTICS clustering algorithm to find promising pair
funds and investment banks. It is capable of obtaining profits irre- combinations; a novel forecasting-based trading model experi-
spective of the market direction. mented with an ARMA, LSTM and LSTM Encoder-Decoder; empiri-
This is accomplished with a two-step procedure. First, a pair of cal evidence of the suitability of ETFs traded in a 5-min setting for
assets whose prices have historically moved together is detected. Pairs Trading.
Then, assuming the equilibrium relationship should persist in the The remainder of this document is organized as follows: in Sec-
future, the spread between the prices of the two assets is continu- tion 2 we introduce the main concepts of Pairs Trading while
ously monitored. In case it deviates from its historical mean, the describing the associated research work. In Section 3, we suggest
investor shorts the overvalued asset and buys the undervalued a new pairs selection framework to address the first problem
one. Both positions are closed upon price convergence. motivating this research work. In Section 4, we propose a new
However, with the growing availability of data, it became trading model in response to the second problem introduced. In
increasingly harder to find robust pairs. In this work, we address Section 5, we design the simulation environment to test the
two problems in specific: (i) how to find profitable pairs while con- proposed approaches. At last, the results obtained are presented
straining the search space and (ii) how to avoid long decline peri- in Section 6.
ods due to prolonged divergent pairs.

2. Background and related work


⇑ Corresponding author.
E-mail addresses: [email protected] (S.M. Sarmento), Each stage composing a Pairs Trading strategy is described in
[email protected] (N. Horta). detail, along with the most relevant related work.

https://doi.org/10.1016/j.eswa.2020.113490
0957-4174/Ó 2020 Elsevier Ltd. All rights reserved.
2 S.M. Sarmento, N. Horta / Expert Systems with Applications 158 (2020) 113490

2.1. Pairs selection i Calculate the pair’s spread (defined by the authors as
St ¼ Y t  X t ) mean, ls , and standard deviation, rs , during the
The pairs selection stage encompasses finding the appropriate formation period.
candidate pairs, and selecting the most promising ones. ii Define the model thresholds: the threshold that triggers a
Starting with the quest for appropriate candidates, the investor long position, aL , the threshold that triggers a short position,
should first select the securities of interest (e.g. stocks, ETFs, etc.) aS , and the exit threshold, aexit , that defines the level at which
and search for possible combinations. In the literature, two a position should be exited.
methodologies are typically suggested for this stage: performing iii Monitor the evolution of the spread, St , and control if any
an exhaustive search for all possible combinations among the threshold is crossed.
selected securities (Krauss, Do, & Huck, 2017; Caldeira & Moura, iv In case aL is crossed, go long the spread by buying Y and sell-
2013), or arranging them in meaningful groups, usually by sector, ing X. If aS is crossed, short the spread by selling Y and buying X.
and constrain the combinations to pairs formed by securities Exit position when aexit is crossed.
within the same group (Do & Faff, 2010; Dunis, Giorgioni, Laws,
& Rudy, 2010). While the former may find more interesting and The simplicity of this model is particularly appealing, motivating
unusual pairs, the latter reduces the likelihood of finding spurious its frequent application in the field. Nonetheless, the entry points
relations. defined may not be optimal since no information concerning the
From the candidates found, the investor must define how a pair subsequent spread direction is incorporated in the trading
is deemed eligible for trading. The most common approaches are decision.
the distance, correlation, and cointegration approaches. Some efforts have emerged, trying to propose more robust trad-
The distance approach, suggested by Gatev, Goetzmann, and ing models. Techniques from different fields, such as stochastic
Rouwenhorst (2006), selects pairs which minimize the historic control theory, statistical process modelling, and Machine Learn-
sum of squared distances between the two assets’ price series. This ing, have been studied.
method is widely used, but according to Krauss (2017) it is analyt- In particular, the results obtained by Machine Learning
ically sub-optimal. If pi;t is a realization of the normalized price approaches have proved very promising. Dunis, Laws, and Evans
 
process Pi ¼ P i;t t2T of an asset i, the average sum of squared dis- (2006), Dunis, Laws, and Evans (2009) and Dunis, Laws,
Middleton, and Karathanasopoulos (2015) explore the application
tances ssdPi ;Pj in the formation period1 of a pair formed by assets i
of Artificial Neural Networks (ANN) to forecast the spread change
and j is given by
for three famous spreads. Thomaidis, Kondakis, and Dounias
1X  2 (2006) propose an experimental statistical arbitrage system based
ssdPi ;Pj ¼ t ¼ 1T pi;t  pj;t : ð1Þ
T on Neural Network Generalized Autoregressive Conditional
Heteroskedasticity (GARCH) models for modelling the
Thus, an optimal pair would be one that minimizes Eq. (1). How- mispricing-correction mechanism between relative prices compos-
ever, this implies a zero spread pair is considered optimal which ing a pair. Huck (2009) and Huck (2010) uses Recurrent Neural
logically may not be, as it would not provide trade opportunities. Networks (RNN) to generate a one-week ahead forecast, from
The application of Pearson correlation as a selection metric is which the predicted returns are calculated. Krauss et al. (2017)
analyzed by Chen, Chen, Chen, and Li (2017). The authors examine analyze the effectiveness of deep neural networks, gradient-
its application on return series with the same data sample used by boosted-trees and random forests in the context of statistical arbi-
Gatev et al. (2006) and find that correlation shows better perfor- trage using S&P 500 stocks. More recently, Kim and Kim (2019)
mance, with a reported monthly average of 1.70% raw returns, explored the potential of applying Deep Reinforcement Learning
almost twice as high as the one obtained using the distance in a Pairs Trading setup, and obtained satisfactory results in com-
approach. Nevertheless, this criteria is not foolproof as two return parison with more traditional methods.
level correlated securities might not share an equilibrium relation- Nevertheless, Machine Learning techniques still remain fairly
ship, and divergence reversions cannot be explained theoretically. unexplored in this field and the results obtained indicate this is a
At last, the cointegration approach entails selecting pairs for promising direction for future research.
which the two constituents, Y t and X t , are found to be cointegrated.
If that is the case, the series constructed as
3. Proposed pairs selection framework
St ¼ Y t  bX t ; ð2Þ
where b is the cointegration factor, must be stationary, by At this research stage, we aim to explore how one investor may
definition. Defining the spread series in this way is particularly con- find promising pairs without exposing himself to the adversities of
venient since under these conditions it is expected to be mean- the common pairs searching techniques. On the one hand, if the
reverting, meaning that every spread divergence is expected to be investor limits its search to securities within the same sector, he
followed by convergence. Hence, this approach finds econometri- is less likely to find pairs not yet being traded in large volumes,
cally more sound equilibrium relationships. Vidyamurthy (2004) leaving a small margin for profit. But on the other hand, if the
proposes a set of heuristics for cointegration-based strategies. Fur- investor does not impose any limitation on the search space, he
thermore, Huck and Afawubo (2015) perform a comparison study might have to explore excessive combinations and possibly find
between the cointegration approach and the distance approach spurious relations.
and find that the former significantly outperforms the latter. We intend to reach an equilibrium with the application of an
Unsupervised Learning algorithm, on the expectation that it will
infer meaningful clusters of assets from which to select the pairs.
2.2. Trading Models

The most common trading model follows from Gatev et al. 3.1. Dimensionality reduction
(2006), and can be described as indicated below:
The first step towards this direction consists in finding a com-
1
The formation period corresponds to the period in which securities are being pact representation for each asset, starting from its price series.
analyzed to form potential pairs. The application of PCA is proposed. PCA is a statistical procedure
S.M. Sarmento, N. Horta / Expert Systems with Applications 158 (2020) 113490 3

that uses an orthogonal transformation to convert a set of observa- However, if regions in space have different densities, a fixed e may
tions of possibly correlated variables into a set of linearly uncorre- be well adapted to one given cluster density, but it might be
lated variables, the principal components. Each component can be unrealistic for another, as depicted in Fig. 1. It is evident that cluster
seen as representing a risk factor. A, B, and C may be found using the same e, but A1 and A2 would not
We suggest the application of PCA in the normalized return ser- be distinguished.
ies, defined as The OPTICS algorithm addresses this problem. OPTICS is based
on DBSCAN, with the introduction of some notions that accomplish
Pi;t  Pi;t1 a varying e implementation. In this enhanced setting, the investor
Ri;t ¼ ; ð3Þ is only required to specify the parameter minPts, as the algorithm is
Pi;t1
capable of detecting the most appropriate e0 for each cluster3.
Therefore, we propose using OPTICS not just to account for varying
where Pi;t is the price series of a asset i. Using the price series may
cluster densities but also to facilitate the investor’s task.
not be appropriate due to the underlying time trends. The number
of principal components used defines the number of features for
each asset representation. Considering that an Unsupervised Learn-
3.3. Pairs selection criteria
ing algorithm will be applied to these data, the number of features
should not be large. High data dimensionality presents a dual prob-
Having generated the clusters of assets from which to find the
lem. The first being that in the presence of more attributes, the like-
candidate pairs, it is still necessary to define a set of rules to select
lihood of finding irrelevant features increases. Additionally, there is
those eligible for trading. It is critical that the pairs’ equilibrium
the problem of the curse of dimensionality, caused by the exponen-
persists. To enforce this, we propose the unification of methods
tial increase in volume associated with adding extra dimensions to
applied in separate research work. According to the proposed crite-
the space. According to Berkhin (2006), this effect starts to be severe
ria, a pair is selected if it complies with the four conditions
for dimensions greater than 15. Taking this into consideration, the
described next.
number of dimensions is upper bounded at 15 and chosen empiri-
First, a pair is only selected if the two securities forming the pair
cally, as indicated in Section 6.1.
are cointegrated. To test this condition, we propose the application
of the Engle-Granger test, Engle and Granger (1987), due to its sim-
3.2. Unsupervised learning plicity. To deal with the test reliance on the dependent variable, we
propose that the test is run for both possible selections of the
Having constructed a compact representation for each asset, a dependent variable and that the combination generating the low-
clustering technique may be applied. To decide which algorithm est t-statistic is selected.
is more appropriate, some problem-specific requisites are first Secondly, to provide more confidence in the mean-reversion
defined: character of the spread, an extra validation step is suggested. We
resort to the concept of Hurst exponent, H, which quantifies the
 No need to specify the number of clusters in advance. relative tendency of a time series either to regress strongly to
 No need to group all securities. the mean, or to follow a trend (Kleinow, 2002). If H belongs to
 Strict assignment that accounts for outliers. the range 0–0.5 it indicates that a time series is mean-reverting.
 No assumptions regarding the clusters’ shape. Hence, we require that a pairs’ spread Hurst exponent is smaller
than 0.5.
By making the number of clusters data-driven, we introduce as In third place, we intend to discard stationary pairs with
little bias as possible. Furthermore, outliers should not be incor- unsuitable timings. A mean-reverting spread by itself does not
porated in the clusters, and therefore grouping all assets should necessarily generate profits. There must be coherence between
not be enforced. In addition, the assignment should be strict, the mean-reversion time and the trading period. The half-life of
otherwise the number of possible pair combinations increases, mean-reversion is an indicator of how long it takes for a time series
which is conflicting with the initial goal. Finally, due to the to mean-revert (Chan, 2013). Therefore, we propose filtering out
nonexistence of prior information that indicates the clusters pairs for which the half-life takes extreme values: less than one
should be regularly shaped, the selected algorithm should not day or more than one year.
assume this. Lastly, we suggest enforcing that every spread crosses its mean
Taking into consideration the previously described require- at least twelve times per year, enforcing one trade per month, on
ments, a density-based clustering algorithm seems an appropriate average.
choice. It forms clusters with arbitrary shapes, and thus no gaus-
sianity assumptions need to be adopted. It is naturally robust to
outliers as it does not group every point in the data set. Further- 3.4. Framework diagram
more, it requires no specification of the number of clusters.
The DBSCAN algorithm is the most influential in this category. The three building blocks of the proposed framework have been
Briefly, DBSCAN detects clusters of points based on their density. described. Fig. 2 illustrates how they connect. As we may observe,
To accomplish that, two parameters need to be defined: e, which the initial state should comprise the price series for all the possible
specifies how close points should be to each other to be considered pairs’ constituents. Then, by reducing the data dimensionality,
‘‘neighbours”, and minPts, the minimum number of points to form each security may be described not just by its price series but also
a cluster. From these two parameters, in conjugation with some by the compact representation emerging from the application of
concepts that we omit here2, clusters of neighbouring points are PCA (State 1). Using this simplified representation, the OPTICS
formed. Points falling in regions with less than minPts within a circle algorithm is capable of organizing the securities into clusters (State
of radius e are classified as outliers, hence not affecting the results. 2). Finally, we may search for pair combinations within the clusters
Nevertheless, DBSCAN still carries one drawback. The algorithm is and select those that comply with the conditions imposed (State 3).
appropriate under the assumption that clusters are evenly dense.
3
This description is very simplified. We suggest the interested readers refer to
2
Interested readers may refer to Ester, Kriegel, Sander, and andand Xu (1996). Ankerst, Breunig, Kriegel, and Sander (1999).
4 S.M. Sarmento, N. Horta / Expert Systems with Applications 158 (2020) 113490

holding a position yet, the next position, Ptþ1 , may be described


according to
8
< if Dtþ1 P aL ; Go long
>
Ptþ1 : if Dtþ1 6 aS ; Go short : ð5Þ
>
:
otherwise; Wait

Once a position is entered, it is maintained while the predicted


spread direction does not change. When it switches, the position
is exited.
This strategy defines the basis of the proposed trading model. It
is left to describe how the thresholds (aL ; aS ) should be calculated.
A possible approach consists of framing an optimization problem,
and try to find the profit-maximizing values. However, this
approach is rejected due to its risk of data-snooping and unneces-
Fig. 1. Clusters with varying density. Adapted from: Ankerst et al. (1999). sary added complexity. We propose a simpler, non-iterative, data-
driven approach. We start by obtaining f ðxÞ, the spread percentage
change distribution during the formation period, given that the
spread percentage change at time t is defined as
4. Proposed trading model Stþ1  St
xt ¼  100:
St
We proceed to address the second problem this work intends to

explore: handling long decline periods due to prolonged divergent From f ðxÞ, the set of negative percentage changes, f ðxÞ, and posi-
þ
pairs. tive percentage changes, f ðxÞ, are considered separately. Since
the proposed model targets abrupt changes but also requires that
they occur frequently enough, looking for the extreme quantiles
4.1. Trading model
seems an adequate solution. Therefore, we recommend using the
þ
A potential alternative to continuously monitor the spread, and top decile and quintile from f ðxÞ as candidates for defining aL

track deviations, consists of modelling the spread directly. This and the bottom ones, from f ðxÞ, for defining aS . The quintile-
way, a prediction can be made regarding how the spread will vary based and decile-based thresholds are both tested in the validation
in the future and a position is only entered if the predicted condi- set and the most optimistic combination is adopted. Formally,
tions are favourable. By taking advantage of a time-series forecast-
faS ; aL g ¼ argmax Rval ðqÞ;
ing algorithm to predict the spread at the next time instant, we q
hn on oi
may calculate the expected spread percentage change at time
q 2 Q f  ðxÞ ð0:20Þ; Q f þ ðxÞ ð0:80Þ Q f  ðxÞ ð0:10Þ; Q f þ ðxÞ ð0:90Þ
t þ 1 as
ð6Þ
Stþ1  St
Dtþ1 ¼  100; ð4Þ val
where R is the return obtained in the validation set.
St
To summarize, the model construction follows the diagram
where S and S correspond to the real and the predicted spread, illustrated in Fig. 3. For each pair, the investor starts by training
respectively. When the absolute value of the predicted change is the forecasting algorithms to predict the spread. Furthermore,
larger than a predefined threshold, a position may be entered, on the decile-based and quintile-based thresholds are collected to
the expectation that the spread will suffer an abrupt movement integrate the trading model. Having fitted the forecasting algo-
from which the investor can benefit. Assuming the investor is not rithms and obtained the two combinations for the thresholds

Fig. 2. Pairs selection diagram.


S.M. Sarmento, N. Horta / Expert Systems with Applications 158 (2020) 113490 5

Fig. 3. Proposed model construction diagram.

(State 1), the model is applied to the validation set. From the vali- Furthermore, from a trading perspective, it might be beneficial
dation performance, the best threshold combination is selected to collect information regarding the prediction not just of the next
(State 2). At this point, the model is ready to be applied to unseen instant of a time-series but also of later time steps. An LSTM
data. Encoder-Decoder architecture, as presented by Sutskever,
An application example is illustrated in Fig. 4. For the sake of Vinyals, and Le (2014), is naturally fitted to such scenarios. This
illustration, the forecasting has perfect accuracy, meaning the posi- architecture is comprised by two LSTMs, one for encoding the
tions can be set in optimal conditions. input sequence into a fixed-length vector, the encoder, and a sec-
ond for decoding the fixed-length vector and outputting the pre-
4.2. Applied forecasting algorithms dicted sequence, the decoder, as illustrated in Fig. 5.
In this multi-step forecasting scenario, the trading rules are
Forecasting algorithms commonly applied in the literature can adapted by simply calculating the prediction change N times in
be divided into two major classes: parametric and non- advance. Likewise, the thresholds aL and aS should be calculated
parametric models. The former assume that the underlying process with respect to the distribution of the percentage change between
can be described using a small number of parameters. The latter xðt þ N Þ and xðt Þ.
make no structural assumptions about the underlying structure Given the limited computational resources, the neural network
of the process. We propose the application of a benchmark para- models’ tuning is constrained to a set of most relevant variables:
metric approach, the autoregressive-moving-average (ARMA), data sequence length, the number of hidden layers and the nodes
and two non-parametric models, the Long Short-Term Memory in each hidden layer. Also, early-stopping and dropout are applied
(LSTM) and the LSTM Encoder-Decoder. This will allow inferring as regularization techniques.
to what extent the strategy profitability depends on the complexity
of the time-series forecasting algorithm. The justification for the 5. Research design
choices adopted is described next.
Although financial time series are very complex in nature (Si & The research design contemplates two stages, corresponding to
Yin, 2013), the ones under analysis are by construction stationary, each problem being addressed.
as they correspond to the linear combination of cointegrated price
series. Thus, it is fair to ask if an ARMA model may succeed at fore-
casting these series. This model describes a stationary stochastic 5.1. Research Stage 1 – Pairs selection
process as the composition of two polynomials, the autoregression
AR(p) and the moving average MA(q), as First, we intend to examine how the three different pairs’ search
techniques (no grouping, grouping by category and grouping with
X
p X
q
OPTICS) compare to each other. For this purpose, the three
X t ¼ c þ et þ ui X ti þ hi eti ; ð7Þ methodologies are implemented. The proposed pairs selection
i¼1 i¼1
rules are also constructed and applied for each search technique.
where p and q represent the order of the polynomials. As for the trading setup, since the focus lies on comparing the
However, there still is an underlying motivation for applying search techniques relative to each other, we do not worry about
more complex models, such as Artificial Neural Networks. First, meticulously optimizing the trading conditions. Therefore, we
ANNs have been an object of attention in many different fields, apply the standard threshold-based model proposed by Gatev
which makes its application in this context an interesting case et al. (2006), described in Section 2.2, with the parameters speci-
study. Furthermore, ANN-based models have shown promising fied in Table 1. The spread’s standard deviation, rs , and mean, ls ,
results in predicting financial time series data in general are calculated with respect to the entire formation period.
(Cavalcante, Brasileiro, Souza, Nobrega, & Oliveira, 2016). From To test the performance of the selected pairs, we implement
the vast amount of existing ANN configurations, the LSTM architec- three different test portfolios, resembling probable trading scenar-
ture, Hochreiter and Schmidhuber (1997), is deemed appropriate ios. Portfolio 1 considers all the pairs identified in the formation
due to its capabilities of learning non-linear representations of period. Portfolio 2 takes advantage of the feedback collected from
the data while memorizing long sequences. LSTMs assume the running the strategy in the validation set by selecting only the
existence of a sequential dependency among inputs, and previous pairs that had a positive return. Lastly, Portfolio 3 simulates the
states might affect the decision of the neural network at a different situation in which the investor is limited to invest in a fixed num-
point in time. ber of k pairs. In such case, we suggest selecting the top-k pairs
6 S.M. Sarmento, N. Horta / Expert Systems with Applications 158 (2020) 113490

Fig. 4. Example of the proposed forecasting-based strategy.

Fig. 5. LSTM Encoder-Decoder.

Table 1 5.3. Dataset


Threshold-based model parameters.

Parameters Values Trading ETFs is considered adequate since they are easy to
Long Treshold ls  2rs trade, as they trade like stocks, and because their dynamics are
Short Threshold ls þ 2rs expected to change much slower than that of a single stock. Adding
Exit Threshold ls to that, research in the field (Chen et al., 2017; Perlin, 2007)
obtained more robust mean-reverting time series by using a linear
combination of stocks to form each component of the spread. We
presume using ETFs may be a proxy to accomplish that more
according to the return obtained in the validation set. We consider efficiently.
k ¼ 10, as it stands in between the choices of Gatev et al. (2006), This work fixates a subset of ETFs which track single commodi-
which uses k ¼ 5 and k ¼ 20. By testing different portfolio con- ties, commodity-linked indexes or companies focused on exploring
structions, we may not only find the optimal clustering procedure a commodity. This reduces the number of possible pairs, thus mak-
but also evaluate its best condition of application. ing the strategy computationally more efficient and leaving space
for careful analysis of the selected pairs.
A total of 208 commodity-linked ETFs are available for trading
5.2. Research Stage 2 – Trading Model in January 2019, for which five categories may be identified based
on their composition (Agriculture, Broad Market, Energy, Industrial
For the second research stage, we aim to compare the robust- Metals and Precious Metals)4. It should be noted that survivorship
ness of the standard threshold-based model, with the one provided bias is induced by considering just the ETFs active throughout the
by the proposed forecasting-based model, simulated using an entire period. Nevertheless, no survivorship-free dataset containing
ARMA, an LSTM and an LSTM Encoder-Decoder with an output of the pretended ETFs was available.
length two. We suggest first to evaluate the forecasting perfor- We considered price series data with 5-min frequency. The
mance of each algorithm. As benchmark, a naive baseline is consid- motivation for using intraday data is threefold. First, with finer
ered, which simply outputs Y tþ1 ¼ Y t . Then, we evaluate the granularity, the entry and exit points can be defined with more
trading strategy itself, using the pairs search technique that proved precision, providing opportunities for higher profit margins.
more appealing according to the results obtained in the previous
research stage. As for the test portfolio, we consider using
Portfolio 2. 4
This information is collected from ETF.com (2019).
S.M. Sarmento, N. Horta / Expert Systems with Applications 158 (2020) 113490 7

Secondly, we may detect intraday mean-reversion patterns that commission and market impact costs are adapted to account for
would not be found otherwise. At last, it provides more data sam- both assets in the pair.
ples, allowing to train complex forecasting models with less risk of As a trading system can not act instantaneously, there might be
overfitting. a small deviation in the entry price, inherent to the delay of enter-
The periods considered for simulating each research stage are ing a position. To account for this factor and make sure the strategy
illustrated in Fig. 6. There are essentially two possible configura- is viable in practice, we assume a conservative one period (5-min)
tions: (i) the 3-year-long formation periods, and (ii) the 9-year- delay for entering a position.
long formation period. In both cases, the second-to-last year is This work does not comprehend an implementation of a stop-
used for validating the performance before running the strategy loss system, under any circumstances. This means a position is
on the test set. We define a 1-year-long trading period, based on only exited if the pair converges or the trading period ends.
the findings of Do and Faff (2010), that claim the profitability can
be increased if the initial 6-month trading period proposed in
5.5. Evaluation metrics
Gatev et al. (2006) is extended to 1 year.
Configuration (i) is adopted when using the threshold-based
Regarding the trading evaluation, we propose analyzing the
trading model (described in Table 1). A formation period of three
Return on Investment (ROI), Sharpe Ratio (SR) and portfolio Maxi-
years seems appropriate. Although this period is slightly longer
mum Drawdown (MDD).
than what is commonly found in the literature5, we decide to pro-
The ROI is calculated as the net profit divided by the initial cap-
ceed on the basis that a longer period may identify more robust
ital, which we enforced to be $1. Hence, it is equal to the net profit.
pairs. Configuration (ii) is used for simulating the forecasting-
The portfolio SR is calculated as
based trading model, thus providing more formation data to fit the
forecasting algorithms. In this case, the first 8 years are used for
Rport  Rf
training, as indicated in Fig. 6. Each series contains approximately SRyear ¼  annualization factor; ð8Þ
rport
two hundred thousand data points.
For the first research stage, we propose using three different
where Rport represents the daily portfolio returns and Rf the risk-free
periods to have more statistical evidence on the results obtained.
rate7. The portfolio volatility, rport , is calculated as
In the second research stage, this is not conceivable due to the
computational burden of training the forecasting algorithms. vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u N N
uXX
Hence, we consider one period using configuration (i), and a sec-
rport ¼t xi covði; jÞxj ; ð9Þ
ond period using configuration (ii), to evaluate how the standard i¼1 j¼1
model could have performed in the same test period.
As preprocessing steps, we start by removing all ETFs with where wi is the relative weight of asset i in the portfolio. Lastly, the
missing values throughout the period being considered. Then, we annualization factor is set according to the methodology proposed
remove ETFs that do not verify the minimum liquidity requisites, by Lo (2002) (Table 2 in Lo (2002)), to prevent imprecise
to ensure the considered transaction costs associated with bid- approximations.
ask spread are realistic6. The minimum liquidity requisites follow The Maximum Drawdown is calculated with respect to the
the criterion adopted in Gatev et al. (2006) and Do and Faff (2012), account balance during the trading period. More formally, the
which discards all ETFs not traded during at least one day. MDD is calculated as
 
X ðt Þ  X ðsÞ
5.4. Trading simulation MDDðT Þ ¼ max max ; ð10Þ
s2ð0;T Þ t2ð0;sÞ X ðt Þ
Concerning the portfolio construction, we impose that all pairs
where X ðt Þ, with t P 0 represents the total account balance.
are equally weighted in the portfolio. In addition, we must define
the capital allocation within each pair. We consider that the capital
resulting from the short position is immediately applied in the long 5.6. Implementation environment
position. This type of leverage is adopted by most hedge funds. On
this basis, we construct a framework which ensures that every All the code developed in this work is built from scratch using
trade is set with just $1. This is accomplished by imposing that Python. Some libraries are particularly useful. First, sci-kit learn
proves helpful in the implementation of PCA and the OPTICS algo-
maxðasset1 ; asset2 Þ ¼ $1; rithm. Second, statsmodels provides an already implemented ver-
sion of the ADF test, useful for testing cointegration. Lastly, we
where asset1 and asset2 represent the capital invested in each pair’s make use of Keras to build the proposed neural networks.
constituent, as illustrated in Fig. 7. Although the gross exposure is Concerning the test environment, most of the simulation code is
higher, a $1 dollar initial investment is always sufficient. As trading run on a CPU (Intel Core i7 @ 3 GHz), except for the training of the
progresses, we consider that all the capital earned by a pair during LSTM models. They involve a huge amount of matrix multiplica-
the trading period is reinvested in that pair’s next trade. tions which result in long processing times. These operations are
All the results presented in this work account for transaction sped up by taking advantage of the parallelization capabilities of
costs. The transaction costs considered are based on estimates a GPU (NVIDIA Tesla T4).
from Do and Faff (2012). The authors perform an in-depth study
on the impact of transaction costs in Pairs Trading. The costs
include three components: commissions (8 bps), market impact 6. Results
(20 bps) and short-selling constraints (1% per annum). Besides,
The results obtained at each research stage are presented next.
5
Gatev et al. (2006), Do and Faff (2010) and Rad, Low, and Faff (2016) use a 1-year-
7
long formation period. Dunis et al. (2010) makes use of a 3-month formation period. We consider the average the 3-month treasury bill rate, taken from US (2019),
6
Trading illiquid ETFs would result in higher bid-ask spreads which could during the corresponding test period and converted to a daily basis for consistency
dramatically impact the profit margins. with the formula.
8 S.M. Sarmento, N. Horta / Expert Systems with Applications 158 (2020) 113490

Fig. 6. Trading periods.

Fig. 7. Market position definition.

Table 2 ETFs, the combinations within this sector are still vast. Lastly, the
Selected pairs using different search methods. number of possible pair combinations when using OPTICS is
Formation Period 2012–2015 2013–2016 2014–2017 remarkably lower. Although the number of clusters is higher than
No Clustering
when grouping by category, their smaller size results in fewer
Number of clusters 1 1 1 combinations. We proceed to analyze in more detail the results
Possible combinations 4465 5460 6670 attained with this algorithm.
Pairs selected 101 247 150 The results concerning the OPTICS application are obtained
By Category using five principal components to describe each ETF. We empiri-
Number of clusters 5 5 5 cally verified that up to the 15-dimensions boundary (motivated
Possible combinations 2612 3318 4190
in Section 3.1) the results are not significantly affected. We adopt
Pairs selected 59 51 51
5 dimensions since we find adequate to settle the ETFs’ represen-
OPTICS
tation in a lower dimension provided that there is no evidence
Number of clusters 9 13 12
Possible combinations 185 140 129 favouring higher dimensions.
Pairs selected 39 40 18 To validate the clusters formed and get an insight into their
composition, we examine the results obtained in the period of
Jan 2014 to Dec 20178. To represent the clusters in a 2-D setting,
the data must be reduced from 5 dimensions. We consider the appli-
6.1. Analysis of the pairs selection framework cation of t-SNE (Maaten & Hinton, 2008) for this purpose. Fig. 8 illus-
trates the clusters formed. The ETFs which are not clustered are
We start by presenting some relevant statistics in Table 2 con- represented by the smaller circles, which were not labelled to facil-
cerning the number of pairs found for the three different pairs’ itate the visualization.
search techniques being compared at this stage. To evaluate the integrity of the clusters, we propose analyzing
As expected, when no restrictions are imposed in the search the composing price series. Therefore, we select two clusters and
space, a broader set of ETFs emerges and consequently more pairs represent the logarithmic ETFs’ price series9. Fig. 9a illustrates a
are selected. Contrarily, when grouping ETFs in five partitions (ac-
cording to the categories described in Section 5.3), there is a reduc- 8
This period is chosen arbitrarily because an extensive analysis covering all
tion in the number of possible pair combinations. This is not more periods does not fit this report.
evident due to the underlying unbalance across the categories con- 9
Before taking the logarithm, the mean is subtracted from the price series to
sidered. Because energy-linked ETFs represent close to half of all facilitate the visualization.
S.M. Sarmento, N. Horta / Expert Systems with Applications 158 (2020) 113490 9

Fig. 8. Application of t-SNE to the clusters generated by OPTICS.

cluster in which the ETFs identified do not just belong to the same Comparing the different clustering techniques, if an investor is
category but are also part of the same segment, the Equity US: MLPs. only focused on obtaining the highest ROI, regardless of the
This evinces that the OPTICS approach is capable of detecting a incurred risk, performing no clustering is particularly appealing.
specific segment just from the time series data. Fig. 9b demonstrates However, when risk is taken into the equation, the OPTICS
the OPTICS clustering capabilities extend beyond selecting ETFs based strategy proves more auspicious. It is capable of generating
within the same segment, as we may observe ETFs from distinct cat- the highest average portfolio Sharpe ratio of 3.79, in comparison
egories, such as Agriculture (CGW, FIW, PHO, and PIO), Industrial with 3.58 obtained when performing no clustering or 2.59 when
Metals (LIT and REMX) and Energy (YMLI). There is a visible relation grouping by category. Also, it shows more consistency with respect
among the identified price series, even though they do not all belong to the portion of profitable pairs in the portfolio, with an average of
to the same category. 86% profitable pairs, against 80% when grouping by category and
We confirm the generated clusters accomplished its intent of 79% when performing no clustering at all. At last, it achieves more
combining the desired traits from the two standard techniques. steady portfolio drawdowns, with the lowest average MDD. It is
Namely, a tendency to group subsets of ETFs from the same cate- capable of maintaining the MDD values within an acceptable range
gory while not preventing clusters containing ETFs from different even when the other two techniques display considerable devia-
categories. tions, as in 2017.
With respect to the trading performance, Table 3 unveils the On a different note, the results indicate that there is no indis-
test results obtained with each clustering type using the three dif- putable best way of constructing the test portfolio. Nonetheless,
ferent portfolios introduced in Section 5.1. To aggregate the infor- evaluating in terms of ROI, we may infer that using information
mation in a more concise way, the average across all years and from the validation performance for constructing the test portfolio
portfolios is described in the rightmost column. Note also that is a useful heuristic, as portfolio 2 and 3 outperform portfolio 1 in
the three evaluation metrics described in Section 5.5 are accentu- every case.
ated, to differentiate from the remaining, less critical, descriptive
statistics. 6.2. Evaluation of the forecasting-based trading model
We can confirm the profitability in all the environments tested,
which corroborates the idea that the pairs selection rules are At this phase we intend to analyze how the proposed
robust. forecasting-based trading model performs when compared with
10 S.M. Sarmento, N. Horta / Expert Systems with Applications 158 (2020) 113490

Fig. 9. Price series composition of some clusters.

the standard model used so far. We start by selecting pairs using experimented with the ARMA model, 8 with the LSTM based model
the OPTICS clustering, due to its demonstrated ability. On these and 14 with the LSTM Encoder-Decoder.
conditions, we find 5 pairs during the formation period of Jan For the ARMA model, we tunned the polyomial orders (p and q).
2009 to Dec 2017 and 19 pairs during Jan 2015 to Dec 2017 (peri- However, due to the limited available resources and the complex-
ods defined in Fig. 6). Not surprisingly, the number of pairs found ity of training the LSTM-based models, there are severe constraints
for the former is significantly smaller as active cointegrated ETFs on the number variables and combinations which can tuned in this
throughout this interval are more scarce. Nevertheless, as training case. For this reason, the only adjustable parameters in the LSTM
the Deep Learning forecasting models is computationally expen- model are the number of input nodes (in ), number of hidden layers
sive, having fewer pairs is convenient. The corresponding five (hl ) and hidden nodes (hn ). As for the LSTM Encoder-Decoder, we
spreads are illustrated in Fig. 10. The spreads look indeed station- also define the number input nodes, encoder nodes (en ) and deco-
ary. There is an evident difference in their volatility, which further der nodes (dn ). The parameters are adjusted progressively based on
supports the importance of enforcing data-driven trading the results obtained, thus avoiding an exhaustive search.
thresholds. Regarding the activation units, the sigmoid is selected for the
Each spread in Fig. 10 is fitted by the forecasting algorithms. The gate activations, and tanh for the final activation.
forecasting score is obtained by averaging the mean-squared error The Xavier initialization, Glorot and Bengio (2010), is adopted. It
(MSE) across the five spreads. A total of 31 forecasting model archi- does not only reduce the chances of running into gradient prob-
tectures are implemented in this work, to find the one with the lems but also brings faster convergence.
most trading potential, meaning 155 models are trained (31 archi- The training stops when the results evidence signs of overfit-
tectures  5 spreads). We experiment increasingly complex config- ting, which were detected using early-stopping. The forecasting
urations until signs of overfitting are evident. From the 31 results obtained for the top 3 configurations of each algorithm,
implemented architectures, 9 correspond to the configurations ordered by their complexity, are described in Table 4.
S.M. Sarmento, N. Horta / Expert Systems with Applications 158 (2020) 113490 11

Table 3
Trading performance for each pairs search technique.

Test Period 2015 2016 2017 AVG.


Test Portfolio 1 2 3 1 2 3 1 2 3 –
No Clustering
SR 3.53 4.12 3.32 3.96 4.51 3.56 4.08 4.05 1.11 3.58
ROI (%) 10.4% 12.4% 17.4% 24.8% 26.3% 26.0% 11.9% 12.4% 11.5% 17.0%
MDD (%) 1.42% 0.97% 2.59% 2.05% 1.98% 2.65% 1.33% 1.38% 9.28% 2.63%
Total pairs 101 77 10 247 223 10 150 141 10 108
Profitable pairs (%) 70% 80% 70% 86% 86% 90% 69% 70% 90% 79%
Total trades 229 173 17 411 361 15 212 195 14 181
Profitable trades 180 147 15 369 329 15 172 162 12 156
Unprofitable trades 49 26 2 42 32 0 40 33 2 25
By Category
SR 1.56 2.39 3.75 3.48 3.82 3.09 2.17 2.14 0.89 2.59
ROI (%) 5.52% 9.38% 17.8% 13.6% 13.9% 20.4% 7.86% 8.42% 8.31% 11.7%
MDD (%) 1.77% 1.82% 2.09% 2.06% 2.26% 4.56% 2.47% 2.67% 8.91% 3.18%
Total pairs 59 40 10 51 44 10 51 47 10 36
Profitable pairs (%) 64% 85% 90% 86% 86% 90% 65% 64% 90% 80%
Total trades 154 108 39 107 83 20 64 54 13 71
Profitable trades 112 89 36 92 73 19 49 43 12 58
Unprofitable trades 42 19 3 15 10 1 15 11 1 13
OPTICS
SR 4.05 3.84 5.08 4.72 4.79 3.80 2.75 2.83 2.27 3.79
ROI (%) 12.5% 13.5% 23.5% 10.5% 11.9% 15.2% 7.36% 8.38% 9.98% 12.5%
MDD (%) 1.37% 1.66% 1.30% 0.80% 0.83% 1.46% 1.21% 1.35% 2.35% 1.37%
Total pairs 39 34 10 40 35 10 18 16 10 24
Profitable pairs (%) 82% 82% 100% 80% 83% 90% 78% 81% 100% 86%
Total trades 161 147 68 87 78 30 24 22 17 70
Profitable trades 140 128 67 72 66 27 21 20 17 62
Unprofitable trades 21 19 1 15 12 3 3 2 0 8

Fig. 10. Pairs identified in Jan 2009-Dec 2017.

We may verify that all the implemented models are capable of not all relevant information about the next event is conveyed by
outperforming the naive implementation during the validation a few recent events. We suspect this is the case in this work. At last,
period. Curiously, we note that the LSTM-based models do not we analyze the performance obtained by the integration of these
manage to surpass the ARMA model, at least with respect to the forecasting algorithms in the proposed trading model scheme.
chosen metrics. Also, the results obtained in the test set indicate The best configuration is chosen for each model (emphasized in
signs of overfitting in spite of the efforts taken in that regard, as bold). Based on the validation records, the quintile-based thresh-
the LSTM-based models are no longer superior to the naive perfor- olds are used with ARMA and the decile-based with the LSTMs.
mance. The incapability of the LSTM-based models to outperform The test results in these conditions are illustrated in Table 5.
the simpler approaches is in accordance with the findings of The results indicate that if robustness is evaluated by the num-
Gers, Eck, and Schmidhuber (2002), who assert that time-series ber of days in which the portfolio value does not decline (accentu-
problems found in the literature are often conceptually simpler ated in Table 5), then the proposed trading model does provide an
than many tasks already solved by LSTMs and more often than improvement. The forecasting-based models display a total of 2
12 S.M. Sarmento, N. Horta / Expert Systems with Applications 158 (2020) 113490

Table 4
Forecasting results comparison.

Model Parameters Time-step Validation Test


MSE RMSE MAE MSE RMSE MAE
(E-03) (E-02) (E-02) (E-03) (E-02) (E-02)
Naive Y tþ1 ¼ Y t (t + 1) 1.87 3.69 1.50 2.60 3.89 1.68
Naive Y tþ2 ¼ Y t (t + 2) 3.34 4.94 2.24 4.47 5.14 2.61
ARMA p : 5; q : 2 (t + 1) 1.511 3.006 1.781 2.271 3.343 1.967
ARMA p : 8; q : 3 (t + 1) 1.508 3.004 1.780 2.264 3.339 1.964
RMA p : 12; q : 4 (t + 1) 1.509 3.004 1.780 2.264 3.338 1.964
LSTM in : 12; hl : 1; hn : 10 (t + 1) 2.73 3.73 2.65 4.99 4.74 3.63
LSTM in : 24; hl : 1; hn : 50 (t + 1) 1.69 3.28 2.04 3.35 4.30 3.08
LSTM in : 24; hl : 1; hn : 60 (t + 1) 1.91 3.43 2.03 3.54 4.61 3.36
LSTM in : 12; en : 30; dn : 30 (t + 1) 2.03 3.60 2.18 5.72 5.75 4.31
Enc.-Dec. (t + 2) 2.43 3.94 2.51 8.45 6.91 5.52
LSTM in : 24; en : 15; dn : 15 (t + 1) 1.71 3.32 2.13 4.31 5.21 3.94
Enc.-Dec. (t + 2) 2.05 3.60 2.45 9.03 7.50 5.91
LSTM in : 24; en : 30; dn : 30 (t + 1) 1.96 3.56 2.21 6.06 5.95 4.50
Enc.-Dec. (t + 2) 2.42 3.92 2.51 8.73 7.04 5.57

Table 5
Trading results comparison using a 9-year-long formation period.

Trading Model Standard ARMA LSTM LSTM Encoder


based Model based Model Decoder based Model
Parameters *see Table 1 aS ¼ Q f  ð0:20Þ aS ¼ Q f  ð0:10Þ aS ¼ Q f  ð0:10Þ
aL ¼ Q f þ ð0:80Þ aL ¼ Q f þ ð0:90Þ aL ¼ Q f þ ð0:90Þ
SR 1.85 1.22 0.50 0.98
ROI 6.27% 5.57% 2.93% 4.17%
MDD 1.43% 0.73% 0.47% 1.19%
Days of portfolio decline 87 11 2 21
Trades (Positive–Negative) 149 (89–60) 34 (22–12) 8 (6–2) 17 (14–3)
Profitable pairs 3 3 2 2

(LSTM), 11 (ARMA) and 22 (LSTM Encoder-Decoder) days of portfo- Table 6


lio decline, in comparison with the considerably more numerous Trading results for standard trading model using a 3-year-long
formation period.
87 days obtained when using the standard threshold-based model.
This finding suggests the forecasting-based model is capable of Trading Model Standard
defining more precise entry points and hence reduce the number SR 3.41
of unprofitable days. However, that comes at the expense of a ROI 11.3%
reduction in both portfolio SR and ROI, questioning the benefits MDD 1.12%
Days of portfolio decline 89
provided by the proposed model, after all. We suspect the long Trades (Positive–Negative) 30 (26–4)
required formation period is also responsible for this profitability Profitable pairs 13
decline. Therefore we proceed to analyze the profitability of the
standard trading model when using the 3-year-long period,
presented in Table 6. At last, not having experimented the proposed frameworks with
By comparison, the performance in the 10-year-long period stocks naturally lets us wondering how the suggested frameworks
seems greatly affected by the long required duration, suggesting would have performed in such context.
the less satisfactory returns emerge not merely from the trading
model itself, but also due to the underlying time settings. Follow-
ing this line of reasoning, if the forecasting-based model’s perfor- 7. Conclusions and future work
mance increases in the same proportion as the standard trading
model when reducing the formation period, for example by incres- We explored how Pairs Trading could be enhanced with the
ing the data-frequency, the results obtained could be much more integration of Machine Learning. First, we proposed a new
satisfactory. approach to search for pairs based on the application of PCA fol-
The results presented in this section have some limitations lowed by the OPTICS algorithm. The suggested method achieved
which should be pointed out. The first one results from the obser- better risk-adjusted returns than standard ones: searching within
vation made in the previous paragraph. We find evidence that the sectors, or considering all possible pair combinations. Secondly,
10-year period required to apply the forecasting-based model we introduced a forecasting-based model aiming to reduce decline
significantly harms the performance, as it limits the pairs available. periods associated with ill-timed market positions and prolonged
A more rigorous test should consider approximately the same divergent pairs. We demonstrated that the proposed model is cap-
number of pairs considered in the remaining simulations. Concern- able of reducing the average decline period in more than 75%,
ing the implementation of the LSTM based models, from a compu- although that comes at the expense of declining profitability under
tational point of view, GRU units, Cho et al. (2014), would have the conditions studied. In addition, this work also contributes with
been slightly faster to train than the LSTM units given its simpler empirical evidence of the suitability of ETFs traded in a 5-min set-
gating system. However, they were not considered in this work. ting in the context of Pairs Trading.
S.M. Sarmento, N. Horta / Expert Systems with Applications 158 (2020) 113490 13

Distinct directions could be taken as a follow-up of this work. It Caldeira, J., & Moura, G. V. (2013). Selection of a portfolio of pairs based on
cointegration: A statistical arbitrage strategy. Available at SSRN 2196391.
may be interesting to continue exploring the proposed forecasting-
Cavalcante, R. C., Brasileiro, R. C., Souza, V. L., Nobrega, J. P., & Oliveira, A. L. (2016).
based trading model, where the following procedures could be Computational intelligence and financial markets: A survey and future
analyzed: directions. Expert Systems with Applications, 55, 194–211.
Chan, E. (2013). Algorithmic trading: Winning strategies and their rationale (vol. 625).
John Wiley & Sons.
 Add more features to predict the price variations rather than Chen, H., Chen, S., Chen, Z., & Li, F. (2017). Empirical investigation of an equity pairs
constraining the features to lagged prices. trading strategy. Management Science.
 Increase the data frequency (e.g.1-min frequency) to enable a Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H.,
and Bengio, Y. (2014). Learning phrase representations using rnn encoder-
reduction in the required formation period and consequently decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
find more pairs. It should be noted that this will be more Do, B., & Faff, R. (2010). Does simple pairs trading still work? Financial Analysts
demanding from a computational point of view. Journal, 66(4), 83–95.
Do, B., & Faff, R. (2012). Are pairs trading profits robust to trading costs? Journal of
 Attempt to train the Artificial Neural Networks in a classifica- Financial Research, 35(2), 261–287.
tion setting. Although regression is seemingly the most obvious Dunis, C. L., Giorgioni, G., Laws, J., & Rudy, J. (2010). Statistical arbitrage and high-
solution, given that the output being predicted can take any real frequency data with an application to eurostoxx 50 equities Working paper.
Liverpool: Business School.
value, classification might bring other advantages. The reason is Dunis, C. L., Laws, J., & Evans, B. (2006). Modelling and trading the gasoline crack
that the MSE loss is much harder to optimize than a more stable spread: A non-linear story. Derivatives Use, Trading & Regulation, 12(1–2),
loss such as Softmax, and it is less robust since outliers can 126–145.
Dunis, C. L., Laws, J., & Evans, B. (2009). Modelling and trading the soybean-oil crush
introduce huge gradients.
spread with recurrent and higher order networks: A comparative analysis. In
Artificial Higher Order Neural Networks for Economics and Business (pp. 348–366).
Alternatively, other aspects which could be investigated may be IGI Global.
the following: Dunis, C. L., Laws, J., Middleton, P. W., & Karathanasopoulos, A. (2015). Trading and
hedging the corn/ethanol crush spread using time-varying leverage and
nonlinear models. The European Journal of Finance, 21(4), 352–375.
 Construct a metric to rank pairs in the portfolio, instead of Engle, R. F., & Granger, C. W. (1987). Co-integration and error correction:
applying an equal weighting scheme. For example, based on a representation, estimation, and testing. Econometrica: Journal of the
Econometric Society, 251–276.
similarity measure between the securities’ features resulting Ester, M., Kriegel, H. -P., Sander, J., &and Xu, X. (1996). A density-based algorithm for
from the application of PCA, or a more empirical heuristic, such discovering clusters in large spatial databases with noise.
as the relative performance in the validation period. ETF.com (2019). Find the Right ETF - Tools, Ratings, News. https://www.etf.com/
etfanalytics/etf-finder. Accessed: 2019-06-30.
 Explore the market conditions in which the proposed strategy Gatev, E., Goetzmann, W. N., & Rouwenhorst, K. G. (2006). Pairs trading:
generates more returns, and investigate whether the investor Performance of a relative-value arbitrage rule. The Review of Financial Studies,
can benefit from predicting such scenarios in advance. 19(3), 797–827.
Gers, F. A., Eck, D., & Schmidhuber, J. (2002). Applying lstm to time series predictable
 Combine commodity-linked ETFs and other security types (e.g. through time-window approaches. Neural Nets WIRN Vietri-01. Springer.
futures) in the same pair and see if the investor can benefit from Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep
the additional expected volatility. feedforward neural networks. In Proceedings of the thirteenth international
conference on artificial intelligence and statistics (pp. 249–256).
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural
CRediT authorship contribution statement Computation, 9(8), 1735–1780.
Huck, N. (2009). Pairs selection and outranking: An application to the s&p 100
Simão Moraes Sarmento: Conceptualization, Methodology, index. European Journal of Operational Research, 196(2), 819–825.
Huck, N. (2010). Pairs trading and outranking: The multi-step-ahead forecasting
Software, Validation, Investigation, Writing - original draft, Writing case. European Journal of Operational Research, 207(3), 1702–1716.
- review & editing. Nuno Horta: Conceptualization, Resources, Huck, N., & Afawubo, K. (2015). Pairs trading and selection methods: Is
Writing - review & editing, Supervision. cointegration superior? Applied Economics, 47(6), 599–613.
Kim, T. & Kim, H.Y. (2019). Optimizing the pairs-trading strategy using deep
reinforcement learning with trading and stop-loss boundaries. Complexity,
Declaration of Competing Interest 2019.
Kleinow, T. (2002). Testing continuous time models in financial markets.
Krauss, C. (2017). Statistical arbitrage pairs trading strategies: Review and outlook.
The authors declare that they have no known competing finan- Journal of Economic Surveys, 31(2), 513–545.
cial interests or personal relationships that could have appeared Krauss, C., Do, X. A., & Huck, N. (2017). Deep neural networks, gradient-boosted
to influence the work reported in this paper. trees, random forests: Statistical arbitrage on the s&p 500. European Journal of
Operational Research, 259(2), 689–702.
Lo, A. W. (2002). The statistics of sharpe ratios. Financial Analysts Journal, 58(4),
Acknowledgement 36–52.
Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-sne. Journal of machine
learning research, 9(Nov), 2579–2605.
This research did not receive any specific grant from funding
Perlin, M. (2007). M of a kind: A multivariate approach at pairs trading.
agencies in the public, commercial, or not-for-profit sectors. Rad, H., Low, R. K. Y., & Faff, R. (2016). The profitability of pairs trading strategies:
Distance, cointegration and copula methods. Quantitative Finance, 16(10),
References 1541–1558.
Si, Y.-W., & Yin, J. (2013). Obst-based segmentation approach to financial time
series. Engineering Applications of Artificial Intelligence, 26(10), 2581–2596.
Ankerst, M., Breunig, M. M., Kriegel, H.-P., & Sander, J. (1999). Optics: ordering Sutskever, I., Vinyals, O., & Le, Q.V. (2014). Sequence to sequence learning with
points to identify the clustering structure. ACM Sigmod record (vol. 28, neural networks. CoRR, abs/1409.3215.
pp. 49–60). ACM. Thomaidis, N. S., Kondakis, N., & Dounias, G. (2006). An intelligent statistical
Berkhin, P. (2006). A survey of clustering data mining techniques. Grouping arbitrage trading system. SETN. .
multidimensional data (pp. 25–71). Springer. Vidyamurthy, G. (2004). Pairs Trading: quantitative methods and analysis, (vol. 217).
Board of Governors of the Federal Reserve System (US) (2019). 3-Month Treasury John Wiley & Sons.
Bill: Secondary Market Rate. https://fred.stlouisfed.org/series/TB3MS. Accessed:
2019-07-11.

You might also like