0% found this document useful (0 votes)

21 views56 pages

Slides Deep Learning Statistical Arbitrage

Uploaded by

spatitanium525

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views56 pages

Slides Deep Learning Statistical Arbitrage

Uploaded by

spatitanium525

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

Deep Learning Statistical Arbitrage

Jorge Guijarro-Ordonez, Markus Pelger, and Greg Zanotti

Stanford University

1
Motivation

Intuition: Pairs trading (simplest statistical arbitrage)

• Identify two “similar” stocks: e.g. GM and Ford
• Assumption: prices are on average similar
• Exploit temporal price differences between similar assets

Prices of Similar Stocks Differences between Prices

0.2 0.2
General Motors
Ford
0 0.1
Price

Price
-0.2 0

-0.4 -0.1

-0.6 -0.2
Jul 2011 Jan 2012 Jul 2012 Jul 2011 Jan 2012 Jul 2012

Three components of statistical arbitrage:

1. Construct long-short portfolio identifying mispricing: t = RtGM − RtFord
2. Extract trading signals by statistically modeling t
3. Find optimal trading policy given signals: max E [payoffT ]
2
Fundamental Problem

Key elements of statistical arbitrage:

1. Arbitrage portfolios: How to generate long-short portfolios of similar assets?
2. Arbitrage signal: What are time-series patterns for temporary price deviations?
3. Arbitrage allocation: How to trade given the arbitrage signal?

Challenges:
1. Large number of assets with unknown similarities
2. Complex time-series patterns in price deviations
3. Optimal trading rules are complicated and depend on trading objective

Can machine learning help?

• Machine learning methods very flexible and deal with big data, but ...
• Important to set up the estimation problem correctly: Not a prediction problem!
• We use a trading objective function on residuals of asset pricing models

Key questions:
1. What is the “best solution” for the three key elements?
2. What matters for statistical arbitrage?
3. How much realistic arbitrage is in the market?

3
Contribution: Methodology

Our novel method: Deep learning statistical arbitrage

1. Statistical factor model including characteristics to get arbitrage portfolios
2. Convolutional neural network + Transformer to extract arbitrage signal:
Flexible data driven time-series filter to learn complex time-series patterns
3. Neural network to map signals into allocations:
Generalization of conventional “optimal stopping rules” for investment.
⇒ We integrate and optimize them for global economic objective:
Maximize risk-adjusted return under constraints.
⇒ Most advanced AI for NLP for time-series pattern detection

Novel conceptual framework:

• Provide unified framework to compare different statistical arbitrage methods:
(1) portfolio generation, (2) signal extraction, (3) allocation decision
• Study each component and compare with conventional models
• Unifying time-series filter perspective for arbitrage signal

4
Contribution: Empirical

Comprehensive out-of-sample study on U.S. equities

• Daily returns for 19 years of 500 largest liquid stocks
• Consider most important risk factor models
• Comparisons include parametric and non-parametric mean-reversion models
Excellent out-of-sample performance:
• Empirically substantially outperforms all benchmark approaches out-of-sample
• Our arbitrage strategies achieve annual Sharpe ratios 4
• Annual returns of around 20% with less than 6% volatility
• Uncorrelated with conventional risk factors and market movements
• Survives realistic transaction and holding costs
• Stable over time and robust to tuning parameters
What matters for arbitrage trading?
• Robust to risk factors to identify similar assets
• Most important is time-series signal; flexible allocation model insufficient
• 4x better than parametric models, 2x better than non-parametric
• Global objective: extract time-series model for trading
Insight into the structure of arbitrage trading:
• “Smooth” trend and mean-reversion patterns
• Asymmetric policies: fast reaction on downtrends, cautious trading on uptrends 5
Literature (partial list)

Classical approaches to statistical arbitrage (parametric models)

• PCA + mean-reversion: Avellaneda and Lee (2010), Yeo and Papanicolaou
(2017)
• Cointegration: Rad, Low and Faff (2016), Vidyamurthy (2004)
• Stochastic control: Cartea and Jaimungal (2016), Leung and Li (2015)
• Simple pairs trading: Gatev, Goetzmann and Rouwenhorst (2006)
• Intractable parametric models with ML: Mulvey, Sun, Wang, and Ye (2020)

Machine learning for asset pricing (explain risk premium not arbitrage)
• Pricing kernel: Chen, Pelger, Zhu (2019), Bryzgalova, Pelger, and Zhu (2019)
• Return prediction: Gu, Kelly and Xiu (2020),
• Factor models: Lettau and Pelger (2020), Kelly, Pruitt and Su (2019)

Machine learning for time-series (no trading objective)

• Time-series prediction: Lim and Zohren (2020), Krauss, Doa, and Huck (2017).

6
Model
Arbitrage portfolios

Excess returns of stocks follow a conditional factor model:

>
Rn,t = βn,t−1 Ft + n,t t = 1, ..., T and n = 1, ..., Nt

• K factors Ft capture systematic risk.

• Loadings βt−1 ∈ RN ×K are general function of information at time t − 1.
t

Factor models identify similar assets by similar exposures to risk factors

• Define arbitrage portfolio as residual portfolios:

>
n,t = Rn,t − βn,t−1 Ft

• Arbitrage portfolios are only weakly cross-sectionally dependent.

• Arbitrage Pricing Theory implies E[n,t ] = 0.
>
• βn,t−1 Ft is “fair price” of Rn,t and n,t captures temporary mispricing

7
Arbitrage portfolios

Residuals with the empirically most important families of factor models:

1. Observed fundamental factors: Fama-French factors.

2. Statistical factors that explain correlations: PCA factors.
3. Conditional statistical factors where loadings are functions of firm
characteristics: Instrumented PCA factors (Kelly, Pruitt and Su (2019)).

Factors are projections on returns without loss of generality:

F >
Ft = wt−1 Rt .

R
Residuals are traded portfolios for factor implied matrix Φt−1 ∈ Nt ×Nt :

T T F T F
t = Rt − βt−1 Ft = Rt − βt−1 wt−1 Rt = INt − βt−1 wt−1 Rt .
| {z }
Φt−1

⇒ Arbitrage portfolios are traded, factor-neutral, weakly correlated and

mean-reverting portfolios of all stocks.

8
Arbitrage Signal and Allocation

Arbitrage trading
has 2 steps given a cumulative residual
P2 PL
x := Lt := n,t−L l=1 n,t−L−1+l ··· l=1 n,t−L−1+l

1. The arbitrage signal function

θ ∈ Θ : Ln,t−1 7→ θn,t−1
models the time series structure using last L cumulative residuals and
estimates a sufficient statistic for trading.
2. The arbitrage allocation function
w ∈ W : θn,t−1 7→ wn,t−1

.
assigns investment weights on residuals using only the estimated signal. 9
Estimation Problem

Estimation: For a given class of models maximize risk-adjusted return:

E
h i
R >
wt−1 Rt
max q
w ∈W ,θ∈Θ R >
Var(wt−1 Rt )
>
R wt−1 Φt−1
s.t. wt−1 = >
and wt−1 = w (θ(Lt−1 )).
kwt−1 Φt−1 k1

• Main objective: Sharpe ratio, but we also consider mean-variance objective

• Extension includes trading costs
R
• Stock weights wt−1 add up to 1 ⇒ implicit leverage constraint
• Many models have separate objective for signal estimation
We consider 3 key model classes for signal θ and allocation w :
1. Parametric model: mean-reversion model with thresholding rule
2. Pre-specified time-series filters and non-parametric allocation
3. Deep-learning arbitrage: data-driven time-series filter and allocation
⇒ We show what are the key elements for profitable arbitrage
10
First class: Parametric models

Classical mean reversion trading: (Avellaneda and Lee (2010))

• Each residual is modeled as an Ornstein-Uhlenbeck (OU) process

dXt = κ(µ − Xt )dt + σdBt

−µ
Xt√
• The allocation is a threshold rule on the ratio σ/ 2κ
.
• In our framework, this corresponds to
 xL −µ̂
 −1, if > cthres
√
σ̂/ 2κ̂

OU X OU xL −µ̂
θ (x) = (κ̂, µ̂, σ̂, xL ), w θ = 1 if √
σ̂/ 2κ̂
< −cthres


0 otherwise

where cthres is chosen optimally.

Limitations: Parametric model might be misspecified (eg. trends, multiple

mean reversion frequencies, etc.), restrictive allocation function.

11
Second class: Pre-specified filter with neural network

Signal θ: General time-series model

R
• Pre-specified linear filter θl = Lj=1 Wjfilter xj (given matrix W filter ∈ L×L )
P

• Includes ARMA models, discretized OU, etc.

• Frequency filters are the most relevant filters for mean reversion patterns:
• We use Fast Fourier Transform (FFT) for a frequency decomposition:
L/2−1
X 2πj 2πj
xl = a0 + aj · cos l + bj · sin l + aL/2 cos (πl) .
j=1
L L

• Signal are the “loadings” on long and short-term reversal patterns:

θ FFT (x) = (a0 , . . . , aL/2 , b1 , . . . bL/2−1 )

Allocation w : Flexible non-parameteric function with regularization

• g FFN is estimated with feedforward neural network (FFN)

w |FFT θFFT = g FFN θFFT .

Limitation: Choice of pre-specified filter limits the time-series patterns.

12
Third class: Convolutional Network with Transformer

Our novel model: Data driven time-series filter based on most advanced deep
learning tools for pattern detection

• Convolutional neural networks (CNN) are data-driven non-linear local

filters
• Transformers learn global dependency patterns between local filters
• CNN+Transformer is a flexible non-linear filter that can learn any
time-series pattern
• Examples of global “pattern factors”
• Mean-reversion: cyclical combination of local curvature patterns
• Trend: Monotonic combination of local drifts
• Signal θ CNN+Trans (x) is the “exposure” to pattern factors
• Allocation function w is a flexible FFN:

w |CNN+Trans θCNN+Trans = g FFN θCNN+Trans .

• Joint estimation of signal and allocation function with trading objective

13
Convolutional Network Intuition

• The network applies to residual time series Lt a combination of local

estimated linear filters W local followed by non-linear transformations:
Dsize
(0)
X
yl = Wmlocal x
m=1
⇒ Represent time-series x ∈ RL in terms of D local patterns x̃ ∈ RL×D

(a) Upward trend (b) Downward trend (c) Up reversal (d) Down reversal 14
Transformer Network Intuition

Transformer captures temporal dependencies between local patterns

L
X
hi = αi,l x̃l with αi,l = αi (x̃L , x̃l ) for l = 1, ..., L and i = 1, .., H
l=1

• H global patterns specified by “attention weights” αi ∈ L . R

• Attention heads hi are “loadings” for a specific “pattern factor” αi
• Transformer estimate flexible attention weight functions
15
Empirical Analysis
Data

Out-of-sample analysis on U.S. equity data:

• 19 years of large cap U.S. daily stock returns from Jan 1998 to Dec 2016
• Only stocks with prior month market cap > 0.01% of total market cap
⇒ ∼ 550 large cap stocks/month ≈ S&P 500
Most liquid stocks to avoid trading frictions
• For IPCA, supplement with 46 monthly firm characteristics for each stock
and month (starting in 1978).

Implementation:
• All results are out-of-sample
• We use L = 30 days lookback windows of returns as input for signal.
• We retrain functions every half year using rolling windows of 4 years.
• Factors models are estimated OOS daily on rolling window of 60 days
• Main analysis with Sharpe ratio objective

16
Arbitrage Portfolios

Residuals with the empirically most important families of factor models:

1. Fama-French factors for 1, 3, 5, 8 factors.
market, size, value, investment, profitability, momentum, short-term and
long-term reversal
2. PCA factors for 1, 3, 5, 8, 10, 15 factors.
3. IPCA model of Kelly, Pruitt, and Su (2019), for 1, 3, 5, 8, 10, 15 factors.
4. “0-factor model”: original stocks instead of residuals.

Given the residuals, we estimate arbitrage signals and allocations for

1. Ornstein-Uhlenbeck estimation with threshold rule (OU+Thres).
2. Fast Fourier Transform with feedforward network (Fourier+FFN).
3. Convolutional network with transformer (CNN+Trans).

and, for completeness,

4. OU estimation with feedforward network (OU+FFN).
5. Just a feedforward network without time-series filter (FFN)

17
OOS Annualized Performance

Factors Fama-French PCA IPCA

Model K SR µ σ SR µ σ SR µ σ
CNN+ 0 1.64 13.7% 8.4% 1.64 13.7% 8.4% 1.64 13.7% 8.4%
Trans 5 3.21 4.6% 1.4% 3.36 14.3% 4.2% 4.16 8.7% 2.1%

FFT+ 0 0.36 4.9% 13.6% 0.36 4.9% 13.6% 0.36 4.9% 13.6%
FFN 5 1.66 3.1% 1.8% 1.98 12.4% 6.3% 1.90 7.7% 4.1%
OU+ 0 -0.18 -2.4% 13.3% -0.18 -2.4% 13.3% -0.18 -2.4% 13.3%
Thres 5 0.38 0.9% 2.3% 0.73 4.4% 6.1% 0.97 3.8% 4.0%

• Arbitrage trading has to be applied to residuals and not returns

• Results do not substantially improve after regressing out 5 factors
• CNN+Transformer strongly dominates all models
• Average return µ is high in spite of leverage constraint
• Arbitrage trading qualitatively robust to choice of factor model
• Fourier+FFN only 50% of CNN+Trans ⇒ flexible time-series filter crucial!
• Conventional OU+Thres only 25% of CNN+Trans
⇒ Too restrictive model!
18
OOS Annualized Performance

Factors Fama-French PCA IPCA

Model K SR µ σ SR µ σ SR µ σ

0 1.64 13.7% 8.4% 1.64 13.7% 8.4% 1.64 13.7% 8.4%

CNN 1 3.68 7.2% 2.0% 2.74 15.2% 5.5% 3.22 8.7% 2.7%
+ 3 3.13 5.5% 1.8% 3.56 16.0% 4.5% 3.93 8.6% 2.2%
Trans 5 3.21 4.6% 1.4% 3.36 14.3% 4.2% 4.16 8.7% 2.1%
8 2.49 3.4% 1.4% 3.02 12.2% 4.0% 3.95 8.2% 2.1%
10 - - - 2.81 10.7% 3.8% 3.97 8.0% 2.0%
15 - - - 2.30 7.6% 3.3% 4.17 8.4% 2.0%

0 0.36 4.9% 13.6% 0.36 4.9% 13.6% 0.36 4.9% 13.6%

1 0.89 3.2% 3.5% 0.80 8.4% 10.6% 1.24 6.3% 5.0%
Fourier 3 1.32 3.5% 2.7% 1.66 11.2% 6.7% 1.77 7.8% 4.4%
+ 5 1.66 3.1% 1.8% 1.98 12.4% 6.3% 1.90 7.7% 4.1%
FFN 8 1.90 3.1% 1.6% 1.95 10.1% 5.2% 1.94 7.8% 4.0%
10 - - - 1.71 8.2% 4.8% 1.93 7.6% 3.9%
15 - - - 1.14 4.8% 4.2% 2.06 7.9% 3.8%

0 -0.18 -2.4% 13.3% -0.18 -2.4% 13.3% -0.18 -2.4% 13.3%

1 0.16 0.6% 3.8% 0.21 2.1% 10.4% 0.60 3.0% 5.1%
OU 3 0.54 1.6% 3.0% 0.77 5.2% 6.8% 0.88 3.8% 4.3%
+ 5 0.38 0.9% 2.3% 0.73 4.4% 6.1% 0.97 3.8% 4.0%
Thresh 8 1.16 2.8% 2.4% 0.87 4.4% 5.1% 0.91 3.5% 3.8%
10 - - - 0.63 2.9% 4.6% 0.86 3.1% 3.6%
15 - - - 0.62 2.4% 3.8% 0.93 3.2% 3.5% 19
Cumulative OOS Returns of Different Arbitrage Strategies

(a) CNN+Trans, Fama-French 5 (b) CNN+Trans, PCA 5 (c) CNN+Trans, IPCA 5

(d) FFT+FFN, Fama-French 5 (e) FFT+FFN, PCA 5 (f) FFT+FFN, IPCA 5

20
(g) OU+Thresh Fama-French 5 (h) OU+Thresh PCA 5 (i) OU+Thresh IPCA 5
Significance of Arbitrage Alphas

CNN+Trans model
Fama-French PCA IPCA
K 0 5 0 5 0 5

α 11.6% 4.5% 11.6% 14.1% 11.6% 8.3%

µ 13.7% 4.6% 13.7% 14.3% 13.7% 8.7%

tα 6.4∗∗∗ 12∗∗∗ 6.4∗∗∗ 13∗∗∗ 6.4∗∗∗ 16∗∗∗

tµ 6.3∗∗∗ 12∗∗∗ 6.3∗∗∗ 13∗∗∗ 6.3∗∗∗ 16∗∗∗
R2 30.3% 2.3% 30.3% 1.3% 30.3% 3.9%

• Time-series regression of 8 asset pricing factors:

Fama-French 5 + momentum, short-term and long-term reversal factors
Pricing errors α, t-statistics tα and R 2 of regression
• Mean µ and corresponding t-statistics tµ of arbitrage stratgies
• CNN+Transformer arbitrage is statistically significant and not subsumed
by conventional risk factors
• Arbitrage strategies orthogonal to conventional risk factors
• Conventional mean-reversion trading explained by conventional risk factors.

21
Significance of Arbitrage Alphas

CNN+Trans model
Fama-French PCA IPCA
K α tα R2 µ tµ α tα R2 µ tµ α tα R2 µ tµ
0 11.6% 6.4∗∗∗ 30.3% 13.7% 6.3∗∗∗ 11.6% 6.4∗∗∗ 30.3% 13.7% 6.3∗∗∗ 11.6% 6.4∗∗∗ 30.3% 13.7% 6.3∗∗∗
1 7.0% 14∗∗∗ 2.4% 7.2% 14∗∗∗ 14.9% 10∗∗∗ 0.6% 15.2% 11∗∗∗ 8.1% 12∗∗∗ 9.5% 8.7% 12∗∗∗
3 5.5% 12∗∗∗ 1.2% 5.5% 12∗∗∗ 15.8% 14∗∗∗ 1.7% 16.0% 14∗∗∗ 8.2% 15∗∗∗ 6.0% 8.6% 15∗∗∗
5 4.5% 12∗∗∗ 2.3% 4.6% 12∗∗∗ 14.1% 13∗∗∗ 1.3% 14.3% 13∗∗∗ 8.3% 16∗∗∗ 3.9% 8.7% 16∗∗∗
8 3.3% 9.4∗∗∗ 2.1% 3.4% 9.6∗∗∗ 12.0% 12∗∗∗ 0.9% 12.2% 12∗∗∗ 7.8% 15∗∗∗ 5.0% 8.2% 15∗∗∗
10 - - - - - 10.5% 11∗∗∗ 0.7% 10.7% 11∗∗∗ 7.7% 15∗∗∗ 4.0% 8.0% 15∗∗∗
15 - - - - - 7.5% 8.8∗∗∗ 0.5% 7.6% 8.9∗∗∗ 8.1% 16∗∗∗ 4.2% 8.4% 16∗∗∗

• Time-series regression of 8 asset pricing factors:

21
Mean-Variance Objective

CNN+Trans model, mean-variance objective function

Fama-French PCA IPCA
K SR µ σ SR µ σ SR µ σ
0 0.83 9.5% 11.4% 0.83 9.5% 11.4% 0.83 9.5% 11.4%
1 3.15 10.5% 3.3% 2.21 27.3% 12.3% 2.83 15.9% 5.6%
3 2.95 7.8% 2.6% 2.38 22.6% 9.5% 3.13 17.9% 5.7%
5 3.03 5.9% 2.0% 2.75 19.6% 7.1% 3.21 18.2% 5.7%
8 2.96 4.2% 1.4% 2.68 16.6% 6.2% 3.18 17.0% 5.4%
10 - - - 2.67 15.3% 5.7% 3.21 16.6% 5.2%
15 - - - 2.20 8.7% 4.0% 3.34 16.3% 4.9%

Alternative mean-variance objective function:

max
w ∈W ,θ∈Θ
E[wt−1
R > R >
Rt ] − γVar(wt−1 Rt )
>
R wt−1 Φt−1
s.t. wt−1 = >
and wt−1 = w (θ(Lt−1 )).
kwt−1 Φt−1 k1

R
• Increase mean return while maintaining leverage constraint of kwt−1 k=1
• Here we set risk aversion to γ = 1
• Annual returns up to 20% while volatility is only half of market.
• Slightly lower Sharpe ratios
22
Importance of Time-Series Signal

Factors Fama-French PCA IPCA

Model K SR µ σ SR µ σ SR µ σ

0 0.57 8.8% 15.3% 0.57 8.8% 15.3% 0.57 8.8% 15.3%

1 0.60 2.0% 3.3% 0.53 6.2% 11.7% 1.07 6.5% 6.1%
3 1.02 2.6% 2.6% 1.15 8.2% 7.2% 1.50 7.6% 5.0%
FFN 5 1.32 2.3% 1.7% 1.42 9.8% 6.9% 1.55 7.3% 4.7%
8 1.31 2.1% 1.6% 0.84 5.1% 6.1% 1.56 7.2% 4.6%
10 - - - 0.70 3.5% 5.0% 1.48 7.0% 4.7%
15 - - - 0.51 2.4% 4.8% 1.68 7.5% 4.5%

Is the a time-series signal function actually needed?

• Apply flexible FFN directly to residuals without time-series model

• Results are substantially worse than Fourier+FFN
• FFN is not efficient enough to learn complex dependencies with limited
data

23
Additional Results

Stability over time:

• Results are robust to length of local rolling window
Essentially identical results for L = 60 rolling lockback window
• Constant signal and allocation function capture most arbitrage information
30% decrease of performance for constant model (Ttrain = 4 or 8 years)
Constant CNN+Transformer still substantially outperforms re-estimated
benchmark models
Robustness to tuning parameters:
• Results very robust to all tuning parameters
• General structure of the problem important, but not number of layers, etc.
Dependency between strategies:
• Between different factor models only weakly correlated (0.2 to 0.45)
• Within factor family model high correlation (0.4 to 0.85)
Unconditional means without allocation:
• Equally weighted residuals have mean returns < 1%
• Need to apply signal and trading policy to residuals for profitable trading
24
Trading Frictions and Transaction Costs

IPCA factor model

Sharpe ratio Mean-variance
K SR µ σ SR µ σ
0 0.52 8.5% 16.3% 0.22 2.6% 11.9%
1 0.85 5.9% 6.9% 0.86 5.5% 6.4%
3 1.24 6.6% 5.4% 1.16 6.9% 5.9%
5 1.11 5.5% 5.0% 1.02 5.3% 5.3%
10 0.98 5.1% 5.2% 1.04 5.4% 5.2%
15 0.94 4.8% 5.1% 1.02 5.1% 5.0%

• Include trading costs for high turnover and large short-selling positions:
R R R R R
cost(wt−1 , wt−2 ) = 0.0005kwt−1 − wt−2 kL1 + 0.0001k min(wt−1 , 0)kL1
5 basis points per transaction and 1 basis point per short position
• No market impact as we only trade in the largest most liquid stocks
• Lower bound on profitability: less turnover with sparse factors, etc.
⇒ Arbitrage trading retains economic significance in presence of trading costs

25
Turnover and Short Selling

Turnover with and without trading friction objective:

(a) No Trading Friction Objective (b) With Trading Friction Objective

Proportion of short allocation weights:

(a) No Trading Friction Objective (b) With Trading Friction Objective

⇒ The effect of trading frictions is time-varying and our model can exploit particularly
profitable arbitrage time periods by increasing trading and short positions. 26
Estimated Structure: Dissecting the
CNN+Transformer Model with IPCA-5
Examples of Allocation and Returns of CNN+Transformer Strategy

(a) Example 1: Mean-reversion

(b) Example 2: Trend

Sample of representative residuals with out-of-sample arbitrage trading

• Positive allocations for positive changes and vice-a-versa
• CNN detects global and local trend and reversion patterns 27
CNN: Local Basic Patterns

(a) Basic pattern 1 (b) Basic pattern 2 (c) Basic pattern 3 (d) Basic pattern 4

(e) Basic pattern 5 (f) Basic pattern 6 (g) Basic pattern 7 (h) Basic pattern 8

Local filters estimated by CNN to capture relative local patterns

• Basic patterns are “building blocks” for the global patterns
• Visualizations of non-linear 3-dimensional local filters into orthogonal
two-dimensional local linear filters
• Sufficient to construct any smooth trend and mean-reversion patterns.
28
Example Attention Weights for Sinusoidal Residual Inputs

l
(a) Input residual and attention head weights for xl = sin 2π 30

(b) Input residual and attention head weights for for xl = sin 2π l+15
30

• Attention head weight 4: negative reversal factor

• Attention head weight 3: early reversal factor
• Attention head weight 1: low-frequency downturn factor 29
CNN+Transformer Model Structure for Representative Residual

(a) Cumulative residual (b) Attention weights per head (c) Average attention weights

(d) 1st CNN activation (e) 2nd CNN activation (f) 3rd CNN activation (g) 4th CNN activation

(h) 5th CNN activation (i) 6th CNN activation (j) 7th CNN activation (k) 8th CNN activation

30
CNN+Transformer Model Structure for Representative Residual Over Time

(a) Cumulative residuals (b) Average attention weights (c) Allocation weights

(d) Attention weights for (e) Attention weights for (f) Attention weights for (g) Attention weights for
head 1 head 2 head 3 head 4

• Attention head weights 4 highest for down-times in 2009, 2014, middle 2016.
Focuses uniformly on last 10 days in 30-day window
• Attention head weights 3 highest for up-patterns in 2007, 2010, 2012.
Focuses uniformly on first 20 days in 30-day window
• Asymmetric response of Transformer:
act swiftly during downtrends, stay cautious during uptrends 31
Variable Importance for Allocation Weight

(a) Importance of Local Basic Patterns (b) Importance of Residual Days

• Measure importance with average absolute gradient of allocation weight

• Most important basic patterns are trends or local curvature.
Flat basic pattern 2 is negligible.
• All previous days matter, but on average the most recent 14 days get more
attention for trading decisions.

32
Conclusion
Conclusion

Methodology:
• Unifying conceptual framework to compare different approaches:
(1) portfolio generation, (2) signal extraction, (3) allocation decision
• Novel deep learning statistical arbitrage:
1. Conditional latent factors to generate arbitrage portfolios
2. CNN+Transformer signal: global dependency pattern with local filters
3. FFN allocation and global trading objective for estimation

Empirical results:
• Comprehensive out-of-sample study on U.S. equities
• CNN+Transformer substantially outperforms benchmark approaches
• Unspanned by conventional risk factors
• Survives realistic transaction and holding costs
• Insights into trading policies: asymmetric trend and reversion patterns
• Trading signal extraction is the most challenging and separating element

33
Appendix
Firm specific characteristics

Past Returns Investment Profitability Intangibles Value Trading Frictions

Momentum Investment Operating profitability Accrual Book to Market Ratio Size
Short-term Reversal Net operating assets Profitability Operating accruals Assets to market cap Turnover
Long-term Reversal Change in prop. to assets Sales over assets Operating leverage Cash to assets Idiosyncratic Volatility
Return 2-1 Net Share Issues Capital turnover Price to cost margin Cash flow to book value CAPM Beta
Return 12-2 Fixed costs to sales Cashflow to price Residual Variance
Return 36-13 Profit margin Dividend to price Total assets
Return on net assets Earnings to price Market Beta
Return on assets Tobin’s Q Close to High
Return on equity Sales to price Spread
Expenses to sales Leverage Unexplained Volume
Capital intensity Variance

46 firm-specific monthly characteristics sorted into six categories.

34
Significance of Arbitrage Alphas

Fourier+FFN model
Fama-French PCA IPCA
K α tα R2 µ tµ α tα R2 µ tµ α tα R2 µ tµ
0 2.7% 0.8 8.6% 4.9% 1.4 2.7% 0.8 8.6% 4.9% 1.4 2.7% 0.8 8.6% 4.9% 1.4
1 3.0% 3.3∗∗ 3.3% 3.2% 3.5∗∗∗ 7.4% 2.7∗∗ 3.3% 8.4% 3.1∗∗ 4.8% 4.0∗∗∗ 16.4% 6.3% 4.8∗∗∗
3 3.2% 4.7∗∗∗ 4.2% 3.5% 5.1∗∗∗ 10.9% 6.3∗∗∗ 2.2% 11.2% 6.4∗∗∗ 6.8% 6.4∗∗∗ 13.0% 7.8% 6.9∗∗∗
5 2.9% 6.1∗∗∗ 3.5% 3.1% 6.4∗∗∗ 12.1% 7.5∗∗∗ 1.5% 12.4% 7.6∗∗∗ 6.7% 6.9∗∗∗ 13.3% 7.7% 7.4∗∗∗
8 3.0% 7.2∗∗∗ 3.2% 3.1% 7.4∗∗∗ 10.0% 7.5∗∗∗ 0.9% 10.1% 7.6∗∗∗ 6.8% 7.0∗∗∗ 13.3% 7.8% 7.5∗∗∗
10 - - - - - 8.0% 6.5∗∗∗ 1.0% 8.2% 6.6∗∗∗ 6.8% 7.1∗∗∗ 12.7% 7.6% 7.5∗∗∗
15 - - - - - 4.7% 4.3∗∗∗ 0.4% 4.8% 4.4∗∗∗ 7.1% 7.6∗∗∗ 12.2% 7.9% 8.0∗∗∗

OU+Thresh model
Fama-French PCA IPCA
K α tα R2 µ tµ α tα R2 µ tµ α tα R2 µ tµ
0 -4.5% -1.4 13.4% -2.4% -0.7 -4.5% -1.4 13.4% -2.4% -0.7 -4.5% -1.4 13.4% -2.4% -0.7
1 -0.2% -0.2 13.5% 0.6% 0.6 0.7% 0.3 6.3% 2.1% 0.8 1.7% 1.4 18.9% 3.0% 2.3∗
3 0.9% 1.2 10.4% 1.6% 2.1∗ 4.3% 2.5∗ 4.3% 5.2% 3.0∗∗ 2.6% 2.6∗∗ 18.8% 3.8% 3.4∗∗∗
5 0.5% 0.9 6.8% 0.9% 1.5 3.7% 2.4∗ 3.2% 4.4% 2.8∗∗ 2.8% 3.0∗∗ 17.7% 3.8% 3.8∗∗∗
8 0.6% 1.2 5.5% 1.0% 1.9 3.9% 3.0∗∗ 1.9% 4.4% 3.4∗∗∗ 2.3% 2.6∗∗ 17.6% 3.5% 3.6∗∗∗
10 - - - - - 2.6% 2.2∗ 1.4% 2.9% 2.4∗ 2.1% 2.5∗ 17.6% 3.1% 3.3∗∗∗
15 - - - - - 2.1% 2.1∗ 0.7% 2.4% 2.4∗ 2.3% 2.8∗∗ 18.1% 3.2% 3.6∗∗∗ 35
Significance of Arbitrage Alphas with Mean-Variance Objective

CNN+Trans model
Fama-French PCA IPCA
K α tα R2 µ tµ α tα R2 µ tµ α tα R2 µ tµ
0 5.8% 2.2∗ 19.6% 9.5% 3.2∗∗ 5.8% 2.2∗ 19.6% 9.5% 3.2∗∗ 5.8% 2.2∗ 19.6% 9.5% 3.2∗∗
1 9.9% 12∗∗∗ 7.1% 10.5% 12∗∗∗ 26.3% 8.3∗∗∗ 1.6% 27.3% 8.6∗∗∗ 14.0% 11∗∗∗ 23.5% 15.9% 11∗∗∗
3 7.5% 11∗∗∗ 5.3% 7.8% 11∗∗∗ 22.1% 9.1∗∗∗ 2.2% 22.6% 9.2∗∗∗ 16.6% 12∗∗∗ 17.6% 17.9% 12∗∗∗
5 5.7% 11∗∗∗ 5.3% 5.9% 12∗∗∗ 19.0% 10∗∗∗ 3.2% 19.6% 11∗∗∗ 16.7% 12∗∗∗ 16.0% 18.2% 12∗∗∗
8 4.4% 9.8∗∗∗ 3.6% 4.6% 10∗∗∗ 16.3% 10∗∗∗ 1.6% 16.6% 10∗∗∗ 15.5% 12∗∗∗ 18.3% 17.0% 12∗∗∗
10 - - - - - 14.8% 10∗∗∗ 1.7% 15.3% 10∗∗∗ 15.2% 13∗∗∗ 20.6% 16.6% 12∗∗∗
15 - - - - - 8.5% 8.4∗∗∗ 0.9% 8.7% 8.5∗∗∗ 14.8% 13∗∗∗ 21.6% 16.3% 13∗∗∗

Fourier+FFN model
Fama-French PCA IPCA
K α tα R2 µ tµ α tα R2 µ tµ α tα R2 µ tµ
0 3.2% 0.7 8.4% 5.5% 1.1 3.2% 0.7 8.4% 5.5% 1.1 3.2% 0.7 8.4% 5.5% 1.1
1 2.8% 1.6 1.8% 2.5% 1.5 15.4% 1.7 1.3% 16.6% 1.9 7.9% 1.8 2.6% 9.7% 2.2∗
3 4.1% 4.4∗∗∗ 3.4% 4.3% 4.5∗∗∗ 30.3% 1.3 0.1% 32.1% 1.3 17.4% 4.1∗∗∗ 1.9% 17.6% 4.1∗∗∗
5 2.9% 4.8∗∗∗ 3.1% 3.1% 5.0∗∗∗ 21.0% 1.3 0.1% 22.5% 1.4 15.9% 4.3∗∗∗ 2.6% 17.0% 4.5∗∗∗
8 3.5% 6.8∗∗∗ 2.3% 3.6% 7.0∗∗∗ 17.4% 2.6∗∗ 0.3% 17.2% 2.6∗∗ 12.9% 4.3∗∗∗ 4.4% 14.4% 4.7∗∗∗
10 - - - - - 7.1% 1.7 0.3% 7.4% 1.8 11.7% 3.9∗∗∗ 3.5% 12.6% 4.1∗∗∗
15 - - - - - 5.5% 2.1∗ 0.1% 5.7% 2.2∗ 11.3% 4.3∗∗∗ 4.0% 12.1% 4.5∗∗∗

36
Dependency between Arbitrage Strategies

Table 1: Correlations between the Returns of the CNN+Transformer Arbitrage

Strategies

FF 3 PCA 3 IPCA 3 FF 5 PCA 5 IPCA 5 PCA 10 IPCA 10

FF 3 1.00 0.32 0.44 0.62 0.25 0.43 0.21 0.44
PCA 3 0.32 1.00 0.32 0.34 0.62 0.35 0.41 0.36
IPCA 3 0.44 0.32 1.00 0.37 0.28 0.81 0.21 0.75
FF 5 0.62 0.34 0.37 1.00 0.28 0.39 0.23 0.40
PCA 5 0.25 0.62 0.28 0.28 1.00 0.29 0.47 0.31
IPCA 5 0.43 0.35 0.81 0.39 0.29 1.00 0.23 0.84
PCA 10 0.21 0.41 0.21 0.23 0.47 0.23 1.00 0.25
IPCA 10 0.44 0.36 0.75 0.40 0.31 0.84 0.25 1.00

Strategies from different factor models have low inter-family correlations

• Inter-family correlations range from 0.21 to 0.44.

• Intra-family correlations range between 0.41 and 0.84.

37
Importance of Time-Series Signal

Factors Fama-French PCA IPCA

Model K SR µ σ SR µ σ SR µ σ

0 0.50 10.6% 21.3% 0.50 10.6% 21.3% 0.50 10.6% 21.3%

1 0.34 0.8% 2.3% 0.05 0.7% 11.9% 0.60 4.8% 8.0%
OU 3 0.16 0.2% 1.4% 0.44 3.4% 7.8% 0.70 4.6% 6.6%
+ 5 0.17 0.2% 1.2% 0.68 4.7% 7.0% 0.66 4.2% 6.3%
FFN 8 -0.34 -0.3% 1.0% 0.31 2.3% 6.9% 0.61 3.9% 6.2%
10 - - - 0.26 1.3% 5.0% 0.56 3.5% 6.2%
15 - - - 0.31 1.4% 4.3% 0.54 3.3% 6.1%

0 0.57 8.8% 15.3% 0.57 8.8% 15.3% 0.57 8.8% 15.3%

38
Robustness to Rolling Window Size

Table 2: OOS Annualized Performance of CNN+Trans for 60 Days Lookback Window

Fama-French PCA IPCA

K SR µ σ SR µ σ SR µ σ
0 1.50 13.5% 9.0% 1.50 13.5% 9.0% 1.50 13.5% 9.0%
1 2.95 9.6% 3.2% 2.68 15.8% 5.9% 3.14 8.8% 2.8%
3 3.21 8.7% 2.7% 3.49 16.8% 4.8% 3.84 9.6% 2.5%
5 3.23 6.8% 2.1% 3.54 16.0% 4.5% 3.90 9.2% 2.4%
8 2.96 4.2% 1.4% 3.02 12.5% 4.2% 3.93 8.7% 2.2%
10 - - - 2.67 9.9% 3.7% 3.98 9.2% 2.3%
15 - - - 2.36 8.1% 3.4% 4.24 9.6% 2.3%

39
Robustness to Rolling Window Size

Table 3: Significance of Arbitrage Alphas for 60 Days Lookback Window

CNN+Trans Model , Sharpe objective function, L = 60 days lookback window

Fama-French PCA IPCA
K α tα R2 µ tµ α tα R2 µ tµ α tα R2 µ tµ
0 11.8% 5.6∗∗∗ 19.5% 13.5% 5.8∗∗∗ 11.8% 5.6∗∗∗ 19.5% 13.5% 5.8∗∗∗ 11.8% 5.6∗∗∗ 19.5% 13.5% 5.8∗∗∗
1 9.1% 11∗∗∗ 7.2% 9.6% 11∗∗∗ 15.5% 10∗∗∗ 1.2% 15.8% 10∗∗∗ 8.2% 12∗∗∗ 10.1% 8.8% 12∗∗∗
3 8.3% 12∗∗∗ 7.1% 8.7% 12∗∗∗ 16.5% 13∗∗∗ 2.5% 16.8% 14∗∗∗ 9.2% 15∗∗∗ 9.3% 9.6% 15∗∗∗
5 6.5% 12∗∗∗ 6.0% 6.8% 13∗∗∗ 15.6% 13∗∗∗ 2.2% 16.0% 14∗∗∗ 8.8% 15∗∗∗ 10.3% 9.2% 15∗∗∗
8 4.1% 11∗∗∗ 3.2% 4.2% 11∗∗∗ 12.2% 11∗∗∗ 1.6% 12.5% 12∗∗∗ 8.3% 15∗∗∗ 8.9% 8.7% 15∗∗∗
10 - - - - - 9.7% 10∗∗∗ 1.0% 9.9% 10∗∗∗ 8.8% 15∗∗∗ 8.3% 9.2% 15∗∗∗
15 - - - - - 8.1% 9.1∗∗∗ 0.7% 8.1% 9.1∗∗∗ 9.2% 16∗∗∗ 9.3% 9.6% 16∗∗∗

40
Mean-Variance Objective

CNN+Trans model, mean-variance objective function

Fourier+FFN model, mean-variance objective function

Fama-French PCA IPCA
K SR µ σ SR µ σ SR µ σ
0 0.28 5.5% 19.3% 0.28 5.5% 19.3% 0.28 5.5% 19.3%
1 0.38 2.5% 6.7% 0.48 16.6% 34.8% 0.56 9.7% 17.2%
3 1.16 4.3% 3.7% 0.34 32.1% 93.1% 1.06 17.6% 16.7%
5 1.30 3.1% 2.4% 0.37 22.5% 61.2% 1.17 17.0% 14.5%
8 1.73 3.6% 2.0% 0.67 17.4% 25.9% 1.21 14.4% 11.9%
10 - - - 0.45 7.4% 16.4% 1.06 12.6% 11.9%
15 - - - 0.56 5.7% 10.2% 1.17 12.1% 10.4%

41
Constant Model without Re-estimation

Table 4: OOS Annualized Performance of CNN+Trans for Constant Model

Ttrain = 4 years
Fama-French PCA IPCA
K SR µ σ SR µ σ SR µ σ
0 1.10 8.5% 7.8% 1.10 8.5% 7.8% 1.10 8.5% 7.8%
1 1.90 4.5% 2.3% 0.44 3.0% 6.9% 0.94 3.1% 3.3%
3 1.60 3.6% 2.2% 1.65 8.7% 5.3% 1.82 5.3% 2.9%
5 1.81 3.0% 1.7% 1.93 9.8% 5.1% 2.09 5.4% 2.6%
8 1.70 2.5% 1.5% 2.04 9.6% 4.7% 1.89 5.0% 2.6%
10 - - - 2.06 9.1% 4.4% 1.77 4.7% 2.7%
15 - - - 1.82 7.0% 3.9% 2.09 5.5% 2.7%

Ttrain = 8 years
Fama-French PCA IPCA
K SR µ σ SR µ σ SR µ σ
0 1.33 12.0% 9.0% 1.33 12.0% 9.0% 1.33 12.0% 9.0%
1 2.06 5.0% 2.4% 1.81 15.2% 8.4% 2.02 8.5% 4.2%
3 2.46 5.3% 2.2% 2.04 13.1% 6.4% 2.47 7.5% 3.0%
5 1.82 3.2% 1.8% 1.91 11.9% 6.2% 2.64 7.6% 2.9%
8 1.48 2.5% 1.7% 1.89 10.8% 5.7% 2.71 8.3% 3.1%
10 - - - 1.82 10.0% 5.5% 2.68 8.2% 3.1%
15 - - - 1.38 6.2% 4.5% 2.70 7.8% 2.9%
42
Constant Model without Re-estimation

Table 5: Significance of Arbitrage Alphas for Constant Model

CNN+Trans model, Sharpe objective function, Ttrain = 4 years

Fama-French PCA IPCA
K α tα R2 µ tµ α tα R2 µ tµ α tα R2 µ tµ
0 8.4% 4.2∗∗∗ 3.0% 8.5% 4.3∗∗∗ 8.4% 4.2∗∗∗ 3.0% 8.5% 4.3∗∗∗ 8.4% 4.2∗∗∗ 3.0% 8.5% 4.3∗∗∗
1 4.0% 6.8∗∗∗ 5.9% 4.5% 7.3∗∗∗ 4.1% 2.0∗ 4.5% 5.2% 2.5∗ 3.1% 3.7∗∗∗ 1.6% 3.1% 3.6∗∗∗
3 3.2% 5.7∗∗∗ 4.9% 3.6% 6.2∗∗∗ 8.2% 6.1∗∗∗ 2.7% 8.7% 6.4∗∗∗ 5.3% 7.4∗∗∗ 11.7% 5.3% 7.0∗∗∗
5 2.8% 6.6∗∗∗ 4.3% 3.0% 7.0∗∗∗ 9.3% 7.1∗∗∗ 1.8% 9.8% 7.5∗∗∗ 5.5% 8.6∗∗∗ 8.3% 5.4% 8.1∗∗∗
8 2.3% 6.1∗∗∗ 5.1% 2.5% 6.6∗∗∗ 9.0% 7.5∗∗∗ 2.2% 9.6% 7.9∗∗∗ 5.0% 7.7∗∗∗ 8.2% 5.0% 7.3∗∗∗
10 - - - - - 8.6% 7.5∗∗∗ 1.9% 9.1% 8.0∗∗∗ 5.1% 8.0∗∗∗ 16.6% 4.7% 6.9∗∗∗
15 - - - - - 6.8% 6.8∗∗∗ 1.0% 7.0% 7.1∗∗∗ 5.8% 9.3∗∗∗ 17.6% 5.5% 8.1∗∗∗

CNN+Trans model, Sharpe objective function, Ttrain = 8 years

Fama-French PCA IPCA
K α tα R2 µ tµ α tα R2 µ tµ α tα R2 µ tµ
0 10.1% 4.1∗∗∗ 18.1% 12.0% 4.4∗∗∗ 10.1% 4.1∗∗∗ 18.1% 12.0% 4.4∗∗∗ 10.1% 4.1∗∗∗ 18.1% 12.0% 4.4∗∗∗
1 4.4% 6.5∗∗∗ 14.3% 5.0% 6.8∗∗∗ 14.5% 5.8∗∗∗ 2.5% 15.2% 6.0∗∗∗ 7.0% 6.6∗∗∗ 30.6% 8.5% 6.7∗∗∗
3 4.9% 7.9∗∗∗ 11.6% 5.3% 8.2∗∗∗ 12.8% 6.7∗∗∗ 2.7% 13.1% 6.8∗∗∗ 7.0% 7.9∗∗∗ 8.2% 7.5% 8.2∗∗∗
5 2.9% 5.8∗∗∗ 12.3% 3.2% 6.0∗∗∗ 11.6% 6.2∗∗∗ 1.6% 11.9% 6.3∗∗∗ 7.1% 8.7∗∗∗ 12.1% 7.6% 8.7∗∗∗
8 2.3% 4.7∗∗∗ 5.4% 2.5% 4.9∗∗∗ 10.2% 6.0∗∗∗ 3.1% 10.8% 6.3∗∗∗ 7.7% 9.0∗∗∗ 14.6% 8.3% 9.0∗∗∗
10 - - - - - 9.4% 5.7∗∗∗ 2.6% 10.0% 6.0∗∗∗ 7.7% 8.9∗∗∗ 11.3% 8.2% 8.9∗∗∗
15 - - - - - 6.0% 4.4∗∗∗ 0.9% 6.2% 4.6∗∗∗ 7.4% 8.9∗∗∗ 11.2% 7.8% 8.9∗∗∗

43
Empirical example: (1) OU+Threshold signals & allocation weights

44
Empirical example: (2) Fourier+FFN signals & allocation weights

45
Simulation example: (3) CNN+Transformer signals & allocation weights

46
Fourier+FFN architecture

FFN equations:

x (l) =ReLU(W (l−1) x (l−1) + b (l−1) )

w =W (L) x (L) + b (L)

47
Convolutional network equations

Given v
L u L
(i) 1 X (i) (i)
u1 X (i) (i)
2
µk = yl,k , σk =t yl,k − µk .
L L
l=1 l=1

Input time series x ∈ (0)

RL
are passed through k = 1, . . . , F convolutional
filters of size fsize , normalization, and ReLU:
fsize (0) (0)
!
(0) (1)
X (1) (0) (1) y1,k − µk
yl,k = bk + Wk,m xl−m+1 , xl,k = ReLU (0)
.
m=1 σk
(1)
R
Output x1 ∈ L×F passes through k = 1, . . . , F convolutional filters of size
fsize × F , normalization, and ReLU:
fsize F (1) (1)
!
(1) (2)
X X (2) (1) (2) yl,k − µk
yl,k = bk + Wk,m xl−m+1,k , xl,k = ReLU (1)
,
m=1 k=1 σk
Finally, residuals are added back to output x (2) ∈ RL×F via residual connection,
to compute features x̃ ∈ L×F :R
(2) (0)
x̃l,k = xl,k + xl .
⇒ All b (i) and W (i) are parameters; all convolutions are left-padded with 0.
48
Transformer equations

R
• Features x̃ ∈ L×F are projected onto i = 1, . . . , h F /h-dimensional
subspaces (“heads”):

Vi = x̃WiV + biV , Ki = x̃WiK + biK , Qi = x̃WiQ + biQ ∈ RL×F /h .

• Projections Vi are aggregated temporally obtaining hidden states

R
yi ∈ L×F /h , for
L
RF /h , exp(Ki,l · Qi,j )
X (i) (i)
yi,l = wl,j Vi,j ∈ wl,j = PL ∈ [0, 1].
j=1 m=1 exp(Ki,l · Qi,m )

• Final output is Concat(y1 , ..., yh )W O + b O ∈ RL×F .

• This is passed through time-wise feedforward networks.
R
• WiV , WiK , WiQ ∈ F ×F /h , biV , biK , biQ ∈ RF /h , W O ∈ RF ×F , bO ∈ RF
⇒ parameters to estimate.

49
Hyperparameter information

Notation Model Hyperparameters Initial Candidates Chosen

OU+Thres
R2T R 2 filter threshold 0.5 0.25, 0.5, 0.75 0.25
ST Signal threshold to long/short 1.25 1, 1.25, 1.5 1.25
LKB Number of days in residual lookback window 30 30 30
DFT+FFN
HLC Hidden layer configuration [16,8,4] [16,8,4] [16,8,4]
DRPH Dropout rate (% removed) in hidden layers 0.25 0.25 0.25
LKB Number of days in residual lookback window 30 30 30
WDW Number of days in rolling training window 1000 1000 1000
RTFQ Number of days of retraining frequency 125 125 125
CNN+Trans
D Number of filter channels in CNN 4 4, 8 8
ATT Number of attention heads 4 2, 4 4
HDN Number of hidden units in transformer’s linear layer 2F 2F, 3F 2F
DRPA Dropout rate (% removed) in the transformer 0.25 0.5, 0.25 0.25
Dsize Filter size in CNN 2 2 2
LKB Number of days in residual lookback window 30 30 30
WDW Number of days in rolling training window 1000 1000 1000
RTFQ Number of days of retraining frequency 125 125 125

The Mathematical Model and Geometrical Progression of The Guitar and The Golden Rectangles-THE BASICS by Yorgos Kertsopoulos PDF
100% (2)
The Mathematical Model and Geometrical Progression of The Guitar and The Golden Rectangles-THE BASICS by Yorgos Kertsopoulos PDF
27 pages
High Frequency Trading Standford .
No ratings yet
High Frequency Trading Standford .
14 pages
Deep Neural Networks, Gradient-Boosted Trees, Random Forests Statistical Arbitrage On The S&P 500
No ratings yet
Deep Neural Networks, Gradient-Boosted Trees, Random Forests Statistical Arbitrage On The S&P 500
33 pages
Deep Reinforcement Learning For Algorithmic Trading: 1.0.1 Learning in Financial Markets
No ratings yet
Deep Reinforcement Learning For Algorithmic Trading: 1.0.1 Learning in Financial Markets
24 pages
Statistical Arbitrage in The U.S. Equities Market
No ratings yet
Statistical Arbitrage in The U.S. Equities Market
47 pages
Algorithmic Trading Using Intelligent Agents
No ratings yet
Algorithmic Trading Using Intelligent Agents
8 pages
SPA Pretest and Post Test
No ratings yet
SPA Pretest and Post Test
2 pages
Deep Learning For Financial Time Series Forecasting in A-Trader
No ratings yet
Deep Learning For Financial Time Series Forecasting in A-Trader
8 pages
An Ornstein-Uhlenbeck Framework For Pairs Trading
No ratings yet
An Ornstein-Uhlenbeck Framework For Pairs Trading
58 pages
Deep Reinforcement Learning For Algorithmic Trading
No ratings yet
Deep Reinforcement Learning For Algorithmic Trading
9 pages
SIP Action Plan Overview
No ratings yet
SIP Action Plan Overview
2 pages
(Studies in Ancient Medicine, 39) Joel Mann - Hippocrates, On The Art of Medicine (2012, Brill) PDF
100% (1)
(Studies in Ancient Medicine, 39) Joel Mann - Hippocrates, On The Art of Medicine (2012, Brill) PDF
292 pages
Analysis of Temporal Pattern, Causal Interaction and Predictive Modeling of
100% (1)
Analysis of Temporal Pattern, Causal Interaction and Predictive Modeling of
17 pages
Vda Brochure
No ratings yet
Vda Brochure
8 pages
Deep Learning Applying On Stock Trading
No ratings yet
Deep Learning Applying On Stock Trading
6 pages
Simatic PDM
No ratings yet
Simatic PDM
16 pages
Worksheet 1
No ratings yet
Worksheet 1
5 pages
A Hybrid Model Combining Discrete Wavelet Transform and Nonlinear Autoregressive
No ratings yet
A Hybrid Model Combining Discrete Wavelet Transform and Nonlinear Autoregressive
7 pages
An Intelligent Statistical Arbitrage Trading System
No ratings yet
An Intelligent Statistical Arbitrage Trading System
14 pages
Software Test Specification
No ratings yet
Software Test Specification
7 pages
Deep Learning Statistical Arbitrage
No ratings yet
Deep Learning Statistical Arbitrage
68 pages
Generating High Frequency Trading Strategies With Artificial - PDF Room
No ratings yet
Generating High Frequency Trading Strategies With Artificial - PDF Room
120 pages
Group 3
No ratings yet
Group 3
17 pages
J Ejor 2016 10 031
No ratings yet
J Ejor 2016 10 031
33 pages
Pairs Trading Research Paper
No ratings yet
Pairs Trading Research Paper
66 pages
THUx院-Trade When Opportunity Comes- Price Movement Forecasting via Locality-Aware Attention and Iterative Refinement Labeling
No ratings yet
THUx院-Trade When Opportunity Comes- Price Movement Forecasting via Locality-Aware Attention and Iterative Refinement Labeling
9 pages
Machine Learning in Futures Markets: Waldow, Fabian Schnaubelt, Matthias Krauss, Christopher Fischer, Thomas G
No ratings yet
Machine Learning in Futures Markets: Waldow, Fabian Schnaubelt, Matthias Krauss, Christopher Fischer, Thomas G
15 pages
Job Safety Analysis: Accident Severity: Any Incident That
No ratings yet
Job Safety Analysis: Accident Severity: Any Incident That
4 pages
S Clémençon ML PDF
No ratings yet
S Clémençon ML PDF
73 pages
Essay On The Principles of Translation
No ratings yet
Essay On The Principles of Translation
445 pages
Neural Networks Can Detect Model-Free Static Arbitrage Strategies
No ratings yet
Neural Networks Can Detect Model-Free Static Arbitrage Strategies
22 pages
ArbitragePricing Set1
No ratings yet
ArbitragePricing Set1
41 pages
SSRN 1153505
No ratings yet
SSRN 1153505
47 pages
Optimal Linear Signal
No ratings yet
Optimal Linear Signal
10 pages
A B - L B S A: Pplication of Lack Itterman Ayesian in Tatistical Rbitrage
No ratings yet
A B - L B S A: Pplication of Lack Itterman Ayesian in Tatistical Rbitrage
10 pages
Capstone PPT APNA
No ratings yet
Capstone PPT APNA
12 pages
A Stock Trading Algorithm Based On Trend Forecasti
No ratings yet
A Stock Trading Algorithm Based On Trend Forecasti
13 pages
Data Mining For Algorithmic Asset Management - Montana
No ratings yet
Data Mining For Algorithmic Asset Management - Montana
13 pages
Have Van Der
No ratings yet
Have Van Der
62 pages
Statistical Arbitrage Powered by Explainable Artificial Intelligence
No ratings yet
Statistical Arbitrage Powered by Explainable Artificial Intelligence
17 pages
Assess Deep Learning Models For Egyptian Exchange Prediction Using Nonlinear Artificial Neural Networks
No ratings yet
Assess Deep Learning Models For Egyptian Exchange Prediction Using Nonlinear Artificial Neural Networks
23 pages
Ai Trading
No ratings yet
Ai Trading
6 pages
16095-Article Text-19589-1-2-20210518
No ratings yet
16095-Article Text-19589-1-2-20210518
10 pages
Sarmento 2020
No ratings yet
Sarmento 2020
13 pages
Notes On Chapter 1
No ratings yet
Notes On Chapter 1
9 pages
Trend Following
No ratings yet
Trend Following
22 pages
Giglio Et Al 2022 Factor Models Machine Learning and Asset Pricing
No ratings yet
Giglio Et Al 2022 Factor Models Machine Learning and Asset Pricing
34 pages
Iusmanmaqbool 1721820716
No ratings yet
Iusmanmaqbool 1721820716
59 pages
Master Thesis v2 4
No ratings yet
Master Thesis v2 4
53 pages
MLbootcamp
No ratings yet
MLbootcamp
54 pages
Machine Learning Stocks
No ratings yet
Machine Learning Stocks
7 pages
Deep Learning For Stock Market Trading: A Superior Trading Strategy?
No ratings yet
Deep Learning For Stock Market Trading: A Superior Trading Strategy?
21 pages
10.3934 Dsfe.2022022
No ratings yet
10.3934 Dsfe.2022022
27 pages
Stock Price Prediction Using Machine Learning With Python
No ratings yet
Stock Price Prediction Using Machine Learning With Python
13 pages
Machine Learning Classification of Price Extrema B
No ratings yet
Machine Learning Classification of Price Extrema B
25 pages
MCM2011 12 Fresh Amount Transfer Ed
No ratings yet
MCM2011 12 Fresh Amount Transfer Ed
122 pages
Draft Paper
No ratings yet
Draft Paper
9 pages
Learning To Trade Using Q-Learning
No ratings yet
Learning To Trade Using Q-Learning
18 pages
C D L O B R L P T: Ombining EEP Earning On Rder Ooks With Einforcement Earning For Rofitable Rading
No ratings yet
C D L O B R L P T: Ombining EEP Earning On Rder Ooks With Einforcement Earning For Rofitable Rading
41 pages
Bachelor Thesis
No ratings yet
Bachelor Thesis
55 pages
v3 05 Barnes
No ratings yet
v3 05 Barnes
22 pages
Research Paper
No ratings yet
Research Paper
6 pages
Water Filter Lab
No ratings yet
Water Filter Lab
8 pages
Testing Stock Market Efficiency Using Historical Trading Data and Machine Learning
No ratings yet
Testing Stock Market Efficiency Using Historical Trading Data and Machine Learning
40 pages
Unit 4
No ratings yet
Unit 4
11 pages
Statistical Arbitrage For Mid-Frequency Trading
No ratings yet
Statistical Arbitrage For Mid-Frequency Trading
17 pages
Financial Time Series Forecasting Applying Deep Learning Algorithms
No ratings yet
Financial Time Series Forecasting Applying Deep Learning Algorithms
16 pages
Minor Project 1 Report
No ratings yet
Minor Project 1 Report
20 pages
FULLTEXT01
No ratings yet
FULLTEXT01
57 pages
Geolab 6
No ratings yet
Geolab 6
6 pages
Sufjan Stevens, Place and Music by Michael Anton (BA Dissertation)
100% (1)
Sufjan Stevens, Place and Music by Michael Anton (BA Dissertation)
69 pages
A Multi-Layer and Multi-Ensemble Stock Trader Using Deep Learning and Deep Reinforcement Learning - Anna's Archive
No ratings yet
A Multi-Layer and Multi-Ensemble Stock Trader Using Deep Learning and Deep Reinforcement Learning - Anna's Archive
17 pages
(배명고2) 2011-2-Final (영어독작 - 이학공학) 문제지
No ratings yet
(배명고2) 2011-2-Final (영어독작 - 이학공학) 문제지
5 pages
Machine Learning in Statistical Arbitrage
No ratings yet
Machine Learning in Statistical Arbitrage
5 pages
Netfortris HUD Web - User Guide
No ratings yet
Netfortris HUD Web - User Guide
29 pages
Enterprise Analysis
No ratings yet
Enterprise Analysis
29 pages
Cold Mind: The Released Suffering Stability
No ratings yet
Cold Mind: The Released Suffering Stability
3 pages
Machine Learning For Algorithmic Trading: T Kondratieva
No ratings yet
Machine Learning For Algorithmic Trading: T Kondratieva
5 pages
Summative Test English 10
No ratings yet
Summative Test English 10
3 pages
Edu410 PD Plan
0% (1)
Edu410 PD Plan
4 pages
Narboux Thompson Clarke's Tight Rope
No ratings yet
Narboux Thompson Clarke's Tight Rope
36 pages
Present Perfect Continuous: Exercises
No ratings yet
Present Perfect Continuous: Exercises
5 pages
Test of Hypothesis by Zakir Sir
No ratings yet
Test of Hypothesis by Zakir Sir
34 pages
Numbers As Political Allies: The Census in Jammu and Kashmir Vikas Kumar - Download The Ebook in PDF With All Chapters To Read Anytime
100% (1)
Numbers As Political Allies: The Census in Jammu and Kashmir Vikas Kumar - Download The Ebook in PDF With All Chapters To Read Anytime
63 pages
Instructions To Complete The Faculty Self-Evaluation Form: 1. Enter Name, Designation and ID Number and Department Name
No ratings yet
Instructions To Complete The Faculty Self-Evaluation Form: 1. Enter Name, Designation and ID Number and Department Name
2 pages
Hair Morphology
No ratings yet
Hair Morphology
2 pages
Tectyl RP 802 (Pds-En) 20161123-R2.0
No ratings yet
Tectyl RP 802 (Pds-En) 20161123-R2.0
1 page
Workplace Education Project
No ratings yet
Workplace Education Project
2 pages
Investment Formulas: A Simple Introduction
From Everand
Investment Formulas: A Simple Introduction
K.H. Erickson
No ratings yet

Slides Deep Learning Statistical Arbitrage

Uploaded by

Slides Deep Learning Statistical Arbitrage

Uploaded by

Deep Learning Statistical Arbitrage

Jorge Guijarro-Ordonez, Markus Pelger, and Greg Zanotti

Intuition: Pairs trading (simplest statistical arbitrage)

Prices of Similar Stocks Differences between Prices

Three components of statistical arbitrage:

Key elements of statistical arbitrage:

Can machine learning help?

Our novel method: Deep learning statistical arbitrage

Novel conceptual framework:

Comprehensive out-of-sample study on U.S. equities

Classical approaches to statistical arbitrage (parametric models)

Machine learning for time-series (no trading objective)

Excess returns of stocks follow a conditional factor model:

• K factors Ft capture systematic risk.

Factor models identify similar assets by similar exposures to risk factors

• Define arbitrage portfolio as residual portfolios:

• Arbitrage portfolios are only weakly cross-sectionally dependent.

Residuals with the empirically most important families of factor models:

1. Observed fundamental factors: Fama-French factors.

Factors are projections on returns without loss of generality:

⇒ Arbitrage portfolios are traded, factor-neutral, weakly correlated and

1. The arbitrage signal function

Estimation: For a given class of models maximize risk-adjusted return:

• Main objective: Sharpe ratio, but we also consider mean-variance objective

Classical mean reversion trading: (Avellaneda and Lee (2010))

• Each residual is modeled as an Ornstein-Uhlenbeck (OU) process

dXt = κ(µ − Xt )dt + σdBt

where cthres is chosen optimally.

Limitations: Parametric model might be misspecified (eg. trends, multiple

Signal θ: General time-series model

• Includes ARMA models, discretized OU, etc.

• Signal are the “loadings” on long and short-term reversal patterns:

Allocation w  : Flexible non-parameteric function with regularization

Limitation: Choice of pre-specified filter limits the time-series patterns.

• Convolutional neural networks (CNN) are data-driven non-linear local

• Joint estimation of signal and allocation function with trading objective

• The network applies to residual time series Lt a combination of local

Transformer captures temporal dependencies between local patterns

• H global patterns specified by “attention weights” αi ∈ L . R

Out-of-sample analysis on U.S. equity data:

Residuals with the empirically most important families of factor models:

Given the residuals, we estimate arbitrage signals and allocations for

and, for completeness,

Factors Fama-French PCA IPCA

• Arbitrage trading has to be applied to residuals and not returns

Factors Fama-French PCA IPCA

0 1.64 13.7% 8.4% 1.64 13.7% 8.4% 1.64 13.7% 8.4%

0 0.36 4.9% 13.6% 0.36 4.9% 13.6% 0.36 4.9% 13.6%

0 -0.18 -2.4% 13.3% -0.18 -2.4% 13.3% -0.18 -2.4% 13.3%

(a) CNN+Trans, Fama-French 5 (b) CNN+Trans, PCA 5 (c) CNN+Trans, IPCA 5

(d) FFT+FFN, Fama-French 5 (e) FFT+FFN, PCA 5 (f) FFT+FFN, IPCA 5

α 11.6% 4.5% 11.6% 14.1% 11.6% 8.3%

tα 6.4∗∗∗ 12∗∗∗ 6.4∗∗∗ 13∗∗∗ 6.4∗∗∗ 16∗∗∗

• Time-series regression of 8 asset pricing factors:

• Time-series regression of 8 asset pricing factors:

CNN+Trans model, mean-variance objective function

Alternative mean-variance objective function:

Factors Fama-French PCA IPCA

0 0.57 8.8% 15.3% 0.57 8.8% 15.3% 0.57 8.8% 15.3%

Is the a time-series signal function actually needed?

• Apply flexible FFN directly to residuals without time-series model

Stability over time:

IPCA factor model

Turnover with and without trading friction objective:

(a) No Trading Friction Objective (b) With Trading Friction Objective

Proportion of short allocation weights:

(a) No Trading Friction Objective (b) With Trading Friction Objective

(a) Example 1: Mean-reversion

(b) Example 2: Trend

Sample of representative residuals with out-of-sample arbitrage trading

Local filters estimated by CNN to capture relative local patterns

• Attention head weight 4: negative reversal factor

(a) Importance of Local Basic Patterns (b) Importance of Residual Days

• Measure importance with average absolute gradient of allocation weight

Past Returns Investment Profitability Intangibles Value Trading Frictions

46 firm-specific monthly characteristics sorted into six categories.

Allocation w : Flexible non-parameteric function with regularization

• The network applies to residual time series Lt a combination of local