0% found this document useful (0 votes)
30 views

Supervised Autoencoder MLP

The document proposes using machine learning techniques like supervised autoencoders and noise augmentation to improve financial time series forecasting and trading strategy performance. It evaluates these techniques on S&P500, EUR/USD, and BTC/USD data. Key research questions are whether noise augmentation and a novel triple barrier labeling technique can improve performance as measured by information ratio. Feature engineering techniques like fractional differentiation of time series are also considered. The aim is to determine whether these machine learning approaches can outperform traditional methods for algorithmic trading.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Supervised Autoencoder MLP

The document proposes using machine learning techniques like supervised autoencoders and noise augmentation to improve financial time series forecasting and trading strategy performance. It evaluates these techniques on S&P500, EUR/USD, and BTC/USD data. Key research questions are whether noise augmentation and a novel triple barrier labeling technique can improve performance as measured by information ratio. Feature engineering techniques like fractional differentiation of time series are also considered. The aim is to determine whether these machine learning approaches can outperform traditional methods for algorithmic trading.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Supervised Autoencoder MLP for Financial Time Series Forecasting

Bartosz Bieganowski, Robert Ślepaczuk

University of Warsaw, Faculty of Economic Sciences


Department of Quantitative Finance and Machine Learning, Quantitative Finance Research Group

March 4, 2024

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 1 / 33
Table of Contents

1 Problem Statement & Data

2 Feature Engineering

3 Novelty 1 - Triple Barrier Labeling & Optimal Metric

4 Novelty 2 - Supervised Autoencoder & Noise Augmentation

5 Approach Comparison & Results

6 Sensitivity Analysis

7 Conclusion

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 2 / 33
Aim

The aim of this study is to verify whether the following machine-learning related techniques
can improve trading strategy performance:
Noise augmentation - originally developed for Computer Vision problems, it has been
noted that adding noise to the input data helps with generalization on image classification
tasks.
Supervised Autoencoder - originally developed for Natural Language Processing
problems, we test if SAE-MLP architecture can be applied in algorithmic trading
strategies.
Triple Barrier Labeling - although already mentioned in the literature, we expand on
this specific labeling method by developing an optimization metric that resembles the
strategy return better.

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 3 / 33
Research Questions

RQ1: Does noise augmentation used with SAE-MLP architecture improve


strategy performance as expressed by Information Ratio?

RQ2: Does the triple barrier labeling with correct optimization metric improve
strategy performance as expressed by Information Ratio?

RQ3: Does the hyperparameter tuning improve strategy performance with


SAE-MLP architectures as expressed by Information Ratio?

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 4 / 33
Literature Review

Efficient Market Hypothesis (EMH) suggests stock prices reflect all available information,
making them unpredictable.
Studies by Fama (1970) and Malkiel (2005) support EMH, while Barberis and Thaler
(2002) suggest market inefficiencies.
Machine learning (ML) techniques like LSTM outperform traditional methods in stock
price prediction (Kryńska and Ślepaczuk, 2022).
LSTM models show promise in forecasting, but challenges remain in handling
non-stationary data and parameter sensitivity.
Hybrid models combining LSTM and GRU demonstrate improved performance in
forecasting financial assets (Baranovhnikov and Ślepaczuk, 2022)
ML models, particularly deep learning, excel in predicting Bitcoin prices, indicating their
relevance in cryptocurrency trading (Michanków et. al., 2022).

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 5 / 33
Data

S&P500 - Low volatility compared to individual stocks, correlated with economic growth
indicators, right-skewed return distribution.

EUR/USD - Moderate volatility, driven by monetary policy of UE and FED, and


indicators from both regions, returns close to normal with leptokurtosis.

BTC/USD - High volatility, driven by speculation, technological developments. Low


correlation with traditional financial assets. Returns skewed and highly leptokurtic.

Training timeframe: 2010-01-01 - 2019-12-31


Testing timeframe: 2020-01-01 - 2022-04-30

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 6 / 33
Toolset

Hardware: GeForce RTX 2080 SUPER, Intel Core i7-9700K, Patriot 32GB RAM

Software: Python 3.10, Tensorflow, Pandas, Matplotlib, Scikit-learn

Computation Time: on average 4 minutes per 1 hyperparameter combination

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 7 / 33
Features - ICSA, Oil, Gas

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 8 / 33
Features - Corn, Gold, Copper

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 9 / 33
Features - Presumed Impact on Economy

Feature Presumed Increase Impact Presumed Decrease Impact


ICSA Negative: Indicates rising unemployment, poten- Positive: Suggests decreasing unemployment,
tial economic slowdown potential economic growth
Oil Mixed: Benefits oil exporters, increases costs for Mixed: Lowers costs for importers and con-
importers and consumers sumers, but may harm oil-exporting economies
Gas Negative: Increases energy costs, affects con- Positive: Decreases energy costs, boosts con-
sumer spending and production costs sumer spending and lowers production costs
Corn Negative: Raises food and feed prices, impacts Positive: Lowers food and feed prices, beneficial
food industry and inflation for food industry and inflation control
Gold Mixed: Often seen as a safe haven, increase may Mixed: Decrease may reflect investor confidence,
indicate economic uncertainty but could impact gold-producing economies
Copper Positive: Suggests industrial growth and de- Negative: May indicate reduced industrial activ-
mand, often a positive economic indicator ity and economic slowdown

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 10 / 33
Feature Engineering Question

What do we do with our features before we put them into the machine learning model?

Should we differentiate the time series (d=1, losing the memory aspect)?

Should we input it as-is (d=0, but data is not stationary)?

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 11 / 33
Fractionally differentiated features

We can apply ARFIMA (Granger, C. W. J.; Joyeux, Roselyne, 1980) assumptions to machine learning
features. We consider the backshift operator B applied to a time series of a feature {Xt } such that
B k Xt = Xt−k .

It follows that the difference between current and last feature’s value can be expressed as (1 − B)Xt . For
example, (1 − B)2 = 1 − 2B + B 2 , where B 2 Xt = Xt−2 so that (1 − B)2 Xt = Xt − 2Xt−1 + Xt−2 .

For any positive integer n, it also holds that:


n   n  
n
X n k n−k
X n n−k k
(x + y ) = x y = x y (1)
k k
k=0 k=0

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 12 / 33
Fractionally differentiated features

On the other hand for any real number d:


∞  
X d k
(1 + x)d = x (2)
k
k=0

is the binomial series. In a model where d is allowed to be a real number, the binomial series
can be expanded into a series of weights which can be applied to feature values:
k−1
( )
d(d − 1) d(d − 1)(d − 1) Y d −i
ω = 1, −d, , , ..., (−1)k (3)
2! 3! k!
i=0

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 13 / 33
Optimal Differentiation Order

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 14 / 33
Optimal Differentiation Order

Algorithm 1 Fractional Feature Differentiation in Walk-Forward Validation


1: Set a range of possible values for d (e.g., from 0 to 1)
2: Set significance level for ADF test (e.g., 1%)
3: Initiate a dictionary associating each feature with optimal d.
4: for each segment pair (train, test) do
5: for each feature do
6: Apply fractional differencing to train segment of feature at discrete intervals
7: Calculate ADF test statistic and p-value for each d for a feature
8: Choose lowest d such that p-value ¡ significance level.
9: Save feature name and associated optimal d to dictionary
10: Apply optimal d differencing to both train and test set of the feature
11: end for
12: Train the model on train segment, evaluate on test set
13: end for

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 15 / 33
Triple Barrier Labeling

Whenever we try to express trading problem as a machine learning problem, we have to think
long and hard about what do we want our model to really predict (Y).

Regression on price in x time? (unstationary, ignores path, uninformative error metrics).

Regression on return over x time? (maybe stationary, ignores path, no directional


sensitivity unless using custom loss metrics like MADL).

Classification on movement direction? (better, still ignores path, high noise-to-signal).

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 16 / 33
Triple Barrier Labeling

Path-dependent classification, which is effectively ML-interpretation of concepts of stop-loss,


take-profit, and timed-exit.

Figure: Exemplary labels in triple-barrier-labeling

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 17 / 33
Triple Barrier Labeling


1,
 if max(St , ..., St+n ) ≥ St · (1 + λ)
Pt = −1, if min(St , ..., St+n ) ≤ St · (1 − λ) (4)

0, otherwise

λ - window size in (%)

(Idea: λ was a constant for this study, but it might work well to base it on an estimate of future volatility)

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 18 / 33
Payoff Table

Table 2. Return on a trade given classification result.

Pred/True 1 0 -1

1 λ (−λ, λ) −λ
0 0 0 0
-1 −λ (−λ, λ) λ

Source: Own Elaboration

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 19 / 33
Derived Optimization Metric

We define directly correct count as the number of times the model entered correct position
which resulted in return of λ. We can similarly define directly incorrect count as the number
of times the model entered incorrect position:

DCC = |{(Ypred , Ytrue ) ∈ S | Ypred ̸= 0 and Ypred = Ytrue }| (5)


DIC = |{(Ypred , Ytrue ) ∈ S | Ypred ̸= 0 and Ypred ̸= Ytrue }| (6)

Where |S| is the cardinality of set S.

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 20 / 33
Derived Optimization Metric

Basic optimization metric Φ:


DCC
Y DIC
Y
Φ= (1 + λ) · (1 − λ) = (1 + λ)DCC · (1 − λ)DIC (7)
1 1

Optimization metric with δ dictating error preference strength:

λ TEC
 
Φδ = (1 + λ)DCC · (1 − λ)DIC · 1 − (8)
δ

where δ > λ. In our study, we set δ arbitrarily to 20, indicating that twenty timed exits are
considered equally undesirable as one direct incorrect classification

(Note: Accurate prediction of zeros could also be taken advantage of with an option butterfly)

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 21 / 33
Data Augmentation in CV

Can we apply the concept to financial


time series?

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 22 / 33
Supervised Autoencoder

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 23 / 33
Supervised Autoencoder

Enhanced Feature Representation: Supervised autoencoders can learn more relevant and discriminative
features for the task at hand because they are trained to not only reconstruct the input data but also to
optimize for an additional task-specific loss (like classification or regression).

Regularization Effect: Incorporating the reconstruction objective alongside the task-specific objective
(like classification accuracy) can act as a form of regularization. This helps in preventing overfitting to the
training data by ensuring that the learned representations maintain information about the input data,
leading to more generalized models.

Efficiency in Data Use: By leveraging unlabeled data for the reconstruction part and labeled data for the
task-specific part, supervised autoencoders can make efficient use of datasets where obtaining labeled data
is expensive or time-consuming. This can be particularly beneficial in semi-supervised learning scenarios,
where the model can learn general features from a large pool of unlabeled data and fine-tune the
representations for the task with a smaller set of labeled examples.

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 24 / 33
Approaches Comparison

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 25 / 33
Drawdown-adjusted information ratio

We use the information ratio as our main metric, originally proposed by Kość et al. (2019)
which is a modification of the Information Ratio measure. This measure also takes into
account the sign of the portfolio’s rate of return and the maximum drawdown:

ARC 2 · sign(ARC )
IR ∗∗ = (9)
ASD · MDD

ARC - Annualized Return Compounded


ASD - Annualized Standard Deviation
MDD - Maximum Drawdown

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 26 / 33
Results - Eq. Weight Portfolio of Strategies - IRR**

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 27 / 33
Sensitivity Analysis - Triple Barrier Labelling
Y - window height λ X-window length (minutes)

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 28 / 33
Sensitivity Analysis - Supervised Autoencoder
Y - Gaussian nosie rate X - Bottleneck size (% of feature count)

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 29 / 33
Sensitivity Analysis - Supervised Autoencoder

Y - Encoder hidden layer count - Decoder hidden layer count

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 30 / 33
Research Question Findings

RQ1: Impact of Data Augmentation and Denoising


Data augmentation (Gaussian noise) and denoising (autoencoders) significantly improve
strategy performance.
Approach 3 excels over Approaches 1 & 2 in Information Ratio for all bar lengths.
Optimal noise level and autoencoder size are critical; relationship is non-linear, requiring
careful calibration.
RQ2: Efficacy of Triple Barrier Labelling
Triple barrier labelling surpasses simple direction classification, enhancing market noise
handling and optimization metrics.
Approach 4 outperforms others in 15 and 30-minute bars but falls short in high-frequency
(5-minute bars) trading scenarios.
RQ3: Role of Hyperparameter Tuning
Crucial for superior investment strategy performance; optimal results with specific noise
levels and autoencoder sizes.

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 31 / 33
Further elaboration ideas

Dynamic lambda - setting lambda (TBL window size) to be a dynamic estimate of future
volatility.

Dynamic window length - setting dynamic length size based on estimate of market activity.

Zero-classifications - Accurate predictions of the price staying the same can be taking advantage
of with options (theta decay).

Other architectures - More elaborate models than MLP can be stacked on top of SAE (Random
Forest, ADABoost, CatBoost).

Feature engineering - more elaborate feature engineering to see how SAE reacts to greater
number of features.

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 32 / 33
Conclusion

Thank you!

Q&A

Bartosz Bieganowski, Robert Ślepaczuk (WNE UW) SAE-MLP For Financial Time Series Forecasting March 4, 2024 33 / 33

You might also like