AAA Improving The Prediction of Asset Returns With Machine Learning by Using A Custom Loss
AAA Improving The Prediction of Asset Returns With Machine Learning by Using A Custom Loss
Jean Dessain
IESEG, School of Management, Department of Finance, 3 rue de la Digue, 59000 Lille, France
Email: [email protected]
Abstract
Not all errors from models predicting asset returns are equal in terms of impact on the
efficiency of the algorithm: some errors induce poor investment decisions while other errors
have no financial consequences. This asymmetry is critical for the performance metric used to
assess the ability of algorithms to predict returns. This should also impact the choice for the
most efficient loss function, a key element for training machine learning algorithms.
Mean Squared Error (MSE) is the most popular loss function for regression algorithms, but
it is not the most efficient one for algorithms predicting asset returns. In this article: (a) we
develop custom loss functions that account for the asymmetry in the predictive purpose of the
algorithm. (b) We compare the efficiency of these custom loss functions with MSE. (c) We
present an efficient custom loss function that significantly improves the prediction of asset
returns, and which we confirm to be robust.
Keywords: Time series forecasting; Stock return predictability; Investment efficiency; Machine
learning; Deep learning; Loss function
JEL: C45, C53, G11, G17, N2.
1. Introduction
The finance industry has systematically looked for ways to predict future asset returns, and
more generally to predict financial time series data. Regression and classification algorithms,
as well as reinforcement learning, have been developed to predict the effective return of assets,
either for very short periods of time or for longer horizons. But the task is undoubtedly difficult
as financial markets are volatile and noisy environments, with short-term and long-term
fluctuations and huge shifts in volatilities.
1.2. Contribution
This paper analyses how a custom loss function can be organised to render the asymmetry
of the objective function and can be implemented in deep neural networks.
The rest of the paper is structured as follows: In section two, we analyse the literature review
in terms of custom loss functions for deep learning, with models predicting asset returns as well
as models from other fields of computer science. Section three presents the framework and
various custom loss functions as alternatives to the MSE. Section four presents the
methodology for testing the various loss functions. Section five presents the results of the
analyses and demonstrates that a custom loss function significatively overperforms MSE.
Section six concludes and draws some perspectives.
1
See Appendix 1. Loss functions applied by author. For those who do not describe the custom loss function,
they are out of our scope and we simply refer to the bibliography of Dessain (2022)
2
MLP = Multi-Layer Perceptron, CNN = Convolutional Neural Network (in this case wavenet) and LSTM =
Long-Short Term Memory neural network, a form of Recurrent Neural Network (RNN).
3
GAN = Generative Adversarial Network, here with an LSTM generator and a CNN discriminator.
4
We do not use the traditional split between training set, validation set, and test set as we are just looking for
algorithms to produce series of returns.
5
This is a recurrent issue with the 190 surveyed articles in 2.1, where many papers apply a test set insufficiently
long, sometimes even equal or shorter than 2 years, thereby producing unreliable and not exploitable results.
Eventually, we tested the LinEx loss function made popular by Patton and Timmermann
(2007) : LinEx = 𝑒 ∗( )
− 𝑎 ∗ ( 𝑦 − 𝑦) − 1. We obtained irrelevant results for the
purpose of our analysis.
Graph 2. Loss level per loss function with a true return of -1.00%
6
See Appendix 2. List of stocks. The data set has been published on Mendeley Data doi:10.17632/nbwhzctrjp.2
Min-max normalisation is applied to X_train. The min and max values derived from X_train
are applied to X_test for the min-max normalisation of the test set.
4.2. Model
In order to test the various loss functions, we tested various straightforward MLP models,
with all interconnected neurons, and one single neuron as output layer for the regression. We
present herewith the results of one architecture with 8 hidden layers. Other tested
architectures, from 2 to 16 hidden layers, delivered very similar results in term of relative
efficiency of the various loss functions.
10
The activation function is the standard Rectified Linear Unit (ReLu). With 61 inputs for
US stocks and 8 hidden layers, we obtain 188.893 learnable parameters. The number of
parameters increases to 193.776 with the 80 inputs for EU stocks.
We run the model several times on several computers with various hyper-parameters to
verify:
11
7
In order to get reproducibility and replicability, we fix the seed for Python, for Numpy and for Pytorch.We also
force Pytorch to work with deterministic CUDNN. Details about hardware and software are provided in
appendix
8
We tested the model with various learning rates between 0.0001 and 0.001, with dropout values between 0.15
and 0.30, and with epochs between 100 and 300. We vary the number of hidden layers between 10 and 6.
9
We therefore compute 630 series of 1260 daily returns generated by algorithms, and 105 daily returns for the
buy & hold strategy.
12
10
D ratio follows the same principle as the Sharpe ratio, but the risk measure (CF-VaR) is more robust than the
standard deviation applied by Sharpe, as it does not assume a normal distribution of returns, an assumption that is
most of the time not verified in practice.
13
We conclude that the D ratios obtained with the various custom loss functions are
statistically different from the D ratios obtained when the algorithm runs with the MSE loss
function.
14
Table 3. Average D ratio, D-Return and D-VaR per loss function (the higher the better)
15
6.1. Conclusion
MSE is the most common loss function applied in ML and DL. While it is easy to apply,
MSE delivers sub-optimal results once compared with asymmetric custom loss functions for
algorithms predicting asset returns, as the consequence of the error in prediction does not
deliver symmetrical consequences in terms of effective realised return.
Customization of the loss function to render the asymmetric consequence of the error in
prediction is an easy way to significantly improve the results of simple algorithms aiming at
predicting results. Not only do most custom loss functions perform much better than algorithms
with MSE, but they also manage to achieve better results than a buy & hold strategy, a result
that MSE never achieved with our MLP algorithm. Depending on the risk aversion of the
investor and on the benchmark strategy for reference, various customizations can be
contemplated. Lloss function AdjLoss2 appears to be among the best performers, not only in
terms of risk-adjusted return metric (D ratio) but also in terms of equilibrium between the
effective return (D-return) and the effective risk-reduction (D-VaR). Eventually, this loss
16
6.2. Perspectives.
Loss function is a key component of any ML algorithm and a key input for computing the
gradient descent. We demonstrate the advantage of tailoring the loss function with a simple
deep learning model. Three possible next steps could be implemented, that would generalise
our results: (i) testing the algorithm with other types of assets (bonds, ETFs, commodities,
crypto-currencies, …), (ii) testing the superiority of custom loss functions with more complex
algorithms (LSTM, CNN and ResNet) for performing the same task of predicting asset returns.
Eventually, (iii) generalising the principle of custom loss function could possibly be efficiently
applied, mutatis mutandis, to some Reinforcement Learning (RL) algorithms (like an on-policy
actor-critic PPO model).
Funding
This research did not receive any specific grant from funding agencies in the public,
commercial, or not-for-profit sectors.
Warning
This research is for scientific purposes only and is not intended to support any model of
investment or trading.
References
Abe, M., Nakayama, H., 2018. Deep learning for forecasting stock returns in the cross-
section. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect.
Notes Bioinformatics) 10937 LNAI, 273–284. https://doi.org/10.1007/978-3-319-93034-
3_22
Abroyan, N., 2017. Neural Networks for Financial Market Risk Classification. Front. Signal
Process. 1, 62–66. https://doi.org/10.22606/fsp.2017.12002
Ahmed, S., Hassan, S.-U., Aljohani, N.R., Nawaz, R., 2020. FLF-LSTM: A novel prediction
17
18
19
20
21
11
Huber loss function as a mix of MSE and MAE
22
23
Software :
The two computers run similar software with, sometimes, marginal differences in releases.
SOFTWARE PC1 PC2
OS Windows 10 Pro Windows 10 Pro
Anaconda 1.9.12 1.9.12
Spyder 4.1.5 4.1.5
Python 3.8 3.7
24
In order to secure reproducibility to the larger extent possible, we applied several strategies :
- Seed defined for Python, Numpy, TensorFlow and/or Pytorch (both CPU and GPU)
- Deterministic backend forced for CUDNN
- debug environment variable CUBLAS_WORKSPACE_CONFIG defined to
":4096:8"
25