Transformers in Finance
Transformers in Finance
Abstract
In traditional quantitative trading practice, navigating the complicated and
dynamic financial market presents a persistent challenge. Fully capturing vari-
ous market variables, including long-term information, as well as essential signals
that may lead to profit remains a difficult task for learning algorithms. In order
to tackle this challenge, this paper introduces quantformer, an enhanced neural
network architecture based on transformers, to build investment factors. By trans-
fer learning from sentiment analysis, quantformer not only exploits its original
inherent advantages in capturing long-range dependencies and modeling complex
data relationships, but is also able to solve tasks with numerical inputs and accu-
rately forecast future returns over a given period. This work collects more than
5,000,000 rolling data of 4,601 stocks in the Chinese capital market from 2010 to
2019. The results of this study demonstrated the model’s superior performance in
predicting stock trends compared with other 100 factor-based quantitative strate-
gies. Notably, the model’s innovative use of transformer-liked model to establish
factors, in conjunction with market sentiment information, has been shown to
enhance the accuracy of trading signals significantly, thereby offering promising
implications for the future of quantitative trading strategies. The implementation
details and code is available on https://github.com/QuantFormer .
1
1 Introduction
The goal of stock trading is to optimize the return on investment in the capital market
according to the process of buying or selling one or more companies’ shares. Traders
obtain profit when a positive difference is generated by the fluctuation of stock price.
However, stocks are influenced by a large number of factors, which constitute a complex
system and make it difficult for people to make a profit. The assessment of a stock’s
evolving trend is inherently challenging due to the highly volatile and interconnected
nature of the market, which sets it apart from typical time series modeling [1]. As a
result, many strategies and tools have been built, in parallel with the development
of capital markets, and quantitative strategies have been playing an important role
among them.
Some traditional quantitative tools, such as the Markowitz portfolio theory [2] and
the Capital Asset Pricing Model (CAPM) [3], focus mainly on static fundamental
analysis. In other words, these strategies aim to make a profit by simple calculation
and analysis. Since then, along with the development of computer science, more quan-
titative methods and tools have been introduced. Within these methods, factor-based
strategies have attracted much attention. In 1993, [4] introduced their Fama-French
Three Factor Model (FF3), which has become an influential model in quantitative
trading. In 2015, Fama and French revised their model with a Five-Factor Asset Pric-
ing Model (FF5) [5]. Besides this classical theory, numerous trading strategies have
been published for decades. [6] built a optimal portfolio to catch future investment
opportunities (FIO) by multi-factor models.
Quantitative trading with factors typically follows two primary approaches, which
are shown in Figure 1. The first approach involves the computation of stock factor
values. Based on these calculated values, stocks are ranked to establish a pool. Once
this pool is established, assets are held for a predetermined period. Adjustments to
the portfolio are then made at specific time intervals, ensuring alignment with evolv-
ing market conditions and factor readings. The second method employs a fixed pool
of stocks, wherein factors guide the derivation of long/short signals. Traders can exe-
cute corresponding actions when they receive the signals from factors, allowing for a
dynamic response to market fluctuations based on factor insights.
2
In recent years, Machine Learning (ML) has become an instrumental tool for
enhancing trading algorithms and decision-making processes. Its core principle,
enabling systems to learn from and make decisions based on data, lends itself partic-
ularly well to the vast and dynamic landscapes of stock markets. [7] exploits machine
learning algorithms based on manual indicators, but the underlying random walk
hypothesis may hamper the task of understanding inherently non-stationary series.
With the rise of different architectures, [8] finds some novel approaches to catch the
sentiment of the market in trading data, which opened a new field in quantitative
trading.
Although there exist several previous experiments with ML factors that attempted
to fetch market sentiment in the quantitative finance field, this research still faces
two difficulties. Firstly, in the field of sentiment analysis, which is a branch of Natu-
ral Language Processing (NLP, a field of computer science that aims to understand,
interpret, and generate human language), models are used to convert words in text to
word vectors through word embeddings to serve as inputs. However, financial datasets
contain categorical data instead of words such as industry types, as well as quantita-
tive data such as price fluctuation, turnover rate, and financial indicators. If the input
comprises only categorical data, the time series can be treated as a sentence [9]. In
most cases, the input will involve numerical data, which cannot be transformed via
word embeddings.
Secondly, most NLP tasks can be transformed into sequence-to-sequence (seq2seq)
problems, such as in machine translation, dialogue systems, and speech recognition.
As an example, the transformer architecture is based on the seq2seq architecture [10].
To utilize existing outputs, decoders in transformer sequentially output samples and
use masking operations to handle input sequences during training. However, in stock
prediction, where the aim is often to accurately forecast future returns over a period,
the transformer model is rarely used for such tasks.
To address these problems, we propose quantformer, which is a modified trans-
former architecture adapted to quantitative data, and used as an investment factor.
Quantformer is able to input numerical data directly, which refers to a method similar
to sentiment analysis. The paper is structured as follows. Section 2 discussed the pre-
vious quantitative financial works mainly based on machine learning. Sections 3 and
4 introduce quantformers. A factor based on quantformer will be trained and back-
tested. For the practical backtest, we collected data from more than 4,600 stocks in the
past 14 years (from 2010 to 2023) from the data-collect platform. To comprehensively
test the ability of the factor, we divided the data by different frequencies (Section
4.2) and trained under different training scales (Section 5.3). Finally, the result of the
back-tests including the comparison between the quantformer-factor and other 100
factors as well as the insights gained from such comparative analysis will be discussed
in Sections 4 and 5.
3
Figure 2: Example of stock prediction
2 Preliminaries
This section briefly introduces the related works about stock prediction with market
sentiment and the development of quantitative financial trading methods, especially
based on Machine Learning (ML) methods.
4
been methods that have been increasingly applied to stock data due to their ability to
process vast amounts of data and make predictions based on them, offering a potential
advantage in the financial markets. At the same time, the finance sector, particularly
quantitative trading, has started to be aware of the potential of deep learning models
for predicting stock movements, portfolio optimization, and risks.
5
models, accentuating the adaptability and robustness of GRUs. [28] centralized on
LSTM networks, provided comparative insights into GRUs. However, in their analysis,
LSTMs showed a slight edge in prediction accuracy, indicating that while GRUs are
powerful, selecting between them and LSTMs may boil down to specific use cases and
computational constraints. Beyond that, [29] presented a model fusing GRUs with
Convolutional Neural Networks (CNN) [30]. The strategy to combine CNNs and GRUs
also exhibits advantages for stock price prediction. [31] introduced the Attention-
GRU model which combined attention and GRU to establish a new factor based on
CVaR [32, 33] portfolio. The Attention-GRU model fitted market return, stocks return
from 28 Dow Jones Industrial Average index (DJIA) stocks, which achieved better
performance than other models in 8 metrics such as annual return, standard deviation
and information ratio.
6
LSTM. The model was trained by stocks from S&P 500 from 2004-2021 and performs
better than other models such as VaR and ARIMA.
Similar to previous work, [1] innovated the structure of a transformer, in the form
of Adaptive Long-Short Pattern Transformer (ALSP-TF). Their model is structurally
innovated for hierarchical representation and interaction of stock price series at differ-
ent context scales. With the help of a learnable function, they make the self-attention
aware of the weighted time intervals between patterns, to adaptively adjust their
dependencies beyond similarity matching. In the end, they obtained more than 10%
of annual return on average.
However, transformer has some disadvantages. [45] mentioned that the global self-
attention module focuses on point-wise token similarities without contextual insights.
As fluctuations of stocks are conditioned on composite signals over manifold periods,
lacking pattern-wise interaction hinders the adequate discrimination of stock tendency
and is susceptible to noise points. On the other hand, [46] claimed that the basic
query-key matching paradigm is position agnostic. Although position embedding is
inserted into the sequential inputs, it may not be optimal because of the inability to
reveal precise distances.
3 Methodology
This section introduces the framework of the work and the steps of the establishment
of the model, including the data processing, quantformer construction and prediction.
χtk = [xt−20
k , xt−19
k . . . xtk ] ∈ R20×2
Each xtk contains two features. The first one is the accumulated daily profit rate
th
during the time step, where pi and pi−1 are the close price of the ith and (i − 1) day,
respectively.
X X pi − pi−1 X
rt = ri = , σt = σi (1)
pi−1
7
Figure 3: Overview of the work
The other feature is the accumulated daily turnover rate σ t during the time step
and the calculation method of σ t is shown above. So each χtk can be represented as:
rt−20 rt−19 . . . rt
χtk = t−20 t−19 ∈ R20×2
σ σ . . . σt
Outputs
For a stock set S t on the trading timestamp t, each stock sti has a profit rate rit+1
on the trading timestamp t + 1, where the calculation method is the same as for rt in
the sequence of inputs shown in equation (1). Then the value in the list of next-time
stamp’s profit for N stocks in the timestamps t + 1 can be represented as:
This list is then sorted and partitioned into q equal parts, where it is set to 3 or
more. For example, set q = 5; for the stocks sk whose profit values ranked in the
bottom 20%, middle 20%, and top 20% respectively. And the dimension of the outputs
df is the number of selected quantiles in training. The value of df should be larger
than 3 and not larger than q.
8
Both the dimensions of training and predicted outputs are the same, which is
the df set at the beginning. For example, set q = 5; for the stocks sk whose profit
values ranked in the bottom 20%, middle 20%, and top 20% respectively, and the
corresponding output of df = 3 is recorded, respectively, as
ykt = [1, 0, 0]T , ykt = [0, 1, 0]T , ykt = [0, 0, 1]T , ykt = [0, 0, 1]T
The dimension of the output also can be 5, and in this case, the outputs of df = 5 are:
In this example, when df = 3, the output dimension will be 3 while the dimension
will be 5 as df is set as 5.
xtk,i − E[xti ]
x̃tk,i = (2)
std[xti ]
In this way, the input sequences are normalized with zero mean and unit vari-
ance, which aims to reduce the influence from outlying time points and allow different
features to be comparable with each other [47].
Qi,h = Pi WQ
h, Ki,h = Pi WK
h , Vi,h = Pi WVh (3)
9
where h = 1, . . . , H is the head index, as well as trainable weights WQ Q
h , Wh , Wh
Q
∈ R2×df . Then, the final layer is represented by the concatenation of all attention
heads.
O
Fi = Multihead(Qi,h , Ki,h , Vi,h ) = ||h=H
h=1 Attention(Qi,h , Ki,h , Vi,h ) W (4)
where || represents the concatenation operator and df is the dimension of the projected
feature space. After the attention computation is completed, the multi-head attention
output is divided into different ”heads”. These outputs from the heads need to be
recombined back into the original input dimension. WO ∈ RHdf ×2 in equation 5
plays the role of combining these separate heads back into the original dimension. The
attention head is shown below:
In the end, all input sequences are represented in the form of:
F = [F1 ; F2 ; . . . FN ] ∈ RN ×20×Hdf
which are fed into feed-forward layers. From Section 3.2, the embedding output is
denoted as:
Yt = {y1t , y2t , . . . yN
t
} ∈ RN ×1×df
Zt = {z1t , z2t , . . . zN
t
} ∈ RN ×1×df (6)
The output layer uses the softmax activation function, and for each zit can be written
as:
zit = [zi,1t t
, . . . zi,m t
, . . . zi,d f
] ∈ R1×df
(m = 1, 2, . . . , n is the index number between 1 and n). The output probability can
be represented as:
ez1 ezm ez n
t t t t T
Yo = [yo,1 , . . . yo,m , . . . yo,df ] = Pn z , . . . Pn z , . . . P zdf (7)
i=1 e i=1 e i=1 e
i i
t
In equation (7), the yo,i take values between 0 and 1 and sum to 1, which means
that they can be interpreted as the probability of the stock’s performance in the slice
m in the next timestamp
3.6 Prediction
The Mean Squared Error Loss (MSELoss) will be used to quantify the loss. MSE Loss
is a widely used metric for assessing the discrepancy between a model’s predictions
10
and the actual values. It is defined as the average of the squares of the differences
between the predicted value and the actual prices:
N
1 X
MSELoss = (yi − ŷi )2 (8)
N i=1
In equation (8), yi is the actual value of the ith observation, ŷi is the corresponding
model prediction, and N is the total number of input samples. During each epoch, the
MSE Loss is computed to guide the optimization process, to minimize this loss over
successive iterations. A lower MSE indicates a closer alignment between the model’s
predictions and the actual stock prices, signifying an improvement in the model’s
learning and its ability to generalize from the training data.
4 Experiments
Based on the methodology in Section 3, in this section, the set of experiments is
introduced, including the data resource, implementation details, trading strategy and
metrics.
4.1 Dataset
The training data of the quantformer comes from the Chinese exchange market. The
dataset is collected from the Shanghai Stock Exchange (SHSE) and Shenzhen Stock
Exchange (SZSE), which contains 4601 stocks that are listed or had been listed from
January 2010 to May 2023. The data comes from AKShare1 and Tushare2 , which are
quantitative finance terminals. The training period is from January 2010 to December
2019 and the testing period starts from January 2020.
Closing price adjustments are used for the stock training price [48]. Closing price
adjustments are essential in stock market analysis, particularly when analyzing his-
torical data for long-term trends and patterns. This price takes into account factors
such as dividends, stock splits, and other corporate actions that can affect a stock’s
price over time [49]. By adjusting for these events, the adjusted closing price provides
a more accurate picture of a stock’s value and performance, which is more appropriate
for financial backtesting.
4.2 Timestamp
Data frequency concerns the number of data points within a specific unit of time,
which may reflect different market characteristics [50]. To better test the performance
of the factor, three frequencies are considered: monthly, weekly, and daily stock data.
In the first experiments, a timestamp was set as one month. So for a stock si
from the stock set S t , it details a feature sequence in which each item in the sequence
contains the accumulated profit and accumulated turnover rate in a month. Both the
accumulated profit and accumulated turnover rate are the sum of the trading day’s
1
https://akshare.akfamily.xyz/
2
https://www.tushare.pro/
11
Table 1: Detailed information of experiments
Strategy Frequency Output Dim Training samples Section zero-output
Month 1 Monthly 3 85,490 100 w/o
Month 2 Monthly 3 142,409 100 w/
Month 3 Monthly 5 142,409 100 w/
Week 1 Weekly 3 455,157 466 w/o
Week 2 Weekly 3 758,300 466 w/
Week 3 Weekly 5 758,300 466 w/
Day 1 Daily 3 3,586,435 2,420 w/o
Day 2 Daily 3 5,140,279 2,420 w/
Day 3 Daily 5 5,140,279 2,420 w/
data in that month. In some cases, if one stock was first listed on the market or resumed
trading in one month, the available data may not cover whole trading days, the data
of the stock from that from would still be recorded. However, if there is a whole month
during which the stock did not have any trading (perhaps due to stopped trading or
not being listed in that month), this month’s data will be recorded as “NaN”. In the
sequence χtk , the sequence will not be used to train the model when there are missing
values in the sequence.
Within the first experiments, three sub-experiments were set. The sub-experiments
share the same inputs. For output data, outputs with different dimensions and different
lengths but the same dimension are used. n = 5 and n = 7 are used in this experiments.
In the group of n = 5, the first one deleted the data with output such as [0, 0, 0]T , and
the second one used the origin dataset (zero-output).
Outliers of both accumulated profit and accumulated turnover rate will not be
removed from the dataset these situations may happen in the future and they are
expected to be predicted, though these incidents rarely happen.
In the second and third experiments, the inputs are in weekly and daily frequency,
respectively. The sub-experiments under different groups are similar to the first one.
All the accumulated number of parameters, the number of trained sections (such as
100 months from 2010 to 2019), and dimension are shown in Table 1.
12
Algorithm 1 Trading Strategy
Require: The sequence xti,k for each stock si in the set S t and the amount of cash.
1: run monthly(trade, monthday=1, time=‘open’)
t
2: function obtain stockpool(xi,k ):
t
3: stockpool = sort(model(xk ), key=lambda x : x[1], reverse=True)
4: return stockpool
5: end function
6: function trade
7: stockpool = obtain stockpool(xti,k )
8: for stock in previous stockpool do
9: if stock not in stockpool then
10: order(stock, 0) ▷ sellout
11: end if
12: end for
13: for stock in stockpool do
14: if stock not in previous stockpool then
15: order(stock, amount / len(stockpool)) ▷ buy
16: end if
17: end for
18: end function
stock set S t will be put in the model and obtain the list of outputs. Then, the stocks
will be ranked according to the first element of the output and the first 1q % stocks
will be added to the stock pool. If the stock already was in the stock pool on the last
timestamp, it will be held; if the stock is in the predicted pool but not in the previous
pool, it will be bought in with the same proportion of the whole account. Stocks that
are not in the predicted pool will be sold out. The same method is run repeatedly
during the subsequent periods. The backtest starts from January 2020, in other words,
the result of the sequences from May 2018 to December 2019 will be used as the first
stock pool to trade.
4.5 Metrics
Formally, the Sharpe Ratio (SR) and the α ratio will be used to test the performance
of the strategy. The SR [3] is a measure of risk-adjusted return that describes the
additional earnings an investor receives for each standard deviation unit increase,
which is shown in equation (9), where Rp is the return of the portfolio and Rf is the
risk-free rate is the rate of London Interbank Offered Rate (LIBOR) which calculated
averaged from estimates submitted by banks in London. The risk-free rate in the
backtest will be calculated under the average LIBOR rates during the test period.
E[Rp ] − Rf
SR = (9)
std[Rp ]
Alpha represents the excess return of an investment portfolio relative to its bench-
mark. It reflects the stock selection skills of an investment. A positive alpha indicates
13
that the investment portfolio has achieved a higher return than its benchmark after
risk adjustment. Specifically, alpha is the excess return of the actual portfolio return
over its expected theoretical return. Equation (10) shows the calculation method of
the alpha rate, where E(Rm ) is the market return and β is the correlation between
the portfolio and systemic risk.
Besides the SR ratio and alpha rate, the annual return (AR) of the stock, the annual
excess return (AER), the average turnover rate (TR) of the portfolio, and the win
rate (WR) will also shown. The excess returns are returns achieved above and beyond
the return of a proxy and the CSI 300 index is used as the basis when calculating the
excess returns. The turnover rate is the average percentage of the portfolio adjusted
daily in terms of market. The win rate is the percentage of trade days where the
portfolio encounters a positive return overall trade days.
V aR = µ + σN −1 (X) (11)
Value at risk (VaR) is a method to summarize the total risk in a portfolio [54].
Equation (11) shows the calculation of VaR, where µ is the mean and σ is the stan-
dard deviation of the portfolio, X is the confidence level and N −1 (X) is the inverse
cumulative normal distribution. In of the measurement of the portfolio, 99% VaR is
used to estimate the maximum loss during the period under 99% of confidence.
14
Table 2: Result of experiments
Strategy AR AER TR WR SR Alpha VaR
Month 1 17.35% 19.43% 26.09% 57.8% 0.915 0.162 2.81
Month 2 9.91% 13.86% 51.69% 49.3% 0.289 0.102 3.61
Month 3 7.37% 9.91% 32.33% 51.6% 0.246 0.064 2.3
Week 1 -0.83% 1.31% 7.13% 46.4% -0.236 -0.030 3.05
Week 2 7.49% 10.81% 1.18% 49.2% 0.160 0.085 3.73
Week 3 12.3% 12.73% 1.39% 54.4% 0.372 0.116 3.77
Day 1 7.89% 11.4% 6.71% 43.1% 0.181 0.090 3.92
Day 2 10.23% 10.94% 6.51% 44.4% 0.279 0.097 3.91
Day 3 9.81% 10.03% 5.57% 44.6% 0.281 0.092 4.02
CSI300 1.77% \ \ \ -0.015 \ 3.19%
In the weekly strategy results, Week 3 showed a remarkable annual return of 12.3%
and an annual excess return of 12.73%, while Week 1 displayed a negative annual
return. For the daily strategies, Day 2 had a return of 10.23% and an annual excess
return of 10.94%. The turnover rate across the daily strategies showed lesser variation
than the monthly or weekly, indicating a more consistent trading frequency. The win
rate for the daily strategies remained fairly stable, hovering around the mid-40% mark.
The Sharpe Ratio and Alpha for weekly and daily strategies show a mix of positive
and negative values, reflecting the fluctuating nature of shorter-term trading efficacy.
To demonstrate the advantages of the quantformer factor over traditional fac-
tors, 100 price-volume type factors from JoinQuant 3 (one of China’s largest financial
quantitative platforms) were selected for backtesting under the same trading strategy
(Algorithm 1). Price-volume factors were used because these are calculated based on
stocks’ prices, volumes, and turnover rates, which are similar to the training data used
by quantformer factors (also used stock turnover rates and price data). The detailed
performance of factors and benchmark is shown in Appendix A. On average, these
factors achieved an annual return of -3.78%, an average excess return of -2.15%, and a
Sharpe ratio of -0.36, where the benchmark return is -1.77%. The average maximum
drawdown of the strategy was 44.88%, which means the potential worst-case scenario
or the most extreme possible loss. The average volatility of the strategy (σp ) was 0.26,
with the benchmark at 0.197.
Among these factor-based strategies, the quantformer factor performed well.
Except for the win rate, which was slightly lower than a few strategies, it excelled
in terms of annual return, annual excess return, σp , Sharpe Ratio and sortino ratio
ranking the best among the 101 factors. The 99% VaR of quantformer factor is in
the quantile of the first 10% among all factors. Figure 5 illustrates the return curves
of the quantformer (QF Month 1) alongside the benchmark and some traditional fac-
tors. “25 week rank” represents the current price’s position over the past 25 weeks;
“EMAC20” and “MAC20” are the 20-day index moving average and the stock’s 20-
day moving average respectively; “DAVOL20” and “VOL20” represent the ratio of the
20-day average turnover rate to the 120-day average turnover rate, and the mean of
the stock’s 20-day turnover rate; “ROC20” is the price rate of change over 20 days;
3
https://www.joinquant.com/
15
Figure 5: Backtest results based on different factors
“MAWVAD” is calculated as the product of the difference between the closing price
and the opening price, divided by the range of the highest and lowest prices, all mul-
tiplied by the volume, accumulated over six days; “Variance20” is the variance of the
stock’s 20-day annualized returns. Most of these factors are compared based on a win-
dow of 20 timestamps, as the quantformer factor was also trained on 20 timestamps.
In comparison with these factors, the quantformer factor (blue line) demonstrated a
significantly better performance in terms of returns than the benchmark (orange line)
and other factor strategies, expressing the improvement of the quantformer factor over
traditional price-volume factors.
16
Lastly, Table 3 shows the result of the backtest. From March 2020 to April 2023,
the strategy’s total return decreased to 56.41%, with an annual return of 15.63%
and an excess return of 52.96%. The Sharpe Ratio also declined to 0.739, but it still
represents a relatively efficient investment compared to the volatility of returns. The
alpha reduced to 0.135, which, while lower than the other periods, still indicates that
the strategy managed to provide returns above those predicted by its risk profile.
Comparing the results between monthly, weekly, and daily strategies provides fur-
ther insights. The monthly strategies can deliver the highest returns, suggesting that
longer-term signals better capture short-term trends. The weekly strategies underper-
form monthly ones but outperform the daily frequency overall. This indicates weekly
data strikes a balance between filtering noise and responding swiftly to emerging
patterns.
The daily frequency strategies perform lower annual returns and higher risk com-
pared to others This highlights the challenge of making profitable trades at higher
frequencies - noise and volatility make reliable signals more difficult. The distribution
of returns also shows increased dispersion at the daily level. However, daily strategies
achieve more consistent turnover rates around 5-7%. This allows for portfolio adjust-
ments while avoiding excessive trading costs. In contrast, monthly strategies see a
turnover rate above 50% in some cases. There is likely an optimal point between 1
week to 1 month where matches market regularly.
17
Table 4: Result of factors under different scales
Strategy AR AER σp WR SR Alpha VaR
QF 20% 17.35% 19.43% 0.162 57.8% 0.915 0.162 2.81%
QF 10% 13.12% 17.73 0.159 61.8% 0.574 0.128 2.023%
QF 5% 12.59% 16.02% 0.136 61% 0.63 0.117 2.015%
QF 1% 24.71% 35.74% 0.214 53.3% 0.967 0.249 3.048%
CSI300 1.77% \ 0.197 \ -0.015 \ 3.19%
6 Conclusion
This study proposed a new model, quantformer for quantitative stock prediction and
trading. We addressed the need for handling numerical input data rather than text and
adapting the model for forecasting tasks rather than sequence-to-sequence problems
common in NLP. To enable direct processing of numerical time series data, we replaced
the word embedding layer with a standard linear layer and removed the output mask-
ing operations. We also simplified the decoder to produce a probability distribution
over future price movements rather than autoregressively generating token sequences.
Eventually, we tested the factor in the reality back test of the financial market and
compared the factor with other 100 price-volume factors. Our experimental results
demonstrate the promise of this approach. The model-based trading strategies were
able to deliver substantial excess returns over the benchmark under a longer time
period, with Sharpe Ratios indicating sound risk-adjusted performance. The positive
alphas affirm the strategy’s ability to outperform expected returns after accounting
for risk factors.
There is further scope to enhance the framework, for instance by incorporating
additional signals like news and fundamentals as auxiliary inputs. The self-attention
mechanism could likely be improved to encode greater temporal context, or use
transformer-based models such as GPT-4 [56] and Claude3 [57] to establish the strat-
egy as these models are fine-tuned and may perform better for quantitative finance
tasks.
Overall, our work illustrates the viability of quantformer for financial data mod-
eling, provides a flexible framework to model the market from multiple perspectives
of financial time series and develops profitable trading strategies that consider differ-
ent variables. With further research to solve the above limitations, we believe such
ML-based quantitative methods hold rich potential. The framework provides a flex-
ible foundation for better understanding markets and developing profitable trading
strategies.
The implementation code of quantformer is available at https://github.com/
zhangmordred/QuantFormer.
18
UIC research grant (R0400001-22; UICR0400008-21; R72021114); Guangdong College
Enhancement and Innovation Program (2021ZDZX1046).
The authors have no competing interests to declare that are relevant to the content
of this article.
References
[1] Wang, H., Wang, T., Li, S., Zheng, J., Guan, S., Chen, W.: Adaptive long-short
pattern transformer for stock investment selection. In: Proceedings of the Thirty-
First International Joint Conference on Artificial Intelligence, pp. 3970–3977
(2022)
[2] Markowitz, H.: Portfolio selection. Journal of Finance 7(1), 77–91 (1952)
[3] Sharpe, W.: Capital asset prices: a theory of market equilibrium under conditions
of risk. Journal of Finance 19, 425–442 (1964)
[4] Fama, E.F., French, K.R.: Common risk factors in the returns on stocks and
bonds. Journal of Financial Economics 33(1), 3–56 (1993)
[5] Fama, E.F., French, K.R.: A five-factor asset pricing model. Journal of Financial
Economics 116(1), 1–22 (2015)
[6] Shi, Y., Kong, L., Yang, L., Li, D., Cui, X.: Dynamic mean-variance portfolio
selection under factor models. Journal of Economic Dynamics and Control 167,
104923 (2024)
[7] Nayak, R.K., Mishra, D., Rath, A.K.: A naı̈ve SVM-KNN based stock market
trend reversal analysis for Indian benchmark indices. Applied Soft Computing
35, 670–680 (2015)
[8] Feng, F., He, X., Wang, X., Luo, C., Liu, Y., Chua, T.-S.: Temporal relational
ranking for stock prediction. ACM Transactions on Information Systems (TOIS)
37(2), 1–30 (2019)
[9] Gorishniy, Y., Rubachev, I., Babenko, A.: On embeddings for numerical features
in tabular deep learning. Advances in Neural Information Processing Systems 35,
24991–25004 (2022)
[10] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N.,
Kaiser, L., Polosukhin, I.: Attention is all you need. In: Proceedings of the 31st
International Conference on Neural Information Processing Systems. NIPS’17,
pp. 6000–6010 (2017)
[11] Asness, C.S.: The power of past stock returns to explain future stock returns.
SSRN:2865769 (1995)
19
[12] Chen, Y., Zhao, H., Li, Z., Lu, J.: A dynamic analysis of the relationship between
investor sentiment and stock market realized volatility: evidence from China.
PlOS One 15(12), 0243080 (2020)
[13] PH, H., Rishad, A.: An empirical examination of investor sentiment and stock
market volatility: evidence from India. Financial Innovation 6(1), 1–15 (2020)
[14] Naseem, S., Mohsin, M., Hui, W., Liyan, G., Penglai, K.: The investor psychology
and stock market behavior during the initial era of COVID-19: a study of China,
Japan, and the United States. Frontiers in Psychology 12, 626934 (2021)
[15] Ding, W., Mazouz, K., Ap Gwilym, O., Wang, Q.: Technical analysis as a sen-
timent barometer and the cross-section of stock returns. Quantitative Finance
23(11), 1617–1636 (2023)
[16] Kim, K.-j.: Financial time series forecasting using support vector machines.
Neurocomputing 55(1-2), 307–319 (2003)
[17] Huang, W., Nakamori, Y., Wang, S.-Y.: Forecasting stock market movement
direction with support vector machine. Computers and Operations Research
32(10), 2513–2522 (2005)
[18] Cavalcante, R.C., Brasileiro, R.C., Souza, V.L., Nobrega, J.P., Oliveira, A.L.:
Computational intelligence and financial markets: a survey and future directions.
Expert Systems with Applications 55, 194–211 (2016)
[19] Huck, N.: Pairs selection and outranking: an application to the S&P 100 index.
European Journal of Operational Research 196(2), 819–825 (2009)
[20] Kercheval, A.N., Zhang, Y.: Modelling high-frequency limit order book dynamics
with support vector machines. Quantitative Finance 15(8), 1315–1329 (2015)
[21] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation
9(8), 1735–1780 (1997)
[22] Orsel, O.E., Yamada, S.S.: Comparative study of machine learning models for
stock price prediction. arXiv:2202.03156 (2022)
[23] Bao, W., Yue, J., Rao, Y.: A deep learning framework for financial time series
using stacked autoencoders and long-short term memory. PlOS One 12(7),
0180944 (2017)
[24] Fischer, T., Krauss, C.: Deep learning with long short-term memory networks for
financial market predictions. European Journal of Operational Research 270(2),
654–669 (2018)
[25] Sezer, O.B., Ozbayoglu, A.M.: Algorithmic financial trading with deep convolu-
tional neural networks: time series to image conversion approach. Applied Soft
20
Computing 70, 525–538 (2018)
[26] Zhang, L., Aggarwal, C., Qi, G.-J.: Stock price prediction via discovering multi-
frequency trading patterns. In: Proceedings of the 23rd ACM SIGKDD Inter-
national Conference on Knowledge Discovery and Data Mining, pp. 2141–2149
(2017)
[27] Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recur-
rent neural networks on sequence modeling. In: NIPS 2014 Workshop on Deep
Learning (2014)
[28] Fischer, T., Krauss, C.: Deep learning with long short-term memory networks for
financial market predictions. European Journal of Operational Research 270(2),
654–669 (2018)
[29] Pak, U., Kim, C., Ryu, U., Sok, K., Pak, S.: A hybrid model based on convo-
lutional neural networks and long short-term memory for ozone concentration
prediction. Air Quality, Atmosphere and Health 11, 883–895 (2018)
[31] Sun, C., Wu, Q., Yan, X.: Dynamic cvar portfolio construction with attention-
powered generative factor learning. Journal of Economic Dynamics and Control
160, 104821 (2024)
[32] Zhu, S., Fukushima, M.: Worst-case conditional value-at-risk with application to
robust portfolio management. Operations research 57(5), 1155–1168 (2009)
[33] Ban, G.-Y., El Karoui, N., Lim, A.E.: Machine learning and portfolio optimiza-
tion. Management Science 64(3), 1136–1154 (2018)
[34] Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.:
Language models are unsupervised multitask learners. https://cdn.openai.com/
better-language-models/language models are unsupervised multitask learners.
pdf (2019)
[35] Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep
bidirectional transformers for language understanding. arXiv:1810.04805 (2018)
[36] Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Nee-
lakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A.,
Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Win-
ter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark,
J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language
models are few-shot learners. In: Proceedings of the 34th International Conference
on Neural Information Processing Systems. NIPS’20 (2020)
21
[37] Araci, D.: FinBERT: financial sentiment analysis with pre-trained language
models. arXiv:1908.10063 (2019)
[38] Yang, Y., Uy, M., Huang, A.: FinBERT: a pretrained language model for financial
communications. arXiv:2006.08097 (2020)
[39] Wu, S., Irsoy, O., Lu, S., Dabravolski, V., Dredze, M., Gehrmann, S., Kambadur,
P., Rosenberg, D., Mann, G.: BloombergGPT: a large language model for finance.
arXiv:2303.17564 (2023)
[40] Wang, C., Chen, Y., Zhang, S., Zhang, Q.: Stock market index prediction using
deep Transformer model. Expert Systems with Applications 208, 118128 (2022)
[41] Ding, Q., Wu, S., Sun, H., Guo, J., Guo, J.: Hierarchical multi-scale Gaussian
transformer for stock movement prediction. In: IJCAI, pp. 4640–4646 (2020)
[42] Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., Zhang, W.: Informer:
beyond efficient transformer for long sequence time-series forecasting. In: Proceed-
ings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 11106–11115
(2021)
[43] Zeng, Z., Kaur, R., Siddagangappa, S., Rahimi, S., Balch, T., Veloso, M.:
Financial time series forecasting using CNN and transformer. arXiv:2304.04912
(2023)
[44] Kim, S., Yun, S.-B., Bae, H.-O., Lee, M., Hong, Y.: Physics-informed convolu-
tional transformer for predicting volatility surface. Quantitative Finance 24(2),
203–220 (2024)
[45] Xu, K., Zhang, Y., Ye, D., Zhao, P., Tan, M.: Relation-aware transformer for
portfolio policy learning. In: Proceedings of the Twenty-Ninth International
Conference on International Joint Conferences on Artificial Intelligence, pp.
4647–4653 (2021)
[46] Wu, C., Wu, F., Huang, Y.: DA-transformer: distance-aware transformer.
arXiv:2010.06925 (2020)
[47] Klambauer, G., Unterthiner, T., Mayr, A., Hochreiter, S.: Self-normalizing neural
networks. Advances in Neural Information Processing Systems 30 (2017)
[48] Diamond, P.A.: A model of price adjustment. Journal of Economic Theory 3(2),
156–168 (1971)
[49] Wei, J., Xu, Q., He, C.: Deep learning of predicting closing price through historical
adjustment closing price. Procedia Computer Science 202, 379–384 (2022)
[50] De Prado, M.L.: Advances in Financial Machine Learning. John Wiley & Sons,
New York (2018)
22
[51] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen,
T., Lin, Z., Gimelshein, N., Antiga, L., et al.: PyTorch: an imperative style, high-
performance deep learning library. Advances in Neural Information Processing
Systems 32 (2019)
[52] Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization.
Journal of Machine Learning Research 13(2) (2012)
[53] Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio,
Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations,
ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings
(2015)
[54] Hull, J.: Risk Management and Financial Institutions vol. 733. John Wiley &
Sons, Hoboken (2012)
[55] Olorunnimbe, K., Viktor, H.: Deep learning in the stock market—a systematic
survey of practice, backtesting, and applications. Artificial Intelligence Review
56(3), 2057–2109 (2023)
[57] Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bash-
lykov, N., Batra, S., Bhargava, P., Bhosale, S., et al.: Llama 2: open foundation
and fine-tuned chat models. arXiv:2307.09288 (2023)
[58] Bacon, C.R.: Practical Portfolio Performance Measurement and Attribution. John
Wiley & Sons, Chennai (2023)
23
Appendix A Detailed backtest results between 100
factors and the quantformer factor
All the data of factors comes from JoinQuant4 , which is one of the most common quan-
titative finance platforms. Detailed descriptions of the following factors are available
at JoinQuant Factors.
Ten metrics are used to evaluate the performance of each factor:
• Annual Return (AR) and Annual Excess Return (AER) are two metrics to reflect
the return of the portfolio directly. The excess returns are returns achieved above
and beyond the return of a proxy and the CSI300 index is used as the basis when
calculating the excess returns.
• Win Rate (WR) is the metric to evaluate the fraction between the action that
makes a profit and all the actions.
• Sharpe Ratio (SR) is a measure of risk-adjusted return that describes the additional
earnings an investor receives for each standard deviation unit increase, which is
shown in equation (9).
• The Alpha measures the ability of a portfolio to generate returns above the market
benchmark. A positive alpha indicates that the investment portfolio has achieved a
higher return than its benchmark after risk adjustment, which is shown in equation
(10).
• Beta is the correlation between the portfolio and systemic risk, reflecting the sen-
sitivity of the strategy to the change in the market. For the daily return of the
strategy, Dp , and the daily return of the benchmark Dm , the equation of beta is:
Cov(Dp , Dm )
Beta = βp =
V ar(Dm )
• Max Drawdown (MD) means the potential worst-case scenario or the most extreme
possible loss from the previous peak. For the trough value of a portfolio Vtrough and
the peak value of the portfolio Vpeak , the MDD is calculated in the following way:
Vpeak − Vtrough
M DD =
Vpeak
• Portfolio volatility (σp ) is the standard deviation of the portfolio returns. σp avoids
the problem of negative deviations canceling with positive deviations and also penal-
izes larger deviations from the mean. It provides a kind of weighted average deviation
in which large deviations carry more weight [58].
4
https://www.joinquant.com/
24
• Sortino Ratio (STN) differentiates harmful volatility from total overall volatility
by using the standard deviation of negative portfolio returns. This metric measures
the performance of the investment relative to the downward risk.
Factor AR (%) AER (%) WR (%) SR Alpha Beta MD (%) σp VaR (%) STN
ARBR -1.47 0.32 54.7 -0.251 -0.002 0.914 34.46 0.221 3.74 -0.375
AR -5.87 -4.37 53.7 -0.406 -0.048 0.981 47.84 0.260 3.94 0.612
ATR14 -2.12 -0.37 62.7 -0.250 0.006 1.160 46.55 0.250 4.51 -0.35
ATR6 -3.94 -2.31 61.2 -0.328 -0.014 1.175 50.11 0.253 4.48 -0.455
BBIC 1.17 3.14 50.9 -0.109 0.033 1.038 32.64 0.258 4.16 -0.155
BR -6.91 -5.47 53.6 -0.467 -0.064 0.934 51.72 0.254 3.80 -0.718
CCI10 -18.65 -17.99 43.3 -1.402 -0.261 0.812 72.04 0.220 2.79 -2.026
CCI15 -17.70 -16.98 44.2 -1.293 -0.240 0.832 71.43 0.223 2.90 -1.866
CCI20 -15.52 -14.66 46.3 -1.051 -0.196 0.868 69.12 0.235 3.28 -1.519
CR20 -4.27 -2.67 54.8 -0.329 -0.032 0.927 49.92 0.263 4.26 -0.511
DAVOL10 -7.44 -6.04 54.1 -0.504 -0.071 0.920 45.45 0.248 3.38 -0.812
DAVOL20 -10.95 -9.78 54.4 -0.702 -0.119 0.919 48.96 0.246 2.86 -1.079
DAVOL5 -9.54 -8.28 51.8 -0.612 -0.097 0.950 46.41 0.250 3.17 -1.019
EMA5 2.77 4.85 48.3 -0.047 0.049 1.041 30.70 0.259 4.66 -0.068
EMAC10 0.27 2.17 50.6 -0.144 0.024 1.037 31.42 0.258 4.19 -0.203
EMAC12 1.58 3.57 50.6 -0.093 0.037 1.032 31.76 0.258 4.17 -0.131
EMAC20 6.01 8.29 54.1 0.069 0.079 1.044 32.43 0.260 4.32 0.102
EMAC26 3.83 5.97 54.1 -0.008 0.059 1.046 34.87 0.261 4.29 -0.012
EMAC120 1.74 3.75 48.3 -0.088 0.034 0.960 35.81 0.254 3.77 -0.145
Kurtosis20 -2.96 -1.27 52.5 -0.395 -0.027 0.772 27.71 0.182 2.74 -0.6
Kurtosis60 -0.28 1.59 53.9 -0.240 -0.001 0.716 22.07 0.179 2.64 -0.367
Kurtosis120 -4.12 -2.51 49.0 -0.468 -0.043 0.708 28.28 0.182 2.44 -0.684
MAC5 0.54 2.46 46.9 -0.134 0.026 1.035 30.03 0.257 4.29 -0.195
MAC10 -1.20 0.61 49.6 -0.206 0.008 1.039 34.15 0.256 4.19 -0.286
MAC20 2.47 4.52 53.5 -0.059 0.046 1.043 34.58 0.257 4.09 -0.083
MAC60 11.10 13.72 53.8 0.243 0.120 0.987 28.61 0.254 4.34 0.399
MAC120 3.11 5.21 47.7 -0.035 0.049 0.987 35.40 0.259 3.99 -0.058
MACDC -11.12 -9.97 52.4 -0.674 -0.117 1.003 46.48 0.261 3.54 -1.046
MASS -10.55 -9.35 51.0 -0.673 -0.115 0.886 48.34 0.249 6.19 -1.032
MAWVAD 7.29 9.67 55.9 0.119 0.082 0.891 36.54 0.248 4.19 0.184
MFI14 -6.21 -4.73 51.5 -0.456 -0.056 0.918 45.65 0.241 3.61 -0.699
PLRC6 -14.83 -13.92 47.8 -0.905 -0.178 0.960 66.24 0.259 3.85 -1.368
PLRC12 -7.28 -5.87 53.3 -0.452 -0.064 1.012 52.30 0.273 4.20 -0.682
PLRC24 -5.77 -4.26 58.1 -0.361 -0.044 1.034 53.28 0.289 4.88 -0.549
PSY -9.11 -7.83 50.6 -0.660 -0.096 0.880 49.79 0.223 3.36 -0.992
Price1M -13.55 -12.55 53.0 -0.760 -0.156 0.977 64.85 0.280 4.21 -1.145
Price3M -6.69 -5.25 58.4 -0.380 -0.056 1.023 59.14 0.305 5.44 -0.555
Price1Y 1.55 3.54 66.8 -0.075 0.041 1.116 60.25 0.326 7.28 -0.105
ROC6 -16.13 -15.30 47.4 -0.982 -0.200 0.978 68.14 0.262 3.56 -1.741
ROC20 -11.63 -10.51 54.7 -0.628 -0.123 1.014 60.28 0.291 4.51 -0.971
ROC60 -6.86 -5.43 59.6 -0.382 -0.056 1.062 62.16 0.309 5.28 -0.529
ROC120 2.36 4.41 66.9 -0.051 0.049 1.117 56.36 0.319 6.79 -0.071
Skewness20 -5.91 -4.41 50.8 -0.538 -0.058 0.820 40.65 0.197 3.04 0.839
Skewness60 -6.47 -5.01 51.9 -0.595 -0.068 0.765 38.02 0.190 2.58 -0.917
Skewness120 -4.37 -2.77 51.1 -0.459 0.044 0.743 36.01 0.192 2.69 -0.684
TRIX5 -10.72 -9.54 52.5 -0.607 -0.110 1.009 56.07 0.280 4.16 -0.923
TRIX10 -3.51 -1.85 58.4 -0.268 -0.019 1.008 52.92 0.292 5.14 -0.4
TVMA20 -1.92 -0.16 64.9 -0.209 0.014 1.268 44.66 0.288 4.63 -0.31
TVMA6 -0.75 1.09 62.4 -0.166 0.026 1.259 39.46 0.288 4.76 -0.246
TVSTD20 -1.83 -0.06 58.5 -0.214 0.012 1.218 35.89 0.277 4.03 -0.318
TVSTD6 -3.77 -2.13 53.4 -0.297 -0.010 1.202 46.57 0.272 4.40 -0.449
VDEA -3.05 -1.37 57.6 -0.305 -0.023 0.857 37.01 0.239 3.06 -0.484
VDIFF -7.93 -6.57 0.2 -0.533 -0.079 0.895 39.79 0.247 3.08 -0.842
VEMA5 -0.41 1.46 57.9 -0.186 0.006 0.850 37.31 0.237 3.33 -0.277
VEMA10 0.95 2.91 58.9 -0.128 0.019 0.844 35.51 0.236 3.36 -0.19
VEMA12 1.01 2.96 59.3 -0.126 0.020 0.840 35.17 0.235 3.35 -0.186
VEMA26 2.41 4.46 59.3 -0.068 0.033 0.838 31.03 0.231 3.37 -0.1
VMACD -1.03 0.79 50.5 -0.216 0.001 0.875 33.96 0.235 3.38 -0.345
VOL5 -8.12 -6.76 55.9 -0.452 -0.066 1.155 53.86 0.296 4.48 -0.656
VOL10 -7.07 -5.64 58.3 -0.404 -0.053 1.145 50.65 0.298 4.35 -0.59
VOL20 -5.43 -3.89 58.6 -0.337 -0.032 1.156 50.19 0.298 4.33 -0.481
VOL60 0.42 2.34 58.1 -0.125 0.032 1.145 38.35 0.286 4.71 -0.175
VOL120 -1.88 -0.12 53.0 -0.213 0.007 1.136 42.09 0.281 4.64 -0.3
VOL240 -5.51 -3.98 47.0 -0.382 -0.038 1.075 43.53 0.265 3.99 -0.541
VOSC -6.35 -4.88 51.2 -0.453 -0.056 0.944 40.40 0.246 3.44 -0.739
VROC6 -16.30 -15.49 42.7 -1.179 -0.208 0.897 69.03 0.221 2.91 -1.778
VROC12 -11.47 -10.34 46.7 -0.793 -0.127 0.915 51.62 0.228 2.74 -1.241
VR -8.43 -7.10 52.8 -0.566 -0.083 0.932 48.57 0.244 3.72 -0.834
Cont’d
25
Factor AR (%) AER (%) WR (%) SR Alpha Beta MD (%) σp VaR (%) STN
VSTD10 -0.77 1.07 56.3 -0.205 0.001 0.843 37.75 0.234 3.19 -0.309
VSTD20 -1.60 0.19 53.7 -0.245 -0.007 0.844 37.31 0.232 3.01 -0.366
Variance20 -1.94 -0.18 59.9 -0.201 0.005 1.124 46.38 0.302 4.84 -0.301
Variance60 9.96 12.51 65.2 0.168 0.122 1.178 30.63 0.313 5.91 0.241
Variance120 5.40 7.65 60.8 0.040 0.083 1.193 39.54 0.312 5.93 0.058
Volume1M -15.54 -14.68 20.7 -0.867 -0.189 0.991 63.68 0.285 3.61 -1.339
WVAD -0.71 1.13 52.3 -0.199 0.002 0.836 42.38 0.238 3.67 -0.317
arron down 25 -7.35 -5.94 48.3 -0.593 -0.072 0.880 42.63 0.209 2.84 -0.826
arron up 25 -15.51 -14.65 46.4 -1.049 -0.195 0.870 66.73 0.235 3.19 -1.604
bear power -13.31 -12.30 50.8 -0.800 -0.154 0.934 59.58 0.261 3.61 -1.182
beta 10.30 12.87 62.4 0.168 0.134 1.344 35.96 0.328 6.76 0.252
book to price 1.87 3.86 58.5 -0.117 0.031 0.889 26.73 0.181 2.91 -0.162
bull power -14.81 -13.91 51.1 -0.844 -0.176 0.984 65.99 0.278 3.83 -1.285
earnings yield -2.41 -0.68 52.6 -0.348 -0.029 0.627 25.87 0.188 2.55 -0.55
growth 2.90 4.99 59.9 -0.045 0.052 1.068 37.35 0.244 4.50 -0.067
leverage 2.05 4.07 53.1 -0.089 0.023 0.719 22.57 0.217 3.22 -0.144
liquidity -2.71 -1.00 56.7 -0.243 -0.002 1.137 41.28 0.284 4.56 -0.35
momentum 10.93 13.54 65.8 0.195 0.129 1.159 43.19 0.310 6.82 0.271
money flow 20 -1.72 0.05 65.5 -0.202 0.016 1.269 44.10 0.288 4.63 -0.298
price no fq 6.83 9.17 76.1 0.095 0.092 1.132 44.86 0.267 6.20 0.131
pull-up 2.19 4.23 54.5 -0.068 0.044 1.062 33.09 0.265 3.92 -0.093
pull-down -2.28 -0.54 47.6 -0.317 -0.015 0.831 34.46 0.203 3.87 -0.439
residual volatility 4.13 6.30 62.3 0.003 0.062 1.048 41.34 0.269 5.25 0.004
sharpe ratio 20 -11.88 -10.78 54.0 -0.663 -0.129 0.979 57.66 0.282 4.09 -1.019
sharpe ratio 60 -7.42 -6.02 58.8 -0.417 -0.065 1.029 65.19 0.300 5.17 -0.583
sharpe ratio 120 -0.32 1.55 65.7 -0.143 0.019 1.062 60.52 0.303 6.10 -0.2
size -6.36 -4.89 51.8 -0.571 -0.058 0.906 36.00 0.196 2.64 -0.82
turnover volatility -10.80 -9.62 51.8 -0.624 -0.108 1.075 57.05 0.274 3.70 -0.905
1day VPT -16.45 -15.65 44.8 -1.062 -0.211 0.904 70.43 0.248 3.09 -1.553
1day VPT 6 -6.36 -4.89 53.0 -0.416 -0.053 0.995 47.06 0.269 3.81 -0.82
1day VPT 12 -0.07 1.82 57.9 -0.146 0.020 1.033 46.51 0.278 4.39 -0.22
25week close -11.51 -10.39 51.0 -0.859 -0.133 0.817 47.79 0.211 2.44 -0.684
QF Month 1 17.35 19.43 57.8 0.915 0.162 0.588 18.35 0.161 2.81 0.946
QF Month 2 9.91 13.53 49.3 0.289 0.102 0.684 26.77 0.205 3.61 0.372
QF Month 3 7.37 10.4 51.6 0.246 0.064 0.483 17.33 0.137 2.3 0.309
QF Week 1 -0.83 1.41 46.4 -0.236 -0.003 0.726 29.31 0.204 3.05 -0.291
QF Week 2 7.49 10.55 49.2 0.16 0.085 0.792 29.08 0.219 3.73 0.206
QF Week 3 12.3 12.4 54.4 0.372 0.116 0.793 29.86 0.223 3.77 0.48
QF Day 1 7.89 11.4 43.1 0.181 0.09 0.834 29.15 0.196 3.92 0.245
QF Day 2 10.23 10.68 44.4 0.279 0.097 0.819 30.02 0.223 3.91 0.361
QF Day 3 9.81 10.03 44.6 0.284 0.098 0.801 30.1 0.223 1.02 0.367
QF 10% 13.12 17.73 61.8 0.754 0.128 0.594 18.32 0.159 2.02 0.714
QF 5% 12.59 16.02 61 0.63 0.117 0.504 13.3 0.136 2.01 0.796
QF 1% 24.71 35.34 53.3 0.967 0.249 0.667 23.28 0.214 3.35 1.299
Average -2.92 -1.25 53.28 -0.32 -0.02 0.96 43.61 0.25 3.99 -0.467
CSI300 1.77 \ \ 0.009 \ \ \ 0.197 3.19 \
26
better returns. Regarding risk, as measured by the V ariance Covariance V aR 99,
the QF M onth 1 factor’s data points are positioned in a moderately high range, indi-
cating a balanced risk profile. The size and color intensity of the QF M onth 1 points,
reflecting higher Alpha values, suggest a robust excess return or relative performance.
Combining these observations, the QF M onth 1 factor likely provides a compelling
investment opportunity by delivering robust returns while maintaining a moderate risk
level, an attractive proposition for strategies aiming to optimize the trade-off between
risk and return.
In scatter plot B1b, most of the factors based on quantformer perform better
compared with the other 100 factors. The performance of quantformer factors which
are trained under different scales shows ability in making profit and risk management.
27