Quantitative Trading Using Deep Q Learning
Quantitative Trading Using Deep Q Learning
Soumyadip Sarkar
[email protected]
arXiv:2304.06037v1 [q-fin.TR] 3 Apr 2023
1 Introduction
Quantitative trading, also known as algorithmic trading, is the use of com-
puter programs to execute trades in financial markets. In recent years, quan-
titative trading has become increasingly popular due to its ability to process
large amounts of data and make trades at high speeds. However, the success of
quantitative trading depends on the development of effective trading strategies
that can accurately predict future price movements and generate profits.
Traditional trading strategies rely on fundamental analysis and technical
analysis to make trading decisions. Fundamental analysis involves analyzing fi-
nancial statements, economic indicators, and other relevant data to identify un-
dervalued or overvalued stocks. Technical analysis involves analyzing past price
and volume data to identify patterns and trends that can be used to predict
future price movements.
However, these strategies have limitations. Fundamental analysis requires
significant expertise and resources, and can be time-consuming and subjective.
Technical analysis can be influenced by noise and is subject to overfitting.
2 S. Sarkar
historical data and the potential for overfitting. Further research is needed to
address these limitations and to explore the potential of reinforcement learning
in quantitative trading.
2 Background
conditions and may fail when market conditions change. Reinforcement learning
algorithms, however, can learn from experience and adapt their trading strategy
to changing market conditions.
Another advantage of reinforcement learning in quantitative trading is its
ability to handle non-stationary environments. The financial markets are con-
stantly changing, and traditional rule-based systems may fail to adapt to these
changes. Reinforcement learning algorithms, on the other hand, can learn from
experience and adapt to changing market conditions.
Despite the potential advantages of reinforcement learning in quantitative
trading, there are also challenges that must be addressed. One of the challenges is
the need for large amounts of historical data to train the reinforcement learning
algorithms. Another challenge is the need to ensure that the algorithms are
robust and do not overfit to historical data.
Overall, reinforcement learning has the potential to revolutionize quantitative
trading by allowing trading algorithms to learn from experience and adapt to
changing market conditions. The goal of this research paper is to explore the use
of reinforcement learning in quantitative trading and evaluate its effectiveness
in generating profits.
3 Related Work
4 Methodology
In this study, we propose a reinforcement learning-based trading strategy for the
stock market. Our approach consists of the following steps:
where x′ is the normalized value, x is the original value, min(x) is the mini-
mum value, and max(x) is the maximum value.
After preprocessing the data, we had a dataset of daily normalized returns
for each stock in the Nifty 50 index for the period from January 1, 2010, to
December 31, 2020. This dataset was used as the basis for training and testing
our trading strategy using reinforcement learning.
The trained algorithm was then compared to a benchmark strategy, which in-
volved buying and holding the Nifty 50 index for the test period. The benchmark
strategy was evaluated based on the cumulative return on investment (ROI) for
the test period. The results were analyzed to determine the effectiveness of the
reinforcement learning algorithm in generating profitable trading strategies.
The trading strategy employed in this research involves the use of the DQN
agent to learn the optimal action to take given the current market state. The
agent’s actions are either to buy or sell a stock, with the amount of shares to be
bought or sold determined by the agent’s output. The agent’s output is scaled
to the available cash of the agent at the time of decision.
At the start of each episode, the agent is given a certain amount of cash and
a fixed number of stocks. The agent observes the current state of the market,
which includes the stock prices, technical indicators, and any other relevant data.
The agent then uses its neural network to determine the optimal action to take
based on its current state.
If the agent decides to buy a stock, the amount of cash required is subtracted
from the agent’s total cash, and the corresponding number of shares is added
to the agent’s total number of stocks. If the agent decides to sell a stock, the
corresponding number of shares is subtracted from the agent’s total number of
stocks, and the cash earned is added to the agent’s total cash.
At the end of each episode, the agent’s total wealth is calculated as the sum
of the agent’s total cash and the current market value of the agent’s remaining
stocks. The agent’s reward for each time step is calculated as the difference
between the current and previous total wealth.
The training process of the DQN agent involves repeatedly running through
episodes of the trading simulation, where the agent learns from its experiences
and updates its Q-values accordingly. The agent’s Q-values represent the ex-
pected cumulative reward for each possible action given the current state.
During the training process, the agent’s experience is stored in a replay buffer,
which is used to sample experiences for updating the agent’s Q-values. The
agent’s Q-values are updated using a variant of the Bellman equation, which
takes into account the discounted future rewards of taking each possible action.
Once the training process is complete, the trained DQN agent can be used
to make trading decisions in a live market.
the sum of the percentage returns over each period of time, with compounding
taken into account.
Mathematically, the cumulative return can be expressed as:
CR = (1 + R1 ) ∗ (1 + R2 ) ∗ ... ∗ (1 + Rn ) − 1
where CR is the cumulative return, R1 , R2 , ..., Rn are the percentage returns
over each period, and n is the total number of periods.
For example, if a trading strategy generates a return of 5% in the first period,
10% in the second period, and -3% in the third period, the cumulative return
over the three periods would be:
Sharpe Ratio It measures the excess return per unit of risk of an investment
or portfolio, and is calculated by dividing the excess return by the standard
deviation of the returns.
The mathematical equation for the Sharpe ratio is:
(Rp − Rf )
SharpeRatio =
δp
where:
Rp = average return of the portfolio
Rf = risk-free rate of return (such as the yield on a U.S. Treasury bond)
δp = standard deviation of the portfolio’s excess returns
(P − Q)
M axDrawdown =
P
where P is the peak value of the portfolio and Q is the minimum value of the
portfolio during the drawdown period.
Quantitative Trading using Deep Q Learning 9
Average Daily Return It measures the average daily profit or loss generated
by a trading strategy, expressed as a percentage of the initial investment. The
mathematical equation for Average Daily Return is:
(Pf −Pi )
( Pi )
ADR =
N
Where ADR is the Average Daily Return, Pf is the final portfolio value, Pi
is the initial portfolio value, and N is the number of trading days.
This formula calculates the daily percentage return by taking the difference
between the final and initial portfolio values, dividing it by the initial value,
and then dividing by the number of trading days. The resulting value represents
the average daily percentage return generated by the trading strategy over the
specified time period.
The Average Daily Return metric is useful because it allows traders to com-
pare the performance of different trading strategies on a daily basis, regardless
of the size of the initial investment. A higher ADR indicates a more profitable
trading strategy, while a lower ADR indicates a less profitable strategy.
10 + 20 + 30 + 15 + 25 + 10 + 20 + 15 + 30 + 25
AHP = = 21.5 days
10
This means that on average, the investor holds their investments for around
21.5 days before selling them. The AHP can be useful in evaluating an investor’s
Quantitative Trading using Deep Q Learning 11
trading strategy, as a shorter holding period may indicate a more active trading
approach, while a longer holding period may indicate a more passive approach.
5 Future Work
While the proposed quantitative trading system using reinforcement learning
has shown promising results, there are several avenues for future research and
improvement. Some potential areas for future work include:
While the proposed system has focused on trading individual stocks, the inte-
gration with portfolio optimization techniques could help to further enhance the
performance of the trading system. By considering the correlation between dif-
ferent stocks and diversifying the portfolio, it may be possible to reduce overall
risk and increase returns.
6 Conclusion
Declarations
– Funding
Not Applicable.
– Competing Interests
No Competing Interests occurs.
– Availability of data and materials
Data is available on request.
– Authors’ contributions
There is only one author who have contributed to this work.
Quantitative Trading using Deep Q Learning 13
References
1. Bertoluzzo, M., Carta, S., & Duci, A. (2018). Deep reinforcement learning for forex
trading. Expert Systems with Applications, 107, 1-9.
2. Jiang, Z., Xu, C., & Li, B. (2017). Stock trading with cycles: A financial application
of a recurrent reinforcement learning algorithm. Journal of Economic Dynamics
and Control, 83, 54-76.
3. Moody, J., & Saffell, M. (2001). Learning to trade via direct reinforcement. IEEE
Transactions on Neural Networks, 12(4), 875-889.
4. Bertoluzzo, M., & De Nicolao, G. (2006). Reinforcement learning for optimal trad-
ing in stocks. IEEE Transactions on Neural Networks, 17(1), 212-222.
5. Chen, Q., Li, S., Peng, Y., Li, Z., Li, B., & Li, X. (2019). A deep reinforcement
learning framework for the financial portfolio management problem. IEEE Access,
7, 163663-163674.
6. Wang, R., Zhang, X., Li, T., & Li, B. (2019). Deep reinforcement learning for
automated stock trading: An ensemble strategy. Expert Systems with Applications,
127, 163-180.
7. Xiong, Z., Zhou, F., Zhang, Y., & Yang, Z. (2020). Multi-agent deep reinforcement
learning for portfolio optimization. Expert Systems with Applications, 144, 113056.
8. Guo, X., Cheng, X., & Zhang, Y. (2020). Deep reinforcement learning for bitcoin
trading. IEEE Access, 8, 169069-169076.
9. Zhu, Y., Jiang, Z., & Li, B. (2017). Deep reinforcement learning for portfolio man-
agement. In Proceedings of the International Conference on Machine Learning
(ICML), Sydney, Australia.
10. Gu, S., Wang, X., Chen, J., & Dai, X. (2021). Reinforcement learning for portfolio
optimization in the presence of transaction costs. Journal of Intelligent & Fuzzy
Systems, 41(3), 3853-3865.
11. Kwon, O., & Moon, K. (2019). A credit risk assessment model using machine
learning and feature selection. Sustainability, 11(20), 5799.
12. Li, Y., Xue, W., Zhu, X., Guo, L., & Qin, J. (2021). Fraud Detection for Online
Advertising Networks Using Machine Learning: A Comprehensive Review. IEEE
Access, 9, 47733-47747.