Quantitative Trading Using Deep Q Learning
Quantitative Trading Using Deep Q Learning
https://doi.org/10.22214/ijraset.2023.50170
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
Abstract: Reinforcement learning (RL) is a branch of machine learning that has been used in a variety of applications such as
robotics, game playing, and autonomous systems. In recent years, there has been growing interest in applying RL to quantitative
trading, where the goal is to make profitable trades in financial markets. This paper explores the use of RL in quantitative
trading and presents a case study of a RLbased trading algorithm. The results show that RL can be a powerful tool for
quantitative trading, and that it has the potential to outperform traditional trading algorithms. The use of reinforcement learning
in quantitative trading represents a promising area of research that can potentially lead to the development of more sophisticated
and effective trading systems. Future work could explore the use of alternative reinforcement learning algorithms, incorporate
additional data sources, and test the system on different asset classes. Overall, our research demonstrates the potential of using
reinforcement learning in quantitative trading and highlights the importance of continued research and development in this
area. By developing more sophisticated and effective trading systems, we can potentially improve the efficiency of financial
markets and generate greater returns for investors.
Keywords: Reinforcement Learning · Quantitative Trading · Financial Markets
I. INTRODUCTION
Quantitative trading, also known as algorithmic trading, is the use of computer programs to execute trades in financial markets. In
recent years, quantitative trading has become increasingly popular due to its ability to process large amounts of data and make trades
at high speeds. However, the success of quantitative trading depends on the development of effective trading strategies that can
accurately predict future price movements and generate profits.
Traditional trading strategies rely on fundamental analysis and technical analysis to make trading decisions. Fundamental analysis
involves analyzing financial statements, economic indicators, and other relevant data to identify undervalued or overvalued stocks.
Technical analysis involves analyzing past price and volume data to identify patterns and trends that can be used to predict future
price movements. However, these strategies have limitations. Fundamental analysis requires significant expertise and resources, and
can be time-consuming and subjective. Technical analysis can be influenced by noise and is subject to overfitting.
Reinforcement learning is a subfield of machine learning that has shown promise in developing automated trading strategies. In
reinforcement learning, an agent learns an optimal trading policy by interacting with a trading environment and receiving feedback
in the form of rewards or penalties.
In this paper, we present a reinforcement learning-based approach to quantitative trading that uses a deep Q-network (DQN) to learn
an optimal trading policy. We evaluate the performance of our algorithm on the historical stock price data of a single stock and
compare it to traditional trading strategies and benchmarks. Our results demonstrate the potential of reinforcement learning as a
powerful tool for developing automated trading strategies and highlight the importance of evaluating the performance of trading
strategies using robust performance metrics.
We begin by discussing the basics of reinforcement learning and its application to quantitative trading. Reinforcement learning
involves an agent taking actions in an environment to maximize cumulative reward. The agent learns a policy that maps states to
actions, and the objective is to find the policy that maximizes the expected cumulative reward over time.
In quantitative trading, the environment is the financial market, and the agent’s actions are buying, selling, or holding a stock. The
state of the environment includes the current stock price, historical price data, economic indicators, and other relevant data. The
reward is a function of the profit or loss from the trade.
We then introduce the deep Q-network (DQN) algorithm, a reinforcement learning technique that uses a neural network to
approximate the optimal actionvalue function. The DQN algorithm has been shown to be effective in a range of applications,
including playing Atari games, and has potential in quantitative trading.
We describe our methodology for training and evaluating our DQN-based trading algorithm. We use historical stock price data of a
single stock as our training and testing data. We preprocess the data by computing technical indicators, such as moving averages and
relative strength index (RSI), which serve as inputs to the DQN.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 731
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
We evaluate the performance of our algorithm using a range of performance metrics, including the Sharpe ratio, cumulative return,
maximum drawdown, and win rate. We compare our results to a buy-and-hold strategy and a simple moving average strategy.
Our results show that our DQN-based trading algorithm outperforms both the buy-and-hold strategy and the simple moving average
strategy in terms of cumulative return, Sharpe ratio, and maximum drawdown. We also observe that our algorithm outperforms the
benchmarks in terms of win rate.
We conclude by discussing the implications of our results and the limitations of our approach. Our results demonstrate the potential
of reinforcement learning in developing automated trading strategies and highlight the importance of using robust performance
metrics to evaluate the performance of trading algorithms. However, our approach has limitations, including the need for large
amounts of historical data and the potential for overfitting. Further research is needed to address these limitations and to explore the
potential of reinforcement learning in quantitative trading.
II. BACKGROUND
Quantitative trading is a field that combines finance, mathematics, and computer science to develop automated trading strategies.
The objective of quantitative trading is to exploit market inefficiencies to generate profits. Quantitative traders use a range of
techniques, including statistical arbitrage, algorithmic trading, and machine learning, to analyze market data and make trading
decisions.
Reinforcement learning is a type of machine learning that has been shown to be effective in a range of applications, including
playing games and robotics. In reinforcement learning, an agent takes actions in an environment to maximize cumulative reward.
The agent learns a policy that maps states to actions, and the objective is to find the policy that maximizes the expected cumulative
reward over time.
The use of reinforcement learning in quantitative trading is a relatively new area of research. Traditional quantitative trading
strategies typically involve rulebased systems that rely on technical indicators, such as moving averages and RSI, to make trading
decisions. These systems are often designed by human experts and are limited in their ability to adapt to changing market conditions.
Reinforcement learning has the potential to overcome these limitations by allowing trading algorithms to learn from experience and
adapt to changing market conditions. Reinforcement learning algorithms can learn from historical market data and use this
knowledge to make trading decisions in real-time. This approach has the potential to be more flexible and adaptable than traditional
rule-based systems.
Recent research has shown that reinforcement learning algorithms can be effective in developing automated trading strategies. For
example, a study by Moody and Saffell [3] used reinforcement learning to develop a trading algorithm for the S&P 500 futures
contract. The algorithm outperformed a buy-and-hold strategy and a moving average strategy.
More recent studies have focused on using deep reinforcement learning, which involves using deep neural networks to approximate
the optimal action-value function. These studies have shown promising results in a range of applications, including playing games
and robotics, and have potential in quantitative trading.
One of the advantages of reinforcement learning in quantitative trading is its ability to handle complex, high-dimensional data.
Traditional rule-based systems often rely on a small number of features, such as moving averages and technical indicators, to make
trading decisions. Reinforcement learning algorithms, on the other hand, can learn directly from raw market data, such as price and
volume, without the need for feature engineering.
Reinforcement learning algorithms can also adapt to changing market conditions. Traditional rule-based systems are designed to
work under specific market conditions and may fail when market conditions change. Reinforcement learning algorithms, however,
can learn from experience and adapt their trading strategy to changing market conditions.
Another advantage of reinforcement learning in quantitative trading is its ability to handle non-stationary environments. The
financial markets are constantly changing, and traditional rule-based systems may fail to adapt to these changes. Reinforcement
learning algorithms, on the other hand, can learn from experience and adapt to changing market conditions.
Despite the potential advantages of reinforcement learning in quantitative trading, there are also challenges that must be addressed.
One of the challenges is the need for large amounts of historical data to train the reinforcement learning algorithms. Another
challenge is the need to ensure that the algorithms are robust and do not overfit to historical data.
Overall, reinforcement learning has the potential to revolutionize quantitative trading by allowing trading algorithms to learn from
experience and adapt to changing market conditions. The goal of this research paper is to explore the use of reinforcement learning
in quantitative trading and evaluate its effectiveness in generating profits.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 732
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
IV. METHODOLOGY
In this study, we propose a reinforcement learning-based trading strategy for the stock market. Our approach consists of the
following steps:
A. Data Preprocessing
The first step in our methodology was to collect and preprocess the data. We obtained daily historical stock price data for the Nifty
50 index from Yahoo Finance for the period from January 1, 2010, to December 31, 2020. The data consisted of the daily open,
high, low, and close prices for each stock in the index.
To preprocess the data, we calculated the daily returns for each stock using the close price data. The daily return for a given stock on
day t was calculated as:
where pt is the closing price of the stock on day t and pt−1 is the closing price on day t-1.
We then normalized the returns using the Min-Max scaling method to ensure that the returns were in the range of [-1, 1]. The Min-
Max scaling method scales the data to a fixed range by subtracting the minimum value and dividing by the range:
where x′ is the normalized value, x is the original value, min(x) is the minimum value, and max(x) is the maximum value.
After preprocessing the data, we had a dataset of daily normalized returns for each stock in the Nifty 50 index for the period from
January 1, 2010, to December 31, 2020. This dataset was used as the basis for training and testing our trading strategy using
reinforcement learning.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 733
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
C. Trading Strategy
The trading strategy employed in this research involves the use of the DQN agent to learn the optimal action to take given the
current market state. The agent’s actions are either to buy or sell a stock, with the amount of shares to be bought or sold determined
by the agent’s output. The agent’s output is scaled to the available cash of the agent at the time of decision.
At the start of each episode, the agent is given a certain amount of cash and a fixed number of stocks. The agent observes the current
state of the market, which includes the stock prices, technical indicators, and any other relevant data. The agent then uses its neural
network to determine the optimal action to take based on its current state.
If the agent decides to buy a stock, the amount of cash required is subtracted from the agent’s total cash, and the corresponding
number of shares is added to the agent’s total number of stocks. If the agent decides to sell a stock, the corresponding number of
shares is subtracted from the agent’s total number of stocks, and the cash earned is added to the agent’s total cash.
At the end of each episode, the agent’s total wealth is calculated as the sum of the agent’s total cash and the current market value of
the agent’s remaining stocks. The agent’s reward for each time step is calculated as the difference between the current and previous
total wealth.
The training process of the DQN agent involves repeatedly running through episodes of the trading simulation, where the agent
learns from its experiences and updates its Q-values accordingly. The agent’s Q-values represent the expected cumulative reward for
each possible action given the current state.
During the training process, the agent’s experience is stored in a replay buffer, which is used to sample experiences for updating the
agent’s Q-values. The agent’s Q-values are updated using a variant of the Bellman equation, which takes into account the discounted
future rewards of taking each possible action.
Once the training process is complete, the trained DQN agent can be used to make trading decisions in a live market.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 734
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
D. Evaluation Metrics
The performance of the proposed quantitative trading system is evaluated using several metrics. The metrics used in this research are
as follows:
Cumulative Return Cumulative return is a measure of the total profit or loss generated by a trading strategy over a specific period of
time. It is calculated as the sum of the percentage returns over each period of time, with compounding taken into account.
Mathematically, the cumulative return can be expressed as:
CR = (1 + R1) ∗ (1 + R2) ∗ ... ∗ (1 + Rn) − 1
where CR is the cumulative return, R1, R2, ..., Rn are the percentage returns
over each period, and n is the total number of periods.
For example, if a trading strategy generates a return of 5% in the first period, 10% in the second period, and -3% in the third period,
the cumulative return over the three periods would be:
CR = (1 + 0.05) ∗ (1 + 0.10) ∗ (1 − 0.03) − 1
CR = 1.1175 − 1
CR = 0.1175 or 11.75%
This means that the trading strategy generated a total return of 11.75% over the three periods, taking into account compounding.
Sharpe Ratio It measures the excess return per unit of risk of an investment or portfolio, and is calculated by dividing the excess
return by the standard deviation of the returns.
where:
Rp = average return of the portfolio
Rf = risk-free rate of return (such as the yield on a U.S. Treasury bond) δp = standard deviation of the portfolio’s excess returns
The Sharpe ratio provides a way to compare the risk-adjusted returns of different investments or portfolios, with higher values
indicating better riskadjusted returns.
Maximum Drawdown It measures the largest percentage decline in a portfolio’s value from its peak to its trough. It is an important
measure for assessing the risk of an investment strategy, as it represents the potential loss that an investor could face at any given
point in time.
where P is the peak value of the portfolio and Q is the minimum value of the portfolio during the drawdown period.
For example, suppose an investor’s portfolio peaks at $100,000 and subsequently falls to a minimum value of $70,000 during a
market downturn. The maximum drawdown for this portfolio would be:
30%
This means that the portfolio experienced a 30% decline from its peak value to its lowest point during the drawdown period.
Average Daily Return It measures the average daily profit or loss generated by a trading strategy, expressed as a percentage of the
initial investment. The mathematical equation for Average Daily Return is:
Where ADR is the Average Daily Return, Pf is the final portfolio value, Pi is the initial portfolio value, and N is the number of
trading days.
This formula calculates the daily percentage return by taking the difference between the final and initial portfolio values, dividing it
by the initial value, and then dividing by the number of trading days. The resulting value represents the average daily percentage
return generated by the trading strategy over the specified time period.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 735
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
The Average Daily Return metric is useful because it allows traders to compare the performance of different trading strategies on a
daily basis, regardless of the size of the initial investment. A higher ADR indicates a more profitable trading strategy, while a lower
ADR indicates a less profitable strategy.
Average Daily Trading Volume It measures the average number of shares or contracts traded per day over a specific period of time.
Mathematically, it can be calculated as follows:
where the total trading volume is the sum of the trading volume over a specific period of time (e.g., 1 year) and the number of
trading days is the number of days in which trading occurred during that period.
For example, if the total trading volume over the past year was 10 million shares and there were 250 trading days during that period,
the ADTV would be:
This means that on average, 40,000 shares were traded per day over the past year. ADTV is a useful metric for investors and traders
to assess the liquidity of a particular security, as securities with higher ADTVs generally have more market liquidity and may be
easier to buy or sell.
Profit Factor It measures the profitability of trades relative to the losses. It is calculated by dividing the total profit of winning trades
by the total loss of losing trades. The formula for calculating the Profit Factor is as follows:
A Profit Factor greater than 1 indicates that the strategy is profitable, while a Profit Factor less than 1 indicates that the strategy is
unprofitable. For example, a Profit Factor of 1.5 indicates that for every dollar lost in losing trades, the strategy generated $1.50 in
winning trades.
Winning Percentage It measures the ratio of successful outcomes to the total number of outcomes. It is calculated using the
following mathematical equation:
100%
For example, if a trader made 100 trades and 60 of them were successful, the winning percentage would be calculated as follows:
100% = 60%
A higher winning percentage indicates a greater proportion of successful outcomes and is generally desirable in trading.
Average Holding Period It measures the average length of time that an investor holds a particular investment. It is calculated by
taking the sum of the holding periods for each trade and dividing it by the total number of trades. The mathematical equation for
calculating AHP is:
where:
P
denotes the sum of the holding periods for all trades Exit Date is the date when the investment is sold Entry Date is the date when
the investment is bought Number of Trades is the total number of trades made For example, if an investor makes 10 trades over a
given period of time, and the holding periods for those trades are 10, 20, 30, 15, 25, 10, 20, 15, 30, and 25 days respectively, the
AHP would be:
10 + 20 + 30 + 15 + 25 + 10 + 20 + 15 + 30 + 25
AHP = = 21.5 days
10
This means that on average, the investor holds their investments for around 21.5 days before selling them. The AHP can be useful in
evaluating an investor’s trading strategy, as a shorter holding period may indicate a more active trading approach, while a longer
holding period may indicate a more passive approach.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 736
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
These evaluation metrics provide a comprehensive assessment of the performance of the proposed quantitative trading system. The
cumulative return and Sharpe ratio measure the overall profitability and risk-adjusted return of the system, respectively. The
maximum drawdown provides an indication of the system’s downside risk, while the average daily return and trading volume
provide insights into the system’s daily performance. The profit factor, winning percentage, and average holding period provide
insights into the trading strategy employed by the system.
V. FUTURE WORK
While the proposed quantitative trading system using reinforcement learning has shown promising results, there are several avenues
for future research and improvement. Some potential areas for future work include:
VI. CONCLUSION
The use of reinforcement learning in quantitative trading represents a promising area of research that can potentially lead to the
development of more sophisticated and effective trading systems.
The ability of the system to learn from market data and adapt to changing market conditions could enable it to generate superior
returns while reducing risk.
While the proposed system has shown promising results, there are still many areas for improvement and further research. Future
work could explore the use of alternative reinforcement learning algorithms, incorporate additional data sources, and test the system
on different asset classes. Additionally, the integration of portfolio optimization techniques could further enhance the performance
of the system.
Overall, our research has demonstrated the potential of using reinforcement learning in quantitative trading and highlights the
importance of continued research and development in this area. By developing more sophisticated and effective trading systems, we
can potentially improve the efficiency of financial markets and generate greater returns for investors.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 737
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
REFERENCES
[1] Bertoluzzo, M., Carta, S., & Duci, A. (2018). Deep reinforcement learning for forex trading. Expert Systems with Applications, 107, 1-9.
[2] Jiang, Z., Xu, C., & Li, B. (2017). Stock trading with cycles: A financial application of a recurrent reinforcement learning algorithm. Journal of Economic
Dynamics and Control, 83, 54-76.
[3] Moody, J., & Saffell, M. (2001). Learning to trade via direct reinforcement. IEEE Transactions on Neural Networks, 12(4), 875-889.
[4] Bertoluzzo, M., & De Nicolao, G. (2006). Reinforcement learning for optimal trading in stocks. IEEE Transactions on Neural Networks, 17(1), 212-222.
[5] Chen, Q., Li, S., Peng, Y., Li, Z., Li, B., & Li, X. (2019). A deep reinforcement learning framework for the financial portfolio management problem. IEEE
Access, 7, 163663-163674.
[6] Wang, R., Zhang, X., Li, T., & Li, B. (2019). Deep reinforcement learning for automated stock trading: An ensemble strategy. Expert Systems with
Applications, 127, 163-180.
[7] Xiong, Z., Zhou, F., Zhang, Y., & Yang, Z. (2020). Multi-agent deep reinforcement learning for portfolio optimization. Expert Systems with Applications, 144,
113056.
[8] Guo, X., Cheng, X., & Zhang, Y. (2020). Deep reinforcement learning for bitcoin trading. IEEE Access, 8, 169069-169076.
[9] Zhu, Y., Jiang, Z., & Li, B. (2017). Deep reinforcement learning for portfolio management. In Proceedings of the International Conference on Machine
Learning (ICML), Sydney, Australia.
[10] Gu, S., Wang, X., Chen, J., & Dai, X. (2021). Reinforcement learning for portfolio optimization in the presence of transaction costs. Journal of Intelligent &
Fuzzy Systems, 41(3), 3853-3865.
[11] Kwon, O., & Moon, K. (2019). A credit risk assessment model using machine learning and feature selection. Sustainability, 11(20), 5799.
[12] Li, Y., Xue, W., Zhu, X., Guo, L., & Qin, J. (2021). Fraud Detection for Online Advertising Networks Using Machine Learning: A Comprehensive Review.
IEEE Access, 9, 47733-47747.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 738