0% found this document useful (0 votes)
159 views

Quantitative Trading Using Deep Q Learning

Reinforcement learning (RL) is a branch of machine learning that has been used in a variety of applications such as robotics, game playing, and autonomous systems. In recent years, there has been growing interest in applying RL to quantitative trading, where the goal is to make profitable trades in financial markets.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
159 views

Quantitative Trading Using Deep Q Learning

Reinforcement learning (RL) is a branch of machine learning that has been used in a variety of applications such as robotics, game playing, and autonomous systems. In recent years, there has been growing interest in applying RL to quantitative trading, where the goal is to make profitable trades in financial markets.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

11 IV April 2023

https://doi.org/10.22214/ijraset.2023.50170
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com

Quantitative Trading using Deep Q Learning


Soumyadip Sarkar
Deloitte Consulting LLP

Abstract: Reinforcement learning (RL) is a branch of machine learning that has been used in a variety of applications such as
robotics, game playing, and autonomous systems. In recent years, there has been growing interest in applying RL to quantitative
trading, where the goal is to make profitable trades in financial markets. This paper explores the use of RL in quantitative
trading and presents a case study of a RLbased trading algorithm. The results show that RL can be a powerful tool for
quantitative trading, and that it has the potential to outperform traditional trading algorithms. The use of reinforcement learning
in quantitative trading represents a promising area of research that can potentially lead to the development of more sophisticated
and effective trading systems. Future work could explore the use of alternative reinforcement learning algorithms, incorporate
additional data sources, and test the system on different asset classes. Overall, our research demonstrates the potential of using
reinforcement learning in quantitative trading and highlights the importance of continued research and development in this
area. By developing more sophisticated and effective trading systems, we can potentially improve the efficiency of financial
markets and generate greater returns for investors.
Keywords: Reinforcement Learning · Quantitative Trading · Financial Markets

I. INTRODUCTION
Quantitative trading, also known as algorithmic trading, is the use of computer programs to execute trades in financial markets. In
recent years, quantitative trading has become increasingly popular due to its ability to process large amounts of data and make trades
at high speeds. However, the success of quantitative trading depends on the development of effective trading strategies that can
accurately predict future price movements and generate profits.
Traditional trading strategies rely on fundamental analysis and technical analysis to make trading decisions. Fundamental analysis
involves analyzing financial statements, economic indicators, and other relevant data to identify undervalued or overvalued stocks.
Technical analysis involves analyzing past price and volume data to identify patterns and trends that can be used to predict future
price movements. However, these strategies have limitations. Fundamental analysis requires significant expertise and resources, and
can be time-consuming and subjective. Technical analysis can be influenced by noise and is subject to overfitting.
Reinforcement learning is a subfield of machine learning that has shown promise in developing automated trading strategies. In
reinforcement learning, an agent learns an optimal trading policy by interacting with a trading environment and receiving feedback
in the form of rewards or penalties.
In this paper, we present a reinforcement learning-based approach to quantitative trading that uses a deep Q-network (DQN) to learn
an optimal trading policy. We evaluate the performance of our algorithm on the historical stock price data of a single stock and
compare it to traditional trading strategies and benchmarks. Our results demonstrate the potential of reinforcement learning as a
powerful tool for developing automated trading strategies and highlight the importance of evaluating the performance of trading
strategies using robust performance metrics.
We begin by discussing the basics of reinforcement learning and its application to quantitative trading. Reinforcement learning
involves an agent taking actions in an environment to maximize cumulative reward. The agent learns a policy that maps states to
actions, and the objective is to find the policy that maximizes the expected cumulative reward over time.
In quantitative trading, the environment is the financial market, and the agent’s actions are buying, selling, or holding a stock. The
state of the environment includes the current stock price, historical price data, economic indicators, and other relevant data. The
reward is a function of the profit or loss from the trade.
We then introduce the deep Q-network (DQN) algorithm, a reinforcement learning technique that uses a neural network to
approximate the optimal actionvalue function. The DQN algorithm has been shown to be effective in a range of applications,
including playing Atari games, and has potential in quantitative trading.
We describe our methodology for training and evaluating our DQN-based trading algorithm. We use historical stock price data of a
single stock as our training and testing data. We preprocess the data by computing technical indicators, such as moving averages and
relative strength index (RSI), which serve as inputs to the DQN.

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 731
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com

We evaluate the performance of our algorithm using a range of performance metrics, including the Sharpe ratio, cumulative return,
maximum drawdown, and win rate. We compare our results to a buy-and-hold strategy and a simple moving average strategy.
Our results show that our DQN-based trading algorithm outperforms both the buy-and-hold strategy and the simple moving average
strategy in terms of cumulative return, Sharpe ratio, and maximum drawdown. We also observe that our algorithm outperforms the
benchmarks in terms of win rate.
We conclude by discussing the implications of our results and the limitations of our approach. Our results demonstrate the potential
of reinforcement learning in developing automated trading strategies and highlight the importance of using robust performance
metrics to evaluate the performance of trading algorithms. However, our approach has limitations, including the need for large
amounts of historical data and the potential for overfitting. Further research is needed to address these limitations and to explore the
potential of reinforcement learning in quantitative trading.

II. BACKGROUND
Quantitative trading is a field that combines finance, mathematics, and computer science to develop automated trading strategies.
The objective of quantitative trading is to exploit market inefficiencies to generate profits. Quantitative traders use a range of
techniques, including statistical arbitrage, algorithmic trading, and machine learning, to analyze market data and make trading
decisions.
Reinforcement learning is a type of machine learning that has been shown to be effective in a range of applications, including
playing games and robotics. In reinforcement learning, an agent takes actions in an environment to maximize cumulative reward.
The agent learns a policy that maps states to actions, and the objective is to find the policy that maximizes the expected cumulative
reward over time.
The use of reinforcement learning in quantitative trading is a relatively new area of research. Traditional quantitative trading
strategies typically involve rulebased systems that rely on technical indicators, such as moving averages and RSI, to make trading
decisions. These systems are often designed by human experts and are limited in their ability to adapt to changing market conditions.
Reinforcement learning has the potential to overcome these limitations by allowing trading algorithms to learn from experience and
adapt to changing market conditions. Reinforcement learning algorithms can learn from historical market data and use this
knowledge to make trading decisions in real-time. This approach has the potential to be more flexible and adaptable than traditional
rule-based systems.
Recent research has shown that reinforcement learning algorithms can be effective in developing automated trading strategies. For
example, a study by Moody and Saffell [3] used reinforcement learning to develop a trading algorithm for the S&P 500 futures
contract. The algorithm outperformed a buy-and-hold strategy and a moving average strategy.
More recent studies have focused on using deep reinforcement learning, which involves using deep neural networks to approximate
the optimal action-value function. These studies have shown promising results in a range of applications, including playing games
and robotics, and have potential in quantitative trading.
One of the advantages of reinforcement learning in quantitative trading is its ability to handle complex, high-dimensional data.
Traditional rule-based systems often rely on a small number of features, such as moving averages and technical indicators, to make
trading decisions. Reinforcement learning algorithms, on the other hand, can learn directly from raw market data, such as price and
volume, without the need for feature engineering.
Reinforcement learning algorithms can also adapt to changing market conditions. Traditional rule-based systems are designed to
work under specific market conditions and may fail when market conditions change. Reinforcement learning algorithms, however,
can learn from experience and adapt their trading strategy to changing market conditions.
Another advantage of reinforcement learning in quantitative trading is its ability to handle non-stationary environments. The
financial markets are constantly changing, and traditional rule-based systems may fail to adapt to these changes. Reinforcement
learning algorithms, on the other hand, can learn from experience and adapt to changing market conditions.
Despite the potential advantages of reinforcement learning in quantitative trading, there are also challenges that must be addressed.
One of the challenges is the need for large amounts of historical data to train the reinforcement learning algorithms. Another
challenge is the need to ensure that the algorithms are robust and do not overfit to historical data.
Overall, reinforcement learning has the potential to revolutionize quantitative trading by allowing trading algorithms to learn from
experience and adapt to changing market conditions. The goal of this research paper is to explore the use of reinforcement learning
in quantitative trading and evaluate its effectiveness in generating profits.

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 732
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com

III. RELATED WORK


Reinforcement learning has gained significant attention in the field of quantitative finance in recent years. Several studies have
explored the use of reinforcement learning algorithms in trading and portfolio optimization.
In a study by Moody and Saffell [3], a reinforcement learning algorithm was used to learn a trading strategy for a simulated market.
The results showed that the reinforcement learning algorithm was able to outperform a buy-and-hold strategy and a moving average
crossover strategy.
Another study by Bertoluzzo and De Nicolao [4] used a reinforcement learning algorithm to optimize a portfolio of stocks. The
results showed that the algorithm was able to outperform traditional portfolio optimization methods.
More recently, a study by Chen et al. [5] used a deep reinforcement learning algorithm to trade stocks. The results showed that the
deep reinforcement learning algorithm was able to outperform traditional trading strategies and achieved higher profits.
Overall, the literature suggests that reinforcement learning has the potential to improve trading and portfolio optimization in finance.
However, further research is needed to evaluate the effectiveness of reinforcement learning algorithms in real-world trading
environments.
In addition to the studies mentioned above, there have been several recent developments in the field of reinforcement learning for
finance. For example, Guo et al. [8] proposed a deep reinforcement learning algorithm for trading in Bitcoin futures markets. The
algorithm was able to achieve higher profits than traditional trading strategies and other deep reinforcement learning algorithms.
Another recent study by Gu et al. [10] proposed a reinforcement learning algorithm for portfolio optimization in the presence of
transaction costs. The algorithm was able to achieve higher risk-adjusted returns than traditional portfolio optimization methods.
In addition to using reinforcement learning algorithms for trading and portfolio optimization, there have also been studies exploring
the use of reinforcement learning for other tasks in finance, such as credit risk assessment [11] and fraud detection [12].
Despite the promising results of these studies, there are still challenges to using reinforcement learning in finance. One of the main
challenges is the need for large amounts of data, which can be expensive and difficult to obtain in finance. Another challenge is the
need for robustness, as reinforcement learning algorithms can be sensitive to changes in the training data.
Overall, the literature suggests that reinforcement learning has the potential to revolutionize finance by allowing trading algorithms
to learn from experience and adapt to changing market conditions. However, further research is needed to evaluate the effectiveness
of these algorithms in real-world trading environments and to address the challenges of using reinforcement learning in finance.

IV. METHODOLOGY
In this study, we propose a reinforcement learning-based trading strategy for the stock market. Our approach consists of the
following steps:

A. Data Preprocessing
The first step in our methodology was to collect and preprocess the data. We obtained daily historical stock price data for the Nifty
50 index from Yahoo Finance for the period from January 1, 2010, to December 31, 2020. The data consisted of the daily open,
high, low, and close prices for each stock in the index.
To preprocess the data, we calculated the daily returns for each stock using the close price data. The daily return for a given stock on
day t was calculated as:

where pt is the closing price of the stock on day t and pt−1 is the closing price on day t-1.

We then normalized the returns using the Min-Max scaling method to ensure that the returns were in the range of [-1, 1]. The Min-
Max scaling method scales the data to a fixed range by subtracting the minimum value and dividing by the range:

where x′ is the normalized value, x is the original value, min(x) is the minimum value, and max(x) is the maximum value.
After preprocessing the data, we had a dataset of daily normalized returns for each stock in the Nifty 50 index for the period from
January 1, 2010, to December 31, 2020. This dataset was used as the basis for training and testing our trading strategy using
reinforcement learning.

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 733
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com

B. Reinforcement Learning Algorithm


We implemented a reinforcement learning algorithm to learn the optimal trading policy using the preprocessed stock price data. The
reinforcement learning algorithm involves an agent interacting with an environment to learn the optimal actions to take in different
states of the environment. In our case, the agent is the trading algorithm and the environment is the stock market.
Our reinforcement learning algorithm was based on the Q-learning algorithm, which is a model-free, off-policy reinforcement
learning algorithm that seeks to learn the optimal action-value function for a given state-action pair. The actionvalue function,
denoted as Q(s,a), represents the expected discounted reward for taking action a in state s and following the optimal policy
thereafter.
The Q-learning algorithm updates the Q-value for each state-action pair based on the observed rewards and the updated estimates of
the Q-values for the next state-action pair. The update rule for the Q-value is as follows:
Q(st,at) ← Q(st,at) + α[rt + γ maxQ(st+1,a) − Q(st,at)] a
where st is the current state, at is the current action, rt is the observed
reward, α is the learning rate, and γ is the discount factor.
To apply the Q-learning algorithm to the stock trading problem, we defined the state as a vector of the normalized returns for the last
n days, and the action as the decision to buy, sell or hold a stock. The reward was defined as the daily percentage return on the
portfolio value, which is calculated as the sum of the product of the number of shares of each stock and its closing price for that day.
We used an ϵ-greedy exploration strategy to balance exploration and exploitation during the learning process. The ϵ-greedy strategy
involves selecting a random action with probability ϵ and selecting the action with the highest Q-value with probability 1 − ϵ.
The algorithm was trained on the preprocessed stock price data using a sliding window approach, where the window size was set to
n days. The algorithm was trained for a total of 10,000 episodes, with each episode representing a trading day. The learning rate and
discount factor were set to 0.001 and 0.99, respectively.
After training, the algorithm was tested on a separate test set consisting of daily stock price data for the year 2020. The algorithm
was evaluated based on the cumulative return on investment (ROI) for the test period, which was calculated as the final portfolio
value divided by the initial portfolio value.
The trained algorithm was then compared to a benchmark strategy, which involved buying and holding the Nifty 50 index for the
test period. The benchmark strategy was evaluated based on the cumulative return on investment (ROI) for the test period. The
results were analyzed to determine the effectiveness of the reinforcement learning algorithm in generating profitable trading
strategies.

C. Trading Strategy
The trading strategy employed in this research involves the use of the DQN agent to learn the optimal action to take given the
current market state. The agent’s actions are either to buy or sell a stock, with the amount of shares to be bought or sold determined
by the agent’s output. The agent’s output is scaled to the available cash of the agent at the time of decision.
At the start of each episode, the agent is given a certain amount of cash and a fixed number of stocks. The agent observes the current
state of the market, which includes the stock prices, technical indicators, and any other relevant data. The agent then uses its neural
network to determine the optimal action to take based on its current state.
If the agent decides to buy a stock, the amount of cash required is subtracted from the agent’s total cash, and the corresponding
number of shares is added to the agent’s total number of stocks. If the agent decides to sell a stock, the corresponding number of
shares is subtracted from the agent’s total number of stocks, and the cash earned is added to the agent’s total cash.
At the end of each episode, the agent’s total wealth is calculated as the sum of the agent’s total cash and the current market value of
the agent’s remaining stocks. The agent’s reward for each time step is calculated as the difference between the current and previous
total wealth.
The training process of the DQN agent involves repeatedly running through episodes of the trading simulation, where the agent
learns from its experiences and updates its Q-values accordingly. The agent’s Q-values represent the expected cumulative reward for
each possible action given the current state.
During the training process, the agent’s experience is stored in a replay buffer, which is used to sample experiences for updating the
agent’s Q-values. The agent’s Q-values are updated using a variant of the Bellman equation, which takes into account the discounted
future rewards of taking each possible action.
Once the training process is complete, the trained DQN agent can be used to make trading decisions in a live market.

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 734
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com

D. Evaluation Metrics
The performance of the proposed quantitative trading system is evaluated using several metrics. The metrics used in this research are
as follows:
Cumulative Return Cumulative return is a measure of the total profit or loss generated by a trading strategy over a specific period of
time. It is calculated as the sum of the percentage returns over each period of time, with compounding taken into account.
Mathematically, the cumulative return can be expressed as:
CR = (1 + R1) ∗ (1 + R2) ∗ ... ∗ (1 + Rn) − 1
where CR is the cumulative return, R1, R2, ..., Rn are the percentage returns
over each period, and n is the total number of periods.
For example, if a trading strategy generates a return of 5% in the first period, 10% in the second period, and -3% in the third period,
the cumulative return over the three periods would be:
CR = (1 + 0.05) ∗ (1 + 0.10) ∗ (1 − 0.03) − 1
CR = 1.1175 − 1
CR = 0.1175 or 11.75%
This means that the trading strategy generated a total return of 11.75% over the three periods, taking into account compounding.
Sharpe Ratio It measures the excess return per unit of risk of an investment or portfolio, and is calculated by dividing the excess
return by the standard deviation of the returns.

The mathematical equation for the Sharpe ratio is:

where:
Rp = average return of the portfolio
Rf = risk-free rate of return (such as the yield on a U.S. Treasury bond) δp = standard deviation of the portfolio’s excess returns

The Sharpe ratio provides a way to compare the risk-adjusted returns of different investments or portfolios, with higher values
indicating better riskadjusted returns.
Maximum Drawdown It measures the largest percentage decline in a portfolio’s value from its peak to its trough. It is an important
measure for assessing the risk of an investment strategy, as it represents the potential loss that an investor could face at any given
point in time.

The mathematical equation for maximum drawdown is as follows:

where P is the peak value of the portfolio and Q is the minimum value of the portfolio during the drawdown period.
For example, suppose an investor’s portfolio peaks at $100,000 and subsequently falls to a minimum value of $70,000 during a
market downturn. The maximum drawdown for this portfolio would be:

30%

This means that the portfolio experienced a 30% decline from its peak value to its lowest point during the drawdown period.
Average Daily Return It measures the average daily profit or loss generated by a trading strategy, expressed as a percentage of the
initial investment. The mathematical equation for Average Daily Return is:

Where ADR is the Average Daily Return, Pf is the final portfolio value, Pi is the initial portfolio value, and N is the number of
trading days.
This formula calculates the daily percentage return by taking the difference between the final and initial portfolio values, dividing it
by the initial value, and then dividing by the number of trading days. The resulting value represents the average daily percentage
return generated by the trading strategy over the specified time period.

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 735
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com

The Average Daily Return metric is useful because it allows traders to compare the performance of different trading strategies on a
daily basis, regardless of the size of the initial investment. A higher ADR indicates a more profitable trading strategy, while a lower
ADR indicates a less profitable strategy.
Average Daily Trading Volume It measures the average number of shares or contracts traded per day over a specific period of time.
Mathematically, it can be calculated as follows:

where the total trading volume is the sum of the trading volume over a specific period of time (e.g., 1 year) and the number of
trading days is the number of days in which trading occurred during that period.
For example, if the total trading volume over the past year was 10 million shares and there were 250 trading days during that period,
the ADTV would be:

This means that on average, 40,000 shares were traded per day over the past year. ADTV is a useful metric for investors and traders
to assess the liquidity of a particular security, as securities with higher ADTVs generally have more market liquidity and may be
easier to buy or sell.
Profit Factor It measures the profitability of trades relative to the losses. It is calculated by dividing the total profit of winning trades
by the total loss of losing trades. The formula for calculating the Profit Factor is as follows:

A Profit Factor greater than 1 indicates that the strategy is profitable, while a Profit Factor less than 1 indicates that the strategy is
unprofitable. For example, a Profit Factor of 1.5 indicates that for every dollar lost in losing trades, the strategy generated $1.50 in
winning trades.
Winning Percentage It measures the ratio of successful outcomes to the total number of outcomes. It is calculated using the
following mathematical equation:

100%
For example, if a trader made 100 trades and 60 of them were successful, the winning percentage would be calculated as follows:

100% = 60%
A higher winning percentage indicates a greater proportion of successful outcomes and is generally desirable in trading.
Average Holding Period It measures the average length of time that an investor holds a particular investment. It is calculated by
taking the sum of the holding periods for each trade and dividing it by the total number of trades. The mathematical equation for
calculating AHP is:

where:
P
denotes the sum of the holding periods for all trades Exit Date is the date when the investment is sold Entry Date is the date when
the investment is bought Number of Trades is the total number of trades made For example, if an investor makes 10 trades over a
given period of time, and the holding periods for those trades are 10, 20, 30, 15, 25, 10, 20, 15, 30, and 25 days respectively, the
AHP would be:
10 + 20 + 30 + 15 + 25 + 10 + 20 + 15 + 30 + 25
AHP = = 21.5 days
10

This means that on average, the investor holds their investments for around 21.5 days before selling them. The AHP can be useful in
evaluating an investor’s trading strategy, as a shorter holding period may indicate a more active trading approach, while a longer
holding period may indicate a more passive approach.

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 736
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com

These evaluation metrics provide a comprehensive assessment of the performance of the proposed quantitative trading system. The
cumulative return and Sharpe ratio measure the overall profitability and risk-adjusted return of the system, respectively. The
maximum drawdown provides an indication of the system’s downside risk, while the average daily return and trading volume
provide insights into the system’s daily performance. The profit factor, winning percentage, and average holding period provide
insights into the trading strategy employed by the system.

V. FUTURE WORK
While the proposed quantitative trading system using reinforcement learning has shown promising results, there are several avenues
for future research and improvement. Some potential areas for future work include:

A. Incorporating More Data Sources


In this research, we have used only stock price data as input to the trading system. However, incorporating additional data sources
such as news articles, financial reports, and social media sentiment could improve the accuracy of the system’s predictions and
enhance its performance.

B. Exploring Alternative Reinforcement Learning Algorithms


While the DQN algorithm used in this research has shown good results, other reinforcement learning algorithms such as PPO, A3C,
and SAC could be explored to determine if they offer better performance.

C. Adapting to Changing Market Conditions


The proposed system has been evaluated on a single dataset covering a specific time period. However, the performance of the
system could be affected by changes in market conditions, such as shifts in market volatility or changes in trading patterns.
Developing methods to adapt the trading strategy to changing market conditions could improve the system’s overall performance.

D. Testing on Different Asset Classes


In this research, we have focused on trading individual stocks. However, the proposed system could be tested on different asset
classes such as commodities, currencies, or cryptocurrencies, to determine its applicability to different markets.

E. Integration with Portfolio Optimization Techniques


While the proposed system has focused on trading individual stocks, the integration with portfolio optimization techniques could
help to further enhance the performance of the trading system. By considering the correlation between different stocks and
diversifying the portfolio, it may be possible to reduce overall risk and increase returns.
Overall, the proposed quantitative trading system using reinforcement learning shows great potential for improving the performance
of automated trading systems. Further research and development in this area could lead to the creation of more sophisticated and
effective trading systems that can generate higher returns while reducing risk.

VI. CONCLUSION
The use of reinforcement learning in quantitative trading represents a promising area of research that can potentially lead to the
development of more sophisticated and effective trading systems.
The ability of the system to learn from market data and adapt to changing market conditions could enable it to generate superior
returns while reducing risk.
While the proposed system has shown promising results, there are still many areas for improvement and further research. Future
work could explore the use of alternative reinforcement learning algorithms, incorporate additional data sources, and test the system
on different asset classes. Additionally, the integration of portfolio optimization techniques could further enhance the performance
of the system.
Overall, our research has demonstrated the potential of using reinforcement learning in quantitative trading and highlights the
importance of continued research and development in this area. By developing more sophisticated and effective trading systems, we
can potentially improve the efficiency of financial markets and generate greater returns for investors.

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 737
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com

REFERENCES
[1] Bertoluzzo, M., Carta, S., & Duci, A. (2018). Deep reinforcement learning for forex trading. Expert Systems with Applications, 107, 1-9.
[2] Jiang, Z., Xu, C., & Li, B. (2017). Stock trading with cycles: A financial application of a recurrent reinforcement learning algorithm. Journal of Economic
Dynamics and Control, 83, 54-76.
[3] Moody, J., & Saffell, M. (2001). Learning to trade via direct reinforcement. IEEE Transactions on Neural Networks, 12(4), 875-889.
[4] Bertoluzzo, M., & De Nicolao, G. (2006). Reinforcement learning for optimal trading in stocks. IEEE Transactions on Neural Networks, 17(1), 212-222.
[5] Chen, Q., Li, S., Peng, Y., Li, Z., Li, B., & Li, X. (2019). A deep reinforcement learning framework for the financial portfolio management problem. IEEE
Access, 7, 163663-163674.
[6] Wang, R., Zhang, X., Li, T., & Li, B. (2019). Deep reinforcement learning for automated stock trading: An ensemble strategy. Expert Systems with
Applications, 127, 163-180.
[7] Xiong, Z., Zhou, F., Zhang, Y., & Yang, Z. (2020). Multi-agent deep reinforcement learning for portfolio optimization. Expert Systems with Applications, 144,
113056.
[8] Guo, X., Cheng, X., & Zhang, Y. (2020). Deep reinforcement learning for bitcoin trading. IEEE Access, 8, 169069-169076.
[9] Zhu, Y., Jiang, Z., & Li, B. (2017). Deep reinforcement learning for portfolio management. In Proceedings of the International Conference on Machine
Learning (ICML), Sydney, Australia.
[10] Gu, S., Wang, X., Chen, J., & Dai, X. (2021). Reinforcement learning for portfolio optimization in the presence of transaction costs. Journal of Intelligent &
Fuzzy Systems, 41(3), 3853-3865.
[11] Kwon, O., & Moon, K. (2019). A credit risk assessment model using machine learning and feature selection. Sustainability, 11(20), 5799.
[12] Li, Y., Xue, W., Zhu, X., Guo, L., & Qin, J. (2021). Fraud Detection for Online Advertising Networks Using Machine Learning: A Comprehensive Review.
IEEE Access, 9, 47733-47747.

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 738

You might also like