0% found this document useful (0 votes)

15 views6 pages

Deep Reinforcement Learning For Stock Portfolio Optimization

This paper explores stock portfolio optimization using Deep Reinforcement Learning (RL) by incorporating transaction costs and risk factors into the model. It compares various state-of-the-art RL algorithms, including Deep Deterministic Policy Gradient (DDPG) and Proximal Policy Optimization (PPO), and utilizes Minimum Variance Portfolio Theory for stock selection. The study also employs Wavelet Transform for price data denoising to enhance the agent's ability to exploit market patterns.

Uploaded by

Ivan Medić

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views6 pages

Deep Reinforcement Learning For Stock Portfolio Optimization

Uploaded by

Ivan Medić

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

International Journal of Modeling and Optimization, Vol. 10, No.

5, October 2020

Deep Reinforcement Learning for Stock Portfolio

Optimization
Le Trung Hieu

Reinforcement Learning. So, we will extend the work to

Abstract—Stock portfolio optimization is the process of stock market which is fluctuating in both directions in
constant re-distribution of money to a pool of various stocks. In contrast with the increasing trend of cryptocurrency. Besides
this paper, we will formulate the problem such that we can that, in contrast with the basic assumptions about stock
apply Reinforcement Learning for the task properly. To
maintain a realistic assumption about the market, we will
market [4]-[6], we will incorporate transaction cost and risk
incorporate transaction cost and risk factor into the state as factor into our state and reward system as well. We will
well. On top of that, we will apply various state-of-the-art Deep explore different state-of-the-art schemes of Deep
Reinforcement Learning algorithms for comparison. Since the Reinforcement Learning for this task.
action space is continuous, the realistic formulation were tested Next, we use Minimum Variance Portfolio Theory to
under a family of state-of-the-art continuous policy gradients select a subset of stocks to construct the portfolio, since we
algorithms: Deep Deterministic Policy Gradient (DDPG),
Generalized Deterministic Policy Gradient (GDPG) and
can get a lower-risk portfolio. If we choose all the stocks to
Proximal Policy Optimization (PPO), where the former two construct the portfolio, our portfolio will rely heavily on the
perform much better than the last one. Next, we will present overall market trend, so it is hard to make profit in bear
the end-to-end solution for the task with Minimum Variance market. We also perform Price Data Denoising using
Portfolio Theory for stock subset selection, and Wavelet Wavelet Transform, so our agent can exploit both high-
Transform for extracting multi-frequency data pattern. frequency patterns in original data(since it has all the noise
Observations and hypothesis were discussed about the results,
as well as possible future research directions.1
from high-frequency trading) and low-frequency patterns in
denoised data(since it removes the noise and uncover the
Index Terms—Reinforcement learning, stock trading, deep underlying low-frequency pattern).
learning, deterministic policy gradient, proximal policy After that, we discuss about the algorithms we implement
optimization, stock portfolio optimization. for this task. First, we go through the common Deep
Deterministic Policy Gradient (DDPG) used by many
existing works. Then, we discuss about two new variants of
I. INTRODUCTION DDPG: GDPG and PPO. Finally, we reach to our results and
In this project, we will explore the task of stock trading observations. The overview pseudo-code of each deep
using reinforcement learning. To be specific, we will work reinforcement learning implementation is attached in the
on the task of portfolio optimization, where the stock weight appendix.
distribution of the portfolio will be adjusted at the beginning Our main contributions in this paper are summarized as:
of each day to maximize profits and constraining some - Extend the work to stock market which is more realistic
certain risks [1]. than cryptocurrency market, and propose a better problem
The current main applications of machine learning to formulation for more realistic simulations.
stock trading is through a price prediction network of the -Explore newer state-of-the-art deep reinforcement
next market price state. As a supervised regression learning algorithms for the task.
problem, this idea is straightforward to implement. - Better end-to-end optimization with Minimum Variance
Unfortunately, the network prediction is not equal to the Portfolio Theory and Price Data Denoising using Wavelet
actions that the trading agents should take. Translating from Transform.
price prediction to agent action usually involves hard-coded
logic layer, which is not extensible and generalized.
Therefore, reinforcement learning was applied to utilize the II. PROBLEM FORMULATION
price prediction model for the trading agents to devise Given a period of time, for example one year, the investor
optimal action plans. will invest in a portfolio of stocks. To decrease the portfolio
The first wave of research on applying reinforcement risk, as commonly done, we maintain a portfolio of m+1
learning to financial market dates back to 1997 [2]. There assets, with 1 risk-free asset (Money) and m risky stock
are existing works on portfolio management using assets.
reinforcement learning [3]. However, they test it on crypto- After we train the agent, we will do back-testing on test
currency market which might not generalize well to stock dataset to assess its performance. To ease the purpose of
market since crypto-currency is more volatile and stochastic, back-testing, here are two hypotheses about the market that
and the strong overall increasing trend given the recent hype. we assume. Note that these assumptions are realistic given
Besides, they test it only on the baseline of Deep market with high volume transactions:
1) Zero Slippage: The market’s liquidity is high enough
Manuscript received December 18, 2019; revised July 20, 2020. that a trade can be transacted at exactly the price when
Le Trung Hieu is with National University of Singapore, Singapore (e-
mail: [email protected]).
order is placed.

DOI: 10.7763/IJMO.2020.V10.761 139

International Journal of Modeling and Optimization, Vol. 10, No. 5, October 2020

2) Zero Market Impact: The investment by the agent is portfolio at each period’s end, after the effect of market
insignificant to not affect the market at all. movement during the day. Therefore, the action space will
Here is the process for updating a portfolio daily. The be continuous and will need to use continuous action space
portfolio at the beginning of the day will change during the policy gradient to tackle this task.
day, due to the price fluctuation of each individual stock. At C. Reward
the end of the day, we will reallocate the weights of each A simple scheme of reward of each action is the change in
stock to get a new profit, which result in a portfolio that the portfolio value during the market movement. However,
remains the same until next day market open. The same this reward is not realistic because it is missing two
process happens again. Note that we assume the closing important factors. Firstly, it lacks the transaction cost
price of the current day is equal to the open price of the next incurred with re-allocating portfolio at the end of day.
day, which we believe is reasonable. Furthermore, the reward does not take into account the risk
We can see that the action now are actually the portfolio or volatility factor of the asset. We will encode this
weights. Therefore, this task is actually a continuous action information inside our reward function.
space reinforcement learning task. Next, we are going to We observe that for normal Markov Decision Process, the
define the states, actions and rewards of this agent. Before reward takes a form of discounted sum of rewards
that, we will go through some terminologies: ∑ γ 𝑟 (𝑠 , 𝑎 ) , however for the case of portfolio
1) Price Vector: 𝑣 of period t stores the closing price of management, the next wealth at period t actually depends on
all assets in period t: 𝑣{ , } is the closing price of asset I the wealth at period t−1 and the reward r in the form of
in period t. Note that the price of first asset is always product instead of sum: 𝑁𝑒𝑤_𝑊𝑒𝑎𝑙𝑡ℎ = 𝑂𝑙𝑑_𝑊𝑒𝑎𝑙𝑡ℎ ∗
constant since it is risk-free cash. 𝑅𝑒𝑤𝑎𝑟𝑑 . Therefore, a slight modification of taking the
logarithm of reward is used to transform the product form to
normal summation form.
𝑣 = 𝑣, 𝑣 , , 𝑣 , , . . , 𝑣 , (1) Therefore: Reward = log(wealth change - transaction cost)
+ (sharpe ratio that represents volatility factor).
2) Price Relative Vector: 𝑦 is defined as the element-
wise division of 𝑣 and 𝑣 𝑟(𝑠 ,𝑤 ) = 𝑙𝑜𝑔 𝑦 ⋅ 𝑤 − μ∑ 𝑤, −𝑤, (5)
+β𝐴

y = 1, ,
, ,
,…, ,
(2) where
, , ,
𝑣, −𝑣,
𝑣,
𝐴= 𝑤, 𝑣, −𝑣, 𝑣 −𝑣,
𝑠𝑡𝑑 ,…, ,
𝑣, 𝑣,
Preprocessing
Fig. 1. During day t, market movement (represented by Price Relative
Vector 𝑦 ) transforms the portfolio weights and portfolio values from 𝑤 D. Stock Selection for Portfolio
and 𝑝 to 𝑤 and 𝑝 . Then, at the end of day, we adjust the portfolio
weights from 𝑤 to 𝑤 , which incurs transaction cost and shrinks the To reduce the vast search space of the portfolio state, we
portfolio from 𝑝 to 𝑝 . will reduce the number of stocks in a portfolio. We will find
a minimum variance portfolio of 6 stocks from the overall
3) Portfolio weights and values after market movement: stocks list [7]. The empirical covariance for each pair of
stocks is obtained using historical data from the training set.
⊙ For every combination of 6 out of 50 stocks, we compute its
𝑤 ′ = ⋅
(3) optimal weight
𝐶 1
where ⊙ is element-wise multiplication, and ⋅ is dot 𝑤∗ =
1 𝐶 1
product between two column vectors 𝑦 and 𝑤
that produces the minimal variance
𝑝 = (𝑦 ⋅ 𝑤 )𝑝 (4) 1
σ =
1 𝐶 1
A. State
State stores the history of prices of each stock in the
portfolio over a window of time. E. Data Denoising
Therefore, the shape of the state will be (batch size, The time series data of stock usually oscillates frequently.
number of assets, window size, number of features). To understand this, we may consider two kinds of trading
participants: one is to take rational actions of buying or
B. Action selling, and this is represented by main tendency of the data.
Action now will become the weight distribution of the The other one is to take random actions since they may have

140
International Journal of Modeling and Optimization, Vol. 10, No. 5, October 2020

other consideration (e.g. need to get money for emergency 𝑑 ← 0 |𝑑| ≤ 𝑇

usage), and this is represented by oscillations (noise) in data. 𝑑 ← 𝑑 − 𝑇|𝑑| > 𝑇 (7)
Denoising is necessary to help us understand the rational
strategy, and then develop good policy [8].
We use the discrete wavelet transform to denoise on 1-D Then when we reconstruct by filtered detail coefficient,
data, because it is applicable to non-stationary series [9], the result will tend to approximation and less oscillate
which means the frequency could change in time. Wavelet because zero in detail coefficient prevent oscillating based
Transform has been frequently applied to the financial on approximation data. Fig. 3 using the same example as Fig.
market as well [10], [11]. It firstly decomposes the original 2 above shows the effect of denoising.
data to generate the approximation and detail coefficients.
The approximation coefficients approximate the tendency of III. METHODS
the data with less oscillation, and the detail coefficients
provide the frequency of the oscillation based on the A. Deep Deterministic Policy Gradient
approximation. Fig. 2 shows an example of two coefficients To tackle this problem, we need a Reinforcement
generated from the original data. Learning paradigm that can deal with continuous action
space. Recall that Deep-Q Learning will take in a state s and
return a vector 𝑎 = [𝑎 , 𝑎 , … , 𝑎 ]where 𝑎 represents the
probability of action i. Naively extending this scheme to
continuous action space means to extend the size of vector a
to a very large number, which will not work well.

Fig. 2. Original data and coefficients.

Fig. 4. Actor-critic architecture.

DDPG solves this issue by following the actor-critic

architecture in Fig. 4. An actor is used to output a vector that
represents the expected action, which can be seen as a policy
gradient method. Next, a critic is used to evaluate the
effectiveness of the output of the actor, and output Q-value
that measures the efficiency of the actor. Critic loss can be
used to update back the actor.
DDPG is based on the actor-critic architecture, with some
modifications. Firstly, the actor and critic networks are
approximated using a Deep Neural Network (θ for actor
and θ for critic). Next, we use the idea of separate target
network for both actors and critics, similar to Deep-Q
Fig. 3. Original data and denoised data. Learning, in order to stabilize the learning. The networks are
also randomly noised as a scheme to balance the
This decomposition is reversable, meaning that we can exploration-exploitation issue in Reinforcement Learning.
reconstruct the original data by these coefficients. To More information can be found in [13].
denoise, we should remove some of the detail coefficients. So the question now is how to update the actor policy. We
Therefore, we use a threshold T to filter out small noise. need to calculate the gradient of policy loss with respect to
Using the formula below adapted from [12]: actor parameters ∇ 𝐽 . According to the Deterministic
Policy Gradient Theorem in the original paper:
√ ∗ (| |)
𝑇= (6)
. ∇ 𝐽≈𝐸 ∼ ∇ 𝑄(𝑠, 𝑎|θ )| , 𝑠 θ
=𝐸 ∼ ∇ 𝑄(𝑠, 𝑎|θ )| , ( ) ∇ μ(𝑠|θ )|
where N is the size of original data, and D is the detail
coefficients.
Next, for each d in detail coefficients, apply:

141
International Journal of Modeling and Optimization, Vol. 10, No. 5, October 2020

min max 𝐽∗ (θ ) + α 𝐽(θ ) − 𝐽∗ (θ )

To update policy of actor, we will find gradient of

𝐽∗ (θ ) + α 𝐽(θ ) − 𝐽∗ (θ )

∇ 𝐽(θ ) = ∑ (1 − α) ∗ ∇θ μ(𝑠|θ )∇ 𝑄∗ (𝑠, 𝑎|θ ∗ ) +

α∇ μ(𝑠|θ )∇ 𝑄(𝑠, 𝑎|θ ) (8)

Fig. 5. DDPG actor-critic architecture. Note that actor and critic deep The main difference between DDPG and GDPG is that
neural networks take in both current state and previous portfolio weights. GDPG maintains a prediction neural network model, which
This is because it needs to learn not to diverge too much from the previous can predict the next market state given the current state. This
weights to prevent high transaction cost.
prediction neural network is used to build an augmented
critic network as in Figure 6. The actor is updated based on
a combination of gradients from both original model-free
critic network and augmented model-based critic network.
C. Proximal Policy Optimization
Proximal Policy Optimization (PPO) is another variant of
DDPG, which aims to improve updating actor policy.

Fig. 6. GDPD augmented critic network. Fig. 7. Policy loss function.

Recall the original policy gradient objective function in

B. Generalized Deterministic Policy Gradient: Fig. 7, we find that it is appealing to perform multiple steps
One of the problems with DDPG is that it assumes of optimization on this loss using the same trajectory. Doing
stochastic state transition. In fact, for most planning so is not well-justified, and empirically it often leads to
problems such as autonomous vehicle, the state transition destructively large policy updates [14].
might be a combination of both stochastic state transition To tackle this problem, PPO was proposed to make use of
(when in dynamics) and deterministic state transition (when a surrogate objective function of the original policy loss
noises are weak). However, the gradient of DDPG in such a function. Instead of using log to trace impact of action, we
new assumption is not well-defined, and can lead to weird will use the ratio between the probability of action under
behavior frequently. Main problem is that the model-free current policy divided by the probability of the action
DDPG is known to have high sampling complexity, which under previous policy. The ratio is formally defined as:
makes learning difficult. Transforming DDPG into
completely model-based can reduce the sampling π (𝑎 |𝑠 )
complexity. Unfortunately, purely model-based 𝑟 (θ) =
π old (𝑎 |𝑠 )
reinforcement learning can lead to slow convergence rate(or
sometimes huge divergence) if the environment is highly
With this new definition, the objective function now
dynamic (which is especially true for stock market).
becomes:
An idea is to combine model-free and model-based
approaches in some meaningful ways. With the above
π (𝑎 |𝑠 )
mentioned insights, a new variation of DDPG is proposed, 𝐿 (θ) = 𝐸 𝐴 = 𝐸 𝑟 (θ)𝐴
which is called General Deterministic Policy Gradient [14]. π old (𝑎 |𝑠 )
GDPG intuition is to maximize the long-term reward of the
augmented MDP(which is approximated by a model-based However, without any constraint, this policy still leads to
network) to reduce sample complexity, but at the same time excessive large policy updates. PPO proposed to clip the
constrain it to be less than the long-term reward of the objective function to penalize changes in policy that lead
original model-free MDP: ratio 𝑟 (𝜃) far away from 1. Therefore, the ratio is clipped
to the range of [1−ε,1+ε]. This net surrogate objective
max 𝐽∗ (θ ), s.t 𝐽∗ (θ ) ≤ 𝐽(θ ) function can constrain the update step in a much simpler
manner, and experiments in the PPO paper show it does
outperform the original objective function in terms of
Using Lagrangian dual theorem, the new objective
sample complexity.
function is transformed into:
(Note that A is the advantage value, which is defined as

142
International Journal of Modeling and Optimization, Vol. 10, No. 5, October 2020

𝐴 (𝑠, 𝑎) = 𝑄 (𝑠, 𝑎) − 𝑉 (𝑠) , which shows how good an minimum variance portfolio among all possible combination
action is compared to average of other actions at that state. of K stocks, as described in section 3.1. Unfortunately, the
In PPO, we will estimate the advantage value as result seems not promising, as in Fig. 8. Our hypothesis is
∑′ γ
′
𝑟 ′ − 𝑉(𝑠 ). that choosing the combinations of stocks with lowest risk
results in a lower-risk portfolio, but it also means the
promising profits cannot be high as well.
Instead, next we choose a portfolio of "AAPL", "PG",
"BSAC", "XOM" from different industries to slightly
diversify the portfolio. The result is illustrated in Fig. 9.
A. Observations
1) The best stock selection for initial portfolio, as
presented in section 3.1, is not a good idea. It gives a
too low-risk portfolio with also very low potential
profits.

Fig. 8. Result on safe portfolio with K=6.

2) Follow-the-winner and Follow-the-loser performs
poorly due to transaction cost incurred. That’s why
existing works do not take into account transaction
costs will lead to very misleading results.
3) DDPG has the best performance. However, in theory
GDPG can reach a better performance than DDPG by
reducing sample complexity. Our hypothesis for this
discrepancy is that since GDPG is very sensitive to the
accuracy of the model-based state-transition modelling
[14], in this case we use LSTM which is not a very
good model. Therefore, next price state prediction
model is a very important component for GDPG
performance, and it is a potential future work to
explore how price prediction model accuracy affects
GDPG.
Fig. 9. Final result. 4) PPO performs much worse compared to other agents,
despite the promising mathematical characteristics.
Our observation is that if the PPO actor network takes
IV. RESULTS in the previous portfolio weights, it will reach a similar
We experimented with stocks in the training dataset from path as UCRP. If we remove the previous portfolio
01/09/2012 to 31/12/2016. Then we back-test our agents weights from actor network, it will reach the path as in
from 01/01/2017 to 01/09/2017. We make use of 3 features Figure 9. The reason still remains unclear to us, it
(close price, high price, close price after wavelet transform). could be because stock market might not be suitable
The network we use for actor and critic are Convolutional for PPO.
Neural Networks. The neural network used for model the B. Conclusion and Future Work
state-transition of GDPG is a Long Short Term Memory In this project, we have explored the task of portfolio
Network. Since the focus of this project is on Reinforcement management with reinforcement learning, and obtained
Learning, we will not go into details of these networks. some insights from the result. There are many future
Baselines: To compare with DDPG, GDPG and PPO, we directions to continue from here. For example, we can
use 3 baselines: include all stocks in the portfolio, and the agent can learn to
[(a)] put 0 in many stocks except a few stocks. However, this task
1) Uniform constant rebalanced portfolios is easy to get stuck in local minimum, and transaction cost
(Benchmark): At the end of each day, the portfolio is prevents large shift of weights between days. Another
adjusted such that the weights are same for all stocks. direction is to provide better networks for actor, critic and
This is the common benchmark used for portfolio especially the state transition network of GPDG.
management research.
2) Follow-the-winner: Shift all the portfolio weights to CONFLICT OF INTEREST
the stock that has the highest return yesterday. This is The authors declare no conflict of interest.
based on the belief that it will keep continuing today.
AUTHOR CONTRIBUTIONS
3) Follow-the-loser: Shift all the portfolio weights to the
stock that has the lowest return yesterday. This is based Author Le Trung Hieu is the sole author of this paper.
on the belief that it has highest chance to improve
today. REFERENCES
[1] R. A. Haugen and R. A. Haugen, Modern Investment Theory, vol. 5,
Firstly, we construct our portfolio with the K stocks with
Prentice Hall Upper Saddle River, NJ, 2001.

143
International Journal of Modeling and Optimization, Vol. 10, No. 5, October 2020

[2] J. Moody, L. Z. Wu, Y. S. Liao, and M. Saffell, “Performance [12] D. B. Percival and A. T. Walden, “Wavelet-based signal estimation,”
functions and reinforcement learning for trading systems and Cambridge Series in Statistical and Probabilistic Mathematics,
portfolios,” Journal of Forecasting, vol. 17, pp. 441–470, 1998. Cambridge University Press, pp. 393–456, 2000.
[3] Z. Y. Jiang, D. X. Xu, and J. J. Liang, “A deep reinforcement learning [13] T. P. Lillicrap, J. J. Hunt, and A. Pritzel, N. M. O. Heess, T. Erez,
framework for the financial portfolio management problem,” 2017. Y. Tassa, D. Silver, and D. P. Wierstra, “Continuous control with
[4] J. Carapuco, R. Neves, and N. Horta, “Reinforcement learning applied deep reinforcement learning,” US Patent, 2017.
to forex trading,” Applied Soft Computing, vol. 7, pp. 783–794, 2018. [14] Q. P. Cai, L. Pan, and P. Z. Tang, “Generalized deterministic policy
[5] G. Jeong and H. Y. Kim, “Improving financial trading decisions gradient algorithms,” 2018.
using deep q-learning: Predicting the number of shares, action [15] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov.
strategies, and transfer learning,” Expert Systems with Applications, “Proximal policy optimization algorithms,” 2017.
vol. 117, pp. 125–138, 2019.
[6] J. Zhang and D. Maringer, “Indicator selection for daily equity trading Copyright © 2020 by the authors. This is an open access article distributed
with recurrent reinforcement learning,” in Proc. the 15th Annual under the Creative Commons Attribution License which permits
Conference Companion on Genetic and Evolutionary Computation, unrestricted use, distribution, and reproduction in any medium, provided
2013, pp. 1757–1758. the original work is properly cited (CC BY 4.0).
[7] R. Clarke, H. D. Silva, and S. Thorley, “Minimum-variance portfolio
composition,” The Journal of Portfolio Management, vol. 37, no. 2,
pp. 31–45, 2011.
[8] K. K. Lai and J. Huang, The Application of Wavelet Transform in Le Trung Hieu was born in Vietnam, 1998. He is in his
Stock Market, JAIST Press, 2007. final year of honors bachelor degree at National University
[9] M. Rhif, A. B. Abbes, I. R. Farah, B. Martınez, and Y. F. Sang, of Singapore, majoring in computer science. He is
“Wavelet transform application for/in non-stationary time-series pursuing the artificial intelligence focus in his career and
analysis: A review,” Applied Sciences, vol. 9, no. 7, pp. 1345, 2019. study path. His relevant experience includes a research
[10] Z. X. Liu, “Analysis of financial fluctuation based on wavelet attachment at NUS-Tsinghua lab, and computer vision
transform,” Francis Academic Press, 2019. engineer internships at Microsoft in Singapore and a
[11] J. Nobre and R. F. Neves, “Combining principal component analysis, startup in Israel. Before that, he worked in software engineering roles at
discrete wavelet transforms and xgboost to trade in the financial Goldman Sachs and Sea Group in Singapore.
markets,” Expert Systems with Applications, vol. 125, pp. 181–194,
2019.

144

Dynamic Replication Hedging Nyu P Kolm
No ratings yet
Dynamic Replication Hedging Nyu P Kolm
41 pages
2利用深度強化學習於股票市場 unlocked
No ratings yet
2利用深度強化學習於股票市場 unlocked
37 pages
A Futures Quantitative Trading Strategy Based On A Deep Reinforcement Learning Algorithm
No ratings yet
A Futures Quantitative Trading Strategy Based On A Deep Reinforcement Learning Algorithm
5 pages
Model-Free Reinforcement Learning For Asset Allocation: Practicum Final Report
No ratings yet
Model-Free Reinforcement Learning For Asset Allocation: Practicum Final Report
69 pages
Robust Machine Learning Pipelines For Trading Mark
No ratings yet
Robust Machine Learning Pipelines For Trading Mark
29 pages
1 s2.0 S095741742303083X Main
No ratings yet
1 s2.0 S095741742303083X Main
16 pages
Quantitative Trading of Stocks Based On TD3 Algori
No ratings yet
Quantitative Trading of Stocks Based On TD3 Algori
8 pages
A Mean-VaR Based Deep Reinforcement Learning Framework For Practical Algorithmic Trading
No ratings yet
A Mean-VaR Based Deep Reinforcement Learning Framework For Practical Algorithmic Trading
14 pages
Final
No ratings yet
Final
35 pages
Financial Trading As A Game: A Deep Reinforcement Learning Approach
No ratings yet
Financial Trading As A Game: A Deep Reinforcement Learning Approach
15 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
16 pages
Final
No ratings yet
Final
33 pages
Neurips 2018
No ratings yet
Neurips 2018
7 pages
Algorithms 16 00023 v2
No ratings yet
Algorithms 16 00023 v2
17 pages
1 s2.0 S2215098621000070 Main
No ratings yet
1 s2.0 S2215098621000070 Main
12 pages
Reinforcement Learning For Quantitative Trading - 2021
No ratings yet
Reinforcement Learning For Quantitative Trading - 2021
26 pages
1 s2.0 S095741742300057X Main
No ratings yet
1 s2.0 S095741742300057X Main
12 pages
Axioms 09 00130
No ratings yet
Axioms 09 00130
15 pages
17CH10019 BTP Ii Report
No ratings yet
17CH10019 BTP Ii Report
22 pages
Mathematics 12 01621
No ratings yet
Mathematics 12 01621
22 pages
Stock Trading Strategy Developing Based On Reinforcement Learning
No ratings yet
Stock Trading Strategy Developing Based On Reinforcement Learning
9 pages
Applsci 13 01956
No ratings yet
Applsci 13 01956
27 pages
5th Alternative Paper
No ratings yet
5th Alternative Paper
31 pages
A Multi-Layer and Multi-Ensemble Stock Trader Using Deep Learning and Deep Reinforcement Learning - Anna's Archive
No ratings yet
A Multi-Layer and Multi-Ensemble Stock Trader Using Deep Learning and Deep Reinforcement Learning - Anna's Archive
17 pages
VC Comp Deep Reinforcement Learning Accepted
No ratings yet
VC Comp Deep Reinforcement Learning Accepted
20 pages
Shortterm Stock Market Timing Prediction Under Reinforcement Learning Schemes
No ratings yet
Shortterm Stock Market Timing Prediction Under Reinforcement Learning Schemes
8 pages
Reinforcement Learning With Maskable Stock Representation For Portfolio Management in Customizable Stock Pools
No ratings yet
Reinforcement Learning With Maskable Stock Representation For Portfolio Management in Customizable Stock Pools
12 pages
A Deep Q-Learning Portfolio Management Framework For The Cryptocurrency Market
No ratings yet
A Deep Q-Learning Portfolio Management Framework For The Cryptocurrency Market
16 pages
Icbda49040 2020 9101333
No ratings yet
Icbda49040 2020 9101333
8 pages
INT 423 RP
No ratings yet
INT 423 RP
8 pages
Multi-Agent Q Learning Daily Trading 2007
No ratings yet
Multi-Agent Q Learning Daily Trading 2007
14 pages
Bridging The Gap Between Markowitz Planning and Drl+-+paper
No ratings yet
Bridging The Gap Between Markowitz Planning and Drl+-+paper
10 pages
978-3-030-41068-1 (1) - 366-437
No ratings yet
978-3-030-41068-1 (1) - 366-437
72 pages
Deep Reinforcement Learning Approach For Trading Automation in The Stock Market
No ratings yet
Deep Reinforcement Learning Approach For Trading Automation in The Stock Market
10 pages
Deep Reinforcement Learning Robots For Algorithmic Trading Considering Stock Market Conditions and U.S. Interest Rates
No ratings yet
Deep Reinforcement Learning Robots For Algorithmic Trading Considering Stock Market Conditions and U.S. Interest Rates
21 pages
2007short-Term Stock Market Timing Prediction Under
No ratings yet
2007short-Term Stock Market Timing Prediction Under
8 pages
Stock Trading Strategies Based On Deep Reinforcement Learning
No ratings yet
Stock Trading Strategies Based On Deep Reinforcement Learning
15 pages
JinElSaawy PortfolioManagementusingReinforcementLearning Report
No ratings yet
JinElSaawy PortfolioManagementusingReinforcementLearning Report
6 pages
Quantitative Trading Using Deep Q Learning
No ratings yet
Quantitative Trading Using Deep Q Learning
13 pages
Scientific Programming - 2022 - Zhang - Deep Reinforcement Learning For Stock Prediction
No ratings yet
Scientific Programming - 2022 - Zhang - Deep Reinforcement Learning For Stock Prediction
9 pages
Research On Portfolio Optimization Based On Deep Reinforcement Learning
No ratings yet
Research On Portfolio Optimization Based On Deep Reinforcement Learning
5 pages
Algorithmic Trading On Financial Time Series Using
No ratings yet
Algorithmic Trading On Financial Time Series Using
20 pages
Ensemble
No ratings yet
Ensemble
8 pages
Deep Reinforcement Learning For Stock Prediction
No ratings yet
Deep Reinforcement Learning For Stock Prediction
6 pages
Portfolio Optimization in Dynamic Markets Reinforcement Learning For Investment 2024
No ratings yet
Portfolio Optimization in Dynamic Markets Reinforcement Learning For Investment 2024
10 pages
A Novel RMS-Driven Deep Reinforcement Learning For
No ratings yet
A Novel RMS-Driven Deep Reinforcement Learning For
23 pages
Ultratech Cement: Particulars Test Results Requirements of
100% (1)
Ultratech Cement: Particulars Test Results Requirements of
1 page
Transformer-Based Reinforcement Learning Model For Optimized Quantitative Trading
No ratings yet
Transformer-Based Reinforcement Learning Model For Optimized Quantitative Trading
2 pages
A Deep Reinforcement Learning Framework For Dynamic Portfolio Optimization Evidence From China Stock Market
No ratings yet
A Deep Reinforcement Learning Framework For Dynamic Portfolio Optimization Evidence From China Stock Market
27 pages
Beating The Stock Market With A Deep Reinforcement Learning Day Trading System
No ratings yet
Beating The Stock Market With A Deep Reinforcement Learning Day Trading System
8 pages
A Novel Deep Reinforcement Learning Based Automated Stock Trading System Using Cascaded LSTM Networks
No ratings yet
A Novel Deep Reinforcement Learning Based Automated Stock Trading System Using Cascaded LSTM Networks
14 pages
Deep Reinforcement Learning For Automated Stock Trading - An Ensemble Strategy
No ratings yet
Deep Reinforcement Learning For Automated Stock Trading - An Ensemble Strategy
9 pages
1999forecasting Series-Based Stock Price Data Using
No ratings yet
1999forecasting Series-Based Stock Price Data Using
6 pages
Application of Deep Reinforcement Learning in Stoc
No ratings yet
Application of Deep Reinforcement Learning in Stoc
19 pages
Multimodal Deep Reinforcement Learning For
No ratings yet
Multimodal Deep Reinforcement Learning For
24 pages
CNN DDPG
No ratings yet
CNN DDPG
12 pages
Deep Robust Reinforcement Learning For Practical Algorithmic Trading
No ratings yet
Deep Robust Reinforcement Learning For Practical Algorithmic Trading
9 pages
Deep Reinforcement Learning Approach For Trading Automation in The Stock Market
No ratings yet
Deep Reinforcement Learning Approach For Trading Automation in The Stock Market
11 pages
Sahil Khaja Huzoor AMS 517 Report
No ratings yet
Sahil Khaja Huzoor AMS 517 Report
11 pages
A Novel Deep Reinforcement Learning Based Automated Stock Trading System Using Cascaded LSTM Networks
No ratings yet
A Novel Deep Reinforcement Learning Based Automated Stock Trading System Using Cascaded LSTM Networks
11 pages
Teste 10 Ano - Technology (Inglês)
100% (2)
Teste 10 Ano - Technology (Inglês)
5 pages
Chapter 8-Vector Control of Induction Motors PDF
No ratings yet
Chapter 8-Vector Control of Induction Motors PDF
18 pages
Technology and Livelihood Education: Module 5 & 6
No ratings yet
Technology and Livelihood Education: Module 5 & 6
18 pages
Soft Computing Stock Market Price Prediction For The Nigerian Stock Exchange
No ratings yet
Soft Computing Stock Market Price Prediction For The Nigerian Stock Exchange
5 pages
The Value of AI Innovations: Wilbur X. Chen Terrence Tianshuo Shi Suraj Srinivasan
No ratings yet
The Value of AI Innovations: Wilbur X. Chen Terrence Tianshuo Shi Suraj Srinivasan
63 pages
Using Llms For Market Research: James Brand Ayelet Israeli Donald Ngwe
No ratings yet
Using Llms For Market Research: James Brand Ayelet Israeli Donald Ngwe
48 pages
1 s2.0 S0165783624001991 Maina
No ratings yet
1 s2.0 S0165783624001991 Maina
16 pages
1 s2.0 S2667305324001236 Main
No ratings yet
1 s2.0 S2667305324001236 Main
22 pages
5 DingFinal
No ratings yet
5 DingFinal
12 pages
7 - Supendi
No ratings yet
7 - Supendi
13 pages
1 s2.0 S0957417425004257 Main
No ratings yet
1 s2.0 S0957417425004257 Main
27 pages
Financial Market Prediction: Keywords
No ratings yet
Financial Market Prediction: Keywords
12 pages
Yuyan Luo, Ziwei Yang, Yiling Ren, Marinko Škare, Yong Qin
No ratings yet
Yuyan Luo, Ziwei Yang, Yiling Ren, Marinko Škare, Yong Qin
17 pages
1 s2.0 S1059056024004945 Main
No ratings yet
1 s2.0 S1059056024004945 Main
15 pages
1 s2.0 S2405851324000370 Main
No ratings yet
1 s2.0 S2405851324000370 Main
16 pages
1 s2.0 S2405844024001671 Main
No ratings yet
1 s2.0 S2405844024001671 Main
14 pages
1 s2.0 S2468227624003168 Main
No ratings yet
1 s2.0 S2468227624003168 Main
14 pages
79 Ploutos Towards Interpretable
No ratings yet
79 Ploutos Towards Interpretable
14 pages
1 s2.0 S2667305324001133 Main
No ratings yet
1 s2.0 S2667305324001133 Main
13 pages
1 s2.0 S2468696424000181 Main
No ratings yet
1 s2.0 S2468696424000181 Main
10 pages
Math Homework Tic Tac Toe
100% (1)
Math Homework Tic Tac Toe
8 pages
The Evolution of AI in Autonomous Systems: Innovations, Challenges, and Future Prospects
No ratings yet
The Evolution of AI in Autonomous Systems: Innovations, Challenges, and Future Prospects
7 pages
1 s2.0 S1544612323009522 Main
No ratings yet
1 s2.0 S1544612323009522 Main
7 pages
Digital System Design (EC 302) - MCQ (Google Classroom Uploading)
No ratings yet
Digital System Design (EC 302) - MCQ (Google Classroom Uploading)
7 pages
0068KJR - KJR 25 865
No ratings yet
0068KJR - KJR 25 865
4 pages
Specifications Guide Electric Range EN
No ratings yet
Specifications Guide Electric Range EN
2 pages
Brosura Hitachi ZW150-6 EN
No ratings yet
Brosura Hitachi ZW150-6 EN
28 pages
Maths Links 8c Homework Book Answers
100% (1)
Maths Links 8c Homework Book Answers
4 pages
InDesign 100 Real Shortcuts Hinglish
No ratings yet
InDesign 100 Real Shortcuts Hinglish
3 pages
Uc Problems For Transmission System, Models and Approaches
No ratings yet
Uc Problems For Transmission System, Models and Approaches
21 pages
8b 10b Encode Decode
No ratings yet
8b 10b Encode Decode
5 pages
Question 1: Write An Assembly Code To Input An Uppercase Letter and Output The Letter in Lowercase Form
No ratings yet
Question 1: Write An Assembly Code To Input An Uppercase Letter and Output The Letter in Lowercase Form
7 pages
Alto DJM-2 Mixer Schematics
No ratings yet
Alto DJM-2 Mixer Schematics
34 pages
Unit 8
No ratings yet
Unit 8
4 pages
1 s2.0 S1062976924000255 Main
No ratings yet
1 s2.0 S1062976924000255 Main
9 pages
Installation and Testing of Battery & Battery Charger
No ratings yet
Installation and Testing of Battery & Battery Charger
3 pages
Skoda Enyaq Brochure April 2024
No ratings yet
Skoda Enyaq Brochure April 2024
43 pages
Macine Design
No ratings yet
Macine Design
40 pages
ILAC - Members (By Category)
No ratings yet
ILAC - Members (By Category)
11 pages
US Gov National Standards Strategy 2023
No ratings yet
US Gov National Standards Strategy 2023
14 pages
Bikku
No ratings yet
Bikku
1 page
Bcs-41 Jadi Buti
No ratings yet
Bcs-41 Jadi Buti
3 pages
Message
No ratings yet
Message
30 pages
Capacidades de Reabastecimento R1700K
No ratings yet
Capacidades de Reabastecimento R1700K
2 pages
Geotechnical Earthquake Engineering: Dr. Deepankar Choudhury
No ratings yet
Geotechnical Earthquake Engineering: Dr. Deepankar Choudhury
40 pages
KOM-MICS, A "Tsunagaruka" System For Production Sites: Technical Paper
No ratings yet
KOM-MICS, A "Tsunagaruka" System For Production Sites: Technical Paper
6 pages
1.18 Omron Ground Fault Relay Catalogue
No ratings yet
1.18 Omron Ground Fault Relay Catalogue
3 pages
Running Head: DATA STRUCTURES 1: Course: Project Name: Student Name: Date
No ratings yet
Running Head: DATA STRUCTURES 1: Course: Project Name: Student Name: Date
7 pages
Trends1 Aio Pretest
No ratings yet
Trends1 Aio Pretest
4 pages
Debian Server
No ratings yet
Debian Server
12 pages
Backtrader Essentials: Building Successful Strategies with Python
From Everand
Backtrader Essentials: Building Successful Strategies with Python
Ali AZARY
No ratings yet

Deep Reinforcement Learning For Stock Portfolio Optimization

Uploaded by

Deep Reinforcement Learning For Stock Portfolio Optimization

Uploaded by

International Journal of Modeling and Optimization, Vol. 10, No.

Deep Reinforcement Learning for Stock Portfolio

Reinforcement Learning. So, we will extend the work to

DOI: 10.7763/IJMO.2020.V10.761 139

other consideration (e.g. need to get money for emergency 𝑑 ← 0 |𝑑| ≤ 𝑇

Fig. 2. Original data and coefficients.

Fig. 4. Actor-critic architecture.

DDPG solves this issue by following the actor-critic

min max 𝐽∗ (θ ) + α 𝐽(θ ) − 𝐽∗ (θ )

To update policy of actor, we will find gradient of

∇ 𝐽(θ ) = ∑ (1 − α) ∗ ∇θ μ(𝑠|θ )∇ 𝑄∗ (𝑠, 𝑎|θ ∗ ) +

Fig. 6. GDPD augmented critic network. Fig. 7. Policy loss function.

Recall the original policy gradient objective function in

Fig. 8. Result on safe portfolio with K=6.

You might also like