0% found this document useful (0 votes)

29 views8 pages

Dynamic Asset Allocation

This document provides a summary of a final report on using reinforcement learning for dynamic asset allocation. It outlines two models - a simple model with discrete states and actions, and a more advanced model that incorporates a longer history of stock returns. The objective is to optimize portfolio returns by dynamically determining the optimal stock weight. The models are implemented using online reinforcement learning algorithms to learn the optimal policy and outperform a static bond baseline. Real stock and bond return data from 2000-2016 is used to train and evaluate the strategies.

Uploaded by

Karina Wahyu Noviyanti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views8 pages

Dynamic Asset Allocation

Uploaded by

Karina Wahyu Noviyanti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Final report: Dynamic Asset Allocation

Using Reinforcement Learning

Enguerrand Horel, Rahul Sarkar, Victor Storchan
December 16, 2016

1 Introduction and objective

The role of portfolio managers is to optimize the allocation of capital across various financial
assets such as stocks, bonds or funds to maximize a chosen performance metric like expected
returns or risk-adjusted returns.
This task is traditionally performed in two steps as described in [4]. First, it is necessary
to predict the expected returns, and the covariance matrix of the returns of the assets
considered. Then a quadratic programming problem can be solved to either minimize the
risk of the portfolio for a constrained return or maximize its return for a given risk.
Finding an optimal way of allocating assets in a portfolio in order to outperform the
market is still a very active research area in finance as recently reviewed in [8].
Reinforcement learning, as a global procedure to maximize cumulative reward from a
random underlying environment, fits perfectly with the problem of asset allocation [6], [5].
Hence we need to model our problem as a Markov Decision process (MDP). Finding the
optimal policy from this model can then be solved using model based reinforcement learning
[3] or Q-learning [9], [7].

2 Models
2.1 Outline
We consider the traditional capital allocation problem that is based on a portfolio of two
assets: a risk-free asset (fixed-income security like a treasury bond) and a risky asset (such
as an individual stock). We define the portfolio by two weights wB and wS that represent
the proportion of the portfolio invested in the bond and in the stock respectively. We further
assume that all the portfolio’s money is invested which is formulated as: wB + wS = 1.
In this framework, our objective is to dynamically determine the optimal wS that would
optimize the overall return of the portfolio.
In the MDP framework, we consider states st for 1 ≤ t ≤ T , T representing our final
investment date, that combines both the stock rS (t) and bond rB (t) returns. We have
decided to consider two different models that gradually make our problem representation
more complex but also more realistic. We consider one simple model with few discrete state
and action spaces and a more advance model with a larger number of states and actions.
We describe more formally these two different models in the next subsections.

1
2.2 First model
• States: st = r̃S (t).
( The stock returns are discretized into positive and negative returns,
−1 rS (t) ≤ 0
i.e., ∀t, r̃S (t) = . Since the return of the bond is constant over time,
1 0 < rS (t)
it doesn’t bring much information and this is why it is not included into the state.

• Actions: at is the weight wS the investor assign to the stock. They are discretized such
that at ∈ {0, 1}. Hence, in this model, the capital is fully allocated in the stock or
fully invested into the bond.

• Rewards: Rt defined as the instant return: Rt = at ∗ rS (t + 1) + (1 − at ) ∗ rB (t + 1).

• IsEnd(st ) = 1t=T , investments are done until the final investment date.

• Discount factor: γ = 1

2.3 Second model

Since we believe from a financial point of view that it is the evolution of the returns over
several days that will contain the most information to predict future returns, we decided to
introduce a hyper parameter mem which encodes the amount of relevant past information
to include into the current state.

• States: st = (r̃S (t − mem), ..., r̃S (t − 1), r̃S (t)). The states are discretized into three
buckets based on the two terciles q 33% and q66% that divide the stock return values into

−1 rS (t) ≤ q33%


three equal parts, i.e. ∀t, r̃S (t) = 0 q33% < rS (t) ≤ q66%

1 q66% < rS (t)


• Actions: at = {0, 0.2, 0.4, 0.6, 0.8, 1}

• Rewards: same as in first model

• IsEnd(st ): same as in first model

• Discount factor: same as in first model

Although the transition probabilities are unknown, we assume they satisfy the following
relation:
T (st , at , st+1 ) = P(st+1 |st , at ) = P(st+1 |st )
which means that the investor’s action has no impact on the market.

2
2.4 Baseline and Oracle
2.4.1 Baseline
The baseline chosen here is simply the bond performance that represents a 2% annual return.
The policy of the baseline is to keep investing the whole portfolio on the bond at each time
period. The goal of this project is then to find a policy that would beat the risk-free asset
performance by including a risky asset with higher returns.

2.4.2 Oracle
Our oracle makes use of future returns data, one week ahead. It is computed as follow: for
each week, we compute the average return of the stock and the bond over the week. The
oracle allocates 100% of the portfolio on the better performing asset during that week.

3 Data
The data used in this project are the daily returns of a bond and a stock. We decided to
use a virtual 2% annual return bond and the Walmart stock. Walmart has been chosen
among other stocks because it satisfies the following criteria: it is a very volatile stock that
contrasts well with the risk-free bond, it does not follow any upward or downward trend
over the period considered and its final cumulative return is roughly the same as the bond.
Hence, a strategy that can outperform the baseline would really be able to benefit from the
extra return of Walmart during the upward periods.
The returns are calculated from the daily closing price downloaded from Google Finance.
The training period spans from 2000 to 2008 included and the testing period spans from
2009 to November 2016.
The cumulative returns of the baseline and Walmart stock over the training and testing
period can be seen in Figure 1.

4 Algorithms
4.1 First model implementation
Let’s recall that our input data are daily returns of Apple stock and Dow Jones index
from 2010 to 2016 that are stored in arrays ra and rm of size T . For the simple model we
implement an online model based reinforcement learning algorithm. As explained before, we
only have four states : {(1,1), (1,-1), (-1,1), (-1,-1)}, and two actions {0,1}. The model based
reinforcement learning idea is that given a state st and time t, we define the best action as:
X X
āt = max R̂(st , a) + T (st , a, s0 )V̂opt (s0 ) = max R̂(st , a) + P(s0 |st )V̂opt (s0 ) = max R̂(st , a)
a a a
s0 s0
(1)
In the above equation, R̂(st , a) is an estimate of the average reward obtained when action
a is taken from the state st . V̂opt (s0 ) is an estimate of the expected reward of the state s0 ,

3
Figure 1: Cumulative returns of the bond and Walmart stock throughout the training and
testing period

under the optimal policy. But due to the assumption that P(st+1 |st , at ) = P(st+1 |st ), we get
the final simplification.
The algorithm is implemented by maintaining two global lists N (s, a) and ρ(s, a), for
all possible pairs of states s and actions a. At any time t, N (s, a) stores the count of the
number of times the action a was taken from the state s, while ρ(s, a) stores the cumulative
sum of the previous rewards obtained every time the action a was taken from the state
s. Hence, at time t, for any state action pair (st , a), we estimate the average reward as
R̂(st , a) = ρ(st , a)/N (st , a). Finally, the best action āt can now be computed using equation
(1). We adopt an -greedy strategy to get the final action at to take at time t, where
0 ≤ ≤ 1. In case of the simple model, we choose a fixed value of = 0.001. We generate a
random number q using an uniform distribution in [0, 1]. If q ≥ , we set at = āt , otherwise
we choose at ∈ {0, 1} randomly.
The final step is to update the global lists N (s, a) and ρ(s, a). N (s, a) is updated as
N (st , at ) ← N (st , at ) + 1, and ρ(s, a) is updated as ρ(st , at ) ← ρ(st , at ) + rt . The algorithm
then proceeds to the next time t + 1 and the process is repeated.

4.2 Second model implementation

The second model implementation attempts to refine the first model based learning. This
refinement is done by considering a state space that takes into account the history of the
stock and that classify the returns in a larger number of classes. The three classes are de-
fined by the two terciles of the return values. More precisely, the 81 states considered are
(r̃S (t − 3), r̃S (t − 2), r̃S (t − 1), r̃S (t)), where each r̃S (t) ∈ {−1, 0, 1} and the 6 actions are:
0, 0.2, 0.4, 0.6, 0.8, 1. As one can see, a memory length of 4 days is used, i.e. the algorithm

4
learns at date t using the returns at times t, t − 1, t − 2, t − 3.

Dealing with Q-learning implementation, we used the Watkins and Dayan algorithm.
On each (st ,a,r,st+1 ):
Q̂opt (st , a) = (1 − η)Q̂opt (st , a) + η(r + γ V̂opt (st+1 ))
where
V̂opt (st+1 ) = max Q̂opt (st+1 , a)
a∈Actions(st+1 )

Additionally, an -greedy policy is used for the sake of exploration. = 0.4 during training
and = 0.01 during testing as we expect to have a quasi-deterministic policy at testing.

The hyperparameters and mem, the memory length have been set doing a grid-search
on a validation set.

5 Results and discussions

Figure 2 and Figure 3 show the cumulative returns of method 1 and method 2 respectively.
The annualized returns, volatility and Sharpe ratio are display in Table 1. The Sharpe ratio
R − Rf
is defined as: SR = p , where R is the annualized return of the strategy considered
var(R)
and Rf is the risk-free annualized return.

Annualized returns Annualized volatility Sharpe ratio

Baseline 2%
Walmart stock 4.8 % 17 % 16 %
First method 8.3 % 13 % 48 %
Second method 5.2 % 9.4 % 34 %
Oracle 15.8 % 4.4 % 313 %

Table 1: Annualized returns, volatility and Sharpe ratio of the baseline, the oracle, Walmart
stock and the two models

It can be seen that our two models achieve to capture local trend of stocks and market
to learn an efficient investment strategy. It is interesting to notice that the two methods
achieve different kind of investment strategies:
• the model based learning method outperforms significantly both the baseline and the
stocks in terms of cumulative returns but with a rather high volatility,
• in comparison, the Q-Learning algorithm is very efficient at decreasing the volatility of
the portfolio while keeping satisfying returns. This is certainly due to a larger action
space that allows more flexibility in the investment strategy.

5
Figure 2: Comparison of cumulative returns of the baseline, the oracle, Walmart stock and
the first model

Figure 3: Comparison of cumulative returns of the baseline, the oracle, Walmart stock and
the second model

It is worth mentioning that even though both methods use an-greedy policy during the
training phase, they do not present the same variance in their results. The model based

6
learning is very consistent across different simulations but the Q-learning results show much
more variance between runs.

We would like to add that we also implemented a Q-learning algorithm with function ap-
proximation. Indeed, it is clear that a continuous state space is a better model for continuous
return values. We considered a linear function and a neural network for the approximation.
However, in both cases, the weights corresponding to the model parameters failed to con-
verge. Because of that, no satisfying results could be obtained. We experimented across a
large space of learning rate parameters to control the evolution of the parameters but in all
cases the weights diverged. This may be due to a lack of normalization of the states.

6 Next steps
Here are some further research directions that could be interesting to consider as next steps:

• One assumption we made is that the investor has the possibility of reallocating weights
across assets without any trading fees. To make it more realistic, we would like to take
into account trading fees. Because of that we can expect a decrease of the performance
and fewer reallocation over time.

• We would like to think more about the trade-off between the exploration of the states
and the exploitation of a deterministic investment strategy. Indeed, in this project, we
decided to first train the model with a large exploration rate and then we exploit it with
almost no randomness. However, an online-learning strategy could be more relevant
to allow the investment strategy to be more flexible to react to market changes.

7
References
[1] David Barber. Bayesian reasoning and machine learning. Cambridge University Press,
2012.

[2] Matthew Hausknecht and Peter Stone. Deep reinforcement learning in parameterized
action space. arXiv preprint arXiv:1511.04143, 2015.

[3] D Kuvayev and Richard S Sutton. Model-based reinforcement learning. Technical report,
Citeseer, 1997.

[4] Harry Markowitz. Portfolio selection. The journal of finance, 7(1):77–91, 1952.

[5] John Moody and Matthew Saffell. Learning to trade via direct reinforcement. IEEE
transactions on neural Networks, 12(4):875–889, 2001.

[6] John Moody, Lizhong Wu, Yuansong Liao, and Matthew Saffell. Performance functions
and reinforcement learning for trading systems and portfolios. Journal of Forecasting,
17(56):441–470, 1998.

[7] Ralph Neuneier et al. Enhancing q-learning for optimal asset allocation. In NIPS, pages
936–942, 1997.

[8] Jessica Wachter. Asset allocation. Technical report, National Bureau of Economic Re-
search, 2010.

[9] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3-4):279–
292, 1992.

MITx 6.86x Notes - MD
No ratings yet
MITx 6.86x Notes - MD
91 pages
Machine Learning Absolute Beginners Introduction 2nd PDF
100% (2)
Machine Learning Absolute Beginners Introduction 2nd PDF
128 pages
DLMAIRIL01 Q4-2024 Session4
No ratings yet
DLMAIRIL01 Q4-2024 Session4
80 pages
Introduction to Applied Econometrics Analysis Using Stata
From Everand
Introduction to Applied Econometrics Analysis Using Stata
Justin Doran
5/5 (3)
Cs224n 2025 Lecture10 Instruction Tunining RLHF
No ratings yet
Cs224n 2025 Lecture10 Instruction Tunining RLHF
61 pages
RL Unit 4
No ratings yet
RL Unit 4
9 pages
Dynamic Replication Hedging Nyu P Kolm
No ratings yet
Dynamic Replication Hedging Nyu P Kolm
41 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
Real Time Traffic Management Using Ai and Data Science
No ratings yet
Real Time Traffic Management Using Ai and Data Science
35 pages
The Ai Revolution Future Unveiled
No ratings yet
The Ai Revolution Future Unveiled
138 pages
Ali Mrad Thesis Redacted
No ratings yet
Ali Mrad Thesis Redacted
78 pages
Yusuf Kursat Tuncel PHD Thesis
No ratings yet
Yusuf Kursat Tuncel PHD Thesis
130 pages
Unit IV - Learning
No ratings yet
Unit IV - Learning
18 pages
Abhivyakti'24 Compressed
No ratings yet
Abhivyakti'24 Compressed
87 pages
Kelly Malamud Zhou 2023
No ratings yet
Kelly Malamud Zhou 2023
141 pages
Computer Science Engineering PDF
No ratings yet
Computer Science Engineering PDF
85 pages
The Era of Experience Paper
No ratings yet
The Era of Experience Paper
11 pages
MLT Key
No ratings yet
MLT Key
71 pages
Reinforcement Learning For Portfolio Management: Meng Dissertation
No ratings yet
Reinforcement Learning For Portfolio Management: Meng Dissertation
123 pages
SEAMO 2018 Junior D 20181103214344 PDF
50% (2)
SEAMO 2018 Junior D 20181103214344 PDF
8 pages
A Crash Course On Reinforcement Learning - Felix Wagner
No ratings yet
A Crash Course On Reinforcement Learning - Felix Wagner
84 pages
SSRN Id4501707
No ratings yet
SSRN Id4501707
159 pages
Portfolio Construction Using Explainable Reinforce
No ratings yet
Portfolio Construction Using Explainable Reinforce
23 pages
Multimodal Deep Reinforcement Learning For
No ratings yet
Multimodal Deep Reinforcement Learning For
24 pages
Deep Reinforcement Learning For Algorithmic Trading
No ratings yet
Deep Reinforcement Learning For Algorithmic Trading
9 pages
A Deep Reinforcement Learning Framework For Dynamic Portfolio Optimization Evidence From China Stock Market
No ratings yet
A Deep Reinforcement Learning Framework For Dynamic Portfolio Optimization Evidence From China Stock Market
27 pages
Model Free Prediction
No ratings yet
Model Free Prediction
38 pages
Algorithmic Trading Using Reinforcement Learning Augmented With Hidden Markov Model
No ratings yet
Algorithmic Trading Using Reinforcement Learning Augmented With Hidden Markov Model
10 pages
Final
No ratings yet
Final
35 pages
Final
No ratings yet
Final
33 pages
From Machine Learning To Robotics - Challenges and Opportunities For Embodied Intelligence
No ratings yet
From Machine Learning To Robotics - Challenges and Opportunities For Embodied Intelligence
39 pages
Haghighat Et Al. - 2020 - Applications of Deep Learning in Intelligent Transportation Systems-Annotated
No ratings yet
Haghighat Et Al. - 2020 - Applications of Deep Learning in Intelligent Transportation Systems-Annotated
31 pages
RL 1
No ratings yet
RL 1
30 pages
Mid Term Report SoS
No ratings yet
Mid Term Report SoS
18 pages
Lecture 5 - ModelFreePrediction
No ratings yet
Lecture 5 - ModelFreePrediction
79 pages
Reinforcement Learning Meets Network Intrusion Detection A Transferable and Adaptable Framework For Anomaly Behavior Identification
No ratings yet
Reinforcement Learning Meets Network Intrusion Detection A Transferable and Adaptable Framework For Anomaly Behavior Identification
16 pages
L. Li 2023
No ratings yet
L. Li 2023
13 pages
Unit Vi
No ratings yet
Unit Vi
17 pages
The Black-Litterman Model For Portfolio Optimization
No ratings yet
The Black-Litterman Model For Portfolio Optimization
76 pages
RL Unit - Ii
No ratings yet
RL Unit - Ii
20 pages
17CH10019 BTP Ii Report
No ratings yet
17CH10019 BTP Ii Report
22 pages
Deep RL - Content Beyond Syllabus
No ratings yet
Deep RL - Content Beyond Syllabus
16 pages
Reinforcement Learning Cheatsheet
No ratings yet
Reinforcement Learning Cheatsheet
16 pages
Question Bank
No ratings yet
Question Bank
2 pages
A Survey of Forex and Stock Price Prediction Using Deep Learning
No ratings yet
A Survey of Forex and Stock Price Prediction Using Deep Learning
30 pages
Model-Free Reinforcement Learning For Asset Allocation: Practicum Final Report
No ratings yet
Model-Free Reinforcement Learning For Asset Allocation: Practicum Final Report
69 pages
Reinforcement Learning Note
No ratings yet
Reinforcement Learning Note
16 pages
Cs229-Notes12 Reinforcement in Control
No ratings yet
Cs229-Notes12 Reinforcement in Control
17 pages
Deep Reinforcement Learning For Automated Stock Trading - An Ensemble Strategy
No ratings yet
Deep Reinforcement Learning For Automated Stock Trading - An Ensemble Strategy
9 pages
Lecture 3 - MDPs and Dynamic Programming
No ratings yet
Lecture 3 - MDPs and Dynamic Programming
66 pages
1999forecasting Series-Based Stock Price Data Using
No ratings yet
1999forecasting Series-Based Stock Price Data Using
6 pages
Deep Learning Based
No ratings yet
Deep Learning Based
16 pages
Reinforcement Learning Model Based Planning Dynamic Programming
No ratings yet
Reinforcement Learning Model Based Planning Dynamic Programming
17 pages
Discuss About Temporal Difference in Reinforcement Learning?
No ratings yet
Discuss About Temporal Difference in Reinforcement Learning?
9 pages
Icbda49040 2020 9101333
No ratings yet
Icbda49040 2020 9101333
8 pages
Automatic Generation of Socratic Subquestions For Teaching Math Word Problems
No ratings yet
Automatic Generation of Socratic Subquestions For Teaching Math Word Problems
15 pages
ML Lab Manual Arpan
No ratings yet
ML Lab Manual Arpan
48 pages
5.4-Reinforcement Learning-Part1-Introduction
No ratings yet
5.4-Reinforcement Learning-Part1-Introduction
15 pages
Stock Trading Strategy Developing Based On Reinforcement Learning
No ratings yet
Stock Trading Strategy Developing Based On Reinforcement Learning
9 pages
Hyperparameter Tuning For Deep Reinforcement Learning Applications
No ratings yet
Hyperparameter Tuning For Deep Reinforcement Learning Applications
12 pages
Policy Bifurcation in Safe Reinforcement Learning
No ratings yet
Policy Bifurcation in Safe Reinforcement Learning
12 pages
Unit 5 Notes
No ratings yet
Unit 5 Notes
20 pages
Research On Portfolio Optimization Based On Deep Reinforcement Learning
No ratings yet
Research On Portfolio Optimization Based On Deep Reinforcement Learning
5 pages
RL Ese
No ratings yet
RL Ese
7 pages
3D Obstacle Avoidance For UAV Based On RL and RealSense
No ratings yet
3D Obstacle Avoidance For UAV Based On RL and RealSense
6 pages
Lms Consensus
No ratings yet
Lms Consensus
14 pages
JinElSaawy PortfolioManagementusingReinforcementLearning Report
No ratings yet
JinElSaawy PortfolioManagementusingReinforcementLearning Report
6 pages
AI Unit 4 QA
No ratings yet
AI Unit 4 QA
22 pages
Fairness in Reinforcement Learning
No ratings yet
Fairness in Reinforcement Learning
10 pages
Deep Reinforcement Learning Approach For Trading Automation in The Stock Market
No ratings yet
Deep Reinforcement Learning Approach For Trading Automation in The Stock Market
10 pages
Testing Stock Market Efficiency Using Historical Trading Data and Machine Learning
No ratings yet
Testing Stock Market Efficiency Using Historical Trading Data and Machine Learning
40 pages
Stock Price Prediction Using Reinforcement Learning
No ratings yet
Stock Price Prediction Using Reinforcement Learning
6 pages
Neurips 2018
No ratings yet
Neurips 2018
7 pages
Predictable Forward Performance Processes: The Binomial Case
No ratings yet
Predictable Forward Performance Processes: The Binomial Case
23 pages
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
RL Complete Unit-5
No ratings yet
RL Complete Unit-5
30 pages
Equity Machine Factor Models
No ratings yet
Equity Machine Factor Models
12 pages
Learning To Trade Using Q-Learning
No ratings yet
Learning To Trade Using Q-Learning
18 pages
Stateless Reinforcement Learning
No ratings yet
Stateless Reinforcement Learning
5 pages
Random Document, I Just Want My Document
No ratings yet
Random Document, I Just Want My Document
5 pages
A Tutorial For Reinforcement Learning
No ratings yet
A Tutorial For Reinforcement Learning
17 pages
A Tutorial For Reinforcement Learning
No ratings yet
A Tutorial For Reinforcement Learning
14 pages
Nueral Network Paper
No ratings yet
Nueral Network Paper
17 pages
ML Mod1-CS467 Machine Learning - Ktustudents - in
No ratings yet
ML Mod1-CS467 Machine Learning - Ktustudents - in
16 pages
Reinforcement Learning in A Nutshell
No ratings yet
Reinforcement Learning in A Nutshell
12 pages
325 Notes
No ratings yet
325 Notes
23 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
7 pages
Dota 2 With Large Scale Deep Reinforcement Learning
No ratings yet
Dota 2 With Large Scale Deep Reinforcement Learning
2 pages
Crop Yield Prediction Using Deep Reinforcement Learning Model For Sustainable Agrarian Applications
No ratings yet
Crop Yield Prediction Using Deep Reinforcement Learning Model For Sustainable Agrarian Applications
17 pages
An Application of Inverse Reinforcement Learning To Medical Records of Diabetes Treatment
No ratings yet
An Application of Inverse Reinforcement Learning To Medical Records of Diabetes Treatment
8 pages
Fundamentals of Reinforcement Learning Learning Objectives
No ratings yet
Fundamentals of Reinforcement Learning Learning Objectives
3 pages
CS2351 - Artificial Intelligence-2 Marks
100% (1)
CS2351 - Artificial Intelligence-2 Marks
16 pages
A Brief Introduction To Reinforcement Learning
No ratings yet
A Brief Introduction To Reinforcement Learning
4 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
15 pages
Grade 5 Seamo: Choose Correct Answer(s) From The Given Choices
No ratings yet
Grade 5 Seamo: Choose Correct Answer(s) From The Given Choices
3 pages
Grade 4 Seamo: Choose Correct Answer(s) From The Given Choices
No ratings yet
Grade 4 Seamo: Choose Correct Answer(s) From The Given Choices
3 pages

Dynamic Asset Allocation

Uploaded by

Dynamic Asset Allocation

Uploaded by

Final report: Dynamic Asset Allocation

Using Reinforcement Learning

1 Introduction and objective

• Rewards: Rt defined as the instant return: Rt = at ∗ rS (t + 1) + (1 − at ) ∗ rB (t + 1).

2.3 Second model

−1 rS (t) ≤ q33%

• Actions: at = {0, 0.2, 0.4, 0.6, 0.8, 1}

• Rewards: same as in first model

• IsEnd(st ): same as in first model

• Discount factor: same as in first model

4.2 Second model implementation

5 Results and discussions

Annualized returns Annualized volatility Sharpe ratio

You might also like