0% found this document useful (0 votes)
5 views

Assignment Two

Uploaded by

henrychenace
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Assignment Two

Uploaded by

henrychenace
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Finance 3

Assignment Two
Due Date: October 28, 2024

1 Theoretical Problems
1. Suppose that two assets have jointly normally distributed returns R1
and R2 . R1 has mean µ1 and standard deviation σ1 and R2 has mean
µ2 and standard deviation σ2 , and the correlation between R1 and R2 is
ρ. Assume µ1 > µ2 . Investors observe N i.i.d. observations of (R1 , R2 ),
and estimate µ̂1 and µ̂2 by the sample means, and declare that asset one
has a higher expected return if µ̂1 > µ̂2 , and that µ2 > µ1 otherwise.
(a) Determine the probability that investors will be correct.
(b) For a prescribed confidence level α, determine the number of ob-
servations required to make the probability that investors are cor-
rect equal to α.
(c) Write a computer program to implement the formulas you derived
in the previous two parts (you may want to experiment with this
program to determine how long an observation period is required
for what you consider to be “reasonable” values of the parame-
ters).
2. Consider the modified search for a stochastic policy in reinforcement
learning, with an entropy regularization:
K K
X 1X πk
max πk Q(st , Ak ) − πk log (1)
π
k=1
β k=1 ωk
K
X
0 ≤ πk ≤ 1, πk = 1. (2)
k=1

where β is a regularization parameter controlling the relative impor-


tance of the two terms that enforce, respectively, maximization of the
action-value function and a preference for a previous reference policy ω
with probabilities ωk . When β < ∞, this produces a stochastic rather
than deterministic optimal policy π ∗ (a|s). Find the optimal policy from
the entropy regularized functional.

1
3. Consider the Boltzmann weighted average of a function h(i) defined on
a binary set I = {1, 2}:
X exp(βh(i))
Boltzβ h = h(i) P . (3)
i∈I j∈I exp(βh(j))
(a) Verify that this functional smoothly interpolates between the max
and the mean of h(i) which are obtained in the limits β → ∞ and
β → 0.
(b) An functional B mapping real-valued functions on I to R is called
a non-expansion if mini h(i) ≤ Bh ≤ maxi h(i) and |Bh − Bh′ | ≤
maxi |h(i) − h′ (i)|. By taking β = 1, h(1) = 100, h(2) = 1,
h′ (1) = 1, h′ (2) = 0, show that Boltzβ is not a non-expansion.
4. Consider the infinite horizon finite state Markov decision process prob-
lem presented in class, and suppose that π, π ′ : S → A are two deter-
ministic policies such that:
Qπ (s, π ′ (s)) ≥ V π (s), ∀s ∈ S.

Show that V π (s) ≥ V π (s).
5. Consider the infinite horizon finite state Markov decision process pre-
sented in class, with discount factor γ ∈ (0, 1). Suppose that R(s, a, s′ ) =
R(s, a) and let π : S → A be a deterministic policy. Let M be the
dynamic programming max operator:
!
X
MV (s) = max R(s, a) + γ p(s′ |s, a)V (s′ ) . (4)
a
s′
Let:
c̄ = max(MV π (s) − V π (s)), (5)
s
and define a policy π ′ using:
!
X
π ′ (s) = arg max R(s, a) + γ p(s′ |s, a)V π (s′ ) = arg max Qπ (s, a).
a a
s′
(6)

Let V π be the value of this policy. Show that
′ c̄
V π (s) ≤ V π (s) + (7)
1−γ
for all s ∈ S.

2
2 Computational Problems
1. Using operators that are not non-expansions can lead to a loss of a
solution in a generalized Bellman equation. To illustrate such a phe-
nomenon, consider the MDP problem on the set I = {1, 2} with two
actions a and b and the following specification:

p(1|1, a) = 0.66, p(2|1, a) = 0.34, r(1, a) = 0.122, (8)


p(1|1, b) = 0.99, p(2|1, b) = 0.01, r(1, b) = 0.033. (9)

The second state is absorbing with p(1|2) = 0, p(2|2) = 1. The discount


factor is γ = 0.98. Assume we use the Boltzmann policy:

exp(β Q̂(s, a))


π(a|s) = P . (10)

a′ exp(β Q̂(s, a ))

Create a compute program to demonstrate empirically that the SARSA


algorithm:
h i
Q̂(s, a) ← Q̂(s, a) + α r(s, a) + γ Q̂(s′ , a′ ) − Q̂(s, a) , (11)

where a, a′ are drawn from the Boltzmann policy with β = 16.55 and
α = 0.1, leads to oscillating solutions for Q̂1 (s, a) and Q̂1 (s, a) that do
not achieve stable states with an increased number of iterations.

2. Write a function that takes the following inputs: a mean vector µ,


variance-covariance matrix Σ and number of return scenarios to be
simulated N . Your function should then simulate N i.i.d. return vec-
tors Xi , i = 1, . . . , N from a normal distribution with the given mean
and covariance matrix. Based on this random sample, it should calcu-
late parameter estimates µ̂ and Σ̂ of µ and Σ (you can use whatever
estimation method you prefer). Your function should then plot (or
return to the user for plotting) the efficient frontiers calculated using
both (µ, Σ) and (µ̂, Σ̂) (you may use software packages with built-in
routines for calculating efficient frontiers). Your function should also
report the global minimum variance portfolios calculated using both
(µ, Σ) and (µ̂, Σ̂). You may want to conduct your own experiments
comparing the true, actual, and estimated frontiers, as we did in class.

You might also like