Homework - 06 - 223 - Spring 2024
Homework - 06 - 223 - Spring 2024
Homework 6
Due by 11 p.m. on Monday, 18 March 2024.
The homework should be submitted as a scanned pdf file to ananth at berkeley
dot edu
Please retain a copy of your submitted solution for self-grading.
1. This was the last problem on Homework 5, postponed to this homework set.
1
(b) Find the best ρ > 0 that you can such that we have
Also, we have written X0k−1 for the sequence (X0 , . . . , Xk−1 ), interpreted to
be the empty sequence when k = 0, and similarly for U0k−1 and Y0k−1 etc.
2
The problem we wish to solve is the partially observed discounted control
problem
∞
Minimizeg E g [ ∑ β k c(Xk , Uk )],
k=0
3. Consider the controlled Markov chain model with state space X ∶= {1, 2, 3},
action space U ∶= {a, b}, and transition probability matrices
⎡ 0 1 1 ⎤
⎢ 2 ⎥
⎢ 2
⎥
Pij (a) = ⎢ 1 0 0 ⎥ ,
⎢ 1 1 ⎥
⎢ ⎥
⎣ 2 2 0 ⎦
⎡ 0 1 1 ⎤
⎢ 2 ⎥
⎢ 2
⎥
Pij (b) = ⎢ 0 0 1 ⎥ .
⎢ 1 1 ⎥
⎢ ⎥
⎣ 2 2 0 ⎦
Note that the transition probabilities from states 1 and 3 do not depend on
the control choice. Suppose the one-step costs are given by:
Let 0 < β < 1 be the discount factor. In the value-iteration algorithm for
finding the optimal control strategy, we start with an initial function V (0)
3
on the state space and form the sequence of iterates (V (n) , n ≥ 0) by letting
V (n+1) = T V (n) , where
Let µ(n) denote a minimizer at the n-th step of value iteration, i.e. for each
i ∈ X , µ(n) (i) satisfies
V (n+1) (i) = c(i, µ(n) (i)) + β ∑ Pij (µ(n) (i))V (n) (j) .
j
We proved in class that during value iteration from an arbitrary initial func-
tion for a finite state space finite action space discounted cost optimal con-
trol problem there is a finite N such that for all n ≥ N we have that µ(n) is
an optimal control strategy for the problem. In this example, show that if
V (0) is such that V (0) (1) ≠ V (0) (3) then the sequence (µ(n) , n ≥ 0) will not
converge. Thus, even though value iteration eliminates all non-optimal sta-
tionary Markov strategies in finitely many steps the sequence of stationary
Markov control strategies it proposes need not converge in general.
4. Let µ1 and µ2 define stationary Markov policies in a finite state finite control
space discounted dynamic programming problem with one step costs c(i, u)
and state transition probabilities pij (u). Thus µ1 and µ2 are functions from
the state space X to the set of controls U. We denote the discount factor by
0 < β < 1.
(a) Let µ3 denote a stationary Markov policy that, when in state i, chooses
the action u to minimize
µ
where W∞ denotes the optimal overall discounted cost when the sta-
tionary Markov control strategy µ is in effect. Show that
µ3
W∞ ≤ min{W∞
µ1 µ2
, W∞ },
4
(b) Let µ4 be defined by
µ1 (i) if W∞µ1
(i) ≤ W∞µ2
(i)
µ4 (i) = { .
µ2 (i) if W∞µ2
(i) < W∞µ1
(i) ,
Show that
µ4
W∞ ≤ min{W∞
µ1 µ2
, W∞ }.
5. Consider a controlled Markov chain with state space the set of nonnegative
integers X = {0, 1, 2, . . .} and action space U = {0, 1}. When action u = 1 is
taken the state moves from the current state i to i + 1, for all i ≥ 0, and the
cost incurred is 1. When action u = 0 is taken the state stays at the current
state i, for all i ≥ 0, and the cost 1+i
1
is incurred.