0% found this document useful (0 votes)
11 views6 pages

Homework - 07 - 223 - Spring 2024

Uploaded by

Yasin sonmez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views6 pages

Homework - 07 - 223 - Spring 2024

Uploaded by

Yasin sonmez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

EE 223 Spring 2024

Homework 7
Due by 11 p.m. on Thursday, 11 April 2024.
The homework should be submitted as a scanned pdf file to ananth at berkeley
dot edu
Please retain a copy of your submitted solution for self-grading.

1. This was the last problem on Homework 6, postponed to this homework set.
Consider a controlled Markov chain with state space the set of nonnegative
integers X = {0, 1, 2, . . .} and action space U = {0, 1}. When action u = 1 is
taken the state moves from the current state i to i + 1, for all i ≥ 0, and the
cost incurred is 1. When action u = 0 is taken the state stays at the current
1
state i, for all i ≥ 0, and the cost 1+i is incurred.

(a) Consider the problem of choosing a control strategy to minimize the


long term average cost. Show that the optimal long term average cost
is 0.
(b) Show that for every discount factor 0 < β < 1, if i is large enough then
the optimal action to take in state i for the purpose of minimizing the
overall β-discounted cost is the action u = 0.
(c) Show that for every state i, if the discount factor 0 < β < 1 is suffi-
ciently close to 1, then the optimal action to take in state i for the pur-
pose of minimizing the overall β-discounted cost is the action u = 1.
(d) Conclude that there is no Blackwell optimal strategy in this problem.

2. Home, office, umbrella


A person alternates between her home and office. She has only one um-
brella, which is either with her in the location she starts from or at the
destination. Whenever she moves from her initial location to the destina-
tion it rains with probability p, independently from move to move. She can
see whether it is raining or not before she makes her move. Assume that
0 < p < 1.
If the umbrella is at her initial location before she moves, then, when it
rains, she takes the umbrella with her. If the umbrella is not at her initial
location, then, when it rains, she incurs a cost of W while making the move,
because she is walking in the rain without an umbrella.

1
If it does not rain when she moves and the umbrella is at her initial location,
she has the option of taking it with her, which incurs a cost of V , because
of the inconvenience of carrying an umbrella with her even though it is not
raining.
Find the optimal strategy for whether she should take the umbrella when it
does not rain (if it happens to be at her initial location) so as to minimize
her long term average cost.
Hint: First argue that the problem can be modeled by a 4-state controlled
Markov chain with state space X = {(1, r), (1, n), (0, r), (0, n)} and action
space U = {0, 1}. Here the state (1, r) means that the umbrella is at her
current location and it is raining; (1, n) means that the umbrella is at her
current location and it is not raining; (0, r) means that the umbrella is not at
her current location and it is raining; and (0, n) means that the umbrella is
not at her current location and it is not raining. The action u = 1 corresponds
to the decision to take the umbrella if the umbrella is at the current location,
and u = 0 corresponds to the decision to not take the umbrella even though
the umbrella is at the current location. Then observe that the control action
that is taken in states (0, r) and (0, n) is irrelevant. Also observe that even
though it looks as if, for any stationary Markov policy, one has to compute
a stationary distribution on a set of size 4, this stationary distribution can
be determined in terms of one parameter in [0, 1] (and the probability of
raining, i.e. p).

3. Consider the average cost optimal control problem for the following con-
trolled Markov chain model with countable state space and finite action
space. The state space is the set of positive integers, {1, 2, . . .}. There are
two possible actions u1 and u2 . The transition probabilities under action u1
are given by
Pi,i+1 (u1 ) = 1 , i ≥ 1 ,
and under action u2 are given by

Pi,i (u2 ) = 1 , i ≥ 1 .

The one-step costs under action u1 are given by

c(i, u1 ) = 1 , i ≥ 1 ,

2
and under action u2 they are given by
1
c(i, u2 ) = , i≥1.
i
(a) Argue that, starting from any initial probability distribution, the opti-
mal long term average cost is 0.
(b) Consider the Bellman equation for the average cost control problem,
which in general reads

J ∗ + h(i) = min{c(i, u) + ∑ Pi,j (u)h(j)} .


u
j

Write down the Bellman equation for this problem. Can you find a
solution (J ∗ , (h(i), i ≥ 1)) for the Bellman equation?
(c) Can you find a solution (J ∗ , (h(i), i ≥ 1)) for the Bellman equation
for which h is a bounded function (i.e. there is finite constant K < ∞
such that ∣h(i)∣ ≤ K for all i ≥ 1)?
(d) Show that, starting from any initial distribution, the following nonsta-
tionary control strategy is optimal for the long term average cost:
If we are in state i for the first time, we use the control action u2 for i
successive times, and then use the control action u1 once.
(This will move us to state i+1 for the first time, after which we repeat
the above prescription, and so on.)
(e) Show that there is no stationary optimal Markovian control strategy
for this control problem.
Note: Our terminology in this course is that a Markovian strategy is
given by a deterministic function from the state space to the space of
actions. Further, the strategy is said to be optimal if it is optimal from
every initial distribution.
(f) Find a stationary randomized Markovian control strategy for this con-
trol problem which is optimal for the long terma average cost, i.e.
achieves the long term average optimal cost 0.
Note: A randomized Markovian control strategy is a function from the
state space to probability distributions on the set of actions.

4. Long run average cost MC with identification

3
We wish to control a finite state Markov chain to minimize the long run
average cost. We can observe the state of the Markov chain and base our
control action at each time k ≥ 0 on the entire state sequence up to and
including the state at time k. However, we are not sure what the transition
probability matrix is.
More precisely, let X ∶= {1, 2} denote the state space and let U ∶= {a, b} de-
note the set of control actions. Let Θ ∶= {θ1 , θ2 }. The underlying controlled
transition probability matrices are modeled as being P (u, θ) ∶= [pij (u, θ)],
where i, j ∈ X , u ∈ U, and θ is either θ1 or θ2 , but we are not sure which.
We adopt a Bayesian viewpoint with our prior probability being that the two
possibilites for θ are equiprobable.
Assume that the cost we incur when in state i and taking action u does not
depend on the underlying θ and, for concreteness, is given by
c(1, a) = 1, c(2, a) = 5, c(1, b) = 0, c(2, b) = 6.
Also, for concreteness, assume that
0.5 0.5 0.9 0.1
P (a, θ1 ) = [ ], P (a, θ2 ) = [ ],
1 0 1 0
0.8 0.2 0.2 0.8
P (b, θ1 ) = [ ], P (b, θ2 ) = [ ].
1 0 1 0

The long term average cost to be minimized is, as usual,


1 N −1 g
lim sup ∑ E [c(Xk , Uk )],
N N k=0
where the minimization is over policies g and the expectation is also over
our prior distribution on the underlying parameter θ.
Explain in detail how you would solve this problem. Please write down
your answer for the specific problem at hand (i.e. with the given numerical
values). You do not need to actually find the optimal long term average
cost - just set up the specific equations (for the given numerical values) that
would need to be solved in order to find this cost.
5. Let X ∶= {1, 2} and U ∶= {a, b, c}. Consider the family of controlled transi-
tion probability matrices (P (u) ∶= [pij (u)], i, j ∈ X , u ∈ U) given by
1 1 1 1
0 1 2 2 2 2
P (a) = [ ] , P (b) = [ 1 1 ] , P (c) = [ ].
1 0 2 2 1 0

4
We are also given a cost function c ∶ X × U → R, where c(1, u) = 1 and
c(2, u) = 0 for all u ∈ U.

(a) Determine the corresponding polytope of stationary occupation mea-


sures (recall that this is a polytope of probability distributions on X ×U,
and its extreme points are precisely the stationary occupation measures
corresponding to deterministic Markov strategies).
(b) For the problem of minimizing the long term average cost with the
given cost function and controlled transition probability matrices, find
the optimal Markov control strategy using the ergodic control approach,
namely, by solving the linear program defined by cost function on the
polytope of stationary occupation measures. (This optimal strategy
will turn out to be uniquely defined in this example.)
(c) Solve the same long term average cost problem using the correspond-
ing Bellman equation.

6. Signalling in distributed control


Consider the following finite horizon control problem, with horizon N = 2.
The state space is X = {1, 2}, action space is U = {a, b}, and observation
space is Y = {1, 2, ∗}.
The initial distribution is given by P (X0 = 1) = P (X0 = 2) = 12 .
At time 0 the controlled transition probabilities are given by

P (X1 = 1∣X0 = 1, U0 = a) = P (X1 = 2∣X0 = 2, U0 = a) = 1,

and

P (X1 = 1∣X0 = 1, U0 = b) = P (X1 = 1∣X0 = 2, U0 = b) = 1.

At time 1 the controlled transition probabilities are given by

P (X2 = 1∣X1 = 1, U1 = a) = P (X2 = 2∣X1 = 2, U1 = a) = 1,

and

P (X2 = 2∣X1 = 1, U1 = b) = P (X2 = 1∣X1 = 2, U1 = b) = 1.

5
The observation at time 0 is given by Y0 = X0 , and the observation at time
1 is given by Y1 = ∗.
There is a cost L > 0 for using action b at time 0, zero cost for using the
control action a at time 0, and zero cost, whatever the control action, at time
1.
The terminal cost of ending up with X2 = 1 is K > 0, and the terminal cost
of ending up with X2 = 2 is 0.

(a) Find the optimal policy to minimize the overall expected cost (from the
given initial condition) over all policies of the type U0 = g0 (Y0 ), U1 =
g1 (Y0 , Y1 ).
(b) Find the optimal policy to minimize the overall expected cost (from the
given initial condition) over all policies of the type U0 = g0 (Y0 ), U1 =
g1 (Y1 ). This can be considered a distributed control problem, since
the controller at time 1 does not have access to the observation at time
0.
(c) Is there a signalling aspect to the optimal control in the second case
(i.e. the case of distributed control). If so, explain what it is, in your
own words.

You might also like