0% found this document useful (0 votes)

11 views6 pages

Homework - 07 - 223 - Spring 2024

Uploaded by

Yasin sonmez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views6 pages

Homework - 07 - 223 - Spring 2024

Uploaded by

Yasin sonmez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

EE 223 Spring 2024

Homework 7
Due by 11 p.m. on Thursday, 11 April 2024.
The homework should be submitted as a scanned pdf file to ananth at berkeley
dot edu
Please retain a copy of your submitted solution for self-grading.

1. This was the last problem on Homework 6, postponed to this homework set.
Consider a controlled Markov chain with state space the set of nonnegative
integers X = {0, 1, 2, . . .} and action space U = {0, 1}. When action u = 1 is
taken the state moves from the current state i to i + 1, for all i ≥ 0, and the
cost incurred is 1. When action u = 0 is taken the state stays at the current
1
state i, for all i ≥ 0, and the cost 1+i is incurred.

(a) Consider the problem of choosing a control strategy to minimize the

long term average cost. Show that the optimal long term average cost
is 0.
(b) Show that for every discount factor 0 < β < 1, if i is large enough then
the optimal action to take in state i for the purpose of minimizing the
overall β-discounted cost is the action u = 0.
(c) Show that for every state i, if the discount factor 0 < β < 1 is suffi-
ciently close to 1, then the optimal action to take in state i for the pur-
pose of minimizing the overall β-discounted cost is the action u = 1.
(d) Conclude that there is no Blackwell optimal strategy in this problem.

2. Home, office, umbrella

A person alternates between her home and office. She has only one um-
brella, which is either with her in the location she starts from or at the
destination. Whenever she moves from her initial location to the destina-
tion it rains with probability p, independently from move to move. She can
see whether it is raining or not before she makes her move. Assume that
0 < p < 1.
If the umbrella is at her initial location before she moves, then, when it
rains, she takes the umbrella with her. If the umbrella is not at her initial
location, then, when it rains, she incurs a cost of W while making the move,
because she is walking in the rain without an umbrella.

1
If it does not rain when she moves and the umbrella is at her initial location,
she has the option of taking it with her, which incurs a cost of V , because
of the inconvenience of carrying an umbrella with her even though it is not
raining.
Find the optimal strategy for whether she should take the umbrella when it
does not rain (if it happens to be at her initial location) so as to minimize
her long term average cost.
Hint: First argue that the problem can be modeled by a 4-state controlled
Markov chain with state space X = {(1, r), (1, n), (0, r), (0, n)} and action
space U = {0, 1}. Here the state (1, r) means that the umbrella is at her
current location and it is raining; (1, n) means that the umbrella is at her
current location and it is not raining; (0, r) means that the umbrella is not at
her current location and it is raining; and (0, n) means that the umbrella is
not at her current location and it is not raining. The action u = 1 corresponds
to the decision to take the umbrella if the umbrella is at the current location,
and u = 0 corresponds to the decision to not take the umbrella even though
the umbrella is at the current location. Then observe that the control action
that is taken in states (0, r) and (0, n) is irrelevant. Also observe that even
though it looks as if, for any stationary Markov policy, one has to compute
a stationary distribution on a set of size 4, this stationary distribution can
be determined in terms of one parameter in [0, 1] (and the probability of
raining, i.e. p).

3. Consider the average cost optimal control problem for the following con-
trolled Markov chain model with countable state space and finite action
space. The state space is the set of positive integers, {1, 2, . . .}. There are
two possible actions u1 and u2 . The transition probabilities under action u1
are given by
Pi,i+1 (u1 ) = 1 , i ≥ 1 ,
and under action u2 are given by

Pi,i (u2 ) = 1 , i ≥ 1 .

The one-step costs under action u1 are given by

c(i, u1 ) = 1 , i ≥ 1 ,

2
and under action u2 they are given by
1
c(i, u2 ) = , i≥1.
i
(a) Argue that, starting from any initial probability distribution, the opti-
mal long term average cost is 0.
(b) Consider the Bellman equation for the average cost control problem,
which in general reads

J ∗ + h(i) = min{c(i, u) + ∑ Pi,j (u)h(j)} .

u
j

Write down the Bellman equation for this problem. Can you find a
solution (J ∗ , (h(i), i ≥ 1)) for the Bellman equation?
(c) Can you find a solution (J ∗ , (h(i), i ≥ 1)) for the Bellman equation
for which h is a bounded function (i.e. there is finite constant K < ∞
such that ∣h(i)∣ ≤ K for all i ≥ 1)?
(d) Show that, starting from any initial distribution, the following nonsta-
tionary control strategy is optimal for the long term average cost:
If we are in state i for the first time, we use the control action u2 for i
successive times, and then use the control action u1 once.
(This will move us to state i+1 for the first time, after which we repeat
the above prescription, and so on.)
(e) Show that there is no stationary optimal Markovian control strategy
for this control problem.
Note: Our terminology in this course is that a Markovian strategy is
given by a deterministic function from the state space to the space of
actions. Further, the strategy is said to be optimal if it is optimal from
every initial distribution.
(f) Find a stationary randomized Markovian control strategy for this con-
trol problem which is optimal for the long terma average cost, i.e.
achieves the long term average optimal cost 0.
Note: A randomized Markovian control strategy is a function from the
state space to probability distributions on the set of actions.

4. Long run average cost MC with identification

3
We wish to control a finite state Markov chain to minimize the long run
average cost. We can observe the state of the Markov chain and base our
control action at each time k ≥ 0 on the entire state sequence up to and
including the state at time k. However, we are not sure what the transition
probability matrix is.
More precisely, let X ∶= {1, 2} denote the state space and let U ∶= {a, b} de-
note the set of control actions. Let Θ ∶= {θ1 , θ2 }. The underlying controlled
transition probability matrices are modeled as being P (u, θ) ∶= [pij (u, θ)],
where i, j ∈ X , u ∈ U, and θ is either θ1 or θ2 , but we are not sure which.
We adopt a Bayesian viewpoint with our prior probability being that the two
possibilites for θ are equiprobable.
Assume that the cost we incur when in state i and taking action u does not
depend on the underlying θ and, for concreteness, is given by
c(1, a) = 1, c(2, a) = 5, c(1, b) = 0, c(2, b) = 6.
Also, for concreteness, assume that
0.5 0.5 0.9 0.1
P (a, θ1 ) = [ ], P (a, θ2 ) = [ ],
1 0 1 0
0.8 0.2 0.2 0.8
P (b, θ1 ) = [ ], P (b, θ2 ) = [ ].
1 0 1 0

The long term average cost to be minimized is, as usual,

1 N −1 g
lim sup ∑ E [c(Xk , Uk )],
N N k=0
where the minimization is over policies g and the expectation is also over
our prior distribution on the underlying parameter θ.
Explain in detail how you would solve this problem. Please write down
your answer for the specific problem at hand (i.e. with the given numerical
values). You do not need to actually find the optimal long term average
cost - just set up the specific equations (for the given numerical values) that
would need to be solved in order to find this cost.
5. Let X ∶= {1, 2} and U ∶= {a, b, c}. Consider the family of controlled transi-
tion probability matrices (P (u) ∶= [pij (u)], i, j ∈ X , u ∈ U) given by
1 1 1 1
0 1 2 2 2 2
P (a) = [ ] , P (b) = [ 1 1 ] , P (c) = [ ].
1 0 2 2 1 0

4
We are also given a cost function c ∶ X × U → R, where c(1, u) = 1 and
c(2, u) = 0 for all u ∈ U.

(a) Determine the corresponding polytope of stationary occupation mea-

sures (recall that this is a polytope of probability distributions on X ×U,
and its extreme points are precisely the stationary occupation measures
corresponding to deterministic Markov strategies).
(b) For the problem of minimizing the long term average cost with the
given cost function and controlled transition probability matrices, find
the optimal Markov control strategy using the ergodic control approach,
namely, by solving the linear program defined by cost function on the
polytope of stationary occupation measures. (This optimal strategy
will turn out to be uniquely defined in this example.)
(c) Solve the same long term average cost problem using the correspond-
ing Bellman equation.

6. Signalling in distributed control

Consider the following finite horizon control problem, with horizon N = 2.
The state space is X = {1, 2}, action space is U = {a, b}, and observation
space is Y = {1, 2, ∗}.
The initial distribution is given by P (X0 = 1) = P (X0 = 2) = 12 .
At time 0 the controlled transition probabilities are given by

P (X1 = 1∣X0 = 1, U0 = a) = P (X1 = 2∣X0 = 2, U0 = a) = 1,

and

P (X1 = 1∣X0 = 1, U0 = b) = P (X1 = 1∣X0 = 2, U0 = b) = 1.

At time 1 the controlled transition probabilities are given by

P (X2 = 1∣X1 = 1, U1 = a) = P (X2 = 2∣X1 = 2, U1 = a) = 1,

and

P (X2 = 2∣X1 = 1, U1 = b) = P (X2 = 1∣X1 = 2, U1 = b) = 1.

5
The observation at time 0 is given by Y0 = X0 , and the observation at time
1 is given by Y1 = ∗.
There is a cost L > 0 for using action b at time 0, zero cost for using the
control action a at time 0, and zero cost, whatever the control action, at time
1.
The terminal cost of ending up with X2 = 1 is K > 0, and the terminal cost
of ending up with X2 = 2 is 0.

(a) Find the optimal policy to minimize the overall expected cost (from the
given initial condition) over all policies of the type U0 = g0 (Y0 ), U1 =
g1 (Y0 , Y1 ).
(b) Find the optimal policy to minimize the overall expected cost (from the
given initial condition) over all policies of the type U0 = g0 (Y0 ), U1 =
g1 (Y1 ). This can be considered a distributed control problem, since
the controller at time 1 does not have access to the observation at time
0.
(c) Is there a signalling aspect to the optimal control in the second case
(i.e. the case of distributed control). If so, explain what it is, in your
own words.

Control Course
No ratings yet
Control Course
126 pages
2 Dynamic
No ratings yet
2 Dynamic
50 pages
Control Course
No ratings yet
Control Course
126 pages
Regular Policies Slides
No ratings yet
Regular Policies Slides
28 pages
KCNA The Linux Foundation Exam Practice Questions
No ratings yet
KCNA The Linux Foundation Exam Practice Questions
14 pages
BOOK-Soner-Stochastic Optimal Control in Finance
No ratings yet
BOOK-Soner-Stochastic Optimal Control in Finance
67 pages
3 - Chapter 3 Optimal State Values and Bellman Optimality Equation
No ratings yet
3 - Chapter 3 Optimal State Values and Bellman Optimality Equation
21 pages
15++ Control Óptimo
No ratings yet
15++ Control Óptimo
11 pages
Pioneer X Hm82 S X Hm82d XC Hm82d K X Hm72 X Hm72d
100% (1)
Pioneer X Hm82 S X Hm82d XC Hm82d K X Hm72 X Hm72d
110 pages
(Evans L.C.) An Introduction To Mathematical Optim
No ratings yet
(Evans L.C.) An Introduction To Mathematical Optim
125 pages
Solution - 05 - 223 - Spring 2024 - Truncated
No ratings yet
Solution - 05 - 223 - Spring 2024 - Truncated
12 pages
Homework - 06 - 223 - Spring 2024
No ratings yet
Homework - 06 - 223 - Spring 2024
5 pages
2008 - Carmon, Shwartz - Markov Decision Processes With Exponentially Representable Discounting
No ratings yet
2008 - Carmon, Shwartz - Markov Decision Processes With Exponentially Representable Discounting
10 pages
Value Functions & Bellman Equations
No ratings yet
Value Functions & Bellman Equations
11 pages
Lecture 3 and 4
No ratings yet
Lecture 3 and 4
14 pages
RL Lecture4
No ratings yet
RL Lecture4
7 pages
Markov Decision Processes: Stochastic, Sequential Environments
No ratings yet
Markov Decision Processes: Stochastic, Sequential Environments
20 pages
Dp-Intro Dynamic Programming
No ratings yet
Dp-Intro Dynamic Programming
4 pages
Solutions Assignment-1 Data Members and Functions
No ratings yet
Solutions Assignment-1 Data Members and Functions
24 pages
cs229 Notes13
No ratings yet
cs229 Notes13
15 pages
15 - Optimal Policies For Passive Learning Controllers
No ratings yet
15 - Optimal Policies For Passive Learning Controllers
7 pages
Introducción Piazza
No ratings yet
Introducción Piazza
33 pages
Subtitle
No ratings yet
Subtitle
2 pages
Homework 1: ELEN E6885: Introduction To Reinforcement Learning September 21, 2021
No ratings yet
Homework 1: ELEN E6885: Introduction To Reinforcement Learning September 21, 2021
8 pages
Cs 188 HW Solutions Artificial Intelligence
No ratings yet
Cs 188 HW Solutions Artificial Intelligence
7 pages
Practice Problem Set 3 IE - 708 - MDP - July24
No ratings yet
Practice Problem Set 3 IE - 708 - MDP - July24
3 pages
Limiting Average Cost Control Problems in A Class of Discrete-Time Stochastic Systems
No ratings yet
Limiting Average Cost Control Problems in A Class of Discrete-Time Stochastic Systems
13 pages
Infinite-Horizon Dynamic Programming: Tianxiao Zheng Saif
No ratings yet
Infinite-Horizon Dynamic Programming: Tianxiao Zheng Saif
10 pages
2FA Set Up
No ratings yet
2FA Set Up
17 pages
DP - Bellman - 1741339134 2025-03-07 09 - 19 - 05
No ratings yet
DP - Bellman - 1741339134 2025-03-07 09 - 19 - 05
13 pages
RL and ObC Lecture 2
No ratings yet
RL and ObC Lecture 2
20 pages
Inventarios 2 Modelos
No ratings yet
Inventarios 2 Modelos
6 pages
Dynamic Programming
No ratings yet
Dynamic Programming
52 pages
Optimal Control Theory
No ratings yet
Optimal Control Theory
28 pages
Valli A A Compact Course On Linear Pdes
No ratings yet
Valli A A Compact Course On Linear Pdes
267 pages
Optimal Control Under Unknown Intensity With Bayesian Learning
No ratings yet
Optimal Control Under Unknown Intensity With Bayesian Learning
23 pages
Optimal Control
No ratings yet
Optimal Control
51 pages
Final Report
No ratings yet
Final Report
518 pages
SLchapt 3
No ratings yet
SLchapt 3
10 pages
1 13 Optimal Control Proofs
No ratings yet
1 13 Optimal Control Proofs
9 pages
EtherWAN EX78602-01B User Manual
No ratings yet
EtherWAN EX78602-01B User Manual
249 pages
Approximate Dynamic Programming - II: Algorithms: Warren B. Powell
No ratings yet
Approximate Dynamic Programming - II: Algorithms: Warren B. Powell
22 pages
Samsung Electronics
100% (1)
Samsung Electronics
31 pages
Harmonic 1
No ratings yet
Harmonic 1
95 pages
RL Problem Sheet: E0 270: Machine Learning (Spring 2025)
No ratings yet
RL Problem Sheet: E0 270: Machine Learning (Spring 2025)
10 pages
Notas - Dynamic Optimation and Optimal Control
No ratings yet
Notas - Dynamic Optimation and Optimal Control
26 pages
Pharmacy Business Plan
No ratings yet
Pharmacy Business Plan
40 pages
Markov Decision Processes and Exact Solution Methods
No ratings yet
Markov Decision Processes and Exact Solution Methods
34 pages
Optimal Control Theory
No ratings yet
Optimal Control Theory
28 pages
MDP Cheatsheet
No ratings yet
MDP Cheatsheet
3 pages
Cs748 s2021 Quizzes Till q4
No ratings yet
Cs748 s2021 Quizzes Till q4
4 pages
Paulo Brito Ecomat Discreto
No ratings yet
Paulo Brito Ecomat Discreto
49 pages
Collaborative Learning For Cyberattack Detection in Blockchain Networks
No ratings yet
Collaborative Learning For Cyberattack Detection in Blockchain Networks
12 pages
Solution 3
No ratings yet
Solution 3
4 pages
PhysRevResearch.5.013122 Physics of Networks
No ratings yet
PhysRevResearch.5.013122 Physics of Networks
9 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Alex Watts CV
No ratings yet
Alex Watts CV
2 pages
Livegrade Pro Manual
No ratings yet
Livegrade Pro Manual
122 pages
Namic Programming
No ratings yet
Namic Programming
18 pages
One To One and Onto1
No ratings yet
One To One and Onto1
9 pages
Vail CMMS
No ratings yet
Vail CMMS
24 pages
16 - Optimal Control of Unknown Parameter Systems
No ratings yet
16 - Optimal Control of Unknown Parameter Systems
3 pages
Optimization and Control: Examples Sheet 2: LQG Models
No ratings yet
Optimization and Control: Examples Sheet 2: LQG Models
2 pages
CD 413
No ratings yet
CD 413
9 pages
CAO Assignment 01 02 CSE2003
No ratings yet
CAO Assignment 01 02 CSE2003
2 pages
Blue Stream Tivo Quick Guide Final
No ratings yet
Blue Stream Tivo Quick Guide Final
24 pages
Knight's Tour
No ratings yet
Knight's Tour
8 pages
Tda 6107 Ajf
No ratings yet
Tda 6107 Ajf
16 pages
Exam Prep 4 Solutions: Q1. MDPS: Dice Bonanza
No ratings yet
Exam Prep 4 Solutions: Q1. MDPS: Dice Bonanza
4 pages
Adaptive DP For Discrete Time LQR Optimal Tracking Control Problems With Unknown Dynamics
No ratings yet
Adaptive DP For Discrete Time LQR Optimal Tracking Control Problems With Unknown Dynamics
6 pages
MX-CPG Bim Impplan Rev0
No ratings yet
MX-CPG Bim Impplan Rev0
17 pages
The Customers Will Be Able To Search For The Different Flower Bouquet Shops That Are Available Near To Their Places So That They Will Be Able To Order Online
No ratings yet
The Customers Will Be Able To Search For The Different Flower Bouquet Shops That Are Available Near To Their Places So That They Will Be Able To Order Online
32 pages
Documents - Pub - The Elastix Call Center Protocol Revealed
No ratings yet
Documents - Pub - The Elastix Call Center Protocol Revealed
68 pages
Assignment - 4 - Risk Response, Contingency and Control
No ratings yet
Assignment - 4 - Risk Response, Contingency and Control
4 pages
Simplification
No ratings yet
Simplification
18 pages
Teaching Veterinary Radiology and Diagnostic Ultrasound at A Distance: Using A QTVR Image Database
No ratings yet
Teaching Veterinary Radiology and Diagnostic Ultrasound at A Distance: Using A QTVR Image Database
19 pages
32 Secret Combinations On Your Keyboard
100% (1)
32 Secret Combinations On Your Keyboard
2 pages
Lecture 7: Least-Squares Problem: Convex Optimization
No ratings yet
Lecture 7: Least-Squares Problem: Convex Optimization
7 pages
Gotoxy Statement in Dev C Tutorial PDF
No ratings yet
Gotoxy Statement in Dev C Tutorial PDF
2 pages
Rayhan Rashad Salusra - 13418076 - Poland - BRIDGING
No ratings yet
Rayhan Rashad Salusra - 13418076 - Poland - BRIDGING
2 pages
Woolseylecture 1
No ratings yet
Woolseylecture 1
4 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Harmonic Analysis and the Theory of Probability
From Everand
Harmonic Analysis and the Theory of Probability
Salomon Bochner
No ratings yet
Optimization in Function Spaces
From Everand
Optimization in Function Spaces
Amol Sasane
No ratings yet
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Mathematical Foundations of Information Theory
From Everand
Mathematical Foundations of Information Theory
A. Ya. Khinchin
3.5/5 (9)
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Homework - 07 - 223 - Spring 2024

Uploaded by

Homework - 07 - 223 - Spring 2024

Uploaded by

EE 223 Spring 2024

(a) Consider the problem of choosing a control strategy to minimize the

2. Home, office, umbrella

The one-step costs under action u1 are given by

J ∗ + h(i) = min{c(i, u) + ∑ Pi,j (u)h(j)} .

4. Long run average cost MC with identification

The long term average cost to be minimized is, as usual,

(a) Determine the corresponding polytope of stationary occupation mea-

6. Signalling in distributed control

P (X1 = 1∣X0 = 1, U0 = a) = P (X1 = 2∣X0 = 2, U0 = a) = 1,

P (X1 = 1∣X0 = 1, U0 = b) = P (X1 = 1∣X0 = 2, U0 = b) = 1.

At time 1 the controlled transition probabilities are given by

P (X2 = 1∣X1 = 1, U1 = a) = P (X2 = 2∣X1 = 2, U1 = a) = 1,

P (X2 = 2∣X1 = 1, U1 = b) = P (X2 = 1∣X1 = 2, U1 = b) = 1.

You might also like