0% found this document useful (0 votes)

37 views5 pages

Homework - 06 - 223 - Spring 2024

Uploaded by

Yasin sonmez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views5 pages

Homework - 06 - 223 - Spring 2024

Uploaded by

Yasin sonmez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

EE 223 Spring 2024

Homework 6
Due by 11 p.m. on Monday, 18 March 2024.
The homework should be submitted as a scanned pdf file to ananth at berkeley
dot edu
Please retain a copy of your submitted solution for self-grading.

1. This was the last problem on Homework 5, postponed to this homework set.

Let X ∶= {1, . . . , d} be a finite state space, U a finite set of control actions,

and ([pij (u)] , u ∈ U) a family of controlled transition probability matrices
on X .
Let 0 < β < 1 be a given discount factor. We consider the discounted dy-
namic programming problem of minimizing the overall expected β-discounted
cost starting from each initial state, where the one-step cost incurred when
being in state i and using control action u is c(i, u), for some given collec-
tion of real numbers (c(i, u), i ∈ X , u ∈ U).
Let V ∗ ∶ X → R be the corresponding optimal β-discounted costs, which
we can think of as a vector of length d, indexed by the initial state. Thus
V ∗ is the fixed point of the Bellman equation associated to the β-discounted
control problem.
Consider the following variant of the value iteration algorithm. Given a
function V ∶ X → R, define the function S(V ) ∶ X → R via

c(i, u) + β ∑j≠i pij (u)V (j)

S(V )(i) ∶= min , for all i ∈ X .
u 1 − βpii (u)

We then consider the sequence of iterates (S k (V ), k ≥ 0), where S 0 (V ) ∶=

V and S k (V ) ∶= S(S k−1 (V )) for k ≥ 1.

(a) Show that

lim ∥S k (V ) − V ∗ ∥∞ = 0,
k→∞

for all V ∶ X → R. Here ∥ ⋅ ∥∞ denotes the L∞ norm on real-valued

functions on X .

1
(b) Find the best ρ > 0 that you can such that we have

∥S k (V ) − V ∗ ∥∞ ≤ ρk ∥V − V ∗ ∥∞ for all V ∶ X → R and all k ≥ 0.

Here ρ will depend on β.

2. Let 0 < β < 1 be a discount factor.

Let X = {0, 1, 2}, U = {a, b} and Y = {y0 , y1 } be the state space, the action
space, and the space of observations respectively.
Consider the controlled transition probability matrices
⎡ 1 1
0 ⎤⎥
⎢
⎢ 2 2
1 ⎥
P (a) = [Pij (a)] = ⎢ 0 1
2 ⎥ ,
⎢ 2
1 ⎥
⎢ 1
0 ⎥
⎣ 2 2 ⎦
⎡ 0 ⎤⎥
⎢ 0 1
⎢ ⎥
P (b) = [Pij (b)] = ⎢ 0 0 1 ⎥,
⎢ ⎥
⎢ 1 0 0 ⎥⎦
⎣
where the rows and columns of each matrix are enumerated by states in the
order 0, 1, 2.
Let c(i, u), for i ∈ X and u ∈ U denote the cost of taking action u when in
state i.
The observation at each time is a noisy function of the current state, given
by
1 1 1 3
p(y0 ∣0) = 1, p(y1 ∣0) = 0, p(y0 ∣1) = , p(y1 ∣1) = , p(y0 ∣2) = , p(y1 ∣2) = .
2 2 4 4
To be completely precise, the framework is that of our usual control prob-
lem. Namely, (Xk , k ≥ 0) denotes the state process, (Uk , k ≥ 0) the control
process, and (Yk , k ≥ 0) the observation process, with the evolution given
by
P (Xk+1 = j∣Xk = i, Uk = u, X0k−1 , U0k−1 , Y0k ) = Pij (u).
and the observation given by

P (Yk = y∣Xk = i, X0k−1 , U0k−1 , Y0k−1 ) = p(y∣i).

Also, we have written X0k−1 for the sequence (X0 , . . . , Xk−1 ), interpreted to
be the empty sequence when k = 0, and similarly for U0k−1 and Y0k−1 etc.

2
The problem we wish to solve is the partially observed discounted control
problem
∞
Minimizeg E g [ ∑ β k c(Xk , Uk )],
k=0

where the minimization is over all strategies g = (gk , k ≥ 0), where Uk =

gk (Y0 , . . . , Yk ), and, as usual, E g denotes that we are taking expectations
when the strategy g is in effect.
Write down the Bellman equation that allows you to solve this problem.
Your answer should be sufficiently explicit to make it clear how one could
incorporate the specific form of the controlled transition probability matri-
ces and the probability of the observation at each time given the current
state. The discount factor β and the per-step costs c(i, u) can be treated as
unspecified variables.
Your answer should also explain briefly the intuitive reasoning behind the
Bellman equation you wrote down. There is no need to give a formal proof.

3. Consider the controlled Markov chain model with state space X ∶= {1, 2, 3},
action space U ∶= {a, b}, and transition probability matrices
⎡ 0 1 1 ⎤
⎢ 2 ⎥
⎢ 2
⎥
Pij (a) = ⎢ 1 0 0 ⎥ ,
⎢ 1 1 ⎥
⎢ ⎥
⎣ 2 2 0 ⎦
⎡ 0 1 1 ⎤
⎢ 2 ⎥
⎢ 2
⎥
Pij (b) = ⎢ 0 0 1 ⎥ .
⎢ 1 1 ⎥
⎢ ⎥
⎣ 2 2 0 ⎦
Note that the transition probabilities from states 1 and 3 do not depend on
the control choice. Suppose the one-step costs are given by:

c(1, a) = c(1, b) = 10 , (1)

c(2, a) = c(2, b) = 0 ,
c(3, a) = c(3, b) = 10 .

Let 0 < β < 1 be the discount factor. In the value-iteration algorithm for
finding the optimal control strategy, we start with an initial function V (0)

3
on the state space and form the sequence of iterates (V (n) , n ≥ 0) by letting
V (n+1) = T V (n) , where

T V (i) ∶= min{c(i, u) + β ∑ Pij (u)V (j)} .

u
j

Let µ(n) denote a minimizer at the n-th step of value iteration, i.e. for each
i ∈ X , µ(n) (i) satisfies

V (n+1) (i) = c(i, µ(n) (i)) + β ∑ Pij (µ(n) (i))V (n) (j) .
j

We proved in class that during value iteration from an arbitrary initial func-
tion for a finite state space finite action space discounted cost optimal con-
trol problem there is a finite N such that for all n ≥ N we have that µ(n) is
an optimal control strategy for the problem. In this example, show that if
V (0) is such that V (0) (1) ≠ V (0) (3) then the sequence (µ(n) , n ≥ 0) will not
converge. Thus, even though value iteration eliminates all non-optimal sta-
tionary Markov strategies in finitely many steps the sequence of stationary
Markov control strategies it proposes need not converge in general.

4. Let µ1 and µ2 define stationary Markov policies in a finite state finite control
space discounted dynamic programming problem with one step costs c(i, u)
and state transition probabilities pij (u). Thus µ1 and µ2 are functions from
the state space X to the set of controls U. We denote the discount factor by
0 < β < 1.

(a) Let µ3 denote a stationary Markov policy that, when in state i, chooses
the action u to minimize

c(i, u) + β ∑ pij (u) min{W∞

µ1
(j), W∞
µ2
(j)} ,
j

µ
where W∞ denotes the optimal overall discounted cost when the sta-
tionary Markov control strategy µ is in effect. Show that
µ3
W∞ ≤ min{W∞
µ1 µ2
, W∞ },

(the inequality is meant to hold coordinatewise, i.e. state by state, as

usual).

4
(b) Let µ4 be defined by

µ1 (i) if W∞µ1
(i) ≤ W∞µ2
(i)
µ4 (i) = { .
µ2 (i) if W∞µ2
(i) < W∞µ1
(i) ,

Show that
µ4
W∞ ≤ min{W∞
µ1 µ2
, W∞ }.

5. Consider a controlled Markov chain with state space the set of nonnegative
integers X = {0, 1, 2, . . .} and action space U = {0, 1}. When action u = 1 is
taken the state moves from the current state i to i + 1, for all i ≥ 0, and the
cost incurred is 1. When action u = 0 is taken the state stays at the current
state i, for all i ≥ 0, and the cost 1+i
1
is incurred.

(a) Consider the problem of choosing a control strategy to minimize the

long term average cost. Show that the optimal long term average cost
is 0.
(b) Show that for every discount factor 0 < β < 1, if i is large enough then
the optimal action to take in state i for the purpose of minimizing the
overall β-discounted cost is the action u = 0.
(c) Show that for every state i, if the discount factor 0 < β < 1 is suffi-
ciently close to 1, then the optimal action to take in state i for the pur-
pose of minimizing the overall β-discounted cost is the action u = 1.
(d) Conclude that there is no Blackwell optimal strategy in this problem.

Big Book of Kaiju 04 - Kaiju of The Sky PDF
100% (2)
Big Book of Kaiju 04 - Kaiju of The Sky PDF
31 pages
Q3 Science 5 Module 3
100% (2)
Q3 Science 5 Module 3
22 pages
Introduction To Optimal Control Theory and Hamilton-Jacobi Equations
100% (1)
Introduction To Optimal Control Theory and Hamilton-Jacobi Equations
55 pages
Intro
0% (1)
Intro
24 pages
Limiting Average Cost Control Problems in A Class of Discrete-Time Stochastic Systems
No ratings yet
Limiting Average Cost Control Problems in A Class of Discrete-Time Stochastic Systems
13 pages
Infinite-Horizon Dynamic Programming: Tianxiao Zheng Saif
No ratings yet
Infinite-Horizon Dynamic Programming: Tianxiao Zheng Saif
10 pages
Aitel S3 Phy Eot-2 2024
No ratings yet
Aitel S3 Phy Eot-2 2024
4 pages
Dynamic Programming and Linear Quadratic (LQ) Control (Discrete-Time and Continuous Time Cases)
No ratings yet
Dynamic Programming and Linear Quadratic (LQ) Control (Discrete-Time and Continuous Time Cases)
53 pages
Dynamic Equilibrium Models III: Infinite Periods
No ratings yet
Dynamic Equilibrium Models III: Infinite Periods
15 pages
A Child's Guide To Dynamic Programming
No ratings yet
A Child's Guide To Dynamic Programming
20 pages
Control Course
No ratings yet
Control Course
126 pages
(Evans L.C.) An Introduction To Mathematical Optim
No ratings yet
(Evans L.C.) An Introduction To Mathematical Optim
125 pages
EE363 Review Session 1: LQR, Controllability and Observability
No ratings yet
EE363 Review Session 1: LQR, Controllability and Observability
6 pages
Control Course
No ratings yet
Control Course
126 pages
SLchapt 3
No ratings yet
SLchapt 3
10 pages
ADA2604 Udh
No ratings yet
ADA2604 Udh
89 pages
Electronics TXT Book - I PUC
No ratings yet
Electronics TXT Book - I PUC
356 pages
Optimal Control Theory
No ratings yet
Optimal Control Theory
28 pages
Model Based Output Difference Feedback Optimal Control
No ratings yet
Model Based Output Difference Feedback Optimal Control
6 pages
cs229 Notes13
No ratings yet
cs229 Notes13
15 pages
2 Dynamic
No ratings yet
2 Dynamic
50 pages
15 - Optimal Policies For Passive Learning Controllers
No ratings yet
15 - Optimal Policies For Passive Learning Controllers
7 pages
3 - Chapter 3 Optimal State Values and Bellman Optimality Equation
No ratings yet
3 - Chapter 3 Optimal State Values and Bellman Optimality Equation
21 pages
Homework - 07 - 223 - Spring 2024
No ratings yet
Homework - 07 - 223 - Spring 2024
6 pages
Computational Economics: Session 16: Numerical Dynamic Programming
No ratings yet
Computational Economics: Session 16: Numerical Dynamic Programming
17 pages
Optimal Control Theory
No ratings yet
Optimal Control Theory
28 pages
Markov Decision Processes and Exact Solution Methods
No ratings yet
Markov Decision Processes and Exact Solution Methods
34 pages
Solution - 05 - 223 - Spring 2024 - Truncated
No ratings yet
Solution - 05 - 223 - Spring 2024 - Truncated
12 pages
Dynamic Programming
No ratings yet
Dynamic Programming
52 pages
RL Problem Sheet: E0 270: Machine Learning (Spring 2025)
No ratings yet
RL Problem Sheet: E0 270: Machine Learning (Spring 2025)
10 pages
Notas - Dynamic Optimation and Optimal Control
No ratings yet
Notas - Dynamic Optimation and Optimal Control
26 pages
16 - Optimal Control of Unknown Parameter Systems
No ratings yet
16 - Optimal Control of Unknown Parameter Systems
3 pages
DRL Homework 1
No ratings yet
DRL Homework 1
4 pages
Datasheet FCP 500 Fa Data Sheet enUS 2701686027
No ratings yet
Datasheet FCP 500 Fa Data Sheet enUS 2701686027
5 pages
Cs229-Notes12 Reinforcement in Control
No ratings yet
Cs229-Notes12 Reinforcement in Control
17 pages
2008 - Carmon, Shwartz - Markov Decision Processes With Exponentially Representable Discounting
No ratings yet
2008 - Carmon, Shwartz - Markov Decision Processes With Exponentially Representable Discounting
10 pages
Practice Problem Set 3 IE - 708 - MDP - July24
No ratings yet
Practice Problem Set 3 IE - 708 - MDP - July24
3 pages
EC106 DeterministicMOdels WS21 June13
No ratings yet
EC106 DeterministicMOdels WS21 June13
19 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Optimal Control Under Unknown Intensity With Bayesian Learning
No ratings yet
Optimal Control Under Unknown Intensity With Bayesian Learning
23 pages
Linear Quadratic Dual Control: Anders Rantzer
No ratings yet
Linear Quadratic Dual Control: Anders Rantzer
4 pages
CS 188 Fall 2018 Written HW4 Soln
No ratings yet
CS 188 Fall 2018 Written HW4 Soln
6 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
7 pages
A2 Linear-Quadratic Optimal Control
No ratings yet
A2 Linear-Quadratic Optimal Control
8 pages
Lecture 3 and 4
No ratings yet
Lecture 3 and 4
14 pages
MDP Cheatsheet
No ratings yet
MDP Cheatsheet
3 pages
Cs748 s2021 Quizzes Till q4
No ratings yet
Cs748 s2021 Quizzes Till q4
4 pages
10.3934 dcdss.2024060
No ratings yet
10.3934 dcdss.2024060
20 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
15 pages
PhysRevResearch.5.013122 Physics of Networks
No ratings yet
PhysRevResearch.5.013122 Physics of Networks
9 pages
Template SINTA 4
No ratings yet
Template SINTA 4
6 pages
Namic Programming
No ratings yet
Namic Programming
18 pages
Stochastic Control Princeton
No ratings yet
Stochastic Control Princeton
14 pages
Model Free Difference Feedback Control of Stochastic Systems
No ratings yet
Model Free Difference Feedback Control of Stochastic Systems
6 pages
Narrative Report SPVM 2014
No ratings yet
Narrative Report SPVM 2014
1 page
5.1 Dynamic Programming and The HJB Equation: k+1 K K K K
No ratings yet
5.1 Dynamic Programming and The HJB Equation: k+1 K K K K
30 pages
Assignment
No ratings yet
Assignment
2 pages
Assignment 4: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Assignment 4: Reinforcement Learning Prof. B. Ravindran
4 pages
RL Lecture4
No ratings yet
RL Lecture4
7 pages
As Level Physics Answer Book 1 Pearson Edexcel International
No ratings yet
As Level Physics Answer Book 1 Pearson Edexcel International
32 pages
A New Calculation For Designing Multilayer Planar Spiral Inductors
No ratings yet
A New Calculation For Designing Multilayer Planar Spiral Inductors
4 pages
Woolseylecture 1
No ratings yet
Woolseylecture 1
4 pages
Exp2 - Impact of A Jet
No ratings yet
Exp2 - Impact of A Jet
8 pages
Strength of Materials-I (CE-207)
No ratings yet
Strength of Materials-I (CE-207)
8 pages
Lelm 103
No ratings yet
Lelm 103
11 pages
Unit 5 SHEET METAL (MP-I)
No ratings yet
Unit 5 SHEET METAL (MP-I)
60 pages
vf-8 40 MLD 5 2022
No ratings yet
vf-8 40 MLD 5 2022
5 pages
1 MT-102
No ratings yet
1 MT-102
2 pages
Gmat User Guide
No ratings yet
Gmat User Guide
77 pages
ME5281 MTech Measurement-Friction
No ratings yet
ME5281 MTech Measurement-Friction
13 pages
Witula R Hetmaniok E Slota D Mean-Value Theorems For One-Sided Differentiable Functions
No ratings yet
Witula R Hetmaniok E Slota D Mean-Value Theorems For One-Sided Differentiable Functions
10 pages
STD 10physics Force Solutions
No ratings yet
STD 10physics Force Solutions
38 pages
Rept Sheet Expt 6
No ratings yet
Rept Sheet Expt 6
3 pages
A, The Two Spheres Are Separated. How Will The Spheres Be Charged, If at All?
No ratings yet
A, The Two Spheres Are Separated. How Will The Spheres Be Charged, If at All?
5 pages
Soil PDF
No ratings yet
Soil PDF
21 pages
EP 6, Section 5.11
No ratings yet
EP 6, Section 5.11
1 page
NCTT 2013 295
No ratings yet
NCTT 2013 295
6 pages
TSPS - PS - Float Switch
No ratings yet
TSPS - PS - Float Switch
3 pages
Blast Resistant (GASCO Specification For Structural Design Basis)
No ratings yet
Blast Resistant (GASCO Specification For Structural Design Basis)
1 page
Unit A1 - Formative (SLHL) 2023 (P2)
No ratings yet
Unit A1 - Formative (SLHL) 2023 (P2)
8 pages
Constant-1400 - Catalog (E)
No ratings yet
Constant-1400 - Catalog (E)
2 pages
CMB Technical Data Sheet
No ratings yet
CMB Technical Data Sheet
2 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Algebraic Equations
From Everand
Algebraic Equations
Demetrios P. Kanoussis
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
Transformation of Axes (Geometry) Mathematics Question Bank
From Everand
Transformation of Axes (Geometry) Mathematics Question Bank
Mohmmad Khaja Shareef
3/5 (1)

Homework - 06 - 223 - Spring 2024

Uploaded by

Homework - 06 - 223 - Spring 2024

Uploaded by

EE 223 Spring 2024

Let X ∶= {1, . . . , d} be a finite state space, U a finite set of control actions,

c(i, u) + β ∑j≠i pij (u)V (j)

We then consider the sequence of iterates (S k (V ), k ≥ 0), where S 0 (V ) ∶=

(a) Show that

for all V ∶ X → R. Here ∥ ⋅ ∥∞ denotes the L∞ norm on real-valued

∥S k (V ) − V ∗ ∥∞ ≤ ρk ∥V − V ∗ ∥∞ for all V ∶ X → R and all k ≥ 0.

Here ρ will depend on β.

2. Let 0 < β < 1 be a discount factor.

P (Yk = y∣Xk = i, X0k−1 , U0k−1 , Y0k−1 ) = p(y∣i).

where the minimization is over all strategies g = (gk , k ≥ 0), where Uk =

c(1, a) = c(1, b) = 10 , (1)

T V (i) ∶= min{c(i, u) + β ∑ Pij (u)V (j)} .

c(i, u) + β ∑ pij (u) min{W∞

(the inequality is meant to hold coordinatewise, i.e. state by state, as

(a) Consider the problem of choosing a control strategy to minimize the

You might also like