Assignment (2)

The document outlines an assignment on Reinforcement Learning focusing on Markov Decision Processes (MDPs), stochastic policies, and the Bellman operator. It includes tasks such as proving the fixed point of the projected Bellman operator, demonstrating the contraction property of the Bellman operator, and exploring quasi-hyperbolic discounting in value functions. Additionally, it discusses model-free algorithms for estimating value functions and the stability of solutions in Q-learning algorithms.

Uploaded by

indrakumar180100

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views2 pages

Assignment (2)

Uploaded by

indrakumar180100

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Reinforcement Learning (E1 277)

- Assignment 02 -

1. Consider the MDP M ≡ (S, A, P, r, γ) with |S| = S and |A| = A. Suppose µ is a stochastic
policy and Φ ∈ RS×d a feature matrix for some d ≥ 1. Let Pµ be the S × S matrix given by
X
Pµ (s′ |s) = µ(a|s)P (s′ |s, a).
a

This matrix represents the transition matrix of the Markov chain (S, Pµ ) induced by µ.
Suppose this Markov chain is ergodic so that it has a unique stationary distribution, which
we denote by dµ . Let Dµ be the S × S diagonal matrix whose diagonal is dµ , and let

A = Φ⊤ Dµ (I − γPµ )Φ and b = Φ⊤ Dµ rµ ,

: RS → RS be given by ΠJ = Φ(Φ⊤ Dµ Φ)−1 Φ⊤ Dµ J.

P
where rµ (s) = a µ(a|s)r(s, a). Let Π
Show that θ∗ := A−1 b is the fixed point
of the projected Bellman operator, i.e., ΠTµ Φθ∗ = Φθ∗ ,
where Tµ : RS → RS is the Bellman operator satisfying Tµ J = rµ + γPµ J. [05]

2. Let Tµ be the Bellman operator defined in the above question. Show that Tµ is a γ-contraction
with respect to ∥ · ∥Dµ . [05]

3. We have so far discussed infinite-horizon reinforcement learning with exponential discounting.

In practice, one is also interested in quasi-hyperbolic discounting. In this case, we have two
parameter σ, γ ∈ [0, 1), and the value function Jµ ∈ R|S| of a stationary (possibly stochastic)
policy µ is given by
X∞
Jµ (s) = E dn g(sn , an , sn+1 ) s0 = s ,
n=0

where (
1 if n = 0,
dn =
σγ n if n ≥ 1;
further, an ∼ µ(·|sn ) and sn+1 ∼ P(·|sn , an ) for n ≥ 0.
Answer the following questions.

(a) Does there exist a Bellman-type relation for Jµ ? [05]

(b) From the above relation, can you identify the Bellman operator Tµ ? Is this operator a
contraction? [02]

1
(c) Suppose the transition matrix P is unknown. Can you design a model-free algorithm to
estimate Jµ ? You can presume that you can sample from the invariant distribution of
the Markov chain induced by µ. Discuss the almost convergence of this algorithm. You
may directly use the results that were discussed in class. [08]

4. As in proof of [1, Theorem 2], let

f¯(y) = (γDP Ππy − D)y + Dr

f (z) = (γDP ΠπQ∗ − D)z + Dr

Show that f¯ is quasi-monotone increasing. Additionally, show that f (y) ≤ f¯(y) for all y.
Finally, show that the origin is the globally asymptotically stable equilibrium for the ODE

ż(t) = f (z(t)).

Use this to conclude that the solution trajectory of the noiseless Q-learning ODE is asymp-
totically lower bounded by the zero vector. [05]

References
[1] Donghwan Lee and Niao He. A unified switching system perspective and ode analysis of q-
learning algorithms. arXiv preprint arXiv:1912.02270, 2019.

Homework 1: ELEN E6885: Introduction To Reinforcement Learning September 21, 2021
No ratings yet
Homework 1: ELEN E6885: Introduction To Reinforcement Learning September 21, 2021
8 pages
Littomore
No ratings yet
Littomore
169 pages
2025_MDPs 1
No ratings yet
2025_MDPs 1
62 pages
cs229 Notes13
No ratings yet
cs229 Notes13
15 pages
l1 Mdps Exact Methods
No ratings yet
l1 Mdps Exact Methods
69 pages
slidedeck_6_MAS_2021_22_RL_2_MDP_Model-based
No ratings yet
slidedeck_6_MAS_2021_22_RL_2_MDP_Model-based
36 pages
10 - Reinforcement Learning
No ratings yet
10 - Reinforcement Learning
24 pages
Markov Decision Processes and Exact Solution Methods
No ratings yet
Markov Decision Processes and Exact Solution Methods
34 pages
2-dynamic
No ratings yet
2-dynamic
50 pages
DRL_Homework_1
No ratings yet
DRL_Homework_1
4 pages
A Distrib Persp On RL
No ratings yet
A Distrib Persp On RL
19 pages
Convergence of Q-Learning PDF
No ratings yet
Convergence of Q-Learning PDF
4 pages
Homework - 06 - 223 - Spring 2024
No ratings yet
Homework - 06 - 223 - Spring 2024
5 pages
Secrets of The Secret
100% (3)
Secrets of The Secret
65 pages
Lecture26 Ri
No ratings yet
Lecture26 Ri
55 pages
Rl Exam Tutti
No ratings yet
Rl Exam Tutti
47 pages
Formula Sheet: Section 1 - Deterministic Dynamic Programming
No ratings yet
Formula Sheet: Section 1 - Deterministic Dynamic Programming
10 pages
Tut21 RL
No ratings yet
Tut21 RL
101 pages
AI512/EE633: Reinforcement Learning: Lecture 3 - Dynamic Programming
No ratings yet
AI512/EE633: Reinforcement Learning: Lecture 3 - Dynamic Programming
43 pages
Bellemare17a PDF
No ratings yet
Bellemare17a PDF
10 pages
q2B Review Sol
No ratings yet
q2B Review Sol
14 pages
mdp-cheatsheet
No ratings yet
mdp-cheatsheet
3 pages
Homework - 07 - 223 - Spring 2024
No ratings yet
Homework - 07 - 223 - Spring 2024
6 pages
کتاب هشتم بارگزاری شده
No ratings yet
کتاب هشتم بارگزاری شده
112 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
101 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
RL_2021_22_Exam_I_63163060c243ad69c552d008b899be82
No ratings yet
RL_2021_22_Exam_I_63163060c243ad69c552d008b899be82
4 pages
DLMAIRIL01_Q4-2024_Session2
No ratings yet
DLMAIRIL01_Q4-2024_Session2
68 pages
GDD Nonlinear NIPS 2009 Convergent Temporal Difference Learning With Arbitrary Smooth Function Approximation
No ratings yet
GDD Nonlinear NIPS 2009 Convergent Temporal Difference Learning With Arbitrary Smooth Function Approximation
9 pages
Assignment 4 (Sol.) : Reinforcement Learning
No ratings yet
Assignment 4 (Sol.) : Reinforcement Learning
6 pages
E0_270_RL
No ratings yet
E0_270_RL
10 pages
RL_assignment
No ratings yet
RL_assignment
2 pages
Lecture_12_slides_-_after
No ratings yet
Lecture_12_slides_-_after
50 pages
cs229-notes12 Reinforcement in Control
No ratings yet
cs229-notes12 Reinforcement in Control
17 pages
02 MarkovDecisionProcess
No ratings yet
02 MarkovDecisionProcess
51 pages
CS 188 Fall 2018 Written HW4 Soln
No ratings yet
CS 188 Fall 2018 Written HW4 Soln
6 pages
CS229
No ratings yet
CS229
17 pages
Reinforcement Learning in A Nutshell
No ratings yet
Reinforcement Learning in A Nutshell
12 pages
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
No ratings yet
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
15 pages
DL Unit 6 QP Solution
No ratings yet
DL Unit 6 QP Solution
15 pages
Notações Dos Algoritimos
No ratings yet
Notações Dos Algoritimos
10 pages
Oscillator Design Techniques Allow High Frequency Applications of Inverted Mesa Resonators
100% (4)
Oscillator Design Techniques Allow High Frequency Applications of Inverted Mesa Resonators
8 pages
Cs748 s2021 Quizzes Till q4
No ratings yet
Cs748 s2021 Quizzes Till q4
4 pages
Solution3
No ratings yet
Solution3
4 pages
HW 2
No ratings yet
HW 2
2 pages
Assignment 6 (Sol.) : Reinforcement Learning
No ratings yet
Assignment 6 (Sol.) : Reinforcement Learning
4 pages
RL_Paper_Deepsk
No ratings yet
RL_Paper_Deepsk
4 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
15 pages
Welding Seminar
100% (1)
Welding Seminar
57 pages
Assignment 4
No ratings yet
Assignment 4
6 pages
Assignment 4: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Assignment 4: Reinforcement Learning Prof. B. Ravindran
4 pages
Electrical Feed Drives in Automation Basics, Computation, Dimensioning (Etc.) (Z-Library)
No ratings yet
Electrical Feed Drives in Automation Basics, Computation, Dimensioning (Etc.) (Z-Library)
337 pages
4.1 Basic Physics and Band Diagrams For MOS Capacitors: FB M I S G
No ratings yet
4.1 Basic Physics and Band Diagrams For MOS Capacitors: FB M I S G
68 pages
rl
No ratings yet
rl
6 pages
EE675A Lec12
No ratings yet
EE675A Lec12
5 pages
A12 Spring2024
No ratings yet
A12 Spring2024
5 pages
hgtfhgfhtf
No ratings yet
hgtfhgfhtf
5 pages
q2B Review
No ratings yet
q2B Review
9 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
7 pages
intro___RL_paper_gpt
No ratings yet
intro___RL_paper_gpt
5 pages
AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions
0% (1)
AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions
4 pages
Unit1 01vectors PDF
No ratings yet
Unit1 01vectors PDF
20 pages
sdata2017117
No ratings yet
sdata2017117
13 pages
Prepared By: Engr. Nemilyn I. Angeles-Fadchar
No ratings yet
Prepared By: Engr. Nemilyn I. Angeles-Fadchar
27 pages
Problem 1: Markov Reward Process
No ratings yet
Problem 1: Markov Reward Process
3 pages
AI 3000 / CS 5500: Reinforcement Learning Assignment 1: Problem 1: Markov Reward Process
No ratings yet
AI 3000 / CS 5500: Reinforcement Learning Assignment 1: Problem 1: Markov Reward Process
5 pages
A Survey of Multilinear Subspace Learning For Tensor Data
No ratings yet
A Survey of Multilinear Subspace Learning For Tensor Data
35 pages
Chemical Engineering 03 2016 PDF
100% (1)
Chemical Engineering 03 2016 PDF
96 pages
3 Atomic Structure 2
No ratings yet
3 Atomic Structure 2
16 pages
Elegant and Professional Company Business Proposal Presentation
No ratings yet
Elegant and Professional Company Business Proposal Presentation
15 pages
Chapter 6 - Electrostatic Boundary - Value Problems
100% (2)
Chapter 6 - Electrostatic Boundary - Value Problems
38 pages
Assignment On Exo and Endo
100% (1)
Assignment On Exo and Endo
6 pages
04 Handout 1
No ratings yet
04 Handout 1
11 pages
D2170_D2170M-22_Kinematic viscosity
No ratings yet
D2170_D2170M-22_Kinematic viscosity
11 pages
Heights and Distances - Exercise Module-2
No ratings yet
Heights and Distances - Exercise Module-2
6 pages
AHP Notes
No ratings yet
AHP Notes
26 pages
Operation Manual Se5-15 PDF
100% (1)
Operation Manual Se5-15 PDF
42 pages
Reinforcement Learning - Unit 6 - Week 4
No ratings yet
Reinforcement Learning - Unit 6 - Week 4
3 pages
Cambridge IGCSE™: Co-Ordinated Sciences 0654/62 October/November 2021
No ratings yet
Cambridge IGCSE™: Co-Ordinated Sciences 0654/62 October/November 2021
11 pages
Conservation of Linear Momentum-1
No ratings yet
Conservation of Linear Momentum-1
4 pages
L-7 & 8 Clock
No ratings yet
L-7 & 8 Clock
8 pages
Design and Implement of A Programmable Logic Controller (PLC) For Classical Control Laboratory
No ratings yet
Design and Implement of A Programmable Logic Controller (PLC) For Classical Control Laboratory
6 pages
ABB ACS-600 AC Drives
No ratings yet
ABB ACS-600 AC Drives
10 pages
Design and Implementation of A Graphical User Interface For The Flexible, Extensible Radar and Sonar Simulator
100% (1)
Design and Implementation of A Graphical User Interface For The Flexible, Extensible Radar and Sonar Simulator
78 pages
Assignment_3 (3)
No ratings yet
Assignment_3 (3)
2 pages
Assignment_2 (3)
No ratings yet
Assignment_2 (3)
2 pages
MLSP-Project-Report-Template-Final (1)
No ratings yet
MLSP-Project-Report-Template-Final (1)
2 pages
Latihan Soal Sistem Kontrol Semester III Terbaiks
No ratings yet
Latihan Soal Sistem Kontrol Semester III Terbaiks
11 pages
TRUMPF Technical Data Sheet TruDisk
No ratings yet
TRUMPF Technical Data Sheet TruDisk
8 pages
Math 7 Q3 Week 1
No ratings yet
Math 7 Q3 Week 1
10 pages
On Color: The Husserlian Material A Priori: Jairo José Da Silva
No ratings yet
On Color: The Husserlian Material A Priori: Jairo José Da Silva
9 pages
Module 1 Dynamics of Rigid Bodies
No ratings yet
Module 1 Dynamics of Rigid Bodies
11 pages
TBA - Vessel Internals
No ratings yet
TBA - Vessel Internals
5 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)

Assignment (2)

Uploaded by

Assignment (2)

Uploaded by

Reinforcement Learning (E1 277)

: RS → RS be given by ΠJ = Φ(Φ⊤ Dµ Φ)−1 Φ⊤ Dµ J.

3. We have so far discussed infinite-horizon reinforcement learning with exponential discounting.

(a) Does there exist a Bellman-type relation for Jµ ? [05]

4. As in proof of [1, Theorem 2], let

f¯(y) = (γDP Ππy − D)y + Dr

You might also like