HW 2

Uploaded by

wajahatgilkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views2 pages

HW 2

Uploaded by

wajahatgilkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Homework 2

Course: MDP and RL (MCL775)

• Questions 1 to 4 are practice problems, that should help you get familiar with the
content in the course.
• Only Question 5 and 6 (Coding problem) will be graded. You will be required to submit
the .ipynb file (IPython Notebook) or .py file(s) via Moodle. Submission deadline is
11:59PM on 25th September.

1. Consider the MDP M = ⟨S, A, r, p, γ⟩, where S, A denote the state and action spaces, respectively,
r : S × A → R is the reward function, p(s′ |s, a) is the transition probability, and γ ∈ (0, 1) is the
discount factor. The objective is to determine a policy π : S → A that maximizes the value function
Vπ (s), i.e.
∞
hX i
max Vπ (s) = Epπ γ t r(st , at ) s0 = s
π
t=0

Write a linear programming formulation to determine the optimal value function V ∗ (s) for all s that
solves the above optimization problem. Prove that this linear program indeed determines V ∗ (s) ∀ s ∈ S.
2. Consider the MDP M = ⟨S, A, c, p⟩, where S, A denote the state and action spaces, respectively, c :
S × A → R is the cost function, and p(s′ |s, a) is the transition probability. Consider the case where the
state space consists of a zero cost termination state δ, i.e., c(δ, ·) = 0, and p(δ|δ, ·) = 1. The corresponding
optimal value function V (s) is given as
∞
hX i
V ∗ (s) = min Epπ c(st , π(st )) s0 = s ,
pi
t=0

where the associated recursive Bellman equation that V ∗ (s) satisfies is given by
X
V ∗ (s) = min c(s, a) + p(s′ |s, a)V ∗ (s′ ) .
a
s′
| {z }
=:Q∗ (s,a)

From above definition, it is clear that Q∗ (s, a) satisfies the following recursive Bellman equation
X
Q∗ (s, a) = c(s, a) + p(s′ |s, a) min
′
Q∗ (s′ , a′ ) .
a
s′
| {z }
=;H(Q)

Show that that map H is a contraction map in a ξ-weighted norm ∥ · ∥ξ , where ∥Q∥ξ := maxs,a |Q(s,a)|
ξs
for some ξ ∈ R|S| > 0. Recall that we did such a proof in the class to show that the map corresponding
to the value function is a contraction.
3. What is the difference between the following?
1. On policy and Off policy algorithms.
2. Model based and model free RL algorithms.
3. TD learning and Q-learning.
Give examples for each type of algorithms above.
4. Consider the scenario where the action at+1 is sampled from an ϵ-greedy policy. Is the SARSA and
Q-learning algorithm exactly similar for this case? If no, for which choice of a policy for at+1 are SARSA
and Q-learning exactly similar.

5. (10 points) Coding Problem: Consider the Gridworld problem from the Homework 1. Write a code
that uses the linear programming formulation to solve for the optimal value functions. Print the optimal
value function that you obtain. Compare it with the solution given by the value iteration method from
Homework 1.
6. (30 points) Coding Problem: Consider a 8 × 8 Gridworld discussed in paper https://auai.org/
uai2016/proceedings/papers/219.pdf. The corresponding environment has been coded into a python
file GridWorldF ox2016.py and provided to you for better understanding. You can go over the code to
understand the cost functions, and transition probabilities (as you would need these to determine V ∗ (s)
using value (or policy) iterations).
Write a Q learning algorithm to determine the optimal policy and value function for the above Gridworld.
Use the metric presented in equation (31) of the paper to describe the evolution of the learning algorithm.

Sure Track 2.0 User Manual
No ratings yet
Sure Track 2.0 User Manual
628 pages
Reinforcement Learning Cheat Sheet: Return
No ratings yet
Reinforcement Learning Cheat Sheet: Return
7 pages
CS 188 Fall 2018 Written HW4 Soln
No ratings yet
CS 188 Fall 2018 Written HW4 Soln
6 pages
AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions
0% (1)
AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions
4 pages
Problem 1: Markov Reward Process
No ratings yet
Problem 1: Markov Reward Process
3 pages
Cs748 s2021 Quizzes Till q4
No ratings yet
Cs748 s2021 Quizzes Till q4
4 pages
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
No ratings yet
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
15 pages
CS6700 RL 2024 Wa1
No ratings yet
CS6700 RL 2024 Wa1
7 pages
DRL Homework 1
No ratings yet
DRL Homework 1
4 pages
AI 3000 / CS 5500: Reinforcement Learning Assignment 1: Problem 1: Markov Reward Process
No ratings yet
AI 3000 / CS 5500: Reinforcement Learning Assignment 1: Problem 1: Markov Reward Process
5 pages
Lecture26 Ri
No ratings yet
Lecture26 Ri
55 pages
RL Unitwise Imp Questions
No ratings yet
RL Unitwise Imp Questions
4 pages
Exam Prep 4 Solutions: Q1. MDPS: Dice Bonanza
No ratings yet
Exam Prep 4 Solutions: Q1. MDPS: Dice Bonanza
4 pages
کتاب هشتم بارگزاری شده
No ratings yet
کتاب هشتم بارگزاری شده
112 pages
RL Problem Sheet: E0 270: Machine Learning (Spring 2025)
No ratings yet
RL Problem Sheet: E0 270: Machine Learning (Spring 2025)
10 pages
Practice Assignment 6: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Practice Assignment 6: Reinforcement Learning Prof. B. Ravindran
24 pages
RL Lecture4
No ratings yet
RL Lecture4
7 pages
q2B Review Sol
No ratings yet
q2B Review Sol
14 pages
A12 Spring2024
No ratings yet
A12 Spring2024
5 pages
Assignment
No ratings yet
Assignment
2 pages
Reinforcement Learning: Mitchell, Ch. 13 (See Also Barto & Sutton Book On-Line)
No ratings yet
Reinforcement Learning: Mitchell, Ch. 13 (See Also Barto & Sutton Book On-Line)
14 pages
Reinforcement Learning: Mitchell, Ch. 13 (See Also Barto & Sutton Book On-Line)
No ratings yet
Reinforcement Learning: Mitchell, Ch. 13 (See Also Barto & Sutton Book On-Line)
14 pages
Markov Decision Processes and Exact Solution Methods
No ratings yet
Markov Decision Processes and Exact Solution Methods
34 pages
Assignment 4: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Assignment 4: Reinforcement Learning Prof. B. Ravindran
4 pages
Notações Dos Algoritimos
No ratings yet
Notações Dos Algoritimos
10 pages
HGTFHGFHTF
No ratings yet
HGTFHGFHTF
5 pages
01 Module 1 Early Reinforcement Learning
No ratings yet
01 Module 1 Early Reinforcement Learning
134 pages
Homework 1: ELEN E6885: Introduction To Reinforcement Learning September 21, 2021
No ratings yet
Homework 1: ELEN E6885: Introduction To Reinforcement Learning September 21, 2021
8 pages
RL Theory Tutorial
No ratings yet
RL Theory Tutorial
80 pages
Notes For Module 4 and 5
No ratings yet
Notes For Module 4 and 5
9 pages
RL Paper Deepsk
No ratings yet
RL Paper Deepsk
4 pages
DL Unit 6 QP Solution
No ratings yet
DL Unit 6 QP Solution
15 pages
Lec17 ReinforcementLearning
No ratings yet
Lec17 ReinforcementLearning
58 pages
2 Dynamic
No ratings yet
2 Dynamic
50 pages
RL Exam Tutti
No ratings yet
RL Exam Tutti
47 pages
MDP Cheatsheet
No ratings yet
MDP Cheatsheet
3 pages
MarkovDecisionProcesses Analysis
No ratings yet
MarkovDecisionProcesses Analysis
10 pages
Intro RL Paper GPT
No ratings yet
Intro RL Paper GPT
5 pages
Tutorial Questions (Annexure I) Que S-Tion No Questions Co BTL
No ratings yet
Tutorial Questions (Annexure I) Que S-Tion No Questions Co BTL
6 pages
12 ML Reinforcement Learning Value Based Control
No ratings yet
12 ML Reinforcement Learning Value Based Control
12 pages
Markov Decision Process II
No ratings yet
Markov Decision Process II
88 pages
Assignment 6 (Sol.) : Reinforcement Learning
No ratings yet
Assignment 6 (Sol.) : Reinforcement Learning
4 pages
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
No ratings yet
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
30 pages
Assignment 4
No ratings yet
Assignment 4
6 pages
Solution 9
No ratings yet
Solution 9
3 pages
Practice Assignment 4: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Practice Assignment 4: Reinforcement Learning Prof. B. Ravindran
2 pages
Reinforcement Learning Exam
No ratings yet
Reinforcement Learning Exam
6 pages
A Crash Course On Reinforcement Learning - Felix Wagner
No ratings yet
A Crash Course On Reinforcement Learning - Felix Wagner
84 pages
10 - Reinforcement Learning
No ratings yet
10 - Reinforcement Learning
24 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
15 pages
Fa19 Lecture 15 MDPs II
No ratings yet
Fa19 Lecture 15 MDPs II
76 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
7 pages
Instructor (Andrew NG) :okay, Good Morning. Welcome Back. So I Hope All of You Had
No ratings yet
Instructor (Andrew NG) :okay, Good Morning. Welcome Back. So I Hope All of You Had
14 pages
ML Assignment
No ratings yet
ML Assignment
7 pages
GDD Nonlinear NIPS 2009 Convergent Temporal Difference Learning With Arbitrary Smooth Function Approximation
No ratings yet
GDD Nonlinear NIPS 2009 Convergent Temporal Difference Learning With Arbitrary Smooth Function Approximation
9 pages
RL Assignment
No ratings yet
RL Assignment
2 pages
cs229 Notes13
No ratings yet
cs229 Notes13
15 pages
Assignment 2
No ratings yet
Assignment 2
13 pages
2025 - MDPs 2
No ratings yet
2025 - MDPs 2
42 pages
Lecture Notes v1.0 687 F22
No ratings yet
Lecture Notes v1.0 687 F22
115 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Tire Sevice Manual
No ratings yet
Tire Sevice Manual
26 pages
Physics Investigatory Project
No ratings yet
Physics Investigatory Project
17 pages
Surveillance Systems
No ratings yet
Surveillance Systems
17 pages
Breadth First Search and Depth First Search Algorithms
No ratings yet
Breadth First Search and Depth First Search Algorithms
2 pages
Aman Pandey Resume 20241012
No ratings yet
Aman Pandey Resume 20241012
2 pages
ADR Sabre
No ratings yet
ADR Sabre
2 pages
Exercise 1: Pizza Class Represents A Type of Pizza That Belongs To The Pizzeria's Menu
No ratings yet
Exercise 1: Pizza Class Represents A Type of Pizza That Belongs To The Pizzeria's Menu
4 pages
Tree Menu Magic 2
No ratings yet
Tree Menu Magic 2
77 pages
Meghnaghat Power Plant
No ratings yet
Meghnaghat Power Plant
65 pages
Toyota Sienna 6
No ratings yet
Toyota Sienna 6
2 pages
Hephaestus 7100 - Quick Reference Guide
No ratings yet
Hephaestus 7100 - Quick Reference Guide
4 pages
Central Purchase Contract
No ratings yet
Central Purchase Contract
38 pages
Search Bar
No ratings yet
Search Bar
6 pages
Dca
No ratings yet
Dca
8 pages
Build A Simple Webservice With Delphi 2006 and Microsoft Server 2003 IIS 6.0
No ratings yet
Build A Simple Webservice With Delphi 2006 and Microsoft Server 2003 IIS 6.0
7 pages
Forklift Inspection
No ratings yet
Forklift Inspection
4 pages
Rans Simulation of Viscous Flow Around Hull of Multipurpose Amphibious Vehicle
No ratings yet
Rans Simulation of Viscous Flow Around Hull of Multipurpose Amphibious Vehicle
5 pages
SOP For Protocol For Working Standard
No ratings yet
SOP For Protocol For Working Standard
6 pages
Flow Calculation Software: Version 4 User's Manual
No ratings yet
Flow Calculation Software: Version 4 User's Manual
64 pages
Fake Snapchat Chat Generator
No ratings yet
Fake Snapchat Chat Generator
1 page
Chs 07 08answers PDF
No ratings yet
Chs 07 08answers PDF
18 pages
BES - R Lab 7
No ratings yet
BES - R Lab 7
5 pages
Print Production: Digital Images
No ratings yet
Print Production: Digital Images
24 pages
Bf2 Flanger Eng
No ratings yet
Bf2 Flanger Eng
5 pages
Tutorial - SurvCE.01.Rev3.NTRIP Connections S9III S8
No ratings yet
Tutorial - SurvCE.01.Rev3.NTRIP Connections S9III S8
18 pages
Sae Arp741c 2016
No ratings yet
Sae Arp741c 2016
22 pages
2018 M.SC 2nd Sem
No ratings yet
2018 M.SC 2nd Sem
12 pages
Go Bag Policy March 2023
No ratings yet
Go Bag Policy March 2023
5 pages
Circulation
No ratings yet
Circulation
56 pages

HW 2

Uploaded by

HW 2

Uploaded by

Homework 2

Course: MDP and RL (MCL775)

You might also like