0% found this document useful (1 vote)

455 views4 pages

AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions

This document is the instructions for an exam on reinforcement learning. It consists of 5 problems worth 70 points total. The problems cover topics like Markov reward processes, Bellman equations, Monte Carlo methods, problem formulation, and miscellaneous questions. Students are instructed to show their work and justify their answers. They must submit their response as a private post by the due date and time.

Uploaded by

Abdul Taufeeq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

455 views4 pages

AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions

Uploaded by

Abdul Taufeeq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

AI 3000 / CS5500 : Reinforcement Learning

Exam № 1
Due Date : 23/10/2021, 3.00 PM

Easwar Subramanian, IIT Hyderabad 23/10/2021

Instructions

Read all the instructions below carefully before you start answering the questions

• Please include your institute roll number in the first page of the answer sheet

• This is an open-book exam.

• Seeking help from other individuals (including classmates) is not allowed.

• Plagiarism in answers will be dealt with strictly.

• The exam has 5 problems for a total of 70 points.

• All answers should include suitable justification lest no marks will be awarded.

• The estimated amount of work for this exam is about three hours.

• Please submit the answer sheets by 3:00pm IST, Saturday October 23, 2021.

• Submit your answer-sheet as a private post to me and the Instructors on Piazza under
the midterm-exam tab.

Exam № 1 Page 1
Problem 1 : Markov Reward Process

Consider the following snake and ladders game as depicted in the figure below.

• Initial state is S and a fair four sided die is used to decide the next state at each time

• Player must land exactly on state W to win

• Die throws that take you further than state W leave the state unchanged

(a) Identify the states, transition matrix of this Markov process (1 points)

(b) Construct a suitable reward function, discount factor and use the Bellman equation for the
Markov reward process to compute how long does it take "on average" (the expected number
of die throws) to reach the state W from any other state (7 points)

Problem 2 : Bellman Equations and Dynamic Programming

(a) Consider an MDP M =< S, A, P, R, γ > where the reward function has the structure

R(s, a) = R1 (s, a) + R2 (s, a).

Suppose we are given action value functions Qπ1 and Qπ2 , for a given policy π, corresponding
to reward fuctions R1 and R2 , respectively. Explain whether it is possible to combine these
action value functions policies in a smple manner to compute the action value function Qπ
corresponding to the composite reward function R. (4 Points)

(b) Let M =< S, A, P, R, γ > be an MDP with fintite state and action space. We further assume
that the reward function R to be a deterministic function of current state s ∈ S and action
a ∈ A. Let f and g be two arbitraty action value functions mapping a state-action pair of the
MDP to a real number, i.e. f, g : S × A → R. Let L denote the Bellman optimality operator
(for the action value function) given by,

(Lf )(s, a) = R(s, a) + γP (s, a), Vf (s)

where Vf (s) = maxa f (·, , a). Prove that,

kLf − Lgk∞ ≤ γkf − gk∞

(6 Points)
[ Note : The Bellman optimality operator defined above is for action value functions and is
different from the one that was defined in the lectures which is for value functions.Think of
Vf (s) as a transformation operator that turns a vector f ∈ R|S||A| into a vector of length |S|.
The max norm of an action value function f is defined as kf k∞ = maxs maxa |f (s, a)|]

Exam № 1 Page 2
Problem 3 : Monte Carlo Methods

Consider a Markov process with 2 states S = {S, A} with transition probablities as shown in
the table below, where p ∈ (0, 1) is a non-zero probablity. To generate a MRP from this Markov
chain, assume that the rewards for being in states S and A are 1 and 0, respectively. In addition,
let the discount factor of the MRP be γ = 1.

S A
S 1−p p
A 0 1

(a) Provide a generic form for a typical trajectory starting at state S. (1 Point)

(b) Estimate V (S) using first visit MC. (2 Points)

(c) Estimate V (S) using every visit MC. (2 Points)

(d) What is true value of V (S) ? (3 Points)

(e) Explain if the every vist MC estimate is biased. (2 Points)

(f) In general for a MRP, comment on the convergence properties of the first visit MC and every
visit MC algorithms (2 Points)

Problem 4 : Problem Forumlation

Consider a SUBWAY outlet in your locality. Customers arrive to the store at times governed
by an unknown probablity distribution. The outlet sells sandwiches with a certain type of bread
(choice of 4 types) and filling (choice of 5 types). If a customer cannot get the desired sandwich,
he/she is not going to visit the store again. Ingredients need to be discarded every 3 days after
purchase. The store owner wants to figure out a policy for buying ingredients in such a way
to maximize his long-time profit using reiforcement learning. To this end, we will formulate
the problem as a MDP. You are free to make other assumptions regarding the probem setting.
Please enumerate your assumptions while answering the questions below.

(a) Suggest a suitable state and action space for the MDP. (5 Points)

(b) Devise an appropriate reward function for the MDP (3 Points)

(d) Would you use dynamic progamming or reinforcement learning to solve the the problem ?
Explain with reasons. (3 Point)

(e) Between MC and TD methods, which would you use for learning ? Why ? (3 Point)

(f) Is function approximation required to solve this problem ? Why or why not ? (3 Point)

Exam № 1 Page 3
Problem 5 : Miscellaneous Questions

(a) What is the algorithm that results, if In the TD(λ) algorithm, we set λ = 1 ? (1 Point)

(b) What are the possible reasons to study TD(λ) over TD(0) method ? (2 Points)

(d) In off policy evaluation, would it be beneficial to have the behaviour policy be deterministic
and the target policy be stochastic ? (2 Points)

(e) Under what conditions, does temporal methods for policy evaluation converge to true value
of the policy π ? Explain intitutively, the reasoning behind those conditions. (3 Points)

(f) Why does MC methods for policy evaluation yields an unbiased estimate of the true value of
the policy ? (2 Points)

(g) Let M =< S, A, P, R, γ > be an MDP with fintite state and action space. We further assume
that the reward function R is non-negative for all state-action pairs. In addition, suppose that
for every state s ∈ S, there is some action as such that P (s0 |s, as ) ≥ p for some p ∈ [0, 1].
We intend to find optimal value function V ∗ using value iteration. Initialize Vs (0) = 0 for all
states of the MDP and let Vt (s) denote the value of state s after t iterations. Prove that for all
states s and t ≥ 0, Vt+1 (s) ≥ pγVt (s). (4 Points)

(h) Explain if it is possible to parallelize the value iteration algorithm (3 Points)

ALL THE BEST

Exam № 1 Page 4

Game Theory (Part 1)
No ratings yet
Game Theory (Part 1)
81 pages
Problem 1: Markov Reward Process
No ratings yet
Problem 1: Markov Reward Process
3 pages
BMME5103 - May09 Mannm
100% (1)
BMME5103 - May09 Mannm
6 pages
Exercises Part3
No ratings yet
Exercises Part3
13 pages
CLS Exr - LP Intro
100% (1)
CLS Exr - LP Intro
4 pages
1 Simulation Case Tri State Corp
0% (1)
1 Simulation Case Tri State Corp
2 pages
CS 188: Artificial Intelligence: Search
No ratings yet
CS 188: Artificial Intelligence: Search
55 pages
Double Pendulum PDF
100% (1)
Double Pendulum PDF
7 pages
4 Ben and Jerry's Thrive On Company Spirit Ob Case 4
No ratings yet
4 Ben and Jerry's Thrive On Company Spirit Ob Case 4
3 pages
Beer or Quiche
No ratings yet
Beer or Quiche
7 pages
General Monsters Corporation Has Two Plants For Producing Juggernauts One
No ratings yet
General Monsters Corporation Has Two Plants For Producing Juggernauts One
1 page
Sapm Ete 2019-20 B
0% (1)
Sapm Ete 2019-20 B
7 pages
Chapter 10 Exercises
No ratings yet
Chapter 10 Exercises
7 pages
CH 2 Linear Programming in Spreadsheets
No ratings yet
CH 2 Linear Programming in Spreadsheets
63 pages
Sensitivity Analysis: Lindo Input & Results
No ratings yet
Sensitivity Analysis: Lindo Input & Results
16 pages
1
No ratings yet
1
12 pages
OR Paper 2
No ratings yet
OR Paper 2
1 page
Final Term Paper Managerial Economics 2019
No ratings yet
Final Term Paper Managerial Economics 2019
10 pages
Chap 6
No ratings yet
Chap 6
64 pages
DS II Mid Term 2017 Solution
No ratings yet
DS II Mid Term 2017 Solution
20 pages
Short Answer Key: Problem Set 1
No ratings yet
Short Answer Key: Problem Set 1
3 pages
MC1 - Exercices
No ratings yet
MC1 - Exercices
6 pages
Decision Tree EMV Decision Making Under Risk Multistage Decision Making
No ratings yet
Decision Tree EMV Decision Making Under Risk Multistage Decision Making
13 pages
Operations Research-Sec D
No ratings yet
Operations Research-Sec D
5 pages
3b Extensive-Form Games
100% (1)
3b Extensive-Form Games
17 pages
Linear Programming Examples
No ratings yet
Linear Programming Examples
5 pages
Industry Introduction: About Bakery Food Industry
No ratings yet
Industry Introduction: About Bakery Food Industry
22 pages
Why Do Organizations Fail?: Marketing Management 1 Assignment 2 Team: Sixth Sense
No ratings yet
Why Do Organizations Fail?: Marketing Management 1 Assignment 2 Team: Sixth Sense
14 pages
Advanced Operation Research
No ratings yet
Advanced Operation Research
6 pages
Part 2
No ratings yet
Part 2
16 pages
Fitness CCentre Management
No ratings yet
Fitness CCentre Management
5 pages
Note1 Model Formulation
100% (1)
Note1 Model Formulation
17 pages
Educating A Minority: A Case Study of Government Urdu Schools in Bengaluru Silicon City K.Vaijayanti
No ratings yet
Educating A Minority: A Case Study of Government Urdu Schools in Bengaluru Silicon City K.Vaijayanti
17 pages
Topic 3 - Information 3 PDF
No ratings yet
Topic 3 - Information 3 PDF
6 pages
Untitled
No ratings yet
Untitled
10 pages
CH 8. Linear Programming Applications
No ratings yet
CH 8. Linear Programming Applications
55 pages
B.ahonseconomics Intermediate Macroeconomics Ii Sem Iv7052 PDF
No ratings yet
B.ahonseconomics Intermediate Macroeconomics Ii Sem Iv7052 PDF
7 pages
Human Resource Management Project
No ratings yet
Human Resource Management Project
9 pages
Deep Reinforcement Learning Handout v2.0
0% (1)
Deep Reinforcement Learning Handout v2.0
6 pages
Game Theory
No ratings yet
Game Theory
4 pages
Ejercicios 1 y 2 Miercoles
No ratings yet
Ejercicios 1 y 2 Miercoles
83 pages
6th Sem CSE PYQ
No ratings yet
6th Sem CSE PYQ
27 pages
EPGP09 WEB OR Endterm QP
No ratings yet
EPGP09 WEB OR Endterm QP
4 pages
(Quantitative Analysis) Linear Programming
No ratings yet
(Quantitative Analysis) Linear Programming
2 pages
Dynamic Programming Value Iteration
100% (1)
Dynamic Programming Value Iteration
36 pages
Assignment No 1 (Sequence Problem)
No ratings yet
Assignment No 1 (Sequence Problem)
6 pages
Monkey
100% (1)
Monkey
28 pages
Mayank M - B75 - C0-RNo20 - QM - Assign01
No ratings yet
Mayank M - B75 - C0-RNo20 - QM - Assign01
16 pages
Use The Following To Answer Questions 5-7:: Figure: Strawberries and Submarines
No ratings yet
Use The Following To Answer Questions 5-7:: Figure: Strawberries and Submarines
16 pages
DA-IICT Placement Brochure 2024-25
No ratings yet
DA-IICT Placement Brochure 2024-25
53 pages
DS II Packet 2
No ratings yet
DS II Packet 2
31 pages
AI Unit 3
No ratings yet
AI Unit 3
89 pages
Problem Set 2 3 4 Or-2
100% (1)
Problem Set 2 3 4 Or-2
5 pages
ECON 102 Midterm1 Sample
100% (1)
ECON 102 Midterm1 Sample
5 pages
Answer - Key - Lab - Activities - Micro Econ
100% (2)
Answer - Key - Lab - Activities - Micro Econ
3 pages
Formulation of Linear Programming Problem
No ratings yet
Formulation of Linear Programming Problem
4 pages
IT Question Paper of EPBM 13
No ratings yet
IT Question Paper of EPBM 13
2 pages
Week 2 Practice Questions - Productivity
100% (1)
Week 2 Practice Questions - Productivity
6 pages
Busi4489 E1
No ratings yet
Busi4489 E1
7 pages
AI 3000 / CS 5500: Reinforcement Learning Assignment 1: Problem 1: Markov Reward Process
No ratings yet
AI 3000 / CS 5500: Reinforcement Learning Assignment 1: Problem 1: Markov Reward Process
5 pages
DRL Homework 1
No ratings yet
DRL Homework 1
4 pages
Reinforcement Learning - Unit 6 - Week 4
No ratings yet
Reinforcement Learning - Unit 6 - Week 4
3 pages
Skip Mock Test Paper 1
No ratings yet
Skip Mock Test Paper 1
10 pages
Chutes and Ladders
No ratings yet
Chutes and Ladders
38 pages
Simple Linear Regression
100% (1)
Simple Linear Regression
24 pages
Neural Network Methods For Natural Language Processing 1st Edition by Yoav Goldberg ISBN 9783031021657 3031021657
100% (12)
Neural Network Methods For Natural Language Processing 1st Edition by Yoav Goldberg ISBN 9783031021657 3031021657
76 pages
Bridget 1
No ratings yet
Bridget 1
2 pages
Monte Carlo Simulation of The 2D Ising Model PDF
No ratings yet
Monte Carlo Simulation of The 2D Ising Model PDF
11 pages
SVM PDF
No ratings yet
SVM PDF
52 pages
Predict 422 - Module 8
100% (1)
Predict 422 - Module 8
138 pages
Linear Programming Word Problems
No ratings yet
Linear Programming Word Problems
5 pages
Data Structures Using C
100% (1)
Data Structures Using C
7 pages
IRS Most Important Topic
No ratings yet
IRS Most Important Topic
4 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
34 pages
Practical Answrs
No ratings yet
Practical Answrs
22 pages
Trial Questions 2
No ratings yet
Trial Questions 2
13 pages
Red Wine Quality Detection
No ratings yet
Red Wine Quality Detection
17 pages
مدل سازی و بهینه سازی چند هدفه پارامترهای عملیاتی در آسیای نیمه خودشکن
No ratings yet
مدل سازی و بهینه سازی چند هدفه پارامترهای عملیاتی در آسیای نیمه خودشکن
13 pages
GPC 2.3 D SCP03 v1.1.2 PublicRelease
No ratings yet
GPC 2.3 D SCP03 v1.1.2 PublicRelease
41 pages
AI Lab Student Sample File With Pages Removed
No ratings yet
AI Lab Student Sample File With Pages Removed
14 pages
Taylor Expansion
No ratings yet
Taylor Expansion
8 pages
Sketch Techniques For Approximate Query Processing
No ratings yet
Sketch Techniques For Approximate Query Processing
67 pages
Active Noise Cancellation: Using The Lms and Fxlms Algorithms
No ratings yet
Active Noise Cancellation: Using The Lms and Fxlms Algorithms
24 pages
Construct The Binary Tree From Preorder and Inorder
No ratings yet
Construct The Binary Tree From Preorder and Inorder
2 pages
4.4 Correlation and Simple Linear Regression
No ratings yet
4.4 Correlation and Simple Linear Regression
19 pages
Term Paper SIS Mizuno ID 1745441
No ratings yet
Term Paper SIS Mizuno ID 1745441
6 pages
ME362 Control System Engineering
No ratings yet
ME362 Control System Engineering
2 pages
A Chaos Based Novel Approach To Video Encryption Using Dynamic S Box
No ratings yet
A Chaos Based Novel Approach To Video Encryption Using Dynamic S Box
31 pages
Non-Seasonal Box-Jenkins Models
No ratings yet
Non-Seasonal Box-Jenkins Models
75 pages
Jntuh Used Paper Aug-2022: (Common To ECE, EIE)
No ratings yet
Jntuh Used Paper Aug-2022: (Common To ECE, EIE)
2 pages

AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions

Uploaded by

AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions

Uploaded by

AI 3000 / CS5500 : Reinforcement Learning

Easwar Subramanian, IIT Hyderabad 23/10/2021

• This is an open-book exam.

• Seeking help from other individuals (including classmates) is not allowed.

• Plagiarism in answers will be dealt with strictly.

• The exam has 5 problems for a total of 70 points.

• Player must land exactly on state W to win

Problem 2 : Bellman Equations and Dynamic Programming

R(s, a) = R1 (s, a) + R2 (s, a).

(Lf )(s, a) = R(s, a) + γP (s, a), Vf (s)

where Vf (s) = maxa f (·, , a). Prove that,

kLf − Lgk∞ ≤ γkf − gk∞

(b) Estimate V (S) using first visit MC. (2 Points)

(c) Estimate V (S) using every visit MC. (2 Points)

(d) What is true value of V (S) ? (3 Points)

(e) Explain if the every vist MC estimate is biased. (2 Points)

Problem 4 : Problem Forumlation

(b) Devise an appropriate reward function for the MDP (3 Points)

(h) Explain if it is possible to parallelize the value iteration algorithm (3 Points)

ALL THE BEST

You might also like