0% found this document useful (0 votes)

11 views4 pages

Quiz2 Sol

The document is a quiz for a Reinforcement Learning course, consisting of multiple questions related to expected return calculations, Markov Decision Processes (MDPs), and value iteration. It includes specific scenarios for calculating returns, building state transition diagrams, and analyzing policies and their effects on employee satisfaction. The quiz also tests knowledge on theoretical aspects of MDPs, such as the implications of discount factors and convergence of value functions.

Uploaded by

Rahul Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views4 pages

Quiz2 Sol

Uploaded by

Rahul Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Reinforcement Learning (RO3002) Quiz-2

Name

ID No.

Total Marks: 15 Duration: 30 min

Q1. Expected Return:

A. (2 marks) Self-balancing Segway: We want a segway machine to learn to balance by applying right forces to
segway-cart moving along a track to keep the segway rider upright from falling over: A failure is said to occur
if the segway rider falls past a given angle from vertical. The segway pole and the rider is reset to vertical
after each failure. We could treat self-balancing segway as a continuing task, using discounting. In this case
the reward would be -1 on each failure and zero at all other times. Consider an experiment run for 100
timesteps, and a failure was observed at timesteps t=47, and t=78. Given the discount factor, γ = 0.9, what
will the returns G0 and G10 be?
4 =
R +
Y Ren + YRALP----

9)77
46

40 =
( 1)(0 9) ·
+ (
-
1)(0 ·

40 =
(1 (0 . 91" + ( 1)(0-a )
-
.

B. (2 marks) In another continuing task, suppose γ = 0.8 and the reward sequence is R1 = 3, R2 = 2, followed by
an infinite sequence of 5s. What will the returns G0 and G5 be?
P
Re YRzty Rye-
30 = +

10.07 (5) + (0 .
03 (s) + -

31 (01)(2) +

Yo
=

(0641(25) = 20 6
·

400 +

100 700)
=
= 3+ 16 +

YS = 3 + 10.05 + 108s + - - -

=
Q2. A tech giant, Coocle, wants you to build an MDP to model their employee retention strategy. The company
categorizes its employees into three categories, namely low satisfaction (L), medium satisfaction (M), and high
satisfaction (H), based on their job satisfaction. When employees raise a concern, the company has two strategies
for addressing it: one is to provide minimal support (ms) with a zero cost, and the second option is to offer full
support (fs) with a cost of 2 units. Minimal support will lead the employee to lose satisfaction and reach one level
below (unless they already have the least satisfaction, in which case they will stay in the same state) with probability
0.8 or stay in the same state with probability 0.2. If full support is provided, it will increase employee satisfaction
(unless they already have the highest satisfaction, in which case they will stay in the same state) with probability 0.9
or stay in the same state with probability 0.1. Any transition which will increase employee satisfaction will result in
an immediate reward of 8 units and a transition which will decrease employee satisfaction will result in an
immediate loss of 8 units.

A. (2 marks) Build a state transition diagram for this MDP with clear indication of states, actions, transition
probabilities, and rewards.

· L
p
↓S

V= 6
0 -
9

M
= 6

.
D= 0 .
8 P= 0 .
8 H
g

·
v
v
=

y
ms

Po
mS
ms

B. (4 marks) Consider a policy where the company always chooses to go with minimal support (ms), compute
the true value function for each state and determine possible improving actions given a discount factor γ =
0.9.
girenpoli
ESE S It =

& ! Aimsa ,

09(VA(4))) VH( 0
=
=
Va(() =
1(0 +

Va (M
=

0 .

0( -
0 + 0 .

9 il) + 0
-

>(0 + 0 .

9(VH(M))
= -7 8 .

6 4 + Vi(M) - Vam =
-

64/0 82 .

Va(M) =
018
.
-

0( O + y (UM(RI) + 0 2 -

(of Y (U (H1)
Vi(H) = 0 . -

10 87(09)( 7 8) + (0-2) (0 9)
·

(UH(H)
64 +
- .

(n) = -

12 010VACH VECH -
1 .
63
VIT(H +
-
=
↑

Blums)
=
Va(L = 0
,
D( ,
bs =
0 .
1-2 +
y )) + 0 .

9)0 + 0 .

97.8)
= 0 2 + 54 63 118
.

-
-
= -

neusor State (@ (LMs)> &(1 +5) - no improved action.

&(M msT2 Vi (MI = -7 0 A(Mb) 1) -2 + 09(-7 8))) + 0 9/6 + 9 14 03)

.
.

= 0
-

0
. .
.

,
,

@ (M , JS) > &(17 ,

=
-
7 35
. =
) ms)

improved action = 76 .

BUR , msl VIT (H) 63 007

M
, (M16S) =
1) 2+ 91 63)) 10
.
=
=
14
-
-

0 =
. - .
=

improved action
Since &(H , MS) &(H , 7) Here no
Q3. (1 mark) Suppose you are in an infinite-horizon MDP with rewards bounded by |R| ≤ 1 and γ = 0.99. What can
we say about the maximum possible value of a state V(s)?

A) V(s) can be arbitrarily large.

-
B) V(s) is at most 100.
C) V(s) is at most 1.
D) The maximum value of V(s) depends on the policy.

Q4. (1 mark) Suppose you have two Markov Decision Processes (MDPs) with the same state transitions and rewards
but different discount factors γ_1 and γ_2. If γ_1 < γ_2, which of the following is true about their value functions
V(s)?

A) V_1(s) is always lesser than V_2(s) for all states s.

-
B) V_2(s) ≥ V_1(s) for all s, only if all the rewards are positive.
C) V_1(s) is always ≥ V_2(s) for all states s.
D) The Bellman equation does not hold when comparing two different discount factors.

Q5. (1 mark) Given an MDP with three states {A, B, C}, and two actions {Left, Right}, consider a policy π where:
π(A) = Left
π(B) = Right
π(C) = Left
After performing one step of policy improvement, which of the following is necessarily true?

A) The new policy π' will always have a higher value function than π.
B) The new policy π' will always be different from π.
C) The new policy π' is guaranteed to be the optimal policy.
-
D) The new policy π' will be at least as good as π.

Q6. (1 mark) Which of the following scenarios would cause value iteration to take an unusually long time to
converge?

A) Low discount factor (γ close to 0).

B) Deterministic transitions.
C) Small action space.
D) Sparse reward structure. -

Q7. (1 mark) If we initialize the value function V(s) arbitrarily and apply Value Iteration, what happens as the number
of iterations increases?

A) V(s) oscillates and does not converge

-B) V(s) converges to the optimal value function V∗(s)
C) V(s) may diverge if initialized incorrectly
D) V(s) converges only if the reward function is bounded

Cheat sheet

Reinforcement Learning - Unit 13 - Week 10
No ratings yet
Reinforcement Learning - Unit 13 - Week 10
3 pages
Case Study 1: Performance Management System
No ratings yet
Case Study 1: Performance Management System
2 pages
242 Sheet 02 03
No ratings yet
242 Sheet 02 03
5 pages
HRM Nibm
75% (8)
HRM Nibm
18 pages
CSE2530 Reinforcement Learning 2025 P1+2
No ratings yet
CSE2530 Reinforcement Learning 2025 P1+2
115 pages
2025 - MDPs 1
No ratings yet
2025 - MDPs 1
62 pages
Week 10
No ratings yet
Week 10
5 pages
Performance Appraisal Presentation
No ratings yet
Performance Appraisal Presentation
6 pages
RL-UNIT2 - RL Unit 2 RL-UNIT2 - RL Unit 2
No ratings yet
RL-UNIT2 - RL Unit 2 RL-UNIT2 - RL Unit 2
23 pages
Shridhar Nanavare Timesheet December 2024
No ratings yet
Shridhar Nanavare Timesheet December 2024
1 page
Irm Sep Pay
No ratings yet
Irm Sep Pay
1 page
HR Planning and Recruitment at KFC
86% (7)
HR Planning and Recruitment at KFC
3 pages
Safe Work Procedure: Hazards Present
No ratings yet
Safe Work Procedure: Hazards Present
2 pages
06 MDP
No ratings yet
06 MDP
89 pages
Absentee Format - SK Sagiruddin
No ratings yet
Absentee Format - SK Sagiruddin
4 pages
Wa 1
No ratings yet
Wa 1
9 pages
Appendix 33 - Payroll
75% (4)
Appendix 33 - Payroll
1 page
Chapter 1 - Staffing Models and Strategies
100% (1)
Chapter 1 - Staffing Models and Strategies
10 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
43 pages
Lec17 ReinforcementLearning
No ratings yet
Lec17 ReinforcementLearning
58 pages
RL Practice Midterm
No ratings yet
RL Practice Midterm
4 pages
Reinforcement Learning Lec12
No ratings yet
Reinforcement Learning Lec12
60 pages
Seminar On Quality of Work Life: Submitted To: Submitted by
No ratings yet
Seminar On Quality of Work Life: Submitted To: Submitted by
21 pages
Markov Decision
No ratings yet
Markov Decision
4 pages
Reinforcement Learning Note
No ratings yet
Reinforcement Learning Note
16 pages
L12 Markov Decision Processes
No ratings yet
L12 Markov Decision Processes
64 pages
DRL #4-5 - Introducing MDP and Dynamic Programming Solution
No ratings yet
DRL #4-5 - Introducing MDP and Dynamic Programming Solution
74 pages
MDP Cheatsheet
No ratings yet
MDP Cheatsheet
3 pages
Public Budget Line Item Wise
No ratings yet
Public Budget Line Item Wise
5 pages
Lec 12
No ratings yet
Lec 12
60 pages
Nitrogen Asphyxiation Bulletin Training Presentation
0% (1)
Nitrogen Asphyxiation Bulletin Training Presentation
24 pages
17 - Markov Decision Processes
No ratings yet
17 - Markov Decision Processes
59 pages
RL Unit - Ii
No ratings yet
RL Unit - Ii
20 pages
MDP RL Paper Grock
No ratings yet
MDP RL Paper Grock
5 pages
Zinger Engagement Defintion and Model
No ratings yet
Zinger Engagement Defintion and Model
5 pages
242 Sheet 02 02
No ratings yet
242 Sheet 02 02
6 pages
DSA5102 Lecture11
No ratings yet
DSA5102 Lecture11
44 pages
Exam Prep Exercises034534123124
No ratings yet
Exam Prep Exercises034534123124
20 pages
CS229
No ratings yet
CS229
17 pages
On Minimizing Ordered Weighted Regrets in Multiobjective Markov Decision Processes
No ratings yet
On Minimizing Ordered Weighted Regrets in Multiobjective Markov Decision Processes
15 pages
RL Paper Deepsk
No ratings yet
RL Paper Deepsk
4 pages
Homework 1: ELEN E6885: Introduction To Reinforcement Learning September 21, 2021
No ratings yet
Homework 1: ELEN E6885: Introduction To Reinforcement Learning September 21, 2021
8 pages
Restaurant Payroll Template
No ratings yet
Restaurant Payroll Template
1 page
High Performance Working System: at Sanofi Pakistan
No ratings yet
High Performance Working System: at Sanofi Pakistan
25 pages
Tutorial Questions (Annexure I) Que S-Tion No Questions Co BTL
No ratings yet
Tutorial Questions (Annexure I) Que S-Tion No Questions Co BTL
6 pages
q2B Review
No ratings yet
q2B Review
9 pages
HGTFHGFHTF
No ratings yet
HGTFHGFHTF
5 pages
AI 3000 / CS 5500: Reinforcement Learning Assignment 1: Problem 1: Markov Reward Process
No ratings yet
AI 3000 / CS 5500: Reinforcement Learning Assignment 1: Problem 1: Markov Reward Process
5 pages
DRL Homework 1
No ratings yet
DRL Homework 1
4 pages
Performance Appraisal
100% (1)
Performance Appraisal
23 pages
RL Problem Sheet: E0 270: Machine Learning (Spring 2025)
No ratings yet
RL Problem Sheet: E0 270: Machine Learning (Spring 2025)
10 pages
Deep RL - Content Beyond Syllabus
No ratings yet
Deep RL - Content Beyond Syllabus
16 pages
Employee Engagement
100% (4)
Employee Engagement
25 pages
Solution 3
No ratings yet
Solution 3
4 pages
Reinforcement Learning in A Nutshell
No ratings yet
Reinforcement Learning in A Nutshell
12 pages
Cs229-Notes12 Reinforcement in Control
No ratings yet
Cs229-Notes12 Reinforcement in Control
17 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Reinforcement Learning: Part I - Definitions
No ratings yet
Reinforcement Learning: Part I - Definitions
26 pages
Pinak HRM Project
No ratings yet
Pinak HRM Project
21 pages
2019 Labor Suggested Answer
100% (1)
2019 Labor Suggested Answer
9 pages
Reinforcement Learning - Unit 6 - Week 4
No ratings yet
Reinforcement Learning - Unit 6 - Week 4
3 pages
22 Reinforcement Learning
No ratings yet
22 Reinforcement Learning
18 pages
Kek Kian Chong - Y2018
No ratings yet
Kek Kian Chong - Y2018
1 page
Lecture 3 - MDPs and Dynamic Programming
No ratings yet
Lecture 3 - MDPs and Dynamic Programming
66 pages
Reinforcement Learning Model Based Planning Dynamic Programming
No ratings yet
Reinforcement Learning Model Based Planning Dynamic Programming
17 pages
Golden Rules (English)
No ratings yet
Golden Rules (English)
1 page
SLL International Cables Specialist vs. NLRC
50% (2)
SLL International Cables Specialist vs. NLRC
2 pages
q2B Review Sol
No ratings yet
q2B Review Sol
14 pages
1.1 Discounted (Infinite-Horizon) Markov Decision Processes
No ratings yet
1.1 Discounted (Infinite-Horizon) Markov Decision Processes
26 pages
CS6700 RL 2024 Wa1
No ratings yet
CS6700 RL 2024 Wa1
7 pages
Bureaucratic Theory by Max Weber
No ratings yet
Bureaucratic Theory by Max Weber
3 pages
Exam Prep 4 Solutions: Q1. MDPS: Dice Bonanza
No ratings yet
Exam Prep 4 Solutions: Q1. MDPS: Dice Bonanza
4 pages
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
No ratings yet
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
15 pages
ML Unit 4
No ratings yet
ML Unit 4
9 pages
02 MarkovDecisionProcess
No ratings yet
02 MarkovDecisionProcess
51 pages
Importance of Organizational Behaviour To Managers
No ratings yet
Importance of Organizational Behaviour To Managers
1 page
CS 188 Fall 2018 Written HW4 Soln
No ratings yet
CS 188 Fall 2018 Written HW4 Soln
6 pages
Markov Decision Processes and Exact Solution Methods
No ratings yet
Markov Decision Processes and Exact Solution Methods
34 pages
Sem 3 Final Minor Project Bajaj
No ratings yet
Sem 3 Final Minor Project Bajaj
43 pages
New CZ3005 Module 4 - Markov Decision Process
No ratings yet
New CZ3005 Module 4 - Markov Decision Process
38 pages
Taxation Project
No ratings yet
Taxation Project
23 pages
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
No ratings yet
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
30 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
7 pages
General Motors
100% (1)
General Motors
3 pages
Problem 1: Markov Reward Process
No ratings yet
Problem 1: Markov Reward Process
3 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
15 pages
AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions
0% (1)
AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions
4 pages
E Liwag 01312007 OLD
No ratings yet
E Liwag 01312007 OLD
2 pages
Reference Check Template
No ratings yet
Reference Check Template
9 pages
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
From Everand
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet

Quiz2 Sol

Uploaded by

Quiz2 Sol

Uploaded by

Reinforcement Learning (RO3002) Quiz-2

Total Marks: 15 Duration: 30 min

Q1. Expected Return:

neusor State (@ (LMs)> &(1 +5) - no improved action.

&(M msT2 Vi (MI = -7 0 A(Mb) 1) -2 + 09(-7 8))) + 0 9/6 + 9 14 03)

@ (M , JS) > &(17 ,

BUR , msl VIT (H) 63 007

A) V(s) can be arbitrarily large.

A) V_1(s) is always lesser than V_2(s) for all states s.

A) Low discount factor (γ close to 0).

A) V(s) oscillates and does not converge

You might also like