0% found this document useful (0 votes)
2 views

Reinforcement Learning - - Unit 13 - Week 10

The document outlines the details of Week 10 Assignment 10 for the NPTEL course on Reinforcement Learning, including submission deadlines and specific questions related to SMDP Q-learning and Markov processes. It includes various questions about the concepts of reinforcement learning, such as update equations, policies, and advantages of hierarchical problem formulation. Students are encouraged to submit their answers multiple times before the final grading deadline.

Uploaded by

ananyagoyal2504
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Reinforcement Learning - - Unit 13 - Week 10

The document outlines the details of Week 10 Assignment 10 for the NPTEL course on Reinforcement Learning, including submission deadlines and specific questions related to SMDP Q-learning and Markov processes. It includes various questions about the concepts of reinforcement learning, such as update equations, policies, and advantages of hierarchical problem formulation. Students are encouraged to submit their answers multiple times before the final grading deadline.

Uploaded by

ananyagoyal2504
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Assessment submitted.

[email protected]
X
(https://swayam.gov.in)

(https://swayam.gov.in/nc_details/NPTEL)

NPTEL (https://swayam.gov.in/explorer?ncCode=NPTEL) » Reinforcement Learning (course)

If already
registered, click
Thank you for taking the Week 10 :
to check your
payment status
Assignment 10.

Course Week 10 : Assignment 10


outline Your last recorded submission was on 2024-04-03, 00:01 IST Due date: 2024-04-03, 23:59 IST.

1) Consider the update equation for SMDP Q-learning: 1 point


About NPTEL
() ′

Q(s, a) = Q(s, a) + α[A + Bmax ′ Q(s , a ) − Q(s, a)]
a

How does an
NPTEL online Which of the following are the correct values of A and B ?
course work? (rk is the reward received at time step k , and γ is the discount factor)
()
A = rt ; B = γ
Week 1 ()
A = rt + γrt+1 + ... + γ τ −1 rt+τ ; B = γ τ
Week 2 ()
A = γ t rt + γ t+1 rt+1 + ... + γ t+τ −1 rt+τ ; B = γ t+τ
Week 3 ()
A = γ τ −1 rt+τ B = γ t
Week 4 ()
2) Consider a SMDP in which the next state and the reward only depend on the previous 1 point
Week 5 () state and action i.e P (s ′
, τ ∣ s, a) = P (s
′ ′ ′
∣ s, a)P (τ ∣ s, a), R(s, a, τ , s ) = R(s, a, s ) .If we
solve the above SMDP with conventional Q-learning we will end up with the same policy as solving it
Week 6 () with SMDP Q-learning.

Week 7 ()
yes, because now τ won’t change anything and we end up with same states and action
sequences
Week 8 ()

no, because τ still depends on the state, action pair and discounting may have a effect on the
Week 9 ()
final policies.

Week 10 () no, because the next state will still depend on the τ .

Hierarchical yes, because the bellman equation is same for both methods in this case.
Reinforcement
Learning (unit? 3) In HAM, what will be the immediate rewards received between two choice states. 1 point
unit=94&lesson
=95) Accumulation of immediate rewards of the core MDP obtained between these choice points.
Types of The return of the next choice state.
Assessment submitted.
Optimality (unit?
X The reward of only the next primitive action taken.
unit=94&lesson
=96) Immediate reward is always zero

Semi Markov 4) Which of the following is true about Markov and Semi Markov Options? 1 point
Decision
Processes In a Markov Option the option’s policy depends only on the current state.
(unit? In a Semi Markov Option the option’s policy can depend only on the current state.
unit=94&lesson
In a Semi Markov Option, the option’s policy may depend on the history since the execution of
=97)
the option began.
Options (unit? A Semi-Markov Option is always a Markov Option but not vice versa.
unit=94&lesson
=98)
5) Consider the two statements below for an SMDP for a HAM: 1 point
Learning with
Options (unit? Statement1: The state of the SMDP is defined by the state of the base MDP, the call stack and the
unit=94&lesson state of the machine currently executing.
=99) Statement2: The actions of the SMDP can only be defined by the action states.
Hierarchical
Abstract Which of the following are true?
Machines (unit?
unit=94&lesson Statement1 is True and Statement2 is True.
=100) Statement1 is True and Statement2 is False.

Week 10 Statement1 is False and Statement2 is True.


Feedback Form Statement1 is False and Statement2 is False.
: Reinforcement
Learning (unit?
6) Which of the following are possible advantages of formulating a given problem as a 1 point
unit=94&lesson
hierarchy of sub-problems?
=102)

Practice: Week A reduced state space.


10 : Assignment More meaningful state-abstraction.
10(Non Graded)
Temporal abstraction of behaviour.
(assessment?
name=197) Re-usability of learnt sub-problems.

Quiz: Week 10
7) In SMDP, consider the case when τ is fixed for all state, action pairs. Will we always get 1 point
: Assignment
10
the same policy for conventional Q-learning and SMDP Q learning then? Provide answer for the three
(assessment? cases when τ = 3, τ = 2, τ = 1.
name=213)
yes, yes, no
Week 11 () no, no, no
yes, yes, yes
DOWNLOAD
no, no, yes
VIDEOS ()

8) State True or False: 1 point


Text
In the classical options framework, each option has a non-zero probability of terminating in any state
Transcripts ()
of the environment.

Problem
True
Solving
Session - Jan False
2024 ()
9) Suppose that we model a robot in a room as an SMDP, such that the position of the robot 1 point
in the room is the state of the SMDP. Which of the following scenarios satisfy the assumption that the
next state and transition time are independent of each other given the current state and action i.e

P (s , τ ∣ s, a) = P (s

∣ s, a)P (τ ∣ s, a)? (Assume that primitive actions - < left, right, up, down>
take a single time step to execute.)
The room has a single door. The actions available are : {exit the room, move left, move right,
Assessment submitted.
move up, move down}.
X
The room has a two doors. The actions available are : {exit the room, move left, move right,
move up, move down}.
The room has a two doors. The actions available are: {move left, move right, move up, move
down}.
None of the above.

10) Which of the following is a correct Bellman equation for an SMDP? 1 point
Note: R(s, a, s') ⟹ reward is a function of only s, a and s' .

∗ ∗
V (s) = max a∈A(S) [R(s, a, τ , s') + γτ P (s'|s, a)V (s')]

∗ ∗
V (s) = max a∈A(S) [∑ P (s'|s, a, τ )(R(s, a, τ , s') + γV (s'))]
s',τ

∗ ∗
V (s) = max a∈A(S) [∑ P (s', τ |s, a)(R(s, a, τ , s') + γτ V (s'))]
s',τ

∗ ∗
V (s) = max a∈A(S) [∑ P (s', τ |s, a)(R(s, a, s') + γV (s'))]
s',τ

You may submit any number of times before the due date. The final submission will be considered for
grading.
Submit Answers

You might also like