0% found this document useful (0 votes)

2 views

Reinforcement Learning - - Unit 13 - Week 10

The document outlines the details of Week 10 Assignment 10 for the NPTEL course on Reinforcement Learning, including submission deadlines and specific questions related to SMDP Q-learning and Markov processes. It includes various questions about the concepts of reinforcement learning, such as update equations, policies, and advantages of hierarchical problem formulation. Students are encouraged to submit their answers multiple times before the final grading deadline.

Uploaded by

ananyagoyal2504

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Reinforcement Learning - - Unit 13 - Week 10

Uploaded by

ananyagoyal2504

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Assessment submitted.

[email protected] 
X
(https://swayam.gov.in)

(https://swayam.gov.in/nc_details/NPTEL)

NPTEL (https://swayam.gov.in/explorer?ncCode=NPTEL) » Reinforcement Learning (course)

If already
registered, click
Thank you for taking the Week 10 :
to check your
payment status
Assignment 10.

Course Week 10 : Assignment 10

outline Your last recorded submission was on 2024-04-03, 00:01 IST Due date: 2024-04-03, 23:59 IST.

1) Consider the update equation for SMDP Q-learning: 1 point

About NPTEL
() ′
′
Q(s, a) = Q(s, a) + α[A + Bmax ′ Q(s , a ) − Q(s, a)]
a

How does an
NPTEL online Which of the following are the correct values of A and B ?
course work? (rk is the reward received at time step k , and γ is the discount factor)
()
A = rt ; B = γ
Week 1 ()
A = rt + γrt+1 + ... + γ τ −1 rt+τ ; B = γ τ
Week 2 ()
A = γ t rt + γ t+1 rt+1 + ... + γ t+τ −1 rt+τ ; B = γ t+τ
Week 3 ()
A = γ τ −1 rt+τ B = γ t
Week 4 ()
2) Consider a SMDP in which the next state and the reward only depend on the previous 1 point
Week 5 () state and action i.e P (s ′
, τ ∣ s, a) = P (s
′ ′ ′
∣ s, a)P (τ ∣ s, a), R(s, a, τ , s ) = R(s, a, s ) .If we
solve the above SMDP with conventional Q-learning we will end up with the same policy as solving it
Week 6 () with SMDP Q-learning.

Week 7 ()
yes, because now τ won’t change anything and we end up with same states and action
sequences
Week 8 ()

no, because τ still depends on the state, action pair and discounting may have a effect on the
Week 9 ()
final policies.

Week 10 () no, because the next state will still depend on the τ .

Hierarchical yes, because the bellman equation is same for both methods in this case.
Reinforcement
Learning (unit? 3) In HAM, what will be the immediate rewards received between two choice states. 1 point
unit=94&lesson
=95) Accumulation of immediate rewards of the core MDP obtained between these choice points.
Types of The return of the next choice state.
Assessment submitted.
Optimality (unit?
X The reward of only the next primitive action taken.
unit=94&lesson
=96) Immediate reward is always zero

Semi Markov 4) Which of the following is true about Markov and Semi Markov Options? 1 point
Decision
Processes In a Markov Option the option’s policy depends only on the current state.
(unit? In a Semi Markov Option the option’s policy can depend only on the current state.
unit=94&lesson
In a Semi Markov Option, the option’s policy may depend on the history since the execution of
=97)
the option began.
Options (unit? A Semi-Markov Option is always a Markov Option but not vice versa.
unit=94&lesson
=98)
5) Consider the two statements below for an SMDP for a HAM: 1 point
Learning with
Options (unit? Statement1: The state of the SMDP is defined by the state of the base MDP, the call stack and the
unit=94&lesson state of the machine currently executing.
=99) Statement2: The actions of the SMDP can only be defined by the action states.
Hierarchical
Abstract Which of the following are true?
Machines (unit?
unit=94&lesson Statement1 is True and Statement2 is True.
=100) Statement1 is True and Statement2 is False.

Week 10 Statement1 is False and Statement2 is True.

Feedback Form Statement1 is False and Statement2 is False.
: Reinforcement
Learning (unit?
6) Which of the following are possible advantages of formulating a given problem as a 1 point
unit=94&lesson
hierarchy of sub-problems?
=102)

Practice: Week A reduced state space.

10 : Assignment More meaningful state-abstraction.
10(Non Graded)
Temporal abstraction of behaviour.
(assessment?
name=197) Re-usability of learnt sub-problems.

Quiz: Week 10
7) In SMDP, consider the case when τ is fixed for all state, action pairs. Will we always get 1 point
: Assignment
10
the same policy for conventional Q-learning and SMDP Q learning then? Provide answer for the three
(assessment? cases when τ = 3, τ = 2, τ = 1.
name=213)
yes, yes, no
Week 11 () no, no, no
yes, yes, yes
DOWNLOAD
no, no, yes
VIDEOS ()

8) State True or False: 1 point

Text
In the classical options framework, each option has a non-zero probability of terminating in any state
Transcripts ()
of the environment.

Problem
True
Solving
Session - Jan False
2024 ()
9) Suppose that we model a robot in a room as an SMDP, such that the position of the robot 1 point
in the room is the state of the SMDP. Which of the following scenarios satisfy the assumption that the
next state and transition time are independent of each other given the current state and action i.e
′
P (s , τ ∣ s, a) = P (s
′
∣ s, a)P (τ ∣ s, a)? (Assume that primitive actions - < left, right, up, down>
take a single time step to execute.)
The room has a single door. The actions available are : {exit the room, move left, move right,
Assessment submitted.
move up, move down}.
X
The room has a two doors. The actions available are : {exit the room, move left, move right,
move up, move down}.
The room has a two doors. The actions available are: {move left, move right, move up, move
down}.
None of the above.

10) Which of the following is a correct Bellman equation for an SMDP? 1 point
Note: R(s, a, s') ⟹ reward is a function of only s, a and s' .

∗ ∗
V (s) = max a∈A(S) [R(s, a, τ , s') + γτ P (s'|s, a)V (s')]

∗ ∗
V (s) = max a∈A(S) [∑ P (s'|s, a, τ )(R(s, a, τ , s') + γV (s'))]
s',τ

∗ ∗
V (s) = max a∈A(S) [∑ P (s', τ |s, a)(R(s, a, τ , s') + γτ V (s'))]
s',τ

∗ ∗
V (s) = max a∈A(S) [∑ P (s', τ |s, a)(R(s, a, s') + γV (s'))]
s',τ

You may submit any number of times before the due date. The final submission will be considered for
grading.
Submit Answers

Problem 1: Markov Reward Process
No ratings yet
Problem 1: Markov Reward Process
3 pages
AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions
0% (1)
AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions
4 pages
A10
No ratings yet
A10
4 pages
Ta Lecture2
No ratings yet
Ta Lecture2
26 pages
Lecture 3 - MDPs and Dynamic Programming
No ratings yet
Lecture 3 - MDPs and Dynamic Programming
66 pages
Reinforcement Learning Note
No ratings yet
Reinforcement Learning Note
16 pages
CSE2530__Reinforcement_Learning__2025_P1+2
No ratings yet
CSE2530__Reinforcement_Learning__2025_P1+2
115 pages
1.1 Discounted (Infinite-Horizon) Markov Decision Processes
No ratings yet
1.1 Discounted (Infinite-Horizon) Markov Decision Processes
26 pages
A17 Complexdecisions
No ratings yet
A17 Complexdecisions
28 pages
Quiz2_sol
No ratings yet
Quiz2_sol
4 pages
06 MDP
No ratings yet
06 MDP
89 pages
DRL #4-5 - Introducing MDP and Dynamic Programming Solution
No ratings yet
DRL #4-5 - Introducing MDP and Dynamic Programming Solution
74 pages
DSA5102_lecture11
No ratings yet
DSA5102_lecture11
44 pages
Stochastic DP
No ratings yet
Stochastic DP
23 pages
MIT 6.036 Lecture
No ratings yet
MIT 6.036 Lecture
64 pages
Reinforcement Learning - Unit 6 - Week 4
No ratings yet
Reinforcement Learning - Unit 6 - Week 4
3 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
43 pages
Lec17-ReinforcementLearning
No ratings yet
Lec17-ReinforcementLearning
58 pages
Infinite Horizon Problems
No ratings yet
Infinite Horizon Problems
69 pages
5.4-Reinforcement Learning-Part1-Introduction
No ratings yet
5.4-Reinforcement Learning-Part1-Introduction
15 pages
22 Reinforcement Learning
No ratings yet
22 Reinforcement Learning
18 pages
Conjugate Markov Decision Processes
No ratings yet
Conjugate Markov Decision Processes
8 pages
Module6 4 Options
No ratings yet
Module6 4 Options
17 pages
DEEP RL - CONTENT BEYOND SYLLABUS
No ratings yet
DEEP RL - CONTENT BEYOND SYLLABUS
16 pages
Reinforcement Learning Exam
No ratings yet
Reinforcement Learning Exam
6 pages
CSE4037 Reinforcement Learning: Options
No ratings yet
CSE4037 Reinforcement Learning: Options
17 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
101 pages
2
No ratings yet
2
23 pages
242 Sheet 02 02
No ratings yet
242 Sheet 02 02
6 pages
08 MDPs
No ratings yet
08 MDPs
110 pages
2024 MDPs Part 1
No ratings yet
2024 MDPs Part 1
59 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
A crash course on reinforcement learning - Felix Wagner
No ratings yet
A crash course on reinforcement learning - Felix Wagner
84 pages
Finite Markov Decision Processes-BR
No ratings yet
Finite Markov Decision Processes-BR
31 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
7 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
15 pages
hgtfhgfhtf
No ratings yet
hgtfhgfhtf
5 pages
L12 Markov Decision Processes
No ratings yet
L12 Markov Decision Processes
64 pages
20AI903_RL_UNIT 2
No ratings yet
20AI903_RL_UNIT 2
27 pages
UNIT-5 AI
No ratings yet
UNIT-5 AI
19 pages
Reinforcement-Learning-Cheatsheet
No ratings yet
Reinforcement-Learning-Cheatsheet
16 pages
Unit 5 Reinforcement Learning Notes
No ratings yet
Unit 5 Reinforcement Learning Notes
20 pages
AS02
No ratings yet
AS02
16 pages
Reinforcement Learning: Part I - Definitions
No ratings yet
Reinforcement Learning: Part I - Definitions
26 pages
Sp14 Cs188 Lecture 8 - Mdps I
No ratings yet
Sp14 Cs188 Lecture 8 - Mdps I
50 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
lec12
No ratings yet
lec12
60 pages
کتاب هشتم بارگزاری شده
No ratings yet
کتاب هشتم بارگزاری شده
112 pages
Solution3
No ratings yet
Solution3
4 pages
Lec 08
No ratings yet
Lec 08
59 pages
An Introduction to Reinforcement Learning From theory to algorithms (December 19, 2024)_Joon Kwon
No ratings yet
An Introduction to Reinforcement Learning From theory to algorithms (December 19, 2024)_Joon Kwon
66 pages
Stochastic Process - Markov Property - Markov Chain - Markov Decision Process - Reinforcement Learning - RL Techniques - Example Applications
No ratings yet
Stochastic Process - Markov Property - Markov Chain - Markov Decision Process - Reinforcement Learning - RL Techniques - Example Applications
39 pages
Lecture7 MDPs I
No ratings yet
Lecture7 MDPs I
9 pages
02 MarkovDecisionProcess
No ratings yet
02 MarkovDecisionProcess
51 pages
RL and ObC Lecture 2
No ratings yet
RL and ObC Lecture 2
20 pages
Lecture7 MDP
No ratings yet
Lecture7 MDP
44 pages
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
No ratings yet
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
40 pages
ML Unit 4
No ratings yet
ML Unit 4
9 pages
17 - Markov Decision Processes.pptx
No ratings yet
17 - Markov Decision Processes.pptx
59 pages
Sequences and Infinite Series, A Collection of Solved Problems
From Everand
Sequences and Infinite Series, A Collection of Solved Problems
Steven Tan
No ratings yet
Nishant_Arora_MAJOR 2 REPORT RXLOGIX_FINAL (1) (1)
No ratings yet
Nishant_Arora_MAJOR 2 REPORT RXLOGIX_FINAL (1) (1)
88 pages
odd
No ratings yet
odd
3 pages
chankya academy
No ratings yet
chankya academy
1 page
Affective Computing - - Unit 12 - Week 10
No ratings yet
Affective Computing - - Unit 12 - Week 10
3 pages
BioInformatics_ Algorithms and Applications - - Unit 13 - Week 10
No ratings yet
BioInformatics_ Algorithms and Applications - - Unit 13 - Week 10
3 pages
Reinforcement Learning - - Unit 14 - Week 11
No ratings yet
Reinforcement Learning - - Unit 14 - Week 11
3 pages
Affective Computing - - Unit 13 - Week 11
No ratings yet
Affective Computing - - Unit 13 - Week 11
2 pages
Operations Manual Item: Bv1000 & Bv1005: Benchmixer™ & Mortexer™
No ratings yet
Operations Manual Item: Bv1000 & Bv1005: Benchmixer™ & Mortexer™
1 page
6.0. OFC - Optical Amplifiers - 2024
No ratings yet
6.0. OFC - Optical Amplifiers - 2024
38 pages
1875 Anonymous Yarker Constitution and General Statutes of The Antient and Primitive Rite
100% (1)
1875 Anonymous Yarker Constitution and General Statutes of The Antient and Primitive Rite
234 pages
Electronic Components July22
No ratings yet
Electronic Components July22
118 pages
Classical Mechanics Quiz
100% (3)
Classical Mechanics Quiz
4 pages
Filipino 6
No ratings yet
Filipino 6
10 pages
Assignment 7n
No ratings yet
Assignment 7n
8 pages
Guided Reading PPT PLC 1
100% (2)
Guided Reading PPT PLC 1
32 pages
PT Teknocal Energi Bersinar PROFILE
No ratings yet
PT Teknocal Energi Bersinar PROFILE
3 pages
Three Phase (Full Converter) Drive
No ratings yet
Three Phase (Full Converter) Drive
3 pages
Computational Methods In Engineering S P Venkateshan Prasanna Swaminathan pdf download
100% (1)
Computational Methods In Engineering S P Venkateshan Prasanna Swaminathan pdf download
91 pages
Counseling
No ratings yet
Counseling
36 pages
Maddi 2002
No ratings yet
Maddi 2002
13 pages
Beam-Column Part 1
No ratings yet
Beam-Column Part 1
28 pages
Ah 0316
No ratings yet
Ah 0316
2 pages
Lex l11 Accessory Catalogue Eng
No ratings yet
Lex l11 Accessory Catalogue Eng
7 pages
Detailed Lesson Plan
100% (1)
Detailed Lesson Plan
4 pages
Bollard Pull Code
No ratings yet
Bollard Pull Code
12 pages
SPRP 499
No ratings yet
SPRP 499
17 pages
Health Network Meeting 2025 Invitation Saumya
No ratings yet
Health Network Meeting 2025 Invitation Saumya
1 page
Climatol Guide
No ratings yet
Climatol Guide
40 pages
Some New Q-Value Correlations To Assist in Site Characterisation and Tunnel Design
No ratings yet
Some New Q-Value Correlations To Assist in Site Characterisation and Tunnel Design
1 page
La Fábrica de Cretinos Digitales
No ratings yet
La Fábrica de Cretinos Digitales
19 pages
Irregular Verbs LIST
No ratings yet
Irregular Verbs LIST
3 pages
New Challenges 3KL 8 Plani Mesimor Tremujor PDF
No ratings yet
New Challenges 3KL 8 Plani Mesimor Tremujor PDF
5 pages
How To Remote Coach Successfully - OPEX Fitness
No ratings yet
How To Remote Coach Successfully - OPEX Fitness
27 pages
Psychrometric Chart
No ratings yet
Psychrometric Chart
1 page
Cambridge International AS & A Level: 9479/02 Art & Design
100% (1)
Cambridge International AS & A Level: 9479/02 Art & Design
4 pages
IOT-UNIT-3 Material
100% (1)
IOT-UNIT-3 Material
19 pages
Demand or Invitation Letter
No ratings yet
Demand or Invitation Letter
4 pages

Reinforcement Learning - - Unit 13 - Week 10

Uploaded by

Reinforcement Learning - - Unit 13 - Week 10

Uploaded by

Assessment submitted.

NPTEL (https://swayam.gov.in/explorer?ncCode=NPTEL) » Reinforcement Learning (course)

Course Week 10 : Assignment 10

1) Consider the update equation for SMDP Q-learning: 1 point

Week 10 Statement1 is False and Statement2 is True.

Practice: Week A reduced state space.

8) State True or False: 1 point

You might also like