0% found this document useful (0 votes)

54 views17 pages

Module6 4 Options

The document discusses Hierarchical Reinforcement Learning (HRL) and the concept of options, which are generalized actions composed of sub-actions, forming a semi-Markov decision process (SMDP). It covers the architecture of options, including initiation sets, termination conditions, and intra-option policies, emphasizing the importance of closed-loop systems in adapting behavior based on the current state. Additionally, it explores the policy-over-options framework and different forms of optimality, highlighting the distinction between hierarchical-optimal and recursive-optimal policies.

Uploaded by

vemuripraveena2622

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views17 pages

Module6 4 Options

Uploaded by

vemuripraveena2622

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

CSE4037

REINFORCEMENT LEARNING

OPTIONS

Dr K G Suma
Associate Professor
School of Computer Science and Engineering
Module No. 6
Hierarchical Reinforcement Learning
8 Hours
■ Hierarchical Reinforcement Learning
■ Types of Optimality
■ Semi MDP Model
■ Options
■ Learning with Options
■ Hierarchical Abstract Machines
■ A partially observable Markov decision process.
Options
■ An option is a generalization of the concept of action. It
captures the idea that certain actions are composed of
other sub-actions.

■ The option picking up an object, going to lunch, and

traveling to a distant city is composed of other sub-
actions (e.g. picking up an object), but is itself an action
(or macro-action). A primitive action (e.g. joint torques)
is itself an option.
Options
■ A set of options defined over an MDP constitutes a
semi-Markov decision process (SMDP), which are
MDPs where the time between actions is not constant but
it is variable.

■ In other words, a semi-MDP (SMDP) is an extension of the

concept of MDP that is used to deal with problems where
there are actions of different levels of abstraction.

■ For example, consider a footballer that needs to

take a freekick. The action "take a freekick" involves a
sequence of other actions, like "run towards the ball",
"look at the wall", etc. The action "take a freekick" takes a
Options

■ Semi-MDPs are thus used to deal with such problems

that involve actions of different levels of
abstraction. Hierarchical reinforcement
learning (HRL) is a generalization (or extension) of
reinforcement learning where the environment is
modeled as a semi-MDP.

■ Semi-MDPs can be converted to MDPs.

Options

■ The empty circles (in the

middle) are options,
while the black circles
(at the top) are primitive
actions (which are
themselves options).
Options
■ Options represent closed-loop sub-behaviors, which are carried out for
multiple timesteps until they trigger their termination condition.

■ Options are considered closed-loop systems because they adapt their

behavior based on the current state. This is in contrast to open-loop
systems, which do not adapt their behavior once initialized when
confronted with a new state. It is often more realistic to model sub-
behaviors as closed-loop systems than using open-loop sub-behaviors.

■ For example, while driving a car, if we would be committed to an open-

loop option, we would not deviate if we encounter danger, a closed-loop
option will alter its action based on the current state.
Options – Architecture
Options

■ The usage of options in a RL setting has been shown to speed up

learning.

■ a way to summarize knowledge as an essential building block in a

lifelong-learning setting.

■ the increased performance of using temporally extended actions

using importance sampling.
Options - Framework
Options - Framework
■ In this example the initiation
set and termination condition
are represented as single
states, and the intra-option
policy is represented by the
arrows. This type of option with
a single initiation- and
termination-state is often called
a point option.

■ An example option, which

initiation state 𝑆0 located

takes the agent from the

termination state 𝑆𝑔 in
in one room, to the
Options - Framework

■ Initiation Set
■ Termination Condition
■ Intra-Option Policy
■ Policy-over-Options
Options - Initiation Set
■ The initiation set of an option defines the states in which the option can be
initiated.
■ A commonly used approach defines the initiation set as all states from
which the agent is able to reliably reach the option its termination condition
when following the intra-option policy, within a limited amount of steps. It is
also usual to assume that for all states in which the policy of the option can
continue, it can also be initiated.
■ For example, uses a logistic regression classifier to build an initiation set.
States that were observed up to 250 time steps before triggering the option
termination were labeled as positive initiation states for the selected option,
states further away in the trajectory were used as negative examples.
Options - Termination Condition
■ The termination condition decides when an option will halt its execution.
Similarly to the initiation set, a set of termination states is often used.
Reaching a state in this set will cause the option to stop running.
Termination states are better known as subgoal-states. Finding good
termination states is often a matter of finding states with special properties
(e.g., a well-connected state or states often visited on highly rewarded
trajectories).
■ The most common termination condition is to end an option when it has
reached a subgoal-state. However, this can lead to all kinds of problems.
The option could for example run forever when it is not able to reach the
subgoal-state. To solve this, a limitation of the allowed number of steps
taken by the option policy is often also added to the termination condition.
■ A termination condition has also been considered when the agent is no
longer active in its initiation set.
Options - Intra-Option Policy
■ If the initiation- and termination-set are specified, the internal policy of the
option can be learned by using any RL method. The intra-option policy can
be seen as a control policy over a region of the state-space.
■ The extrinsic reward signal is often not suitable in this case, because the
overall goal does not necessarily align with the termination condition of the
option. Alternatively the intra-option policy could solely be rewarded when
triggering its termination condition. However, various denser intrinsic
rewards signals could also be used. For example, if the termination
condition is based upon a special characteristic of the environment, this
property might serve as an intrinsic reward signal.
■ Intra-option policy learning can both happen on-policy and off-policy.
Options - Policy-over-Options
■ A policy-over-options 𝜋(𝜔|𝑠𝑡) selects an option 𝜔∈Ω given a state 𝑠𝑡.
This additional policy can be useful to select the best option, when the
current state belongs to multiple option initiation sets. It can also be used
as an alternative to defining an initiation set for each option.

■ The most often used execution model is the call-and-return model. This
approach is also often called hierarchical-execution. In this model a policy-
over-options selects an option according to the current state. The agent
follows this option, until the agent triggers the termination condition of the
active option. After termination, the agent samples a new option to follow.
Options - Policy-over-Options
■ When considering a policy-over-options, we can identify different forms of
optimality:
• Hierarchical-optimal: a policy that achieves the maximum highest cumulative
reward on the entire task.
• Recursive-optimal: the different sub-behaviors of the agent are optimal
individually.
■ A policy which is recursive-optimal might not be hierarchical-optimal. It is
possible that there exists a better hierarchical policy, where the policy of a sub-
task, might be locally suboptimal, in order for the overall policy to be optimal.
■ For example, if a sub-task consists of navigating to the exit of a room, the
policy is recursive-optimal when the agent only fixates on this sub-task.
However, a hierarchical-optimal solution might also take a slight detour to pick
up a key, or charge its battery. These diversions negatively impact the
performance of the sub-task, but improve the performance of the overall task.

CSE4037 Reinforcement Learning: Options
No ratings yet
CSE4037 Reinforcement Learning: Options
17 pages
Ta Lecture2
No ratings yet
Ta Lecture2
26 pages
AILabPress Ersin
No ratings yet
AILabPress Ersin
18 pages
HRL Lecture
No ratings yet
HRL Lecture
42 pages
MODULE6 3 Types of Optimality
No ratings yet
MODULE6 3 Types of Optimality
9 pages
Reinforcement Learning - Unit 13 - Week 10
No ratings yet
Reinforcement Learning - Unit 13 - Week 10
3 pages
CSE4037 Reinforcement Learning
No ratings yet
CSE4037 Reinforcement Learning
19 pages
A Tutorial For Reinforcement Learning
No ratings yet
A Tutorial For Reinforcement Learning
14 pages
A Tutorial For Reinforcement Learning
No ratings yet
A Tutorial For Reinforcement Learning
17 pages
MODULE6 5 Learning With Options
No ratings yet
MODULE6 5 Learning With Options
19 pages
Hierarchical Reinforcement Learning Guide
No ratings yet
Hierarchical Reinforcement Learning Guide
28 pages
L12 Reinforcement Learning 2
No ratings yet
L12 Reinforcement Learning 2
26 pages
10 ReinforcementLearning
No ratings yet
10 ReinforcementLearning
59 pages
AI Decision Making & RL Guide
No ratings yet
AI Decision Making & RL Guide
18 pages
06 MDP
No ratings yet
06 MDP
89 pages
Infinite Horizon Problems
No ratings yet
Infinite Horizon Problems
69 pages
CSE2530 Reinforcement Learning 2025 P1+2
No ratings yet
CSE2530 Reinforcement Learning 2025 P1+2
115 pages
MDPs and State Machines Overview
No ratings yet
MDPs and State Machines Overview
64 pages
A17 Complexdecisions
No ratings yet
A17 Complexdecisions
28 pages
DRL #4-5 - Introducing MDP and Dynamic Programming Solution
No ratings yet
DRL #4-5 - Introducing MDP and Dynamic Programming Solution
74 pages
Chapter17 1
No ratings yet
Chapter17 1
40 pages
2024 MDPs Part 1
No ratings yet
2024 MDPs Part 1
59 pages
Reinforcement Learning Note
No ratings yet
Reinforcement Learning Note
16 pages
Lec17 ReinforcementLearning
No ratings yet
Lec17 ReinforcementLearning
58 pages
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
No ratings yet
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
14 pages
L12 Markov Decision Processes
No ratings yet
L12 Markov Decision Processes
64 pages
Opinion Critic
No ratings yet
Opinion Critic
9 pages
New CZ3005 Module 4 - Markov Decision Process
No ratings yet
New CZ3005 Module 4 - Markov Decision Process
38 pages
Unit-4 of Ai
100% (1)
Unit-4 of Ai
9 pages
IMEN319 - 6. Markov Decision Process
No ratings yet
IMEN319 - 6. Markov Decision Process
17 pages
On State Variables and POMDP-s
No ratings yet
On State Variables and POMDP-s
49 pages
20ai903 - RL - Unit 2
No ratings yet
20ai903 - RL - Unit 2
27 pages
M 2
No ratings yet
M 2
12 pages
19.5 Markov Decision Processes: Resolving Unbounded Expected Rewards
No ratings yet
19.5 Markov Decision Processes: Resolving Unbounded Expected Rewards
13 pages
Discounted Markov Decision Processes
No ratings yet
Discounted Markov Decision Processes
26 pages
Understanding Regret and MDP Basics
No ratings yet
Understanding Regret and MDP Basics
29 pages
7 - Reinforcement Learning
No ratings yet
7 - Reinforcement Learning
23 pages
SSRN 4963741
No ratings yet
SSRN 4963741
26 pages
5.4-Reinforcement Learning-Part1-Introduction
No ratings yet
5.4-Reinforcement Learning-Part1-Introduction
15 pages
Double Actor-Critic for Option Learning
No ratings yet
Double Actor-Critic for Option Learning
15 pages
Eigenoption Discovery Through The Deep Successor Representation
No ratings yet
Eigenoption Discovery Through The Deep Successor Representation
22 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
RL Notes
No ratings yet
RL Notes
69 pages
Reinforcement Learning - A Comprehensive Overview
No ratings yet
Reinforcement Learning - A Comprehensive Overview
177 pages
Notes Summary
No ratings yet
Notes Summary
65 pages
Semi-MDP
No ratings yet
Semi-MDP
31 pages
Markov Decision Processes
100% (1)
Markov Decision Processes
104 pages
Unified Stochastic Optimization Framework
No ratings yet
Unified Stochastic Optimization Framework
69 pages
Unit Vi
No ratings yet
Unit Vi
17 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
35 pages
Markov Decision Processes: Lecture Notes For STP 425: Jay Taylor
100% (1)
Markov Decision Processes: Lecture Notes For STP 425: Jay Taylor
86 pages
CSE 445 - Lecture 9 - Reinforcement Learning
No ratings yet
CSE 445 - Lecture 9 - Reinforcement Learning
45 pages
Lec 08
No ratings yet
Lec 08
59 pages
DeepMind Reinforcement Learning Overview
No ratings yet
DeepMind Reinforcement Learning Overview
216 pages
08 MDPs
No ratings yet
08 MDPs
110 pages
Conjugate Markov Decision Processes
No ratings yet
Conjugate Markov Decision Processes
8 pages
Reinforcement Learning: Full Summary of Chapters 3-8: Summarized by Grok 3 June 30, 2025
No ratings yet
Reinforcement Learning: Full Summary of Chapters 3-8: Summarized by Grok 3 June 30, 2025
23 pages
RL Basics 1737166593
No ratings yet
RL Basics 1737166593
30 pages
Comparing Q Learning and Policy Gradient in Frozen Lake Environment
No ratings yet
Comparing Q Learning and Policy Gradient in Frozen Lake Environment
8 pages
Comparing Q Learning and Policy Gradient in Frozen Lake Environment
No ratings yet
Comparing Q Learning and Policy Gradient in Frozen Lake Environment
8 pages
Closure Properties of Regular Languages
No ratings yet
Closure Properties of Regular Languages
33 pages
Module 1
No ratings yet
Module 1
98 pages
Module1 2
No ratings yet
Module1 2
14 pages
CH3 - 2 Montecarlo Control
No ratings yet
CH3 - 2 Montecarlo Control
33 pages
Module 4
No ratings yet
Module 4
32 pages
Ai
No ratings yet
Ai
4 pages
CH5 - Function Approximation
No ratings yet
CH5 - Function Approximation
33 pages
22bce7873 Asg7
No ratings yet
22bce7873 Asg7
3 pages
Mathworks Installation Help
No ratings yet
Mathworks Installation Help
60 pages
16-Optimization and Loss Functions in Classifiers, Convolution Layers, Max Pool Layers-24!08!2024
No ratings yet
16-Optimization and Loss Functions in Classifiers, Convolution Layers, Max Pool Layers-24!08!2024
36 pages
22bce7873 Asg9
No ratings yet
22bce7873 Asg9
3 pages
Module I NLP
No ratings yet
Module I NLP
65 pages
8 Linear Classifiers HInge Loss 03-08-2024
No ratings yet
8 Linear Classifiers HInge Loss 03-08-2024
20 pages
1 Linear Algebra Basics 25-07-2024
No ratings yet
1 Linear Algebra Basics 25-07-2024
30 pages
Module 5
No ratings yet
Module 5
37 pages
WINSEM2024-25 STS4006 TH AP2024254001070 2025-03-01 Reference-Material-I
No ratings yet
WINSEM2024-25 STS4006 TH AP2024254001070 2025-03-01 Reference-Material-I
14 pages
DSA Time Complexity Table
No ratings yet
DSA Time Complexity Table
1 page
Hazmat Practice Test FF1 3rd Edition
94% (16)
Hazmat Practice Test FF1 3rd Edition
28 pages
Priceline
No ratings yet
Priceline
2 pages
Hertha Ayrton: Pioneer Scientist and Inventor
No ratings yet
Hertha Ayrton: Pioneer Scientist and Inventor
6 pages
Nursing Case Analysis Guide
No ratings yet
Nursing Case Analysis Guide
7 pages
Basic NDT - Et QB 2
No ratings yet
Basic NDT - Et QB 2
4 pages
SKILL DEVELOPMENT COURSE Program 2
No ratings yet
SKILL DEVELOPMENT COURSE Program 2
7 pages
VHDL ULA Design with Logic Operations
No ratings yet
VHDL ULA Design with Logic Operations
2 pages
PRC4 Introduction To Accounting - Mock Exam - Solution To Final Accounts
No ratings yet
PRC4 Introduction To Accounting - Mock Exam - Solution To Final Accounts
6 pages
Entrepreneurship Role and Challenges
No ratings yet
Entrepreneurship Role and Challenges
27 pages
Veterinary and Pharma Directory
No ratings yet
Veterinary and Pharma Directory
16 pages
Krebs Cycle: Steps, Products, Significance
No ratings yet
Krebs Cycle: Steps, Products, Significance
13 pages
ABFS
No ratings yet
ABFS
61 pages
NPTEL Local Chapter Enrollment Details
No ratings yet
NPTEL Local Chapter Enrollment Details
38 pages
Principles of Pest Management - 2024
No ratings yet
Principles of Pest Management - 2024
23 pages
Army Canteen Price List 2020
67% (3)
Army Canteen Price List 2020
70 pages
Remembering Abraham
No ratings yet
Remembering Abraham
215 pages
PET Tray Recycling Design Guidelines
No ratings yet
PET Tray Recycling Design Guidelines
1 page
Iconic Designs - 50 Stories About 50 Things, Edited by Grace Lees-Maffei (The Design Journal, Vol. 18, Issue 3) (2015)
No ratings yet
Iconic Designs - 50 Stories About 50 Things, Edited by Grace Lees-Maffei (The Design Journal, Vol. 18, Issue 3) (2015)
5 pages
GOOD NUTRITION & DIABETIC CHART FOR Priya 5-04
No ratings yet
GOOD NUTRITION & DIABETIC CHART FOR Priya 5-04
7 pages
Cadet Heating Boiler Installation Manual
100% (1)
Cadet Heating Boiler Installation Manual
92 pages
Lecture 1 Introduction To Ergonomics
100% (1)
Lecture 1 Introduction To Ergonomics
23 pages
The Role of Technology in Shaping Business Communication
No ratings yet
The Role of Technology in Shaping Business Communication
6 pages
Pedagogy of Mathematics 2
No ratings yet
Pedagogy of Mathematics 2
11 pages
CH 13: Secondary Metabolism and Plant Defense
No ratings yet
CH 13: Secondary Metabolism and Plant Defense
48 pages
Women's Body Hair Perceptions Study
No ratings yet
Women's Body Hair Perceptions Study
9 pages
Berger Paints - Covering Letter
No ratings yet
Berger Paints - Covering Letter
5 pages
Endocrine System Hormones Quiz
100% (2)
Endocrine System Hormones Quiz
8 pages
Class 8 Conservation Chapter Study Notes
No ratings yet
Class 8 Conservation Chapter Study Notes
6 pages
Troubleshooting Model A Engine Overheating
No ratings yet
Troubleshooting Model A Engine Overheating
3 pages
Rabies: Symptoms, Transmission, and Prevention
100% (1)
Rabies: Symptoms, Transmission, and Prevention
1 page

Module6 4 Options

Uploaded by

Module6 4 Options

Uploaded by

CSE4037

■ The option picking up an object, going to lunch, and

■ In other words, a semi-MDP (SMDP) is an extension of the

■ For example, consider a footballer that needs to

■ Semi-MDPs are thus used to deal with such problems

■ Semi-MDPs can be converted to MDPs.

■ The empty circles (in the

■ Options are considered closed-loop systems because they adapt their

■ For example, while driving a car, if we would be committed to an open-

■ The usage of options in a RL setting has been shown to speed up

■ a way to summarize knowledge as an essential building block in a

■ the increased performance of using temporally extended actions

■ An example option, which

initiation state 𝑆0 located

You might also like