IntroductiontoRL BR

The document provides an introduction to reinforcement learning (RL), explaining its principles, types, and key components such as policies, reward signals, and value functions. It distinguishes between model-based and model-free approaches, explores the exploration-exploitation trade-off, and discusses the credit assignment problem and reward design. Additionally, it highlights deep reinforcement learning, emphasizing its reliance on deep neural networks and the advantages of parallelism in training.

Uploaded by

Simhadri Sevitha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views22 pages

IntroductiontoRL BR

Uploaded by

Simhadri Sevitha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 22

Introduction to Reinforcement

Learning

Chapter 1 – Reinforcement Learning: An

Introduction
Imitation Learning Lecture Slides from CMU Deep
Reinforcement Learning Course
What is learning?
Learning takes place as a result of
interaction between an agent and the
world, the idea behind learning is that
Percepts received by an agent should be
used not only for acting, but also for
improving the agent’s ability to behave
optimally in the future to achieve the goal.
Learning types
Learning types
Supervised learning:
a situation in which sample (input, output) pairs of
the function to be learned can be perceived or are
given
 You can think it as if there is a kind teacher
Reinforcement learning:
in the case of the agent acts on its environment, it
receives some evaluation of its action
(reinforcement), but is not told of which action is the
correct one to achieve its goal
What is Reinforcement
Learning?
Learning from interaction with an
environment to achieve some long-term
goal that is related to the state of the
environment
The goal is defined by reward signal, which
must be maximised
Agent must able to partially/fully sense the
environment state and take actions to
influence the environment state
The state is typically described with a
feature-vector
RL is learning from
interaction
RL model

Each percept(e) is enough to determine the

State(the state is accessible)
The agent can decompose the Reward
component from a percept.
The agent task: to find a optimal policy,
mapping states to actions, that maximize long-
run measure of the reinforcement
Think of reinforcement as reward
Can be modeled as MDP model!
 The Markov decision process (MDP) is a mathematical
framework used for modeling decision-making
problems where the outcomes are partly random and
partly controllable.
 It's a framework that can address most reinforcement
Review of MDP model
MDP model <S,T,A,R>

• S– set of states
Agent • A– set of actions
• T(s,a,s’) = P(s’|s,a)– the
State Action probability of transition from
Reward
s to s’ given action a
Environment • R(s,a)– the expected reward
for taking action a in state s
R ( s, a )  P ( s ' | s, a )r ( s, a, s ' )
a0 a1 a2 s'
s0 s1 s2 s3 R ( s, a )  T ( s, a, s ' )r ( s, a, s ' )
r0 r1 r2 s'
Exploration versus Exploitation
We want a reinforcement learning agent to
earn lots of reward
The agent must prefer past actions that have
been found to be effective at producing reward
The agent must exploit what it already knows
to obtain reward
The agent must select untested actions to
discover reward-producing actions
The agent must explore actions to make better
action selections in the future
Trade-off between exploration and exploitation
Passive learning v.s. Active
learning
Passive learning
The agent imply watches the world going by
and tries to learn the utilities of being in
various states
Active learning
The agent not simply watches, but also acts
Passive learning scenario

The agent see the sequences of state

transitions and associate rewards
The environment generates state transitions
and the agent perceive them

Key idea: updating the utility value using

the given training sequences.
Reinforcement Learning
Systems
Reinforcement learning systems have 4
main elements:
Policy
Reward signal
Value function
Optional model of the environment
Model based v.s.Model free
approaches
 But, we don’t know anything about the environment
model—the transition function T(s,a,s’)
 Here comes two approaches
 Model based approach RL:
learn the model, and use it to derive the optimal policy.
e.g Adaptive dynamic learning(ADP) approach

 Model free approach RL:

derive the optimal policy without learning the model.
e.g LMS and Temporal difference approach
Model-free versus Model-
based
 A model of the environment allows inferences to be
made about how the environment will behave
 Example: Given a state and an action to be taken
while in that state, the model could predict the next
state and the next reward
 Models are used for planning, which means deciding
on a course of action by considering possible future
situations before they are experienced
 Model-based methods use models and planning. Think
of this as modelling the dynamics p(s’ | s, a)
 Model-free methods learn exclusively from trial-and-
error (i.e. no modelling of the environment)
 This presentation focuses on model-free methods
Policy
A policy is a mapping from the perceived
states of the environment to actions to be
taken when in those states
A reinforcement learning agent uses a
policy to select actions given in the current
environment state
On-policy versus Off-policy
An on-policy agent learns only about the
policy that it is executing
An off-policy agent learns about a policy or
policies different from the one that it is
executing
Credit Assignment Problem
Given a sequence of states and actions,
and the final sum of time-discounted future
rewards, how do we infer which actions
were effective at producing lots of reward
and which actions were not effective?
How do we assign credit for the observed
rewards given a sequence of actions over
time?
Every reinforcement learning algorithm
must address this problem
Reward Design
We need rewards to guide the agent to
achieve its goal
Option 1: Hand-designed reward functions
Option 2: Learn rewards from
demonstrations
Instead of having a human expert tune a
system to achieve the desired behaviour, the
expert can demonstrate desired behaviour
and the robot can tune itself to match the
demonstration
Reward Signal
The reward signal defines the goal
On each time step, the environment sends
a single number called the reward to the
reinforcement learning agent
The agent’s objective is to maximise the
total reward that it receives over the long
run
The reward signal is used to alter the policy
Value Function (1)
The reward signal indicates what is good in
the short run while the value function
indicates what is good in the long run
The value of a state is the total amount of
reward an agent can expect to accumulate
over the future, starting in that state
Compute the value using the states that
are likely to follow the current state and the
rewards available in those states
Future rewards may be time-discounted
with a factor in the interval [0, 1]
Use the values to make and evaluate
decisions
Action choices are made based on value
judgements
Prefer actions that bring about states of
highest value instead of highest reward
Rewards are given directly by the
environment
Values must continually be re-estimated
from the sequence of observations that an
agent makes over its lifetime
What is Deep Reinforcement
Learning?
 Deep reinforcement learning is standard reinforcement
learning where a deep neural network is used to
approximate either a policy or a value function
 Deep neural networks require lots of real/simulated
interaction with the environment to learn
 Lots of trials/interactions are possible in simulated
environments
 We can easily parallelise the trials/interaction in
simulated environments
 We cannot do this with robotics (no simulations)
because action execution takes time,
accidents/failures are expensive and there are safety
concerns
Summary
Get faster training because of parallelism
Can use on-policy reinforcement learning
methods
Diversity in exploration can lead to better
performance than synchronous methods
In practice, the on-policy A3C algorithm
appears to be the best performing
asynchronous reinforcement learning
method in terms of performance and
training speed

Engine Control System (R9M) : Section
100% (4)
Engine Control System (R9M) : Section
405 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
23 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
MLT Unit-5 Notes
No ratings yet
MLT Unit-5 Notes
17 pages
Unit 3
No ratings yet
Unit 3
29 pages
Lecture 5
No ratings yet
Lecture 5
28 pages
CMPE257 - W10C13 - Reinforcement Learning
No ratings yet
CMPE257 - W10C13 - Reinforcement Learning
161 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
Ai (It) Unit-5
No ratings yet
Ai (It) Unit-5
43 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
Unit-5 Mla
No ratings yet
Unit-5 Mla
22 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Reinforcement
No ratings yet
Reinforcement
9 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
13 pages
Unit 4
No ratings yet
Unit 4
56 pages
L-14 - Reinforcement-L-d-07062024-111949am
No ratings yet
L-14 - Reinforcement-L-d-07062024-111949am
22 pages
Module - 1 - Reinforcement Learning and Markov Decision Process
No ratings yet
Module - 1 - Reinforcement Learning and Markov Decision Process
19 pages
Lecture 29 RL
No ratings yet
Lecture 29 RL
38 pages
DLMAIRIL01 Q4-2024 Session1
No ratings yet
DLMAIRIL01 Q4-2024 Session1
84 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
Artificial Intelligence: Computer Science & Engineering, Khulna University
No ratings yet
Artificial Intelligence: Computer Science & Engineering, Khulna University
30 pages
L11 Reinforcement Learning 1
No ratings yet
L11 Reinforcement Learning 1
18 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
Unit 5
No ratings yet
Unit 5
45 pages
Unit-5 Reinforcemnt and Q Learning
No ratings yet
Unit-5 Reinforcemnt and Q Learning
45 pages
Lecture 9 Reiforcement Learning
No ratings yet
Lecture 9 Reiforcement Learning
29 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
25 pages
Unit-5 (AI)
No ratings yet
Unit-5 (AI)
21 pages
RL & DL Notes
No ratings yet
RL & DL Notes
73 pages
ML - Unit-3 - Reinforcement Learning
No ratings yet
ML - Unit-3 - Reinforcement Learning
47 pages
RL & DL Notes
No ratings yet
RL & DL Notes
43 pages
Unit V Reinforcement Learning and Genetic Algorithm
No ratings yet
Unit V Reinforcement Learning and Genetic Algorithm
40 pages
Reinforcement Learning: By: Chandra Prakash IIITM Gwalior
No ratings yet
Reinforcement Learning: By: Chandra Prakash IIITM Gwalior
64 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
5 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
28 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
57 pages
Reinforcemnet Learning
No ratings yet
Reinforcemnet Learning
8 pages
Reinforcement Learning: Nguyen Do Van, PHD
No ratings yet
Reinforcement Learning: Nguyen Do Van, PHD
40 pages
R22ML 5
No ratings yet
R22ML 5
24 pages
Module 1
No ratings yet
Module 1
72 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
64 pages
Unit-8 - Reinforcement Learning
No ratings yet
Unit-8 - Reinforcement Learning
52 pages
Unit 6
No ratings yet
Unit 6
34 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
Ai PPT New
No ratings yet
Ai PPT New
14 pages
Lecture Week12
No ratings yet
Lecture Week12
37 pages
Reinforcement Learning Overview
No ratings yet
Reinforcement Learning Overview
2 pages
Markov Decision Process and Reinforcement Learning
No ratings yet
Markov Decision Process and Reinforcement Learning
36 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
15 pages
ML Unit-4
No ratings yet
ML Unit-4
10 pages
Unit 5 ML 3year
No ratings yet
Unit 5 ML 3year
17 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Unit4 (AI) 2024 Docx-1
No ratings yet
Unit4 (AI) 2024 Docx-1
22 pages
L12 Reinforcement Learning 2
No ratings yet
L12 Reinforcement Learning 2
26 pages
Unit Vi
No ratings yet
Unit Vi
17 pages
RL Vishnu Sankar
No ratings yet
RL Vishnu Sankar
26 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
30 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
Unit V
100% (1)
Unit V
24 pages
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
From Everand
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
Luka Nikolic
No ratings yet
Lecture 1 - Engineering Economics - CS - 25-July-22
No ratings yet
Lecture 1 - Engineering Economics - CS - 25-July-22
12 pages
Lecture 2
No ratings yet
Lecture 2
16 pages
Optical Flow
No ratings yet
Optical Flow
58 pages
Policy Gradient Methods-BR
No ratings yet
Policy Gradient Methods-BR
14 pages
Class 31 - Depreciation and Income Taxes Contd..
No ratings yet
Class 31 - Depreciation and Income Taxes Contd..
33 pages
Lecture7 LinearFilters
No ratings yet
Lecture7 LinearFilters
22 pages
Branching and Merging (Web UI)
No ratings yet
Branching and Merging (Web UI)
6 pages
FT04 Haghighat Independent 2023
No ratings yet
FT04 Haghighat Independent 2023
40 pages
Getting Started With RStudio and Installing Packages
No ratings yet
Getting Started With RStudio and Installing Packages
6 pages
IT257 DAA Linear Programming
No ratings yet
IT257 DAA Linear Programming
42 pages
80-90 DT - Fiat Tractor (01/84 - 12/92)
No ratings yet
80-90 DT - Fiat Tractor (01/84 - 12/92)
2 pages
IEC-IM03 Series: Key Features
No ratings yet
IEC-IM03 Series: Key Features
1 page
Laag 1
No ratings yet
Laag 1
12 pages
Our Development Board: Product Details
No ratings yet
Our Development Board: Product Details
4 pages
Presentation Free Diving Range 2018 Eng
No ratings yet
Presentation Free Diving Range 2018 Eng
55 pages
Certificate of Analysis: Product: ACCESS Prolactin Calibrators
No ratings yet
Certificate of Analysis: Product: ACCESS Prolactin Calibrators
1 page
Grating and Expanded Metal Catalog
No ratings yet
Grating and Expanded Metal Catalog
118 pages
G9 DLL Q1 Week4
No ratings yet
G9 DLL Q1 Week4
3 pages
07820100024353
No ratings yet
07820100024353
20 pages
BBMF2063 Tutorial Questions - 202306-10
No ratings yet
BBMF2063 Tutorial Questions - 202306-10
1 page
06 Intro ERP Using GBI Case Study PP (Letter) en v2.11 PDF
No ratings yet
06 Intro ERP Using GBI Case Study PP (Letter) en v2.11 PDF
41 pages
Asymmetric Information: Theory and Applications
No ratings yet
Asymmetric Information: Theory and Applications
35 pages
Waste Hierarchy
No ratings yet
Waste Hierarchy
4 pages
Island Agriculture Assessment - TOR
No ratings yet
Island Agriculture Assessment - TOR
2 pages
Technical Information: 1 - Introduction
No ratings yet
Technical Information: 1 - Introduction
3 pages
K7D628
No ratings yet
K7D628
16 pages
Pol Party Raz
No ratings yet
Pol Party Raz
1 page
Assessment 613 Full Resubmission PDF
No ratings yet
Assessment 613 Full Resubmission PDF
32 pages
Tehcnical Note - LIS PDF
100% (1)
Tehcnical Note - LIS PDF
19 pages
On-Load Tap-Changers For Power Transformers: MR Publication
100% (2)
On-Load Tap-Changers For Power Transformers: MR Publication
24 pages
Document 1
No ratings yet
Document 1
17 pages
Notice of Recurrence: U.S. Department of Labor
No ratings yet
Notice of Recurrence: U.S. Department of Labor
4 pages
MarchofDimesReportCard Michigan 2020
No ratings yet
MarchofDimesReportCard Michigan 2020
5 pages
Ajsr 50 08
No ratings yet
Ajsr 50 08
14 pages
9340-1131 Turbine Water Induction Protection - TWIP
100% (1)
9340-1131 Turbine Water Induction Protection - TWIP
2 pages
Robin Austin Resume
No ratings yet
Robin Austin Resume
4 pages
Elka 43 Instructions
No ratings yet
Elka 43 Instructions
5 pages
Da-1405 TDS en
No ratings yet
Da-1405 TDS en
1 page
Playwright JS Course Content
No ratings yet
Playwright JS Course Content
10 pages

IntroductiontoRL BR

Uploaded by

IntroductiontoRL BR

Uploaded by

Introduction to Reinforcement

Chapter 1 – Reinforcement Learning: An

Each percept(e) is enough to determine the

The agent see the sequences of state

Key idea: updating the utility value using

 Model free approach RL:

You might also like