Lecture2 Introduction Part2

The document discusses key concepts in Reinforcement Learning (RL), including action spaces, rewards, state values, and sequential decision making. It explains the difference between discrete and continuous action spaces, the importance of maximizing cumulative rewards, and the role of policies in determining actions. Additionally, it highlights the long-term consequences of actions and the need for expected return values in decision making.

Uploaded by

mscai2024.avinesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views13 pages

Lecture2 Introduction Part2

Uploaded by

mscai2024.avinesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 13

Introduction to RL Part2

Agent Environment Loop

Action Space
• The set of all valid actions in a given
environment is called the action space.
• There are two types of action space:
– Discrete action space
• the finite number of actions are possible.
• For example, turning left or right.
– Continuous action space
• can have an infinite number of actions.
• For instance, steering angle instead of turning left or
right.
Rewards
• A reward Rt is a scalar feedback signal

• Indicates how well agent is doing at step t

• The agent’s job is to maximize cumulative reward

(return)
…..

• Return is only about the future

State Value
• Since the return is about the future, we
cannot always know the actual return value.

• Therefore we consider expected return

• These values depend on the action the agent

takes.
State Value
• Now the goal is to maximize the expected return
value by choosing the suitable action.

• In RL, we cannot comment on if a particular

action is correct or wrong (since it is not
supervised learning).

• However, a sequence of actions taken from a

state s to reach the goal produce the state value
of s.
Sequential Decision Making
• Selecting actions to maximize total future reward
• Actions may have long term consequences
• Reward may be delayed
• It may be better to sacrifice immediate reward to
gain more long-term reward
Sequential Decision Making
• Examples:
– A financial investment (may take months to
mature)
– Refueling a helicopter (might prevent a crash in
several hours)
– Blocking opponent moves (might help winning
chances many moves from now)
Recursive Definitions – Return
…..

…..

Therefore
Recursive Definitions - State Value

…| =s]

Therefore
| =s]
Policy
• A mapping from states to actions
Action Value
• Values associated with the action to be taken

• Here, we pin down only the first action and

the future actions are not decided.
• Therefore, q depends on the policy
Thank You

Introduction To Reinforcement Learning: Instructor: Sergey Levine UC Berkeley
No ratings yet
Introduction To Reinforcement Learning: Instructor: Sergey Levine UC Berkeley
46 pages
Sections
No ratings yet
Sections
76 pages
RL - Unit III
No ratings yet
RL - Unit III
12 pages
Kguh
No ratings yet
Kguh
38 pages
Introduction To Reinforcement Learning
100% (1)
Introduction To Reinforcement Learning
52 pages
RL RS-Unit - 3
No ratings yet
RL RS-Unit - 3
6 pages
Reinforcement Learning and Robotics
No ratings yet
Reinforcement Learning and Robotics
35 pages
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
No ratings yet
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
16 pages
Sdfesdf
No ratings yet
Sdfesdf
23 pages
A Crash Course On Reinforcement Learning - Felix Wagner
No ratings yet
A Crash Course On Reinforcement Learning - Felix Wagner
84 pages
Reinforcement Learning2A
No ratings yet
Reinforcement Learning2A
88 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Lecture - 01 - Introduction - I
No ratings yet
Lecture - 01 - Introduction - I
15 pages
A Primer Chapter On Reinforcement Learning-Final
No ratings yet
A Primer Chapter On Reinforcement Learning-Final
22 pages
RL Lecturer
No ratings yet
RL Lecturer
38 pages
A Beginner's Guide To Deep Reinforcement Learning: Skymind - Ai
No ratings yet
A Beginner's Guide To Deep Reinforcement Learning: Skymind - Ai
23 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
17 pages
Module 1
No ratings yet
Module 1
81 pages
10 ML Introduction To Reinforcement Learning
No ratings yet
10 ML Introduction To Reinforcement Learning
8 pages
ML Unit 4
No ratings yet
ML Unit 4
9 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
Maai 6
No ratings yet
Maai 6
143 pages
Lec17 ReinforcementLearning
No ratings yet
Lec17 ReinforcementLearning
58 pages
Lecture 10 - Overview of RL With A VIP Perspective
No ratings yet
Lecture 10 - Overview of RL With A VIP Perspective
35 pages
Reinforcement LN-6
No ratings yet
Reinforcement LN-6
13 pages
RL Unit - Ii
No ratings yet
RL Unit - Ii
20 pages
Lecture 06
No ratings yet
Lecture 06
98 pages
LearnAlgorithms LT
No ratings yet
LearnAlgorithms LT
95 pages
L07 Slides - rl1
No ratings yet
L07 Slides - rl1
20 pages
tiếng anhi
No ratings yet
tiếng anhi
7 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
13 pages
10 Deep Reinforcement
No ratings yet
10 Deep Reinforcement
40 pages
RL Frra
No ratings yet
RL Frra
9 pages
Value Functions & Bellman Equations: UNIT-3
No ratings yet
Value Functions & Bellman Equations: UNIT-3
11 pages
Reinforcement Learning Mastery Path
No ratings yet
Reinforcement Learning Mastery Path
18 pages
ML Unit 4
No ratings yet
ML Unit 4
17 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
2024 MTH058 Lecture05 ReinforcementLearning
No ratings yet
2024 MTH058 Lecture05 ReinforcementLearning
59 pages
RL Module 1
No ratings yet
RL Module 1
6 pages
Lecture 9 - Reinforced Learning
No ratings yet
Lecture 9 - Reinforced Learning
18 pages
Unit 3 Ai
No ratings yet
Unit 3 Ai
5 pages
Reinforcement Learning: A Short Cut
No ratings yet
Reinforcement Learning: A Short Cut
7 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
35 pages
ML Unit-4
No ratings yet
ML Unit-4
10 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
1 page
Machine - Learning - Chapter 4
No ratings yet
Machine - Learning - Chapter 4
13 pages
Reinforcement Learning: Amulya Viswambaran (202090007) Kehkashan Fatima (202090202) Sruthi Krishnan (202090333)
No ratings yet
Reinforcement Learning: Amulya Viswambaran (202090007) Kehkashan Fatima (202090202) Sruthi Krishnan (202090333)
40 pages
RL Frra
No ratings yet
RL Frra
10 pages
RL Ese Answers
No ratings yet
RL Ese Answers
16 pages
A Review of Reinforcement Learning For Financial Time Series Prediction and Portfolio Optimization
No ratings yet
A Review of Reinforcement Learning For Financial Time Series Prediction and Portfolio Optimization
38 pages
03 04 Lessonarticle
No ratings yet
03 04 Lessonarticle
5 pages
18 AI BasicRL
No ratings yet
18 AI BasicRL
96 pages
Introduction To Reinforcement Learning
No ratings yet
Introduction To Reinforcement Learning
62 pages
Reinforcement Learning: Nguyen Do Van, PHD
No ratings yet
Reinforcement Learning: Nguyen Do Van, PHD
40 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
30 pages
Lecture 9 Reiforcement Learning
No ratings yet
Lecture 9 Reiforcement Learning
29 pages
Lec 04 Reinforcement Learning
No ratings yet
Lec 04 Reinforcement Learning
57 pages
Lecture 1: Introduction To Reinforcement Learning: David Silver
No ratings yet
Lecture 1: Introduction To Reinforcement Learning: David Silver
46 pages
Module 1
No ratings yet
Module 1
72 pages
Financial Accounting and Reporting Study Guide Notes
From Everand
Financial Accounting and Reporting Study Guide Notes
Leonard Prather
1/5 (1)

Lecture2 Introduction Part2

Uploaded by

Lecture2 Introduction Part2

Uploaded by

Introduction to RL Part2

Agent Environment Loop

• Indicates how well agent is doing at step t

• The agent’s job is to maximize cumulative reward

• Return is only about the future

• Therefore we consider expected return

• These values depend on the action the agent

• In RL, we cannot comment on if a particular

• However, a sequence of actions taken from a

• Here, we pin down only the first action and

You might also like