0% found this document useful (0 votes)

8 views

Reinforcement Learning

Uploaded by

Lakshit Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Reinforcement Learning

Uploaded by

Lakshit Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

REINFORCEMENT LEARNING

No dataset
in advanc
Youreceive
data
your
asagent
performs
sequential
actions
environment
Reward
actions
action
good bad

to minimize the no
of trials

Addsalt

EEngin
You STA
maxtrzes

Tastesgood reward

stress
Salthes Addsalt
independent
identical
distribution

Reinforcement Learning Examples

Robotics Autonomous Driving Mario

MARKOV DECISION PROCESS

0.2 0.05
persist
penalty Aim complete task asap
Probability

101 801 101

State Transition Diagram

Et
N
550,51 52
A ao a
pys.gg _P 5215191

we donot know PC and RC

typically
we dont know state transition diagram in advance
have to learn it trial and In
we through error
the process of learning through trial and error we
will encounter samples from the transition diagram

why the name Markov

3ʳsEmpñ
Policy is a
mapping from
state to Action

Poway
0.03 0 03 0.03
WHET 0.03 0.03

0.030.03 0.03

If your reward
changes then
policy
will also change

1 Pong

optimal policy is
sensitive to
reward fr

to new state
really transitioned
a
Not
upperbound 0.3

2
uphill 0 3 r o3 0 03
83 0 3 890.3
discounted

8 8 re 813

keep
8 diluting
the
contribution
8 of
future
transitions

82 v

S A PR

A S A
MarsRoverexample

Terminal States 1,6

After reaching
terminal state
state 1 2 3 4 5 6
you stop

100 0 0 0 0 40 Rewards
State 1 2 3 4 5 6

Action
Teft Fight

Ei as a 321 0 0 0 0 100
rewards

RETURN UTILITY Discouted Reward

4 3 2 1 109 0
Coria
1 9 0 693100
y 0.9 0 0 0 0.729 100 72.9

Return R OR 0 Rz until terminal state

Rewards

8 0.5

43 2 1 10 5 0 0.570 0.573100
12.5
Return
Exampleofreturs Return depends on
Reward depends on
Reward
Action depends

Triare
it L
it L
is
51 51 1 1 ZEEE.ae
action

10 8 051100 o.gotlo.pt 1o 5 o 0.570 1053100

v5
Zeisiae action

2 710.5740
0 105 0 105740

Optimee

alien world

titimith genetics
Policy

state's acts
A 4
A 5

5117117 EE
aaEem moves
State value function Action value function

fan of state and of policy that you

are
following
Elad pale

stochastic

E X EXPLX

mtfaztggggm
yactig.gg

output by optimal
policy
8R 2 R2 0213
State value function
Action value function

OCS a from state

o start s
O take actin a
Behave
optimally after that
Return that you get

nnt.li e iio
2 o 5 o
6.570 6.573100

12 5
9 acs

q7
Actin value

f
function

yj
for all States

fora acini
calculate return Q Sa

MarsRover

il Kittie
c
t.mn
na
ta vis

EE
1fiii1imI1t
showing
www taa.m QLS
a Rls
a
Q 2 R 2 05 Q 3 a
may
Q 2 0 0.5 max 25 6 25

Q 2 of 0.5 25 12.5

Q 4 R4 0 5 Q 3 a
may
0 0 5 max 25,625

12.5

Bellmanequation
ECX EXPLX

state
Ufunction
Stochastic
Action
violation
Q s a start from states
actin a return
behave
optimally

ETEEs
g gg maggagggy ages
QCs a RCS rmaa yes Bf7
JITE
VALUE AND POLICY ITERATION

Solving MDPs with known P and R

value and
Policy Iteration

VALUE ITERATION
bastaI

1s
10

807
A
To

927
8 111
08 0 0.9 1
0140 0 to 1 0 0 9 0

n
tot
POLICY ITERATION
Temporal Difference and Q
learning
we have seen how to solve MDP when P and R
known by value and Policy Iteration
are
using

33IE asipaate
By taking action in

EEEE.tt
quantities

Expectationof under P

Everytime we take
some action we
get to see one
sample
we treat
single
a
sample as

proxy for
entire distribution

TD treatsevery single sample you encounter as

representative of distribution and you dont have
to wait for large no of interactions in environment
to improve your
Resources :

https://www.youtube.com/playlist?list=PLYgyoWurxA_8ePNUuTLDtMvzyf-YW7im2
http://incompleteideas.net/book/ebook/
https://www.coursera.org/learn/unsupervised-learning-recommenders-reinforcement-learning

Reinforcement Learning Cheat Sheet: Return
No ratings yet
Reinforcement Learning Cheat Sheet: Return
7 pages
Add-On DRL CS06
No ratings yet
Add-On DRL CS06
23 pages
Lec 04 Reinforcement Learning
No ratings yet
Lec 04 Reinforcement Learning
57 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
DSA5102_lecture11
No ratings yet
DSA5102_lecture11
44 pages
Reinforcement Learning: EEE 485/585 Statistical Learning and Data Analytics
No ratings yet
Reinforcement Learning: EEE 485/585 Statistical Learning and Data Analytics
15 pages
Machine Learning Unit4
No ratings yet
Machine Learning Unit4
21 pages
lecture-06
No ratings yet
lecture-06
98 pages
2
No ratings yet
2
23 pages
21 - Reinforcement Learning
No ratings yet
21 - Reinforcement Learning
25 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
31 pages
16 RL PDF
No ratings yet
16 RL PDF
87 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
30 pages
کتاب هشتم بارگزاری شده
No ratings yet
کتاب هشتم بارگزاری شده
112 pages
ML Unit 4
No ratings yet
ML Unit 4
9 pages
Reinf 2
No ratings yet
Reinf 2
4 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I
35 pages
11-DL-Deep Learning For Reinforcement Learning
No ratings yet
11-DL-Deep Learning For Reinforcement Learning
47 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
13 pages
Reinforcement Learning Model Based Planning Dynamic Programming
No ratings yet
Reinforcement Learning Model Based Planning Dynamic Programming
17 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
Reinforcement Learning: Nguyen Do Van, PHD
No ratings yet
Reinforcement Learning: Nguyen Do Van, PHD
40 pages
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
No ratings yet
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
15 pages
Reinforcement Learning Note
No ratings yet
Reinforcement Learning Note
16 pages
RL Lecturer (1)
No ratings yet
RL Lecturer (1)
38 pages
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
No ratings yet
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
40 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
7 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
15 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
17 - Markov Decision Processes.pptx
No ratings yet
17 - Markov Decision Processes.pptx
59 pages
02 MarkovDecisionProcess
No ratings yet
02 MarkovDecisionProcess
51 pages
Stochastic Process - Markov Property - Markov Chain - Markov Decision Process - Reinforcement Learning - RL Techniques - Example Applications
No ratings yet
Stochastic Process - Markov Property - Markov Chain - Markov Decision Process - Reinforcement Learning - RL Techniques - Example Applications
39 pages
DD2431 Machine Learning Lab 4: Reinforcement Learning Python Version
No ratings yet
DD2431 Machine Learning Lab 4: Reinforcement Learning Python Version
9 pages
CS229
No ratings yet
CS229
17 pages
ML - Unit 3 - Part II
No ratings yet
ML - Unit 3 - Part II
51 pages
Chapter4 221208 183603
No ratings yet
Chapter4 221208 183603
23 pages
Reinforcement LN-6
No ratings yet
Reinforcement LN-6
13 pages
Lecture 30 Reinforcement-Learning
No ratings yet
Lecture 30 Reinforcement-Learning
50 pages
cs229-notes12 Reinforcement in Control
No ratings yet
cs229-notes12 Reinforcement in Control
17 pages
Lesson 5 AI
No ratings yet
Lesson 5 AI
38 pages
RL_Basics_1737166593
No ratings yet
RL_Basics_1737166593
30 pages
Lect28 4up
No ratings yet
Lect28 4up
11 pages
Reinforcement Learning: Instructor: Max Welling
No ratings yet
Reinforcement Learning: Instructor: Max Welling
18 pages
7- Reinforcement Learning
No ratings yet
7- Reinforcement Learning
23 pages
Reinforcement Learning: Russell and Norvig: CH 21
No ratings yet
Reinforcement Learning: Russell and Norvig: CH 21
16 pages
20AI903_RL_UNIT 2
No ratings yet
20AI903_RL_UNIT 2
27 pages
Lecture Notes v1.0 687 F22
No ratings yet
Lecture Notes v1.0 687 F22
115 pages
Evolutionary Game Theory and Multi-Agent Reinforcement Learning
No ratings yet
Evolutionary Game Theory and Multi-Agent Reinforcement Learning
26 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
45 pages
ML unit 4
No ratings yet
ML unit 4
17 pages
Reinforcement Learning: Amulya Viswambaran (202090007) Kehkashan Fatima (202090202) Sruthi Krishnan (202090333)
No ratings yet
Reinforcement Learning: Amulya Viswambaran (202090007) Kehkashan Fatima (202090202) Sruthi Krishnan (202090333)
40 pages
DEEP RL - CONTENT BEYOND SYLLABUS
No ratings yet
DEEP RL - CONTENT BEYOND SYLLABUS
16 pages
37 RL
No ratings yet
37 RL
18 pages
Reinforcement Learning: Part I - Definitions
No ratings yet
Reinforcement Learning: Part I - Definitions
26 pages
DRL #4-5 - Introducing MDP and Dynamic Programming Solution
No ratings yet
DRL #4-5 - Introducing MDP and Dynamic Programming Solution
74 pages
A17 Complexdecisions
No ratings yet
A17 Complexdecisions
28 pages
Lecture 3 - MDPs and Dynamic Programming
No ratings yet
Lecture 3 - MDPs and Dynamic Programming
66 pages
Reinforcement Learning: Russell and Norvig: CH 21
No ratings yet
Reinforcement Learning: Russell and Norvig: CH 21
16 pages
Module_1 - Reinforcement Learning and Markov Decision Process
No ratings yet
Module_1 - Reinforcement Learning and Markov Decision Process
19 pages
Introduction To C Part 2
No ratings yet
Introduction To C Part 2
45 pages
Ai Unit-2
No ratings yet
Ai Unit-2
45 pages
CCS 305
No ratings yet
CCS 305
5 pages
Unit 3
No ratings yet
Unit 3
146 pages
Lab 8
No ratings yet
Lab 8
4 pages
DAA Assignment Question jbki hn ha sb hn
No ratings yet
DAA Assignment Question jbki hn ha sb hn
2 pages
New Network
No ratings yet
New Network
530 pages
Four Bit Binary To Gray Code Converter
No ratings yet
Four Bit Binary To Gray Code Converter
23 pages
Pseudo Code
No ratings yet
Pseudo Code
8 pages
Binary Heaps, Heap Sort Algorithm: Welcome To CS221: Programming & Data Structures
No ratings yet
Binary Heaps, Heap Sort Algorithm: Welcome To CS221: Programming & Data Structures
44 pages
PrimeTime Suite Tool Commands - Fix - Eco - Wire
No ratings yet
PrimeTime Suite Tool Commands - Fix - Eco - Wire
2 pages
NGEC 5 CH 7 The Beauty of Codes
No ratings yet
NGEC 5 CH 7 The Beauty of Codes
22 pages
CBDS2103 Data Structure - Smay19 (RS & MREP)
No ratings yet
CBDS2103 Data Structure - Smay19 (RS & MREP)
210 pages
Builtin Com Artificial Intelligence
No ratings yet
Builtin Com Artificial Intelligence
20 pages
Linear Programming - Final 1
No ratings yet
Linear Programming - Final 1
89 pages
A Complete Process of Text Classification System Using State‐of‐the‐Art NLP Models
No ratings yet
A Complete Process of Text Classification System Using State‐of‐the‐Art NLP Models
26 pages
CSC319 - 356 - 141 - CSC319
No ratings yet
CSC319 - 356 - 141 - CSC319
5 pages
A Survey of Reinforcement Learning from Human Feedback
No ratings yet
A Survey of Reinforcement Learning from Human Feedback
83 pages
Unit 3.docx
No ratings yet
Unit 3.docx
20 pages
Extensiones
No ratings yet
Extensiones
18 pages
Multi Level Password Authentication Using Bio-Metric Verification For Smart Atm
No ratings yet
Multi Level Password Authentication Using Bio-Metric Verification For Smart Atm
98 pages
Java 3 Mitad de Curso Completito
No ratings yet
Java 3 Mitad de Curso Completito
506 pages
Mathematicsunit 6lesson
No ratings yet
Mathematicsunit 6lesson
24 pages
Andersson TerrainRendering (Siggraph07)
No ratings yet
Andersson TerrainRendering (Siggraph07)
52 pages
CENG240-2021 - Week6 - Examples With Conditional and Repetitive Execution
No ratings yet
CENG240-2021 - Week6 - Examples With Conditional and Repetitive Execution
18 pages
Lab Report
No ratings yet
Lab Report
5 pages
m2-summer2017
No ratings yet
m2-summer2017
12 pages
Probability and Statistics by Prof Sudip Roy Roorkee
No ratings yet
Probability and Statistics by Prof Sudip Roy Roorkee
21 pages
Vi Command Cheat Sheet
No ratings yet
Vi Command Cheat Sheet
1 page
Turing Variations
No ratings yet
Turing Variations
57 pages

Reinforcement Learning

Uploaded by

Reinforcement Learning

Uploaded by

REINFORCEMENT LEARNING

Reinforcement Learning Examples

MARKOV DECISION PROCESS

101 801 101

State Transition Diagram

we donot know PC and RC

why the name Markov

Terminal States 1,6

RETURN UTILITY Discouted Reward

Return R OR 0 Rz until terminal state

10 8 051100 o.gotlo.pt 1o 5 o 0.570 1053100

fan of state and of policy that you

OCS a from state

Solving MDPs with known P and R

TD treatsevery single sample you encounter as

You might also like