0% found this document useful (0 votes)
59 views21 pages

Lecture 19 35

The document discusses Muad'Dib's approach to learning. It states that Muad'Dib learned rapidly because he believed he could learn and that every experience provides a lesson. His first lesson was the basic trust that he was capable of learning. Many people do not believe they can learn or think that learning is difficult, but Muad'Dib knew that learning was possible from any experience.

Uploaded by

BRIAN CHENG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views21 pages

Lecture 19 35

The document discusses Muad'Dib's approach to learning. It states that Muad'Dib learned rapidly because he believed he could learn and that every experience provides a lesson. His first lesson was the basic trust that he was capable of learning. Many people do not believe they can learn or think that learning is difficult, but Muad'Dib knew that learning was possible from any experience.

Uploaded by

BRIAN CHENG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

“(...

) Muad'Dib learned rapidly because his first training was in how to


learn. And the first lesson of all was the basic trust that he could
learn. It's shocking to find how many people do not believe they can
learn, and how many more believe learning to be difficult. Muad'Dib
knew that every experience carries its lesson.”

Frank Herbert, Dune

CMPUT 365
Introduction to RL
Marlos C. Machado Class 19/35
CMPUT 365 – Class 17/35
2

Reminder I
You should be enrolled in the private session we created in Coursera for CMPUT 365.

I cannot use marks from the public repository for your course marks. You need to check,
every time, if you are in the private session and if you are submitting quizzes and
assignments to the private section.

At the end of the term, I will not port grades from the public session in Coursera.

If you have any questions or concerns, talk with the TAs or email us
[email protected].
Marlos C. Machado
CMPUT 365 – Class 19/35
3

Reminder II
● Exam viewing on Tuesday and Wednesday
○ Tuesday: 2pm - 5pm in ATH 3-28
○ Wednesday: 3pm - 5pm in ATH 3-32

Marlos C. Machado
CMPUT 365 – Class 17/35
4

Plan / Reminder III


● What I plan to do today:
○ Continue talking about TD Learning for Prediction (Beginning of Chapter 6 of the textbook).

● Due today (Monday):


○ Programming assignment (Policy evaluation with TD learning).

● I still recommend you to read Chapter 6 of the textbook.

Marlos C. Machado
CMPUT 365 – Class 19/35
7

Please, interrupt me at any time!

Marlos C. Machado https://pngtree.com/freepng/question-expression-cartoon-illustration_4545209.ht


CMPUT 365 – Class 19/35
8

Last Class: Temporal-Difference Error

Marlos C. Machado
CMPUT 365 – Class 19/35
9

TD is a sample update with bootstrapping


● Dynamic programming update:

Marlos C. Machado
CMPUT 365 – Class 19/35
10

TD is a sample update with bootstrapping


● Dynamic programming update:

● Monte Carlo update:

Marlos C. Machado
CMPUT 365 – Class 19/35
11

TD is a sample update with bootstrapping


● Dynamic programming update:

● Monte Carlo update:

● TD update:

Marlos C. Machado
CMPUT 365 – Class 19/35
12

Marlos C. Machado https://pngtree.com/freepng/question-expression-cartoon-illustration_4545209.ht


CMPUT 365 – Class 19/35
13

Example – Driving Home

Marlos C. Machado
CMPUT 365 – Class 19/35
14

Example – Driving Home

Marlos C. Machado
CMPUT 365 – Class 19/35
15

Example – Driving Home

Marlos C. Machado
CMPUT 365 – Class 19/35
16

Example – Driving Home

Marlos C. Machado
CMPUT 365 – Class 19/35
17

Marlos C. Machado https://pngtree.com/freepng/question-expression-cartoon-illustration_4545209.ht


CMPUT 365 – Class 19/35
18

Advantages of TD Prediction Methods


● TD methods do not require a model of the environment.
● TD methods are implemented in an online, fully incremental fashion.
● TD(0) is biased towards the initialization while MC methods have higher variance.
● When processing a fixed batch of data, they converge to different points.

Marlos C. Machado
CMPUT 365 – Class 19/35
19

Optimality of TD(0)
● Under batch training, constant-α MC converges to values, V(s), that are sample
averages of the actual returns experienced after visiting each state s. These are
optimal estimates in the sense that they minimize the mean square error from
the actual returns in the training set.

● Bath TD(0) gives us the answer that it is based on first modeling the Markov
process and then computing the correct estimates given the model (the
certainty-equivalence estimate).

Marlos C. Machado
CMPUT 365 – Class 19/35
20

Example

V(A) = ?

V(B) = ?

Marlos C. Machado
CMPUT 365 – Class 19/35
21

Example

TD MC
V(A) = ? ¾ or 0?

V(B) = ¾

Marlos C. Machado
CMPUT 365 – Class 19/35
22

TD vs Monte Carlo

“Batch Monte Carlo methods always find the estimates that minimize mean square
error on the training set, whereas batch TD(0) always finds the estimates that would
be exactly correct for the maximum-likelihood model of the Markov process.”

In general, the maximum-likelihood estimate of a parameter is the parameter


value whose probability of generating the data is greatest.

Marlos C. Machado
CMPUT 365 – Class 19/35
23

Marlos C. Machado https://pngtree.com/freepng/question-expression-cartoon-illustration_4545209.ht

You might also like