0% found this document useful (0 votes)

59 views21 pages

Lecture 19 35

The document discusses Muad'Dib's approach to learning. It states that Muad'Dib learned rapidly because he believed he could learn and that every experience provides a lesson. His first lesson was the basic trust that he was capable of learning. Many people do not believe they can learn or think that learning is difficult, but Muad'Dib knew that learning was possible from any experience.

Uploaded by

BRIAN CHENG

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views21 pages

Lecture 19 35

Uploaded by

BRIAN CHENG

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

“(...

) Muad'Dib learned rapidly because his ﬁrst training was in how to

learn. And the first lesson of all was the basic trust that he could
learn. It's shocking to find how many people do not believe they can
learn, and how many more believe learning to be difficult. Muad'Dib
knew that every experience carries its lesson.”

Frank Herbert, Dune

CMPUT 365
Introduction to RL
Marlos C. Machado Class 19/35
CMPUT 365 – Class 17/35
2

Reminder I
You should be enrolled in the private session we created in Coursera for CMPUT 365.

I cannot use marks from the public repository for your course marks. You need to check,
every time, if you are in the private session and if you are submitting quizzes and
assignments to the private section.

At the end of the term, I will not port grades from the public session in Coursera.

If you have any questions or concerns, talk with the TAs or email us
[email protected].
Marlos C. Machado
CMPUT 365 – Class 19/35
3

Reminder II
● Exam viewing on Tuesday and Wednesday
○ Tuesday: 2pm - 5pm in ATH 3-28
○ Wednesday: 3pm - 5pm in ATH 3-32

Marlos C. Machado
CMPUT 365 – Class 17/35
4

Plan / Reminder III

● What I plan to do today:
○ Continue talking about TD Learning for Prediction (Beginning of Chapter 6 of the textbook).

● Due today (Monday):

○ Programming assignment (Policy evaluation with TD learning).

● I still recommend you to read Chapter 6 of the textbook.

Marlos C. Machado
CMPUT 365 – Class 19/35
7

Please, interrupt me at any time!

Marlos C. Machado https://pngtree.com/freepng/question-expression-cartoon-illustration_4545209.ht

CMPUT 365 – Class 19/35
8

Last Class: Temporal-Diﬀerence Error

Marlos C. Machado
CMPUT 365 – Class 19/35
9

TD is a sample update with bootstrapping

● Dynamic programming update:

Marlos C. Machado
CMPUT 365 – Class 19/35
10

TD is a sample update with bootstrapping

● Dynamic programming update:

● Monte Carlo update:

Marlos C. Machado
CMPUT 365 – Class 19/35
11

TD is a sample update with bootstrapping

● Dynamic programming update:

● Monte Carlo update:

● TD update:

Marlos C. Machado
CMPUT 365 – Class 19/35
12

Marlos C. Machado https://pngtree.com/freepng/question-expression-cartoon-illustration_4545209.ht

CMPUT 365 – Class 19/35
13

Example – Driving Home

Marlos C. Machado
CMPUT 365 – Class 19/35
14

Example – Driving Home

Marlos C. Machado
CMPUT 365 – Class 19/35
15

Example – Driving Home

Marlos C. Machado
CMPUT 365 – Class 19/35
16

Example – Driving Home

Marlos C. Machado
CMPUT 365 – Class 19/35
17

Marlos C. Machado https://pngtree.com/freepng/question-expression-cartoon-illustration_4545209.ht

CMPUT 365 – Class 19/35
18

Advantages of TD Prediction Methods

● TD methods do not require a model of the environment.
● TD methods are implemented in an online, fully incremental fashion.
● TD(0) is biased towards the initialization while MC methods have higher variance.
● When processing a ﬁxed batch of data, they converge to diﬀerent points.

Marlos C. Machado
CMPUT 365 – Class 19/35
19

Optimality of TD(0)
● Under batch training, constant-α MC converges to values, V(s), that are sample
averages of the actual returns experienced after visiting each state s. These are
optimal estimates in the sense that they minimize the mean square error from
the actual returns in the training set.

● Bath TD(0) gives us the answer that it is based on ﬁrst modeling the Markov
process and then computing the correct estimates given the model (the
certainty-equivalence estimate).

Marlos C. Machado
CMPUT 365 – Class 19/35
20

Example

V(A) = ?

V(B) = ?

Marlos C. Machado
CMPUT 365 – Class 19/35
21

Example

TD MC
V(A) = ? ¾ or 0?

V(B) = ¾

Marlos C. Machado
CMPUT 365 – Class 19/35
22

TD vs Monte Carlo

“Batch Monte Carlo methods always ﬁnd the estimates that minimize mean square
error on the training set, whereas batch TD(0) always ﬁnds the estimates that would
be exactly correct for the maximum-likelihood model of the Markov process.”

In general, the maximum-likelihood estimate of a parameter is the parameter

value whose probability of generating the data is greatest.

Marlos C. Machado
CMPUT 365 – Class 19/35
23

Marlos C. Machado https://pngtree.com/freepng/question-expression-cartoon-illustration_4545209.ht

Well Cost - Basic Concepts Release 5000.1.12.0
100% (2)
Well Cost - Basic Concepts Release 5000.1.12.0
190 pages
CS3002 Solution Paper 2015.16 - v2
No ratings yet
CS3002 Solution Paper 2015.16 - v2
6 pages
Métodos numéricos aplicados a Ingeniería: Casos de estudio usando MATLAB
From Everand
Métodos numéricos aplicados a Ingeniería: Casos de estudio usando MATLAB
Héctor Jorquera González
5/5 (1)
Computer Simulation of Liquids PDF
100% (5)
Computer Simulation of Liquids PDF
641 pages
Tolerance Analysis Using Worst Case Approach
100% (2)
Tolerance Analysis Using Worst Case Approach
25 pages
Topic6 Intro Prediction Oct212023
No ratings yet
Topic6 Intro Prediction Oct212023
15 pages
Temporal Difference Learning
No ratings yet
Temporal Difference Learning
15 pages
Lecture 02 - Neural Networks - 4p
No ratings yet
Lecture 02 - Neural Networks - 4p
10 pages
Lecture 4: Model-Free Prediction: David Silver
No ratings yet
Lecture 4: Model-Free Prediction: David Silver
51 pages
5.1 Large Scale ML
No ratings yet
5.1 Large Scale ML
10 pages
4-Tensors and Opeartions - Probability Basics-Gradient Descent-27!07!2024
No ratings yet
4-Tensors and Opeartions - Probability Basics-Gradient Descent-27!07!2024
18 pages
Lecture 01 - Introduction To ML - 4p
No ratings yet
Lecture 01 - Introduction To ML - 4p
11 pages
CSCE 636: Deep Learning
No ratings yet
CSCE 636: Deep Learning
30 pages
Z.H. Sikder University of Science and Technology: Mid-Term Examination, Fall-2020
No ratings yet
Z.H. Sikder University of Science and Technology: Mid-Term Examination, Fall-2020
6 pages
Introduction To Deep Learning AI 2025
No ratings yet
Introduction To Deep Learning AI 2025
78 pages
5 Temporal Difference Learning
No ratings yet
5 Temporal Difference Learning
25 pages
Week3 LearningI
No ratings yet
Week3 LearningI
48 pages
Linear Models and Learning Via Optimization: Piyush Rai Introduction To Machine Learning (CS771A)
No ratings yet
Linear Models and Learning Via Optimization: Piyush Rai Introduction To Machine Learning (CS771A)
26 pages
Practice Quiz M1 (Ungraded) 1
No ratings yet
Practice Quiz M1 (Ungraded) 1
4 pages
Hansen 2022
No ratings yet
Hansen 2022
20 pages
Value-Based Reinforcement Learning: Shusen Wang
No ratings yet
Value-Based Reinforcement Learning: Shusen Wang
53 pages
S24 Lecture 2 ML Problem Formulation
No ratings yet
S24 Lecture 2 ML Problem Formulation
38 pages
03 Supervised Classification
No ratings yet
03 Supervised Classification
68 pages
UCR Time Series Classification Archive
No ratings yet
UCR Time Series Classification Archive
14 pages
1.c CMPS460 S22 DT
No ratings yet
1.c CMPS460 S22 DT
34 pages
CE802 Lec IntroML Handouts
No ratings yet
CE802 Lec IntroML Handouts
24 pages
Practice Quiz M1 (Ungraded) 1
No ratings yet
Practice Quiz M1 (Ungraded) 1
4 pages
CS 446: Machine Learning: Dan Roth University of Illinois, Urbana-Champaign
No ratings yet
CS 446: Machine Learning: Dan Roth University of Illinois, Urbana-Champaign
71 pages
Intro
No ratings yet
Intro
38 pages
Lecture 01
No ratings yet
Lecture 01
23 pages
Model-Based Deep Learning
No ratings yet
Model-Based Deep Learning
35 pages
DIT865 2018 Mar Solution
No ratings yet
DIT865 2018 Mar Solution
9 pages
Databricks Certified Machine Learning Professional Exam Guide
No ratings yet
Databricks Certified Machine Learning Professional Exam Guide
9 pages
Ds 1
No ratings yet
Ds 1
45 pages
01.black Box ML
No ratings yet
01.black Box ML
67 pages
Course Two
No ratings yet
Course Two
133 pages
Introduction To DL With TensorFlow
No ratings yet
Introduction To DL With TensorFlow
55 pages
CSC 2541: Neural Net Training Dynamics: Lecture 1 - A Toy Model: Linear Regression
No ratings yet
CSC 2541: Neural Net Training Dynamics: Lecture 1 - A Toy Model: Linear Regression
62 pages
2023 Week3 Modelfree
No ratings yet
2023 Week3 Modelfree
63 pages
01 Introduction
No ratings yet
01 Introduction
51 pages
ML Lecture # 01 Introduction To ML
No ratings yet
ML Lecture # 01 Introduction To ML
60 pages
CP4252 ML Syllabus
No ratings yet
CP4252 ML Syllabus
4 pages
Daksh DA
No ratings yet
Daksh DA
7 pages
01 02 Intro
No ratings yet
01 02 Intro
11 pages
Ds 2
No ratings yet
Ds 2
27 pages
ML Lecture # 01 Introduction To ML
No ratings yet
ML Lecture # 01 Introduction To ML
44 pages
Machine Learning Updated
No ratings yet
Machine Learning Updated
14 pages
01 ML Basics
No ratings yet
01 ML Basics
61 pages
Introduction To Machine Learning - Unit 11 - Week 8
No ratings yet
Introduction To Machine Learning - Unit 11 - Week 8
4 pages
1 CourseOverview
No ratings yet
1 CourseOverview
34 pages
Assignment 1 Solution
No ratings yet
Assignment 1 Solution
6 pages
Lecture 2.2 Example Data Preparation Feature Engineering
No ratings yet
Lecture 2.2 Example Data Preparation Feature Engineering
25 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
(Fall 2024) Intro To ML
No ratings yet
(Fall 2024) Intro To ML
51 pages
Data Science in FInancial Services - 3
No ratings yet
Data Science in FInancial Services - 3
76 pages
1b Slides Modeling
100% (1)
1b Slides Modeling
61 pages
Cintttseriesdataxdy 0001 Session 71725493802205
No ratings yet
Cintttseriesdataxdy 0001 Session 71725493802205
36 pages
Sample Exam ML4DT-revised
No ratings yet
Sample Exam ML4DT-revised
10 pages
Artificial Intelligence & Machine Learning Lab With Applications
No ratings yet
Artificial Intelligence & Machine Learning Lab With Applications
6 pages
2 Linear
No ratings yet
2 Linear
83 pages
L09 - Learning - Part 2
No ratings yet
L09 - Learning - Part 2
41 pages
11-DL-Deep Learning For Reinforcement Learning
No ratings yet
11-DL-Deep Learning For Reinforcement Learning
47 pages
PCAWIN Program For Jointed Concrete Pavement Design
100% (1)
PCAWIN Program For Jointed Concrete Pavement Design
8 pages
UT Dallas Syllabus For cs3341.001.11f Taught by Pankaj Choudhary (pkc022000)
No ratings yet
UT Dallas Syllabus For cs3341.001.11f Taught by Pankaj Choudhary (pkc022000)
7 pages
ASHTIKA
No ratings yet
ASHTIKA
9 pages
The TSP Code Matlab
No ratings yet
The TSP Code Matlab
20 pages
Data Science Probability
No ratings yet
Data Science Probability
97 pages
Business Risk Calculation
No ratings yet
Business Risk Calculation
2 pages
Analytics of Observational Data Lec 12
No ratings yet
Analytics of Observational Data Lec 12
24 pages
Risk Register Template 21
No ratings yet
Risk Register Template 21
33 pages
Statistical PERT Normal Edition Quick Start Guide For Version 4 Updated
No ratings yet
Statistical PERT Normal Edition Quick Start Guide For Version 4 Updated
17 pages
Monte Carlo Methods PDF
No ratings yet
Monte Carlo Methods PDF
6 pages
2022 Vergote - Building A Framework For Probabilistic Assessment Accounting For Soil
No ratings yet
2022 Vergote - Building A Framework For Probabilistic Assessment Accounting For Soil
14 pages
Project Risk Management
No ratings yet
Project Risk Management
104 pages
Monte Carlo
No ratings yet
Monte Carlo
2 pages
Paper BinTree Ver2 5 OK
No ratings yet
Paper BinTree Ver2 5 OK
24 pages
SARC Project Contingency Methodology
No ratings yet
SARC Project Contingency Methodology
5 pages
Forward-Walking Greens Function Monte Carlo Metho
No ratings yet
Forward-Walking Greens Function Monte Carlo Metho
23 pages
Calculated Surprises. A Philosophy of Computer Simulation. Lenhard Ebook All Chapters PDF
100% (1)
Calculated Surprises. A Philosophy of Computer Simulation. Lenhard Ebook All Chapters PDF
47 pages
PetrelRE2012 AHM Tips 5762109 03
No ratings yet
PetrelRE2012 AHM Tips 5762109 03
11 pages
Ilovepdf Merged-4
No ratings yet
Ilovepdf Merged-4
98 pages
Mf2024 Coursepm Ht11 v2
No ratings yet
Mf2024 Coursepm Ht11 v2
4 pages
COSSAN PowerGrid
No ratings yet
COSSAN PowerGrid
10 pages
Computational Mathematics Notes
No ratings yet
Computational Mathematics Notes
129 pages
Adaptation and Cross Layer Design in Wireless Networks 1st Edition Mohamed Ibnkahla - The Latest Ebook Is Available For Instant Download Now
No ratings yet
Adaptation and Cross Layer Design in Wireless Networks 1st Edition Mohamed Ibnkahla - The Latest Ebook Is Available For Instant Download Now
57 pages
1 en 12 Chapter Author
No ratings yet
1 en 12 Chapter Author
12 pages
Latin Hypercube Sampling (LHS) For Gas Reserves
No ratings yet
Latin Hypercube Sampling (LHS) For Gas Reserves
8 pages
Process Capability Analysis With GD&T Specifications - Liu, Huang, Kong, Zhou
No ratings yet
Process Capability Analysis With GD&T Specifications - Liu, Huang, Kong, Zhou
9 pages
Operation Research: Physical Models
No ratings yet
Operation Research: Physical Models
10 pages

Lecture 19 35

Uploaded by

Lecture 19 35

Uploaded by

“(...

) Muad'Dib learned rapidly because his ﬁrst training was in how to

Frank Herbert, Dune

Plan / Reminder III

● Due today (Monday):

● I still recommend you to read Chapter 6 of the textbook.

Please, interrupt me at any time!

Marlos C. Machado https://pngtree.com/freepng/question-expression-cartoon-illustration_4545209.ht

Last Class: Temporal-Diﬀerence Error

TD is a sample update with bootstrapping

TD is a sample update with bootstrapping

● Monte Carlo update:

TD is a sample update with bootstrapping

● Monte Carlo update:

Marlos C. Machado https://pngtree.com/freepng/question-expression-cartoon-illustration_4545209.ht

Example – Driving Home

Example – Driving Home

Example – Driving Home

Example – Driving Home

Marlos C. Machado https://pngtree.com/freepng/question-expression-cartoon-illustration_4545209.ht

Advantages of TD Prediction Methods

In general, the maximum-likelihood estimate of a parameter is the parameter

Marlos C. Machado https://pngtree.com/freepng/question-expression-cartoon-illustration_4545209.ht

You might also like