Lecture 6 MONTE CARLO Example

Monte Carlo methods estimate value functions using returns from sample episodes of interaction with an environment, without knowledge of the environment's dynamics. There are two types of Monte Carlo methods: First Visit estimates the value of a state as the average return following the first visit to that state in each episode. Every Visit estimates the value as the average return following every visit to a state across all episodes. These methods were demonstrated on examples to calculate the value of states A and B in a 3-state environment using 2 sample episodes.

Uploaded by

Trinaya Kodavati

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views

Lecture 6 MONTE CARLO Example

Uploaded by

Trinaya Kodavati

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

MONTE CARLO method for

Reinforcement Learning
What is meant by Monte Carlo?

The term “Monte Carlo” is often used more broadly for any estimation method whose operation

involves a significant random component.

Monte Carlo methods require only experience — sample sequences of states, actions,

and rewards from actual or simulated interaction with an environment. Learning from

actual experience is striking because it requires no prior knowledge of the environment’s dynamics,

yet one can still attain optimal behavior.

What is this Monte thing used for in RL?
It is a method for estimating Value-action(Value|State, Action) or Value function
(Value|State) using some sample runs from the environment for which we are
estimating the Value function.
● Let us consider the above situation where we have a system of 3 states that are A, B &

terminate.

● We are given two example episodes (we can generate them using random walks for

any environment).

● A+3 →A+2 means the transition from state A →A with reward =3 for this transition.
Now, we know that averaging rewards can get us value-function for multi-state RL
problems as well. But things aren’t this easy as we know value-function depends on
future rewards as well. Hence we have got 2 types of Monte Carlo learning on how to
average future rewards:

First Visit Monte Carlo: First visit estimates (Value|State: S1) as the average of the
returns following the first visit to the state S1.

Every Visit Monte Carlo: It estimates (Value|State: S1) as the average of returns for every

visit to the State S1.

We will be Calculating V(A) & V(B) using the above mentioned Monte

Carlo methods
First Visit Monte Carlo:

● Calculating V(A)

As we have been given 2 different iterations, we will be summing all the rewards
coming after A (including that of A) after the first visit to ‘A’. Therefore, we can’t
have more than one summation_term/episode for a state.

Hence,

● For 1st episode=3+2+-4+4+-3=2

● For 2nd episode=3+-3=0

As we have got two terms, we will be averaging these two value i.e V(A)=(2+0)/2=1
Note:It must be noted that if an episode doesn’t have an occurence of ‘A’, it won’t be

considered in the average.

Hence if a 3rd episode like B-3 →B-3 →terminate existed, still V(A) using 1st Visit

would have been 1

Calculating V(B)
Drawing reference from the above example:

● 1st episode=-4+4–3=-3

● 2nd episode=-2+3+-3=-2

Averaging, V(B)=(-3+-2)/2=-2.5
Every Visit MC: Calculating V(A)
Here, we would be creating a new summation term adding all rewards coming
after every occurrence of ‘A’(including that of A as well).

● From 1st episode=(3+2+-4+4+-3)+(2+-4+4+-3)+(4+-3)=2+-1+1

● From 2nd episode=(3+-3)=0

As we got 4 summation terms, we will be averaging using N=4

i.e. V(A)=(2+-1+1+0)/4=0.5
Calculating V(B)
● From 1st episode=(-4+4+-3)+(-3)=-3+-3

● From 2nd episode=(-2+3–3)+(-3)=-2+-3

As we have 4 summation terms, averaging using N=4,

V(B)=(-3+-3+-2+-3)/4=-2.75

Dimensions of School-Based Management: Jose L. Francisco Discussant
100% (41)
Dimensions of School-Based Management: Jose L. Francisco Discussant
41 pages
Arguments Vs Non Arguments
100% (2)
Arguments Vs Non Arguments
21 pages
Lecture 4 Monte Carlo Method
100% (1)
Lecture 4 Monte Carlo Method
22 pages
Dissecting Reinforcement Learning-Part9
No ratings yet
Dissecting Reinforcement Learning-Part9
15 pages
Monte Carlo Learning
No ratings yet
Monte Carlo Learning
14 pages
Getting Started With Reinforcement Learning and Open AI Gym
No ratings yet
Getting Started With Reinforcement Learning and Open AI Gym
10 pages
rl
No ratings yet
rl
6 pages
DN Damp Guide Part 3
No ratings yet
DN Damp Guide Part 3
9 pages
Practical 9
No ratings yet
Practical 9
39 pages
CS 188 Introduction To Artificial Intelligence Summer 2019 Note 4
No ratings yet
CS 188 Introduction To Artificial Intelligence Summer 2019 Note 4
9 pages
ADA Q Bank Ans
No ratings yet
ADA Q Bank Ans
32 pages
hgtfhgfhtf
No ratings yet
hgtfhgfhtf
5 pages
RL_MJJ
No ratings yet
RL_MJJ
32 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
Week-9-LQR Control
No ratings yet
Week-9-LQR Control
15 pages
MCS............. First Order MATLAB 4
No ratings yet
MCS............. First Order MATLAB 4
58 pages
Ex 2 Solution
No ratings yet
Ex 2 Solution
13 pages
Servo
No ratings yet
Servo
24 pages
Pseudo Code
No ratings yet
Pseudo Code
27 pages
Solutions To Homework 4: 1 Problem 2.23
No ratings yet
Solutions To Homework 4: 1 Problem 2.23
6 pages
Control Example Using Matlab
No ratings yet
Control Example Using Matlab
37 pages
Comp417 Assignment3 2022
No ratings yet
Comp417 Assignment3 2022
2 pages
Roll No 37 - AI
No ratings yet
Roll No 37 - AI
5 pages
DL UNIT 2
No ratings yet
DL UNIT 2
46 pages
4.3 Reinforcement Learning
No ratings yet
4.3 Reinforcement Learning
27 pages
Question: Devise A Hill Climbing Search Based Approach To Solve N-City Tsps. You Must de Ne All Req..
No ratings yet
Question: Devise A Hill Climbing Search Based Approach To Solve N-City Tsps. You Must de Ne All Req..
4 pages
Lecture 23 - MODELING FINITE STATE MACHINES
No ratings yet
Lecture 23 - MODELING FINITE STATE MACHINES
15 pages
Note 2
No ratings yet
Note 2
30 pages
Inverted Pendulum State-Space Methods For Controller Design
No ratings yet
Inverted Pendulum State-Space Methods For Controller Design
16 pages
RL
No ratings yet
RL
6 pages
ML-UNIT-3
No ratings yet
ML-UNIT-3
46 pages
5.4-Reinforcement Learning-Part2-Learning-Algorithms
No ratings yet
5.4-Reinforcement Learning-Part2-Learning-Algorithms
15 pages
ENSC-483 - Inverted Pendulum - Final Project
No ratings yet
ENSC-483 - Inverted Pendulum - Final Project
14 pages
Chapter15 2
No ratings yet
Chapter15 2
21 pages
11 ML Reinforcement Learning Prediction
No ratings yet
11 ML Reinforcement Learning Prediction
13 pages
How to Perform Simple Linear Regression in Python
No ratings yet
How to Perform Simple Linear Regression in Python
8 pages
Assignment 190623
No ratings yet
Assignment 190623
12 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
10 pages
4150 Lab 3
No ratings yet
4150 Lab 3
6 pages
Sisea: Kalman Filter and Hidden Markov Models
No ratings yet
Sisea: Kalman Filter and Hidden Markov Models
16 pages
Mindanao State University General Santos City: Simple Linear Regression
No ratings yet
Mindanao State University General Santos City: Simple Linear Regression
12 pages
D2S1 - Classification Algorithms
No ratings yet
D2S1 - Classification Algorithms
30 pages
Assignment 6 (Sol.) : Reinforcement Learning
No ratings yet
Assignment 6 (Sol.) : Reinforcement Learning
4 pages
Unit 3 - Javascript
No ratings yet
Unit 3 - Javascript
27 pages
Operators
No ratings yet
Operators
15 pages
EEE350 Control Systems: Assignment 2
No ratings yet
EEE350 Control Systems: Assignment 2
15 pages
Lec26 PDF
No ratings yet
Lec26 PDF
16 pages
ML - Unit 3 - Part II
No ratings yet
ML - Unit 3 - Part II
51 pages
Artificial Intelligence Tutorial 5 - Answers: Difficult), and P
100% (1)
Artificial Intelligence Tutorial 5 - Answers: Difficult), and P
5 pages
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
No ratings yet
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
15 pages
MP254-Lecture 1 Force Systems 1: 1 Key Ideas
No ratings yet
MP254-Lecture 1 Force Systems 1: 1 Key Ideas
8 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
15 pages
2021 - Praktikum DinSis - Modul 4
No ratings yet
2021 - Praktikum DinSis - Modul 4
6 pages
Maths Primer Part1 Blank
No ratings yet
Maths Primer Part1 Blank
156 pages
Using MALLET For Conditional Random Fields: Matthew Michelson & Craig A. Knoblock CSCI 548 - Lecture 3
No ratings yet
Using MALLET For Conditional Random Fields: Matthew Michelson & Craig A. Knoblock CSCI 548 - Lecture 3
41 pages
Aerial Robotics Lecture 1B - 2 Dynamics and 1-D Linear Control
100% (1)
Aerial Robotics Lecture 1B - 2 Dynamics and 1-D Linear Control
8 pages
Chapter 4-Input
No ratings yet
Chapter 4-Input
15 pages
cv2 2
No ratings yet
cv2 2
79 pages
Regression PPT
No ratings yet
Regression PPT
21 pages
Tutorial
No ratings yet
Tutorial
28 pages
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
My Report
No ratings yet
My Report
81 pages
Reading Rubric
No ratings yet
Reading Rubric
2 pages
01048935
No ratings yet
01048935
4 pages
Example For A Simple Science Research
No ratings yet
Example For A Simple Science Research
12 pages
CCTV Training Course-1
No ratings yet
CCTV Training Course-1
10 pages
Compressive Strength of Cement
No ratings yet
Compressive Strength of Cement
8 pages
Resume Naqiu
No ratings yet
Resume Naqiu
3 pages
Psychoactive Substances and Paranormal Phenomena - A Comprehensive
No ratings yet
Psychoactive Substances and Paranormal Phenomena - A Comprehensive
61 pages
Grice Meaning
No ratings yet
Grice Meaning
10 pages
Yale University: Contributing To A Vital Downtown
No ratings yet
Yale University: Contributing To A Vital Downtown
1 page
NGC2012 Proceedings
No ratings yet
NGC2012 Proceedings
192 pages
Answer Key of Class-9th-Set-B
No ratings yet
Answer Key of Class-9th-Set-B
6 pages
Can It Be Shown That God
100% (1)
Can It Be Shown That God
5 pages
Teknologi Penyulingan Minyak Sereh Wangi Skala Kecil Dan Menengah Di Jawa Barat
No ratings yet
Teknologi Penyulingan Minyak Sereh Wangi Skala Kecil Dan Menengah Di Jawa Barat
9 pages
CMPUT 466/551 - Assignment 1: Paradox?
No ratings yet
CMPUT 466/551 - Assignment 1: Paradox?
6 pages
A History of On Edwin G Boring
No ratings yet
A History of On Edwin G Boring
21 pages
(2016.01) 2015 ATA Management Guidelines For Adult Patients With Thyroid Nodules and Differentiated Thyroid Cancer PDF
No ratings yet
(2016.01) 2015 ATA Management Guidelines For Adult Patients With Thyroid Nodules and Differentiated Thyroid Cancer PDF
168 pages
Looking Back at Heroes: The Different Versions of A Myth
No ratings yet
Looking Back at Heroes: The Different Versions of A Myth
13 pages
10 Steps To Pump Reliability
No ratings yet
10 Steps To Pump Reliability
13 pages
Anthill Pro and CI
No ratings yet
Anthill Pro and CI
8 pages
6.4 Implicit Differentiation
No ratings yet
6.4 Implicit Differentiation
34 pages
Rendell Company Case Study 2
No ratings yet
Rendell Company Case Study 2
3 pages
01 Proactive Negotiation
No ratings yet
01 Proactive Negotiation
33 pages
Meter Bridge-Specific Resistance
No ratings yet
Meter Bridge-Specific Resistance
3 pages
MSCR 534 Syllabus Ver 13 Jan 2020
No ratings yet
MSCR 534 Syllabus Ver 13 Jan 2020
6 pages
Graphs: 3.1 Basic Definitions and Applications
No ratings yet
Graphs: 3.1 Basic Definitions and Applications
12 pages
Applied Intertemporal Optimization
No ratings yet
Applied Intertemporal Optimization
256 pages
Sea Otter Final Id Brief
No ratings yet
Sea Otter Final Id Brief
3 pages