0% found this document useful (0 votes)

46 views5 pages

Solutions To Reinforcement Learning by Sutton Chapter 4 r5

This summary covers exercises from Chapter 4 of the book Reinforcement Learning by Sutton. The exercises involve analyzing reinforcement learning algorithms, analyzing optimal policies for different problems, and implementing reinforcement learning algorithms.

Uploaded by

이강민

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views5 pages

Solutions To Reinforcement Learning by Sutton Chapter 4 r5

Uploaded by

이강민

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Solutions to Reinforcement Learning by Sutton

Chapter 4
Yifan Wang

May 2019

Exercise 4.1

if π is the equiprobable random policy:

qπ (11, down) = −1 + vπ (T ) = −1 + 0 = −1

qπ (7, down) = −1 + vπ (11) = −1 + (−14) = −15

Exercise 4.2

Adding state 15 will result:

vπ (15) = −1 + 0.25(−20 − 22 − 14 + vπ (15)) = −15 + 0.25vπ (15)

vπ (15) = −15/0.75 = −20

Changing the dynamics will not result the recalculation of the whole
game: the Set S 0 of S = 15 is exactly as the one of S = 13. Thus, they
must share the same state value as −20.

Here is my script implementation of this game. Feel free to add

a S = 15 in it.

1
Exercise 4.3

.
qk+1 (s, a) = Eπ Rt+1 + γGt+1 | St = s, At = a
X h X i
= p(s0 , r|s, a) r + γ π(a0 |s0 )qk (s0 , a0 )
s0 ,r a0

Exercise 4.4

In the step 3. Policy Improvement, it said:

If old-action6= π(s) , then ......

It is a bug and one way to fix it is to say the following instead:

If old-action 6∈ {ai }, which is the all equi-best solutions from π(s), ......

2
Exercise 4.5

1. Initialization
Q(s, a) ∈ R and π(s) ∈ A(s) arbitrarily for all s ∈ S, a ∈ A

2. Policy Evaluation
Loop:
∆← −0
Loop for each s ∈ S and a ∈ A:
q = Q(s, a) h i
− s0 ,r p(s0 , r|s, a) r + γ a0 π(a0 |s0 )Q(s0 , a0 )
P P
Q(s, a) ←
∆← − max(∆, |q − Q(s, a)|)
until ∆ < θ (a small positive number determining the accuracy of estimation)

3. Policy Improvement
policy-stable ←− true
For each s ∈ S and a ∈ A:
old-action ←− π(s)
π(s) ←− arg maxa Q(s, a)
If old-action 6∈ {ai }, which is the set of equi-best solutions from π(s)
Then policy-stable ← − f alse
If policy-stable, then stop and return Q ≈ q∗ and π ≈ π∗ ; else go to 2

Exercise 4.6

Step 3 Changes: We will only decide policy-stable is false under the

condition that the policy does not explore.

Step 2 Changes: θ should not be set above the limit of any sof t-
method.

Step 1 Changes: π should be well defined as sof t- method. should

be given.

3
Exercise 4.7

Partial answer here. The programming implementation of DP is ex-

tremely time consuming. I did not get the exactly same answer from
the book due to some unknown reason (algorithm difference/float
precision etc.). Still, feel free to check it out and solve the whole
picture! (Warning: It takes 30min to train.)

Exercise 4.8

The gambler’s problem has such curious form of optimal policy be-
cause at the number 50, you can suddenly win with probability ph .
Thus, the best policy will bet all when Capital=50 and the possible
dividends of it, like 25.

Thinking capital of 51 as 50 plus 1. Of course we can bet all when we

have 51, but the best policy is to see if we can earn much from the
extra 1 dollar. If this return g is positive, we can say we have extra
g money and bet it again until 75, where the sudden win chance is
coming. On the contrary, if we bet 50 out of 51 first, our chance of
win is only ph and we lose the chance to reach 75. Instead, we will
have to try our best to reach 25 first with 1 dollar if we lose the bet,
a much worse condition.

Conclusion: The indicated optimal policy creates more chances to

win and guarantees the gambler be better off when he loses.

4
Exercise 4.9

Program here.

Plot A

Plot B

Plot C

Plot D

With proper thinking, you could easily recognize which plot is which.
(Think of human playing technique!)

Exercise 4.10

.
γqk (s0 , a0 )

qk+1 (s, a) = E Rt+1 + max 0
a
X h i
0 0 0
= p(s , r|s, a) r + max 0
γq k (s , a )
a
s0 ,r

Reinforcement Learning Cheat Sheet: Return
No ratings yet
Reinforcement Learning Cheat Sheet: Return
7 pages
Solutions To Engineering Mechanics "RESULTANT OF ANY FORCE SYSTEM" 3rd Edition by Ferdinand Singer
57% (14)
Solutions To Engineering Mechanics "RESULTANT OF ANY FORCE SYSTEM" 3rd Edition by Ferdinand Singer
16 pages
FSA ELA Reading Practice Test Questions: Grade 10
No ratings yet
FSA ELA Reading Practice Test Questions: Grade 10
26 pages
Delta Lesson Plan
100% (3)
Delta Lesson Plan
8 pages
RL Theory Tutorial
No ratings yet
RL Theory Tutorial
80 pages
Ar514 MDP
No ratings yet
Ar514 MDP
27 pages
Lecture 11
No ratings yet
Lecture 11
51 pages
A12 Spring2024
No ratings yet
A12 Spring2024
5 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
30 pages
Reinforcement Learning: Instructor: Max Welling
No ratings yet
Reinforcement Learning: Instructor: Max Welling
18 pages
Issues in Using Function Approximation For Reinforcement Learning
No ratings yet
Issues in Using Function Approximation For Reinforcement Learning
9 pages
RL Exam Tutti
No ratings yet
RL Exam Tutti
47 pages
RL Lecture4
No ratings yet
RL Lecture4
7 pages
A (Long) Peek Into Reinforcement Learning - Lil'Log
No ratings yet
A (Long) Peek Into Reinforcement Learning - Lil'Log
23 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
17 pages
ML - Unit 3 - Part II
No ratings yet
ML - Unit 3 - Part II
51 pages
databookRL Steve Brunton PDF
No ratings yet
databookRL Steve Brunton PDF
76 pages
Reinforcement LN-6
No ratings yet
Reinforcement LN-6
13 pages
Lec17 ReinforcementLearning
No ratings yet
Lec17 ReinforcementLearning
58 pages
RL Paper Deepsk
No ratings yet
RL Paper Deepsk
4 pages
07 Deep Reinforcement Learning (John)
No ratings yet
07 Deep Reinforcement Learning (John)
52 pages
Unit 5d - Deep Reinforcement Learning
No ratings yet
Unit 5d - Deep Reinforcement Learning
52 pages
Reinforcement Learning: EEE 485/585 Statistical Learning and Data Analytics
No ratings yet
Reinforcement Learning: EEE 485/585 Statistical Learning and Data Analytics
15 pages
CO431 RL 2023 End Nov
No ratings yet
CO431 RL 2023 End Nov
3 pages
Bellman Equation and RL Notes
No ratings yet
Bellman Equation and RL Notes
6 pages
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
No ratings yet
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
30 pages
Some Thoughts On Reinforcement Learning: 1 Motivation
No ratings yet
Some Thoughts On Reinforcement Learning: 1 Motivation
9 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
Neural Networks: 1 October, 2016
No ratings yet
Neural Networks: 1 October, 2016
51 pages
Solutions - REINFORCE and Linear Function Approximation
No ratings yet
Solutions - REINFORCE and Linear Function Approximation
5 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
45 pages
4 Reinforcement Learning - Basic Algorithms: - S, A) ) and The Immediate Reward Function R (R (S, A, S
No ratings yet
4 Reinforcement Learning - Basic Algorithms: - S, A) ) and The Immediate Reward Function R (R (S, A, S
16 pages
Littomore
No ratings yet
Littomore
169 pages
Evolutionary Game Theory and Multi-Agent Reinforcement Learning
No ratings yet
Evolutionary Game Theory and Multi-Agent Reinforcement Learning
26 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
46 pages
Reinforcement Learning in A Nutshell
No ratings yet
Reinforcement Learning in A Nutshell
12 pages
q2B Review Sol
No ratings yet
q2B Review Sol
14 pages
20ai903 - RL - Unit 4
No ratings yet
20ai903 - RL - Unit 4
49 pages
RL 2021 22 Exam I
No ratings yet
RL 2021 22 Exam I
4 pages
Lecture Notes v1.0 687 F22
No ratings yet
Lecture Notes v1.0 687 F22
115 pages
11-DL-Deep Learning For Reinforcement Learning
No ratings yet
11-DL-Deep Learning For Reinforcement Learning
47 pages
RL With LCS
No ratings yet
RL With LCS
29 pages
Bellemare17a PDF
No ratings yet
Bellemare17a PDF
10 pages
Soft Q-Learning With Mutual Information Regularization
No ratings yet
Soft Q-Learning With Mutual Information Regularization
19 pages
Lesson 5 AI
No ratings yet
Lesson 5 AI
38 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
5 pages
Homework 1: ELEN E6885: Introduction To Reinforcement Learning September 21, 2021
No ratings yet
Homework 1: ELEN E6885: Introduction To Reinforcement Learning September 21, 2021
8 pages
Adobe Scan Nov 18, 2024
No ratings yet
Adobe Scan Nov 18, 2024
13 pages
RL DP and Value and Policy
No ratings yet
RL DP and Value and Policy
4 pages
Active Learning For Reward Estimation in Inverse Reinforcement Learning
No ratings yet
Active Learning For Reward Estimation in Inverse Reinforcement Learning
16 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
DLMAIRIL01 Q4-2024 Session4
No ratings yet
DLMAIRIL01 Q4-2024 Session4
80 pages
Mavrin 19 A
No ratings yet
Mavrin 19 A
11 pages
Sdfesdf
No ratings yet
Sdfesdf
23 pages
Exam Prep Exercises034534123124
No ratings yet
Exam Prep Exercises034534123124
20 pages
RL MJJ
No ratings yet
RL MJJ
32 pages
Bridging The Gap Between Value and Policy Based Reinforcement Learning
No ratings yet
Bridging The Gap Between Value and Policy Based Reinforcement Learning
21 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I
35 pages
Lecture 10 - Overview of RL With A VIP Perspective
No ratings yet
Lecture 10 - Overview of RL With A VIP Perspective
35 pages
Notações Dos Algoritimos
No ratings yet
Notações Dos Algoritimos
10 pages
A Distrib Persp On RL
No ratings yet
A Distrib Persp On RL
19 pages
Trigonometric Ratios to Transformations (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Trigonometric Ratios to Transformations (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
From Everand
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
Shubhankar Paul
No ratings yet
Flexible Quadrotor Unmanned Aerial Vehicles Spatially Distributed Modeling and Delay-Resistant Control
No ratings yet
Flexible Quadrotor Unmanned Aerial Vehicles Spatially Distributed Modeling and Delay-Resistant Control
14 pages
EE530 Image Processing Project #2: 20215259 Kangmin Lee 2023.04.20
No ratings yet
EE530 Image Processing Project #2: 20215259 Kangmin Lee 2023.04.20
13 pages
Fa Ii
No ratings yet
Fa Ii
62 pages
MPC HW1
No ratings yet
MPC HW1
7 pages
I.N. Elements (I) : SCP.: Inés Gil Turón
No ratings yet
I.N. Elements (I) : SCP.: Inés Gil Turón
37 pages
IDS22Bayes Applications
No ratings yet
IDS22Bayes Applications
34 pages
Military Radar 2019 HMOTez02c2EXTpVv98x0aH6zI4nAf7bz8NlshpO0
No ratings yet
Military Radar 2019 HMOTez02c2EXTpVv98x0aH6zI4nAf7bz8NlshpO0
10 pages
Wi-Vi Technology PDF
No ratings yet
Wi-Vi Technology PDF
11 pages
Error Correction
50% (2)
Error Correction
14 pages
DP1 Timetable May 2022
No ratings yet
DP1 Timetable May 2022
2 pages
Progress in Energy and Combustion Science: Steffen Heidenreich, Pier Ugo Foscolo
No ratings yet
Progress in Energy and Combustion Science: Steffen Heidenreich, Pier Ugo Foscolo
24 pages
Guess What British English GW4 HomeBooklet Home-School Resources
100% (3)
Guess What British English GW4 HomeBooklet Home-School Resources
32 pages
Elshaddai Engineering Products Details
No ratings yet
Elshaddai Engineering Products Details
6 pages
Presentation By: Graciella Fae C. Puyaoan
No ratings yet
Presentation By: Graciella Fae C. Puyaoan
17 pages
Technology: Autoform
No ratings yet
Technology: Autoform
8 pages
Whybihar
No ratings yet
Whybihar
9 pages
Madhavi Sem 4 Dissertation
No ratings yet
Madhavi Sem 4 Dissertation
64 pages
CS1402 Ooad
No ratings yet
CS1402 Ooad
9 pages
Savage Worlds RPG Battlestar Galactica
100% (8)
Savage Worlds RPG Battlestar Galactica
61 pages
High Voltage Engineering Theory and Practice by M. Khalifa
No ratings yet
High Voltage Engineering Theory and Practice by M. Khalifa
554 pages
Homework 2
No ratings yet
Homework 2
2 pages
82-Arshad Original Article PDF
No ratings yet
82-Arshad Original Article PDF
34 pages
Sample Thank You Message For Event Attendees PDF
No ratings yet
Sample Thank You Message For Event Attendees PDF
2 pages
Myers Brigg
No ratings yet
Myers Brigg
7 pages
CMR DLL Catch Up Friday
No ratings yet
CMR DLL Catch Up Friday
6 pages
Minutes Portafolio
No ratings yet
Minutes Portafolio
7 pages
A Performance Task in Grade 12A
100% (3)
A Performance Task in Grade 12A
3 pages
2013 Gas-Lift Form - Abstracts and Notes
No ratings yet
2013 Gas-Lift Form - Abstracts and Notes
23 pages
ISCEON ® MO79 - Tablas Termodinamicas (Si)
No ratings yet
ISCEON ® MO79 - Tablas Termodinamicas (Si)
18 pages
Cse Viii Advanced Computer Architectures 06CS81 Notes PDF
No ratings yet
Cse Viii Advanced Computer Architectures 06CS81 Notes PDF
156 pages
The Occult and The Third Reich
No ratings yet
The Occult and The Third Reich
6 pages