Reinforcement Learning Notes
Reinforcement Learning Notes
Monte-Carlo Methods:
for any estimation method that relies on
repeated random sampling.
12 ?
13 ? o
14
i
71
72 ?
value function,
* (s) I EA [G 1St-s]
Ge = Pet, +84++1
{Ge = o, if't' is last}
Problem formulation,
argmax q, Cs,a)
a
so No
9,
Az
iteration (GPI)
To → IT, IT, → . -
Improvement
Q CS, A)
S, A Returns CS, A) HIT/Stic IT(S)
Epsilon-soft policies,
IT
State MMM
Initialize
+ ← an arbitrary f- soft policy
G ← 8Gt Rtt,
Append G to returns (St, At)
Importance Sampling,
E.CRY
mob
l Mr
= Eb [XP CX)]
n
nip (ni)
at