Module3 TD Methods
Module3 TD Methods
Learning
Dr. D. John Pradeep
Associate Professor
VIT-AP University
TD-Learning
• TD learning is a combination of Monte Carlo ideas and dynamic
programming (DP) ideas
• TD methods can learn directly from raw experience without a
model of the environment’s dynamics like Monte Carlo methods
• TD methods update estimates based in part on other learned
estimates, without waiting for a final outcome (they bootstrap) –
DP
Dynamic programming
Full back up
Boot strapping
Monte Carlo Methods
St
Sample back up Boot strapping
TD(0) - error
St+1
TD - Methods
Sarsa – On policy TD control
• Let an episode consists of an alternating sequence of states and
state–action pairs as shown below