Reinforcement Learning: Mitchell, Ch. 13 (See Also Barto & Sutton Book On-Line)

Reinforcement learning allows an agent to learn from experience through trial-and-error interactions with its environment. The agent receives feedback in the form of rewards or punishments, without being explicitly told which actions to take. The goal is for the agent to learn a policy that maximizes its total reward over time through exploration and exploitation. Markov decision processes provide a framework for modeling reinforcement learning problems and finding optimal policies using techniques like value iteration, policy iteration, and Q-learning. Q-learning is a model-free approach that can learn directly from experience without knowing the transition and reward functions of the environment.

Uploaded by

Anil Sahu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views14 pages

Reinforcement Learning: Mitchell, Ch. 13 (See Also Barto & Sutton Book On-Line)

Uploaded by

Anil Sahu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 14

Reinforcement Learning

Mitchell, Ch. 13
(see also Barto & Sutton book on-line)
Rationale
• Learning from experience
• Adaptive control
• Examples not explicitly labeled, delayed
feedback
• Problem of credit assignment – which
action(s) led to payoff?
• tradeoff short-term thinking (immediate
reward) for long-term consequences
Agent Model
• Transition function – T:SxA->S, environment
• Reward function R:SxA->real, payoff
• Stochastic but Markov
=

• Policy=decision function, :S->A

• “rationality” – maximize long term expected
reward
– Discounted long-term reward (convergent series)
– Alternatives: finite time horizon, uniform weights
R,T
Markov Decision Processes (MDPs)
• if know R and T(=P), solve for value func V(s)
• policy evaluation
• Bellman Equations
• dynamic programming (|S| eqns in |S| unknowns)
MDPs
• finding optimal policies

• Value iteration – update V(s) iteratively until

(s)=argmaxa V(s) stops changing

• Policy iteration – iterate between choosing  and

updating V over all states

• Monte Carlo sampling: run random scenarios

using  and take average rewards as V(s)
Q-learning: model-free
• Q-function: reformulate as value function
of S and A, independent of R and T(=)
Q-learning algorithm
Convergence
• Theorem: Q converges to Q*, after visiting
each state infinitely often (assuming |r|<)
• Proof: with each iteration (where all SxA
visited), magnitude of largest error in Q
table decreases by at least 
• “on-policy” Training
– exploitation vs. exploration
– will relevant parts of the space be explored if stick to
current (sub-optimal) policy?
– -greedy policies: choose action with max Q value
most of the time, or random action  % of the time
• “off-policy”
– learn from simulations or traces
– SARSA: training example database: <s,a,r,s’,a’>
• Actor-critic
Non-deterministic case
Temporal Difference Learning
• convergence is not the problem
• representation of large Q table is the
problem (domains with many states or
continuous actions)
• how to represent large Q tables?
– neural network
– function approximation
– basis functions
– hierarchical decomposition of state space

Reinforcement Learning: Mitchell, Ch. 13 (See Also Barto & Sutton Book On-Line)
No ratings yet
Reinforcement Learning: Mitchell, Ch. 13 (See Also Barto & Sutton Book On-Line)
14 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
45 pages
37 RL
No ratings yet
37 RL
18 pages
11-DL-Deep Learning For Reinforcement Learning
No ratings yet
11-DL-Deep Learning For Reinforcement Learning
47 pages
5SC28 L7 Machine Learning
No ratings yet
5SC28 L7 Machine Learning
61 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
46 pages
10 Deep Reinforcement
No ratings yet
10 Deep Reinforcement
40 pages
Reinforcement Learning: Instructor: Max Welling
No ratings yet
Reinforcement Learning: Instructor: Max Welling
18 pages
Learning Task
No ratings yet
Learning Task
14 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
52 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
50 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
No ratings yet
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
11 pages
DL Unit 6 QP Solution
No ratings yet
DL Unit 6 QP Solution
15 pages
ML - Unit 3 - Part II
No ratings yet
ML - Unit 3 - Part II
51 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
A Crash Course On Reinforcement Learning - Felix Wagner
No ratings yet
A Crash Course On Reinforcement Learning - Felix Wagner
84 pages
7 - Reinforcement Learning
No ratings yet
7 - Reinforcement Learning
23 pages
Lecture 10
No ratings yet
Lecture 10
25 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
Jia Zhou - JMLR 2023
No ratings yet
Jia Zhou - JMLR 2023
61 pages
RL DP and Value and Policy
No ratings yet
RL DP and Value and Policy
4 pages
12 ML Reinforcement Learning Value Based Control
No ratings yet
12 ML Reinforcement Learning Value Based Control
12 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
Unit 5d - Deep Reinforcement Learning
No ratings yet
Unit 5d - Deep Reinforcement Learning
52 pages
Unit 5
No ratings yet
Unit 5
54 pages
Add-On DRL CS06
No ratings yet
Add-On DRL CS06
23 pages
Reinforcement LN-6
No ratings yet
Reinforcement LN-6
13 pages
Reinforcement Learning (Part 2) : Nguyen Do Van, PHD
No ratings yet
Reinforcement Learning (Part 2) : Nguyen Do Van, PHD
46 pages
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
52 pages
Lec17 ReinforcementLearning
No ratings yet
Lec17 ReinforcementLearning
58 pages
Lecture Notes v1.0 687 F22
No ratings yet
Lecture Notes v1.0 687 F22
115 pages
Sdfesdf
No ratings yet
Sdfesdf
23 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
19 - Monte Carlo and Temporal Difference For Markov Decision Processes
No ratings yet
19 - Monte Carlo and Temporal Difference For Markov Decision Processes
57 pages
DD2431 Machine Learning Lab 4: Reinforcement Learning Python Version
No ratings yet
DD2431 Machine Learning Lab 4: Reinforcement Learning Python Version
9 pages
Rule-Based Reinforcement Learning Augmented by External Knowledge
No ratings yet
Rule-Based Reinforcement Learning Augmented by External Knowledge
7 pages
Q Learning SARSA Deep Q Learning
No ratings yet
Q Learning SARSA Deep Q Learning
4 pages
DLMAIRIL01 Q4-2024 Session4
No ratings yet
DLMAIRIL01 Q4-2024 Session4
80 pages
Lecture 06
No ratings yet
Lecture 06
98 pages
Lecture 9 Reiforcement Learning
No ratings yet
Lecture 9 Reiforcement Learning
29 pages
Sections
No ratings yet
Sections
76 pages
Lecture 30 Reinforcement-Learning
No ratings yet
Lecture 30 Reinforcement-Learning
50 pages
Serge Levine Course Introduction To Reinforcement Learning 3: RL Introduction
No ratings yet
Serge Levine Course Introduction To Reinforcement Learning 3: RL Introduction
46 pages
Q - Networks (1) 31 50
No ratings yet
Q - Networks (1) 31 50
20 pages
RL Lecturer
No ratings yet
RL Lecturer
38 pages
5SC28 Machine Learning For Systems and Control
No ratings yet
5SC28 Machine Learning For Systems and Control
68 pages
Reinforcement Learning: EEE 485/585 Statistical Learning and Data Analytics
No ratings yet
Reinforcement Learning: EEE 485/585 Statistical Learning and Data Analytics
15 pages
Unit 1
No ratings yet
Unit 1
18 pages
Notes For Module 4 and 5
No ratings yet
Notes For Module 4 and 5
9 pages
New CZ3005 Module 5 - Reinforcement Learning
No ratings yet
New CZ3005 Module 5 - Reinforcement Learning
31 pages
MAS Lab7 QFA
No ratings yet
MAS Lab7 QFA
10 pages
RL Unitwise Imp Questions
No ratings yet
RL Unitwise Imp Questions
4 pages
Intro To Reinforcement Learning
No ratings yet
Intro To Reinforcement Learning
56 pages
Filippov Theory On Infinitesimal Epsilon-Greedy Q-Learning
No ratings yet
Filippov Theory On Infinitesimal Epsilon-Greedy Q-Learning
66 pages
What Is TD Learning
No ratings yet
What Is TD Learning
15 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet

Reinforcement Learning: Mitchell, Ch. 13 (See Also Barto & Sutton Book On-Line)

Uploaded by

Reinforcement Learning: Mitchell, Ch. 13 (See Also Barto & Sutton Book On-Line)

Uploaded by

Reinforcement Learning

• Policy=decision function, :S->A

• Value iteration – update V(s) iteratively until

• Policy iteration – iterate between choosing  and

• Monte Carlo sampling: run random scenarios

You might also like