0% found this document useful (0 votes)

31 views18 pages

37 RL

machine learning

Uploaded by

prachi parihar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views18 pages

37 RL

machine learning

Uploaded by

prachi parihar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Reinforcement Learning

Source: T. Mitchell, Machine Learning, Chapter 13.

Overview
• Supervised Learning: Immediate feedback (labels provided for every input).

• Unsupervised Learning: No feedback (no labels provided).

• Reinforcement Learning: Delayed scalar feedback (a number called reward).

• RL deals with agents that must sense & act upon their environment.
This is combines classical AI and machine learning techniques.
It the most comprehensive problem setting.

• Examples:
• A robot cleaning my room and recharging its battery
• Robot-soccer
• How to invest in shares
• Modeling the economy through rational agents
• Learning how to fly a helicopter
• Scheduling planes to their destinations
• and so on
The Big Picture

Your action influences the state of the world which determines its reward
Complications
• The outcome of your actions may be uncertain

• You may not be able to perfectly sense the state of the world

• The reward may be stochastic.

• Reward is delayed (i.e. finding food in a maze)

• You may have no clue (model) about how the world responds to your actions.

• You may have no clue (model) of how rewards are being paid off.

• The world may change while you try to learn it

• How much time do you need to explore uncharted territory before you
exploit what you have learned?
The Task
• To learn an optimal policy that maps states of the world to actions of the agent.
I.e., if this patch of room is dirty, I clean it. If my battery is empty, I recharge it.

• What is it that the agent tries to optimize?

Answer: the total future discounted reward:

Note: immediate reward is worth more than future reward.

What would happen to mouse in a maze with gamma = 0 ?
Value Function
• Let’s say we have access to the optimal value function that computes the
total future discounted reward

• What would be the optimal policy ?

• Answer: we choose the action that maximizes:

• We assume that we know what the reward will be if we perform action “a” in
state “s”:

• We also assume we know what the next state of the world will be if we perform
action “a” in state “s”:
Example I
• Consider some complicated graph, and we would like to find the shortest
path from a node Si to a goal node G.

• Traversing an edge will cost you “length edge” dollars.

• The value function encodes the total remaining Si

distance to the goal node from any node s, i.e.
V(s) = “1 / distance” to goal from s.

• If you know V(s), the problem is trivial. You simply

G
choose the node that has highest V(s).
Example II
Find your way to the goal.
Q-Function Bellman Equation:

• One approach to RL is then to try to estimate V*(s).

• However, this approach requires you to know r(s,a) and delta(s,a).

• This is unrealistic in many real problems. What is the reward if a robot

is exploring mars and decides to take a right turn?

• Fortunately we can circumvent this problem by exploring and experiencing

how the world reacts to our actions. We need to learn r & delta.

• We want a function that directly learns good state-action pairs, i.e.

what action should I take in this state. We call this Q(s,a).

• Given Q(s,a) it is now trivial to execute the optimal policy, without knowing
r(s,a) and delta(s,a). We have:
Example II

Check that
Q-Learning

• This still depends on r(s,a) and delta(s,a).

• However, imagine the robot is exploring its environment, trying new actions
as it goes.

• At every step it receives some reward “r”, and it observes the environment
change into a new state s’ for action a.
How can we use these observations, (s,a,s’,r) to learn a model?

s’=st+1
Q-Learning
s’=st+1

• This equation continually estimates Q at state s consistent with an estimate

of Q at state s’, one step in the future: temporal difference (TD) learning.

• Note that s’ is closer to goal, and hence more “reliable”, but still an estimate itself.

• Updating estimates based on other estimates is called bootstrapping.

• We do an update after each state-action pair. I.e., we are learning online!

• We are learning useful things about explored state-action pairs. These are typically
most useful because they are likely to be encountered again.

• Under suitable conditions, these updates can actually be proved to converge to the
real answer.
Example Q-Learning

Q-learning propagates Q-estimates 1-step backwards

Exploration / Exploitation

• It is very important that the agent does not simply follow the current policy
when learning Q. (off-policy learning).The reason is that you may get stuck
in a suboptimal solution. I.e. there may be other solutions out there that you
have never seen.

• Hence it is good to try new things so now and then, e.g.

If T large lots of exploring, if T small follow current policy.
One can decrease T over time.
Improvements
• One can trade-off memory and computation by cashing (s,s’,r) for observed
transitions. After a while, as Q(s’,a’) has changed, you can “replay” the update:

• One can actively search for state-action pairs for which Q(s,a) is
expected to change a lot (prioritized sweeping).

• One can do updates along the sampled path much further back than just
one step ( learning).
Extensions
• To deal with stochastic environments, we need to maximize
expected future discounted reward:

• Often the state space is too large to deal with all states. In this case we
need to learn a function:

• Neural network with back-propagation have been quite successful.

• For instance, TD-Gammon is a back-gammon program that plays at expert level.

state-space very large, trained by playing against itself, uses NN to approximate
value function, uses TD(lambda) for learning.
More on Function Approximation
• For instance: linear function:

The features Phi are fixed measurements of the state (e.g. # stones on the board).
We only learn the parameters theta.

• Update rule: (start in state s, take action a, observe reward r and end up in state s’)

change in Q
Conclusion
• Reinforcement learning addresses a very broad and relevant question:
How can we learn to survive in our environment?

• We have looked at Q-learning, which simply learns from experience.

No model of the world is needed.

• We made simplifying assumptions: e.g. state of the world only depends on

last state and action. This is the Markov assumption. The model is called
a Markov Decision Process (MDP).

• We assumed deterministic dynamics, reward function, but the world really is

stochastic.

• There are many extensions to speed up learning.

• There have been many successful real world applications.

http://elsy.gdan.pl/index.php?option=com_content&task=view&id=20&Itemid=39

Python Full Notes Apna College
100% (2)
Python Full Notes Apna College
80 pages
Ai (It) Unit-5
No ratings yet
Ai (It) Unit-5
43 pages
CMPE257 - W10C13 - Reinforcement Learning
No ratings yet
CMPE257 - W10C13 - Reinforcement Learning
161 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
50 pages
Intro To Reinforcement Learning
No ratings yet
Intro To Reinforcement Learning
56 pages
Introduction To Reinforcement Learning
100% (1)
Introduction To Reinforcement Learning
52 pages
Sdfesdf
No ratings yet
Sdfesdf
23 pages
Lec 04 Reinforcement Learning
No ratings yet
Lec 04 Reinforcement Learning
57 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
35 pages
Unit 1
No ratings yet
Unit 1
18 pages
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
52 pages
10 Deep Reinforcement
No ratings yet
10 Deep Reinforcement
40 pages
ML - Unit 3 - Part II
No ratings yet
ML - Unit 3 - Part II
51 pages
Artificial Intelligence: Lecture 10 - Reinforcement Learning Prof. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 10 - Reinforcement Learning Prof. Shivanjali Khare
45 pages
Learning Task
No ratings yet
Learning Task
14 pages
Unit 5d - Deep Reinforcement Learning
No ratings yet
Unit 5d - Deep Reinforcement Learning
52 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
52 pages
Sections
No ratings yet
Sections
76 pages
A Crash Course On Reinforcement Learning - Felix Wagner
No ratings yet
A Crash Course On Reinforcement Learning - Felix Wagner
84 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
30 pages
MT45115 S 1512 Presentation
100% (1)
MT45115 S 1512 Presentation
1,284 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
45 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
17 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
Lec17 ReinforcementLearning
No ratings yet
Lec17 ReinforcementLearning
58 pages
21 - Reinforcement Learning
No ratings yet
21 - Reinforcement Learning
25 pages
7 - Reinforcement Learning
No ratings yet
7 - Reinforcement Learning
23 pages
11-DL-Deep Learning For Reinforcement Learning
No ratings yet
11-DL-Deep Learning For Reinforcement Learning
47 pages
Lecture 9 Reiforcement Learning
No ratings yet
Lecture 9 Reiforcement Learning
29 pages
Reinforcement Learning: Russell and Norvig: CH 21
No ratings yet
Reinforcement Learning: Russell and Norvig: CH 21
16 pages
Unit Vi
No ratings yet
Unit Vi
17 pages
Reinforcement Learning Mastery Path
No ratings yet
Reinforcement Learning Mastery Path
18 pages
Python Full Stack
No ratings yet
Python Full Stack
37 pages
Lecture Notes v1.0 687 F22
No ratings yet
Lecture Notes v1.0 687 F22
115 pages
Module 1
No ratings yet
Module 1
72 pages
Introduction To Microprocessor 8086: Objective
No ratings yet
Introduction To Microprocessor 8086: Objective
5 pages
Unit 5
No ratings yet
Unit 5
10 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
10 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
Lecture 30 Reinforcement-Learning
No ratings yet
Lecture 30 Reinforcement-Learning
50 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
A Beginner's Guide To Deep Reinforcement Learning: Skymind - Ai
No ratings yet
A Beginner's Guide To Deep Reinforcement Learning: Skymind - Ai
23 pages
Worksheet Solutions: Number Play: Fill in The Blanks
No ratings yet
Worksheet Solutions: Number Play: Fill in The Blanks
3 pages
Unit 5
No ratings yet
Unit 5
45 pages
Reinforcement Learning (Part 2) : Nguyen Do Van, PHD
No ratings yet
Reinforcement Learning (Part 2) : Nguyen Do Van, PHD
46 pages
CH - 6 Python Fundamentals
No ratings yet
CH - 6 Python Fundamentals
3 pages
Reinf 2
No ratings yet
Reinf 2
4 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I PDF
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I PDF
38 pages
Reinforcement Learning: Russell and Norvig: CH 21
No ratings yet
Reinforcement Learning: Russell and Norvig: CH 21
16 pages
Reinforcement LN-6
No ratings yet
Reinforcement LN-6
13 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
46 pages
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
No ratings yet
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
11 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I
35 pages
Dijkstra's Algorithm Lab Report
No ratings yet
Dijkstra's Algorithm Lab Report
6 pages
Reinforcement Learning: Instructor: Max Welling
No ratings yet
Reinforcement Learning: Instructor: Max Welling
18 pages
Write A C Program To Simulate Lexical Analyzer To Validating A Given Input String.
No ratings yet
Write A C Program To Simulate Lexical Analyzer To Validating A Given Input String.
8 pages
BCA Syllabus (First Year) - 1
No ratings yet
BCA Syllabus (First Year) - 1
127 pages
Flat 1
No ratings yet
Flat 1
16 pages
Prediction of Road Traffic Congestion Based On Random Forest
No ratings yet
Prediction of Road Traffic Congestion Based On Random Forest
4 pages
Encryption-Decryption Tool Mini Project Report by Vishwajeet
No ratings yet
Encryption-Decryption Tool Mini Project Report by Vishwajeet
33 pages
Quiz About Top MCQs On Sorting Algorithms With Answers
No ratings yet
Quiz About Top MCQs On Sorting Algorithms With Answers
6 pages
Computer Architecture and Organization Presented by Mr.P.Prashan T Ap/Aiml
No ratings yet
Computer Architecture and Organization Presented by Mr.P.Prashan T Ap/Aiml
365 pages
L7 L8 L9 Matching Main
No ratings yet
L7 L8 L9 Matching Main
51 pages
Compiler Construction: Instructor: Aunsia Khan
No ratings yet
Compiler Construction: Instructor: Aunsia Khan
35 pages
Binary Tree
No ratings yet
Binary Tree
18 pages
Untitled Document
No ratings yet
Untitled Document
6 pages
Dsa Module 2 Notes
No ratings yet
Dsa Module 2 Notes
29 pages
Unit 2 MCQ
No ratings yet
Unit 2 MCQ
24 pages
Introduction of AIML
No ratings yet
Introduction of AIML
22 pages
1cO1CO2: A CO1CO1Co1
No ratings yet
1cO1CO2: A CO1CO1Co1
4 pages
Ensf 480 Midterm 2018 Solutions
No ratings yet
Ensf 480 Midterm 2018 Solutions
6 pages
Experiment 4
No ratings yet
Experiment 4
5 pages
Ripemd 160
No ratings yet
Ripemd 160
17 pages
G5AIAI - Introduction To AI Exam 2002/2003: The Rubric For This Examination Should Read As Follows
No ratings yet
G5AIAI - Introduction To AI Exam 2002/2003: The Rubric For This Examination Should Read As Follows
29 pages
BSC C.S (H) 6th Semester Syllabus
No ratings yet
BSC C.S (H) 6th Semester Syllabus
6 pages
hw05 Solution PDF
No ratings yet
hw05 Solution PDF
8 pages
Quant Js
No ratings yet
Quant Js
3 pages
Quiz VI - Attempt Review
No ratings yet
Quiz VI - Attempt Review
3 pages