0% found this document useful (0 votes)

29 views18 pages

5.4-Reinforcement learning-part3-Q-Learning

This document provides an overview of Q-learning, a reinforcement learning algorithm. Q-learning aims to learn an optimal policy that maximizes total reward by learning the quality (Q) of taking an action (a) in a given state (s). The algorithm initializes a Q(s,a) function and then iteratively updates it based on rewards and choices made in each state during episodes. After many episodes, the optimal policy emerges as the one that takes the action with the highest Q-value in each state. The document demonstrates the algorithm through an example gridworld environment.

Uploaded by

polinati.vinesh2023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views18 pages

5.4-Reinforcement learning-part3-Q-Learning

Uploaded by

polinati.vinesh2023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

NPTEL

Video Course on Machine Learning

Professor Carl Gustaf Jansson, KTH

Week 5 Machine Learning enabled

by prior Theories

Video 5.4 Reinforcement Learning – Part 3 Q-learning

Q Learning

Q-learning is a model-free off-policy TD reinforcement learning algorithm.

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

For any finite Markov decision process (FMDP), Q-learning finds a policy that is optimal in the sense that it
maximizes the expected value of the total reward over any and all successive steps, starting from the current
state.

Q-learning can identify an optimal action-selection policy for any given FMDP, given infinite exploration time
and a partly-random policy.

"Q" names the function Q(s,a) that can be said to stand for the "quality" of an action a taken in a given state s.

Suppose we have the optimal Q-function (s, a) then the optimal policy in state s is argmax a Q(s, a).
Q-learning Algorithm

Initialize Q(s, a) arbitrarily

Repeat (for each episode)
Initialize s
Repeat (for each step of the episode)
Take action a, observe r, s’
Q(s, a) 🡨 Q(s, a) + α[r + γ max Q(s’, a’) – Q(s, a)]
a’
s 🡨 s’

With α =1 or α =1 and γ = 1 the updating formula is simplified

Q(s, a) 🡨 r + γ max Q(s’, a’)
Q(s, a) 🡨 r + max Q(s’, a’)
Example

r=8

r=0
r=-8
States and Actions

States: s Actions: a
1 2 3 4 5
N

6 7 8 9 10
S

11 12 13 14 15
E

16 17 18 19 20
W

Assume that α=1 and γ = 0.5

Initializing the Q(s, a) function

States
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

A N 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c
t S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
i
o W 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
n
s E 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
An Episode

1 2 3 4 5

6 7 8 9 10

11 12 13 14 15

16 17 18 19 20
Calculating new Q(s, a) values

1st step:

2nd step:

3rd step:

4th step:
The Q(s, a) function after the first episode

States
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

A N 0 0 0 0 0 0 -8 0 0 0 0 0 0 0 0 0 0 0 0 0
c
t S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
i
o
W 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
n
s
E 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
A second episode

1 2 3 4 5

6 7 8 9 10

11 12 13 14 15

16 17 18 19 20
Calculating new Q(s, a) values

1st step:

2nd step:

3rd step:

4th step:
The Q(s, a) function after the second episode

States
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

A N 0 0 0 0 0 0 -8 0 0 0 0 0 0 0 0 0 0 0 0 0
c
t S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
i
o W 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
n
s E 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 0 0 0
The Q(s, a) function after a few episodes

States
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

N 0 0 0 0 0 0 -8 -8 -8 0 0 1 2 4 0 0 0 0 0 0
A
c
t S 0 0 0 0 0 0 0.5 1 2 0 0 -8 -8 -8 0 0 0 0 0 0
i
o W 0 0 0 0 0 0 -8 1 2 0 0 -8 0.5 1 0 0 0 0 0 0
n
s E 0 0 0 0 0 0 2 4 8 0 0 1 2 -8 0 0 0 0 0 0
One of the optimal policies

States
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

N 0 0 0 0 0 0 -8 -8 -8 0 0 1 2 4 0 0 0 0 0 0
A
c S 0 0 0 0 0 0 0.5 1 2 0 0 -8 -8 -8 0 0 0 0 0 0
t
i W 0 0 0 0 0 0 -8 1 2 0 0 -8 0.5 1 0 0 0 0 0 0
o
n
E 0 0 0 0 0 0 2 4 8 0 0 1 2 -8 0 0 0 0 0 0
s
An optimal policy graphically

1 2 3 4 5

6 7 8 9 10

11 12 13 14 15

16 17 18 19 20
Another of the optimal policies

States
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

A N 0 0 0 0 0 0 -8 -8 -8 0 0 1 2 4 0 0 0 0 0 0
c
t S 0 0 0 0 0 0 0.5 1 2 0 0 -8 -8 -8 0 0 0 0 0 0
i
o W 0 0 0 0 0 0 -8 1 2 0 0 -8 0.5 1 0 0 0 0 0 0
n
s E 0 0 0 0 0 0 2 4 8 0 0 1 2 -8 0 0 0 0 0 0
Another optimal policy graphically

1 2 3 4 5

6 7 8 9 10

11 12 13 14 15

16 17 18 19 20
NPTEL

Video Course on Machine Learning

Professor Carl Gustaf Jansson, KTH

Thanks for your attention!

The next lecture 5.5 will be on the topic:

Case Based Reasoning

Filippov Theory On Infinitesimal Epsilon-Greedy Q-Learning
No ratings yet
Filippov Theory On Infinitesimal Epsilon-Greedy Q-Learning
66 pages
Q Learning SARSA Deep Q Learning
No ratings yet
Q Learning SARSA Deep Q Learning
4 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
12 pages
Adobe Scan Nov 18, 2024
No ratings yet
Adobe Scan Nov 18, 2024
13 pages
Unit 5
No ratings yet
Unit 5
54 pages
Algorithms To Solve An MDP
No ratings yet
Algorithms To Solve An MDP
24 pages
Deep Learning Binoy-19-3-RL Q Learning
No ratings yet
Deep Learning Binoy-19-3-RL Q Learning
26 pages
Q Learning Ejemplo
100% (1)
Q Learning Ejemplo
11 pages
RL Class Mtech
No ratings yet
RL Class Mtech
67 pages
Unit 5
No ratings yet
Unit 5
65 pages
Intro To Reinforcement Learning - DQ Q AC A3C
No ratings yet
Intro To Reinforcement Learning - DQ Q AC A3C
36 pages
112 Q Learning N
100% (1)
112 Q Learning N
15 pages
Q Learning
No ratings yet
Q Learning
12 pages
AI Seminar RL
No ratings yet
AI Seminar RL
27 pages
Q Learning
No ratings yet
Q Learning
38 pages
Q Learning
No ratings yet
Q Learning
9 pages
RL Examples
No ratings yet
RL Examples
6 pages
Q-Learning in C++
No ratings yet
Q-Learning in C++
4 pages
ML - Unit 3 - Part II
No ratings yet
ML - Unit 3 - Part II
51 pages
3964 Double Q Learning
No ratings yet
3964 Double Q Learning
9 pages
EE 675 Lecture 27th March
No ratings yet
EE 675 Lecture 27th March
4 pages
Reinforcement Learning: Mitchell, Ch. 13 (See Also Barto & Sutton Book On-Line)
No ratings yet
Reinforcement Learning: Mitchell, Ch. 13 (See Also Barto & Sutton Book On-Line)
14 pages
Reinforcement Learning: Mitchell, Ch. 13 (See Also Barto & Sutton Book On-Line)
No ratings yet
Reinforcement Learning: Mitchell, Ch. 13 (See Also Barto & Sutton Book On-Line)
14 pages
Q-Learning: Reinforcement Learning Basic Q-Learning Algorithm Common Modifications
No ratings yet
Q-Learning: Reinforcement Learning Basic Q-Learning Algorithm Common Modifications
22 pages
39-Q Learning Numerical
No ratings yet
39-Q Learning Numerical
13 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
Subtitle
No ratings yet
Subtitle
2 pages
Reinforcement Learning 2
No ratings yet
Reinforcement Learning 2
41 pages
Convergence of Q-Learning PDF
No ratings yet
Convergence of Q-Learning PDF
4 pages
q2B Review
No ratings yet
q2B Review
9 pages
RL DP and Value and Policy
No ratings yet
RL DP and Value and Policy
4 pages
09 - Monte Carlo Learning
No ratings yet
09 - Monte Carlo Learning
24 pages
Q Learning
No ratings yet
Q Learning
38 pages
Lec 09
No ratings yet
Lec 09
26 pages
Q Learn
No ratings yet
Q Learn
5 pages
MAS Lab7 QFA
No ratings yet
MAS Lab7 QFA
10 pages
12 ML Reinforcement Learning Value Based Control
No ratings yet
12 ML Reinforcement Learning Value Based Control
12 pages
Reinforedu
No ratings yet
Reinforedu
46 pages
A Painless Q-Learning Tutorial
No ratings yet
A Painless Q-Learning Tutorial
6 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
Lec 17 SARSA Expected SARSA Q Learning
No ratings yet
Lec 17 SARSA Expected SARSA Q Learning
4 pages
Learning Task
No ratings yet
Learning Task
14 pages
Unit 1
No ratings yet
Unit 1
18 pages
Reinforcement Learning Algorithms in Global Path Planning For Mobile Robot
No ratings yet
Reinforcement Learning Algorithms in Global Path Planning For Mobile Robot
5 pages
Lecture 06
No ratings yet
Lecture 06
98 pages
Q-Learning and Dynamic Treatment Regimes: S.A. Murphy Univ. of Michigan IMS/Bernoulli: July, 2004
No ratings yet
Q-Learning and Dynamic Treatment Regimes: S.A. Murphy Univ. of Michigan IMS/Bernoulli: July, 2004
31 pages
q2B Review Sol
No ratings yet
q2B Review Sol
14 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
52 pages
Lab2 q1 200001064
No ratings yet
Lab2 q1 200001064
2 pages
DD2431 Machine Learning Lab 4: Reinforcement Learning Python Version
No ratings yet
DD2431 Machine Learning Lab 4: Reinforcement Learning Python Version
9 pages
Enhancing Q-Learning Speed Using Selective Signal Injection
No ratings yet
Enhancing Q-Learning Speed Using Selective Signal Injection
4 pages
Q-Learning Algorithm
No ratings yet
Q-Learning Algorithm
13 pages
S18 Reinforcement Learning 2
No ratings yet
S18 Reinforcement Learning 2
46 pages
7 - Reinforcement Learning
No ratings yet
7 - Reinforcement Learning
23 pages
Q-Learning in RL With Openai Gym: Joo Soon Lee
No ratings yet
Q-Learning in RL With Openai Gym: Joo Soon Lee
34 pages
Sections
No ratings yet
Sections
76 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
11 pages
p1 Piotr
No ratings yet
p1 Piotr
7 pages
Rylie & Kylie Adventures: The Missing Tickets
From Everand
Rylie & Kylie Adventures: The Missing Tickets
Monique Scarver
No ratings yet
7.2 Interdisciplinary Inspiration
No ratings yet
7.2 Interdisciplinary Inspiration
17 pages
Assignments For Week 6 2024
No ratings yet
Assignments For Week 6 2024
13 pages
Assignments For Week 5 2024
No ratings yet
Assignments For Week 5 2024
10 pages
4.1-Inductive Learning Based On Symbolic Representations and Week Theories
No ratings yet
4.1-Inductive Learning Based On Symbolic Representations and Week Theories
9 pages
4.2-GeneralizationAsSearch Part 1
No ratings yet
4.2-GeneralizationAsSearch Part 1
17 pages
6 9-DeepLearning
No ratings yet
6 9-DeepLearning
8 pages
Convolutional Neural Networks-Part2
No ratings yet
Convolutional Neural Networks-Part2
21 pages
5.4-Reinforcement Learning-Part2-Learning-Algorithms
No ratings yet
5.4-Reinforcement Learning-Part2-Learning-Algorithms
15 pages
5 2-ExplanationBasedLearning
No ratings yet
5 2-ExplanationBasedLearning
19 pages
Hebbian Learning and Associative Memory
No ratings yet
Hebbian Learning and Associative Memory
13 pages
Hopfield Networks and Boltzman Machines-Part 2
No ratings yet
Hopfield Networks and Boltzman Machines-Part 2
13 pages
Hopfield Networks and Boltzman Machines-Part 1
100% (1)
Hopfield Networks and Boltzman Machines-Part 1
13 pages
Perceptrons
No ratings yet
Perceptrons
11 pages
Learning in A Feed Forward Multiple Layer ANN - Backpropagation
No ratings yet
Learning in A Feed Forward Multiple Layer ANN - Backpropagation
18 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
18 pages
Model of Neuron in An ANN
No ratings yet
Model of Neuron in An ANN
12 pages
SAP Fiori Catalogs Hands On 1732979112
No ratings yet
SAP Fiori Catalogs Hands On 1732979112
21 pages
Chapter 3 - Authentication, Authorization, and Accounting - Compressed
No ratings yet
Chapter 3 - Authentication, Authorization, and Accounting - Compressed
64 pages
Chapter 2 System Analysis and Design - SDLC Phases Introduction
No ratings yet
Chapter 2 System Analysis and Design - SDLC Phases Introduction
5 pages
WS en 2018.07 Rev.6 64636655
No ratings yet
WS en 2018.07 Rev.6 64636655
88 pages
Cashify Whitepaper 2020
No ratings yet
Cashify Whitepaper 2020
28 pages
Css OB
No ratings yet
Css OB
14 pages
C TS4FI 2023-Demo
No ratings yet
C TS4FI 2023-Demo
5 pages
ISSM535Q Week1 PDF
No ratings yet
ISSM535Q Week1 PDF
42 pages
Proceedings of Spie
No ratings yet
Proceedings of Spie
20 pages
Analytics Assignment: To Access The Google Analytics Demo Account
No ratings yet
Analytics Assignment: To Access The Google Analytics Demo Account
2 pages
Csci5561 Spring2025 hw3
No ratings yet
Csci5561 Spring2025 hw3
8 pages
Case Study-Question-3
No ratings yet
Case Study-Question-3
2 pages
Zabbix Installation
No ratings yet
Zabbix Installation
10 pages
Quectel-Antenna-Brochure - V1 7 4
No ratings yet
Quectel-Antenna-Brochure - V1 7 4
20 pages
WMN Chapter 1
No ratings yet
WMN Chapter 1
23 pages
SIWES Report (Corrected)
100% (1)
SIWES Report (Corrected)
23 pages
Illustrated Microsoft Office 365 and Word 2016 Comprehensive 1st Edition
No ratings yet
Illustrated Microsoft Office 365 and Word 2016 Comprehensive 1st Edition
405 pages
Plke VF Manteinance
No ratings yet
Plke VF Manteinance
17 pages
Dasari Teja Sree: Career Objective
No ratings yet
Dasari Teja Sree: Career Objective
3 pages
Solutions For Problems in Mathematical Structures For Computer Science, 7th Edition - Gersting
No ratings yet
Solutions For Problems in Mathematical Structures For Computer Science, 7th Edition - Gersting
31 pages
Using Social Media Images As Data in Social Science Research
No ratings yet
Using Social Media Images As Data in Social Science Research
23 pages
B1 LEVEL - EnglishFileQskillsRE - RegistrationForStudents
No ratings yet
B1 LEVEL - EnglishFileQskillsRE - RegistrationForStudents
24 pages
Varietal Discrimination of Guava Psidium Guajava Leaves Using Multi Features Analysis
No ratings yet
Varietal Discrimination of Guava Psidium Guajava Leaves Using Multi Features Analysis
19 pages
Usim Conformance Testing Spec
No ratings yet
Usim Conformance Testing Spec
99 pages
PRBT-0348 D20AC Peripheral Discontinuance Notice V100 R0
No ratings yet
PRBT-0348 D20AC Peripheral Discontinuance Notice V100 R0
3 pages
MCP User Manual
No ratings yet
MCP User Manual
13 pages
Wbsn-2400 User Guide: September 2011
No ratings yet
Wbsn-2400 User Guide: September 2011
42 pages
Installation Manual & Operation Instructions: Wheel Balancer Pse Wb-260
No ratings yet
Installation Manual & Operation Instructions: Wheel Balancer Pse Wb-260
42 pages
Tum 4560
No ratings yet
Tum 4560
32 pages
The Girl With The Broken Heart Lurlene Mcdaniel Download
No ratings yet
The Girl With The Broken Heart Lurlene Mcdaniel Download
27 pages

5.4-Reinforcement learning-part3-Q-Learning

Uploaded by

5.4-Reinforcement learning-part3-Q-Learning

Uploaded by

NPTEL

Video Course on Machine Learning

Professor Carl Gustaf Jansson, KTH

Week 5 Machine Learning enabled

Video 5.4 Reinforcement Learning – Part 3 Q-learning

Q-learning is a model-free off-policy TD reinforcement learning algorithm.

Initialize Q(s, a) arbitrarily

With α =1 or α =1 and γ = 1 the updating formula is simplified

Assume that α=1 and γ = 0.5

Video Course on Machine Learning

Professor Carl Gustaf Jansson, KTH

Thanks for your attention!

The next lecture 5.5 will be on the topic:

Case Based Reasoning

You might also like