Reinforcement Learning

Reinforcement learning involves an agent learning through trial-and-error interactions with an environment. The goal is to maximize a numerical reward signal by learning how to map situations to actions. Reinforcement learning contains elements like a policy, which is the agent's behavior mapping situations to actions, a reward signal defining the goal, and a value function estimating future rewards. Model-free methods learn without knowing the environment's dynamics, while model-based methods learn an environment model to perform planning. Exploration involves discovering new information and exploitation uses known information to maximize reward. Generalized policy iteration alternates between policy evaluation to estimate value functions and policy improvement to find better policies based on the evaluation.

Uploaded by

Raghu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views30 pages

Reinforcement Learning

Uploaded by

Raghu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

Reinforcement Learning

Raghunath Reddy, IHub Data Foundation

Reinforcement learning
• Reinforcement learning is learning what to do
—how to map situations to actions
—so as to maximize a numerical reward signal.
• trial-and-error search
• delayed reward
• Absence of model
• Partially observable state
• Have large number of states
2
Source: Richard S. Sutton & Andrew G. Barto (2018), Reinforcement Learning: An Introduction, 2nd Edition, A Bradford Book.
Reinforcement Learning (RL)
Computer Science

Engineering Machine Neuroscience

Learning
Optimal Reward
Control System
Reinforcement
Learning
Operations Classical/Operant
Research Conditioning
Mathematics Bounded Psychology
Rationality

Economics

3
Source: David Silver (2015), Introduction to reinforcement learning, https://www.youtube.com/playlist?list=PLqYmG7hTraZDM-OYHWgPebj2MfCFzFObQ
Branches of Machine Learning (ML)
Reinforcement Learning (RL)
• No Labels
• Labeled data • No feedback
• Direct feedback • Find hidden structure
• Predict

Supervised Unsupervised
Learning Learning
Machine
Learning

Reinforcement
Learning • Decision process
• Reward system
• Learn series of actions
4
Source: David Silver (2015), Introduction to reinforcement learning, https://www.youtube.com/playlist?list=PLqYmG7hTraZDM-OYHWgPebj2MfCFzFObQ
Reinforcement learning Applications
• RL solve problems that are sequential and goal is long term such as
game playing robotics resource management, industrial automation
etc.
• Not Suitable for problems where the solutions can be directly obtains
and we know complete in for supervised learning like object
detection, fraud detection etc.
Elements of Reinforcement Learning
• Agent
• Environment
• Policy
• Reward signal
• Value function
• Model
Elements of Reinforcement Learning
• Policy
• Agent’s behavior
• It is a map from state to action
• Deterministic policy: a = π(s)
• Stochastic policy: π(a|s) = P[At = a|St = s]

• Reward signal
• The goal of a reinforcement learning problem
• Value function
• How good is each state and/or action
• A prediction of future reward
• Model
• Agent’s representation of the environment
7
Source: Richard S. Sutton & Andrew G. Barto (2018), Reinforcement Learning: An Introduction, 2nd Edition, A Bradford Book.
Reinforcement Learning
• Value Based
• No Policy (Implicit)
• Value Function
• Policy Based
• Policy
• No Value Function
• Actor Critic
• Policy
• Value Function

8
Source: David Silver (2015), Introduction to reinforcement learning, https://www.youtube.com/playlist?list=PLqYmG7hTraZDM-OYHWgPebj2MfCFzFObQ
Examples of Rewards
• Make a humanoid robot walk
• +ve reward for forward motion
• -ve reward for falling over
• Play may different Atari games better than humans
• +/-ve reward for increasing/decreasing score
• Manage an investment portfolio
• +ve reward for each $ in bank

9
Source: David Silver (2015), Introduction to reinforcement learning, https://www.youtube.com/playlist?list=PLqYmG7hTraZDM-OYHWgPebj2MfCFzFObQ
Reinforcement Learning
• Model Free
• Policy and/or Value Function
• No Model
• Model Based
• Policy and/or Value Function
• Model

10
Source: David Silver (2015), Introduction to reinforcement learning, https://www.youtube.com/playlist?list=PLqYmG7hTraZDM-OYHWgPebj2MfCFzFObQ
Learning and Planning
• Two fundamental problems in
sequential decision making
• Reinforcement Learning
• The environment is initially unknown
• The agent interacts with environment
• The agent improves its policy
• Planning
• A model of the environment is known
• The agent performs computations with its model
(without any external interaction)
• The agent improves its policy
• a.k.a deliberation, reasoning, introspection, pondering,
thought, search
11
Source: David Silver (2015), Introduction to reinforcement learning, https://www.youtube.com/playlist?list=PLqYmG7hTraZDM-OYHWgPebj2MfCFzFObQ
Exploration and Exploitation
• Reinforcement learning is like trial-and-error learning
• The agent should discover a good policy
• From its experiences of the environment
• Without losing too much reward along the way
• Exploration finds more information about the
environment
• Exploitation exploits known information to maximise
reward
• It is usually important to explore as well as exploit

12
Source: David Silver (2015), Introduction to reinforcement learning, https://www.youtube.com/playlist?list=PLqYmG7hTraZDM-OYHWgPebj2MfCFzFObQ
Exploration and Exploitation
Examples
• Restaurant Selection
• Exploitation: Go to your favorite restaurant
• Exploration: Try a new restaurant Online Banner
• Advertisements
• Exploitation: Show the most successful advert
• Exploration: Show a different advert
• Oil Drilling
• Exploitation: Drill at the best known location
• Exploration: Drill at a new location
• Game Playing
• Exploitation: Play the move you believe is best
• Exploration: Play an experimental move
13
Source: David Silver (2015), Introduction to reinforcement learning, https://www.youtube.com/playlist?list=PLqYmG7hTraZDM-OYHWgPebj2MfCFzFObQ
Prediction and Control
• Prediction: evaluate the future
• Given a policy
• Control: optimize the future
• Find the best policy

14
Source: David Silver (2015), Introduction to reinforcement learning, https://www.youtube.com/playlist?list=PLqYmG7hTraZDM-OYHWgPebj2MfCFzFObQ
Generalized Policy Iteration (GPI)
evaluation
V → v𝜋

𝜋 V
𝜋 → greedy (V)

improvement

𝜋* v*
15
Source: Richard S. Sutton & Andrew G. Barto (2018), Reinforcement Learning: An Introduction, 2nd Edition, A Bradford Book.
Generalized Policy Iteration (GPI)
Any iteration of policy evaluation and policy improvement,
independent of their granularity.

evaluation
Ｑ → q𝜋

𝜋 Ｑ
𝜋 → greedy ( Ｑ )

improvement
16
Source: Richard S. Sutton & Andrew G. Barto (2018), Reinforcement Learning: An Introduction, 2nd Edition, A Bradford Book.
Bellman Equation
Bellman equation decomposes the value function into two parts, the immediate reward plus the discounted
future values.
This equation simplifies the computation of the value function, such that rather than summing over multiple
time steps, we can find the optimal solution of a complex problem by breaking it down into simpler,
recursive subproblems and finding their optimal solutions.
Bellman Equation
Example
Example
Example
Q Learning Algorithm
Q learning Example
Example Continued
Example Continued
Example Continued
The transition rule of Q learning is a very simple formula:
Q(state, action) = R(state, action) + Gamma * Max[Q(next state, all actions)]
Example Continued
Look at the second row (state 1) of matrix R. There are two possible actions for the current state 1: go to state
3, or go to state 5. By random selection, we select to go to 5 as our action.
Now let’s imagine what would happen if our agent were in state 5. Look at the sixth row of the reward matrix
R (i.e. state 5). It has 3 possible actions: go to states 1, 4, or 5.
Q(state, action) = R(state, action) + Gamma * Max[Q(next state, all actions)]
Q(1, 5) = R(1, 5) + 0.8 * Max[Q(5, 1), Q(5, 4), Q(5, 5)] = 100 + 0.8 * 0 = 100
Example Continued
For the next episode, we start with a randomly chosen initial state. This time, we have state 3 as our initial
state. Look at the fourth row of matrix R; it has 3 possible actions: go to states 1, 2, or 4. By random
selection, we select to go to state 1 as our action. Now we imagine that we are in state 1. Look at the
second row of reward matrix R (i.e. state 1). It has 2 possible actions: go to state 3 or state 5. Then, we
compute the Q value:
Q(state, action) = R(state, action) + Gamma * Max[Q(next state, all actions)]
Q(3, 1) = R(3, 1) + 0.8 * Max[Q(1, 3), Q(1, 5)] = 0 + 0.8 * Max(0, 100) = 80
We use the updated matrix Q from the last episode. Q(1, 3) = 0 and Q(1, 5) = 100. The result of the
computation is Q(3, 1) = 80 because the reward is zero. The matrix Q becomes:
Example Continued

Example
Thank You

Ai (It) Unit-5
No ratings yet
Ai (It) Unit-5
43 pages
CMPE257 - W10C13 - Reinforcement Learning
No ratings yet
CMPE257 - W10C13 - Reinforcement Learning
161 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
RL-Endterm Report - Mridul Agarwal
No ratings yet
RL-Endterm Report - Mridul Agarwal
27 pages
Lecture 9 - Reinforced Learning
No ratings yet
Lecture 9 - Reinforced Learning
18 pages
Sara Reinforcement Learning
No ratings yet
Sara Reinforcement Learning
69 pages
A (Long) Peek Into Reinforcement Learning - Lil'Log
No ratings yet
A (Long) Peek Into Reinforcement Learning - Lil'Log
23 pages
Machine Learning Seminar Report
No ratings yet
Machine Learning Seminar Report
19 pages
Sections
No ratings yet
Sections
76 pages
Lec 04 Reinforcement Learning
No ratings yet
Lec 04 Reinforcement Learning
57 pages
Intro To Reinforcement Learning
No ratings yet
Intro To Reinforcement Learning
56 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
64 pages
Sdfesdf
No ratings yet
Sdfesdf
23 pages
Lec17 ReinforcementLearning
No ratings yet
Lec17 ReinforcementLearning
58 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
35 pages
Module - 1 - Reinforcement Learning and Markov Decision Process
No ratings yet
Module - 1 - Reinforcement Learning and Markov Decision Process
19 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
15 pages
Deep Seek V1
No ratings yet
Deep Seek V1
28 pages
Smart Agriculture Emerging Pedagogies of Deep Learning Machine Learning and Internet of Things - Govind Singh Patel Amrita Rai Nripendra Narayan Das R.P. Singh
100% (2)
Smart Agriculture Emerging Pedagogies of Deep Learning Machine Learning and Internet of Things - Govind Singh Patel Amrita Rai Nripendra Narayan Das R.P. Singh
222 pages
Reinforcement Learning - Introduction
No ratings yet
Reinforcement Learning - Introduction
19 pages
Module 1
No ratings yet
Module 1
72 pages
Module 01
No ratings yet
Module 01
66 pages
Unit 4
No ratings yet
Unit 4
56 pages
ML Unit 4
No ratings yet
ML Unit 4
17 pages
Unit 5d - Deep Reinforcement Learning
No ratings yet
Unit 5d - Deep Reinforcement Learning
52 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
Neural Networks: 1 October, 2016
No ratings yet
Neural Networks: 1 October, 2016
51 pages
21 - Reinforcement Learning
No ratings yet
21 - Reinforcement Learning
25 pages
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
No ratings yet
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
16 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
136 pages
L11 Reinforcement Learning 1
No ratings yet
L11 Reinforcement Learning 1
18 pages
Reinforcement Learning: Nguyen Do Van, PHD
No ratings yet
Reinforcement Learning: Nguyen Do Van, PHD
40 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
Load Survey - 17098375 - 21-Mar-2025 11-41-01-044 AM (1)
No ratings yet
Load Survey - 17098375 - 21-Mar-2025 11-41-01-044 AM (1)
21 pages
Load Survey - 17098375 - 21-Mar-2025 11-41-01-044 AM (1)
No ratings yet
Load Survey - 17098375 - 21-Mar-2025 11-41-01-044 AM (1)
21 pages
IntroductiontoRL BR
No ratings yet
IntroductiontoRL BR
22 pages
Lecture 9 Reiforcement Learning
No ratings yet
Lecture 9 Reiforcement Learning
29 pages
Reinforcement Learning: By: Chandra Prakash IIITM Gwalior
No ratings yet
Reinforcement Learning: By: Chandra Prakash IIITM Gwalior
64 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Unit 5
No ratings yet
Unit 5
45 pages
Product Management Basic Guide IIMV 1724162695
No ratings yet
Product Management Basic Guide IIMV 1724162695
93 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
17 pages
Unit-5 (AI)
No ratings yet
Unit-5 (AI)
21 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
10 pages
Unit-5 Mla
No ratings yet
Unit-5 Mla
22 pages
7 - Reinforcement Learning
No ratings yet
7 - Reinforcement Learning
23 pages
API Client.c
No ratings yet
API Client.c
10 pages
Reinforcement Learning Syllabus
No ratings yet
Reinforcement Learning Syllabus
1 page
Web Server
No ratings yet
Web Server
9 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
4.1 Reinforcement Learning 2
No ratings yet
4.1 Reinforcement Learning 2
31 pages
MLT Unit-5 Notes
No ratings yet
MLT Unit-5 Notes
17 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
Reinforcement Learning: Russell and Norvig: CH 21
No ratings yet
Reinforcement Learning: Russell and Norvig: CH 21
16 pages
37 RL
No ratings yet
37 RL
18 pages
Project Plan_ Energy Consumption Modeling
No ratings yet
Project Plan_ Energy Consumption Modeling
5 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
WiFi Manager
No ratings yet
WiFi Manager
4 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
NPTEL - Machine Learning Assignment Q&A
No ratings yet
NPTEL - Machine Learning Assignment Q&A
18 pages
Unit 4
No ratings yet
Unit 4
8 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
Lecture 1: Introduction To Reinforcement Learning: David Silver
No ratings yet
Lecture 1: Introduction To Reinforcement Learning: David Silver
46 pages
Process Is All You Need
No ratings yet
Process Is All You Need
48 pages
Exploring The Latest Trends in Artificial Intellig
No ratings yet
Exploring The Latest Trends in Artificial Intellig
13 pages
List of Experiments for Random Forest and Xgboost Models
No ratings yet
List of Experiments for Random Forest and Xgboost Models
1 page
Btech Cse 6 Sem Artificial Intelligence 79250 Dec 2022
No ratings yet
Btech Cse 6 Sem Artificial Intelligence 79250 Dec 2022
2 pages
AIML - Practical No.01
No ratings yet
AIML - Practical No.01
9 pages
4a - Approximate Reinforcement Learning
No ratings yet
4a - Approximate Reinforcement Learning
55 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
23 pages
Reinforcement Learning With AWS DeepRacer and Amazon SageMaker RL - Pedro Paez
No ratings yet
Reinforcement Learning With AWS DeepRacer and Amazon SageMaker RL - Pedro Paez
31 pages
DAA Notes
No ratings yet
DAA Notes
200 pages
Reinforcement Learning: Russell and Norvig: CH 21
No ratings yet
Reinforcement Learning: Russell and Norvig: CH 21
16 pages
Chapter 1
No ratings yet
Chapter 1
56 pages
M6 Practice Quiz My Path Assessment Editor - WorldQuant University
No ratings yet
M6 Practice Quiz My Path Assessment Editor - WorldQuant University
11 pages
MACHINELEARING UNIT 1material
100% (1)
MACHINELEARING UNIT 1material
64 pages
MATLAB and Octave: An Introduction
No ratings yet
MATLAB and Octave: An Introduction
91 pages
Optimal Binary Search Tree
No ratings yet
Optimal Binary Search Tree
25 pages
ML Assignment 2
No ratings yet
ML Assignment 2
6 pages
MCQ All Unit
No ratings yet
MCQ All Unit
35 pages
Reinforcement Learning: Instructor: Max Welling
No ratings yet
Reinforcement Learning: Instructor: Max Welling
18 pages
Machine Learning Techniques Short Answers
No ratings yet
Machine Learning Techniques Short Answers
20 pages
AI and ML For Business Antim Prahar WITH ANSWERS
No ratings yet
AI and ML For Business Antim Prahar WITH ANSWERS
26 pages
BIDE: Efficient Mining of Frequent Closed Sequences: Jianyong Wang and Jiawei Han
No ratings yet
BIDE: Efficient Mining of Frequent Closed Sequences: Jianyong Wang and Jiawei Han
36 pages
3 - Chapter 4 Value Iteration and Policy Iteration
No ratings yet
3 - Chapter 4 Value Iteration and Policy Iteration
20 pages
Doing The Right Thing For The Right Reason Evaluat
No ratings yet
Doing The Right Thing For The Right Reason Evaluat
11 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
10 pages
Syallbus of Specialization in Robotics and Machine Learning
No ratings yet
Syallbus of Specialization in Robotics and Machine Learning
24 pages
5.4-Reinforcement learning-part3-Q-Learning
No ratings yet
5.4-Reinforcement learning-part3-Q-Learning
18 pages
AI Unit 5
No ratings yet
AI Unit 5
18 pages
Machine Learning (Class 4-5) e
No ratings yet
Machine Learning (Class 4-5) e
20 pages
Reinforcement Learning From Demonstration: Madhur Singal 2015EE10908
No ratings yet
Reinforcement Learning From Demonstration: Madhur Singal 2015EE10908
19 pages
System Model Formulation
No ratings yet
System Model Formulation
10 pages
Bi Connected Components
No ratings yet
Bi Connected Components
7 pages
Deep Reinforcement Learning in High Frequency Trad
No ratings yet
Deep Reinforcement Learning in High Frequency Trad
6 pages
Intermediate AI Prompting – Reinforcement Learning
From Everand
Intermediate AI Prompting – Reinforcement Learning
Eric Centore
No ratings yet

Reinforcement Learning

Uploaded by

Reinforcement Learning

Uploaded by

Reinforcement Learning

Raghunath Reddy, IHub Data Foundation

Engineering Machine Neuroscience

You might also like