0% found this document useful (0 votes)

3 views

Reinforcement Learning

The document provides an overview of Reinforcement Learning (RL), defining it as a machine learning technique that enables software to make decisions through a reward-and-punishment system. It discusses key terminologies, procedures, and types of RL, including passive vs. active learning, value-based vs. policy-based methods, and model-based vs. model-free approaches. Additionally, it covers specific algorithms like Q-learning and their underlying principles, such as the Bellman equation.

Uploaded by

melvin.2022

Available Formats

Download as KEY, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Reinforcement Learning

Uploaded by

melvin.2022

Available Formats

Download as KEY, PDF, TXT or read online on Scribd

You are on page 1/ 38

Lorem Ipsum Dolor

Reinforcement Dr. Thomas Abraham J V

Learning
Associate Professor Senior,
School of Computing and
Engineering Science,
VIT Chennai.
Introduction
Introduction
Reinforcement Learning (RL) can be defined as the study of taking
optimal decisions utilizing experiences.
It is mainly intended to solve a specific kind of problem where the
decision making is successive and the goal or objective is long-term,
this includes robotics, game playing, or even logistics and resource
management.
Reinforcement learning (RL) is a machine learning technique that
teaches software to make decisions by using a reward-and-
punishment system.
Reinforcement learning is the process of training a program to attain a
goal through trial and error by incentivising it with a combination of
rewards and penalties.
RL Applications
RL Terminologies
Agent: The agent in RL can be defined as the entity which acts as a learner and decision-
maker. It is empowered to interact continually, select its own actions, and respond to those
actions.
Environment: It is the abstract world through which the agent moves. The Environment
takes the current state and action of the agent as input and returns its next state and
appropriate reward as the output.
States: The specific place at which an agent is present is called a state. This can either
be the current situation of the agent in the environment or any of the future situations.
Actions: This defines the set of all possible moves an agent can make within an
environment.
Reward or Penalty: This is nothing but the feedback by means of which the success or
failure of an agent’s action within a given state can be measured. The rewards are used for
the effective evaluation of an agent’s action.
Policy or Strategy: It is mainly used to map the states along with their actions. The
agent is said to use a strategy to determine the next best action based on its current state.
Reinforcement Learning Procedure
1. An Agent perceives the environment and gets the current-state.
2. According to the current state, the Agent takes
an action with strategy/policy in the environment.
3. The Agent receives the reward from the environment and updates
its strategy/policy.
4. After taking action, the environment updates and reaches to next
state.
5. Repete 1–4.
Reinforcement Learning Procedure
Passive vs Active RL
A policy is a strategy or set of rules that an agent follows to make
decisions in an environment. It defines the mapping from states of the
world to the actions the agent should take. Essentially, a policy guides
the agent on what action to choose when it encounters a particular state.

Both active and passive reinforcement learning are types of RL. In case of
passive RL, the agent’s policy is fixed which means that it is told what to
do.

In contrast to this, in active RL, an agent needs to decide what to do as

there’s no fixed policy that it can act on. Therefore, the goal of a passive
RL agent is to execute a fixed policy (sequence of actions) and evaluate it
while that of an active RL agent is to act and learn an optimal policy.
Sour
ce
Types of RL
Value-Based Reinforcement Learning

Value-based reinforcement learning focuses on finding the optimal value function that
measures how good it is for an agent to be in a given state (or take a given action). The goal is
to maximize the value function, which represents the long-term cumulative reward.

Example: Q-Learning, Deep Q-Learning

Policy-Based Reinforcement Learning

A policy is a strategy or set of rules that an agent follows to make decisions in an

environment. It defines the mapping from states of the world to the actions the agent should
take. Essentially, a policy guides the agent on what action to choose when it encounters a
particular state.

Unlike value-based methods, policy-based RL methods aim to directly learn the optimal policy
π(a∣s), which maps states to probabilities of selecting actions. These methods can be effective
for environments with high-dimensional or continuous action spaces, where value-based
methods struggle.
Types of RL
Passive vs Active RL

Both active and passive reinforcement learning are types of RL. In case of passive RL, the agent’s
policy is fixed which means that it is told what to do.

In contrast to this, in active RL, an agent needs to decide what to do as there’s no fixed policy that it
can act on. Therefore, the goal of a passive RL agent is to execute a fixed policy (sequence of
actions) and evaluate it while that of an active RL agent is to act and learn an optimal policy.

On-Policy vs Off-policy Learning

On-policy methods are about learning from what you are currently doing. The policy directs the
agent's actions in every state, including the decision-making process while learning. The agent
evaluates the outcomes of its present actions, refining its strategy incrementally.

Off-policy methods, on the other hand, are like learning from someone else's experience. It involves
learning the value of the optimal policy independently of the agent's actions. These methods enable
the agent to learn from observations about the optimal policy, even when it's not followed. This is
useful for learning from a fixed dataset or a teaching policy.
Types of RL
Model-based reinforcement learning
In model-based algorithms, the agent builds an internal model of the
environment. This model represents the dynamics of the environment,
including state transitions and reward probabilities. The agent can then use
this model to plan and evaluate different actions before taking them in the
real environment.
This approach has the advantage of being more sample-efficient,
especially in complex environments.
The disadvantage is that building an accurate model can be challenging,
especially for complex environments. The model may not reflect the real
environment accurately, leading to suboptimal behaviour.
Types of RL
Model-free reinforcement learning
This approach focuses on learning directly from interaction with the
environment without explicitly building an internal model. The agent learns
the value of states and actions or the optimal strategy through trial and
error.
Model-free RL offers a simpler approach in environments where building an
accurate model is challenging. For Bob, this means he doesn't need to
create a complex mental map of the room – he can learn through
scratching and experiencing the consequences.
Model-free RL excels in dynamic environments where the rules might
change. However, only learning through trial and error can be less sample-
efficient.
Q-Learning: The algorithm learns a Q-value for each state-action pair. The Q-value
represents the expected future reward of taking a specific action in a particular
state. The agent can then choose the action with the highest Q-value to maximize
its long-term reward (we’ll explain this in more detail in the next section).
SARSA (State-Action-Reward-State-Action): This is similar to Q-learning, but it
learns a value function for each state-action pair. It updates the value based on the
reward received after taking an action and the next state observed.
Policy gradient methods: These algorithms directly learn the policy function,
which maps states to actions. They use gradients to update the policy in the
direction expected to lead to higher rewards. Examples include REINFORCE and
Proximal Policy Optimization (PPO).
Deep Q-Networks (DQN): This algorithm combines Q-learning with deep neural
networks to handle high-dimensional state spaces, often encountered in complex
environments like video games.
Bellman Equation for State Value Function V(s)

This is used for policy evaluation under a fixed policy π. It expresses the
expected return from a state following the policy:

V(s)=Eπ[R(s,a)+γ⋅V(s′)]

If the policy is deterministic, it simplifies to:

V(s)=R(s,a)+γ⋅V(s′)

States: S1, S2, S3, S4

The agent follows a fixed policy and always transitions:

S1 → S2, reward = +1; S2 → S3, reward = 0; S3 → S4, reward = +2; S4 →

S1, reward = −1

Discount factor: γ = 0.9

Bellman Equation for State Value Function V(s)

The Bellman Equations

V(S1) =1+0.9⋅V(S2)

V(S2) =0+0.9⋅V(S3)

V(S3) =2+0.9⋅V(S4)

V(S4) =−1+0.9⋅V(S1)

Solve the system of equations and Back-substitute to get other

values

V(S1)≈5.5, V(S4)=−1+0.9⋅5.5≈3.95,
V(S3)=2+0.9⋅3.95≈5.55, V(S2)=0.9⋅5.555≈5.0
Q-learning
Q-learning is a model-free, value-based, off-policy
algorithm that will find the best series of actions based on
the agent's current state. The “Q” stands for quality. Quality
represents how valuable the action is in maximising future
rewards.

1. Initialise your Q-table

2. Choose an action using the Epsilon-Greedy Exploration Strategy
3. Update the Q-table using the Bellman Equation
Initialize your Q-table
Bellman Equation

S = the State or Observation

A = the Action the agent takes
R = the Reward from taking an Action
t = the time step
Ɑ = the Learning Rate
ƛ = the discount factor which causes rewards to lose their value
over time so more immediate rewards are valued more highly.
Bellman Equation
Q-learning
Q-learning is a model-free, value-based, off-policy algorithm that will find
the best series of actions based on the agent's current state. The “Q”
stands for quality. Quality represents how valuable the action is in
maximizing future rewards.

Key Terminologies in Q-learning

States(s): the current position of the agent in the environment.

Action(a): a step taken by the agent in a particular state.

Rewards: for every action, the agent receives a reward and penalty.
Key Terminologies in Q-learning (contd)

Episodes: the end of the stage, where agents can’t take new action. It
happens when the agent has achieved the goal or failed.

Q(St+1, a): expected optimal Q-value of doing the action in a particular

state.

Q(St, At): it is the current estimation of Q(St+1, a).

Q-Table: the agent maintains the Q-table of sets of states and actions.

Temporal Differences(TD): used to estimate the expected value of

Q(St+1, a) by using the current state and action and previous state and
action.
How Does Q-Learning Work?
Q-Table

The agent will use a Q-table to take the best possible action based on the
expected reward for each state in the environment. In simple words, a Q-
table is a data structure of sets of actions and states, and we use the Q-
learning algorithm to update the values in the table.

Q-Function

The Q-function uses the Bellman equation and takes state(s) and
action(a) as input. The equation simplifies the state values and state-
action value calculation.
Understanding Q-learning

Scope of Work Document For Integrated Marketing Agency
100% (2)
Scope of Work Document For Integrated Marketing Agency
2 pages
Optimizing Tableau Aws Redshift Whitepaper
No ratings yet
Optimizing Tableau Aws Redshift Whitepaper
33 pages
ReinforcementLearning
No ratings yet
ReinforcementLearning
17 pages
Unit 1 - Reinforcement Learning,Overfitting, Training, Validation Sets, Metrics, Bias and Variance
No ratings yet
Unit 1 - Reinforcement Learning,Overfitting, Training, Validation Sets, Metrics, Bias and Variance
16 pages
Unit 5
No ratings yet
Unit 5
45 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
unit 3 ai
No ratings yet
unit 3 ai
5 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
Sections
No ratings yet
Sections
76 pages
lecture 9 Reiforcement learning (1)
No ratings yet
lecture 9 Reiforcement learning (1)
29 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Reinforcement learning
No ratings yet
Reinforcement learning
10 pages
Introduction To Reinforcement Learning
100% (1)
Introduction To Reinforcement Learning
52 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
29 pages
Unit-5 Mla
No ratings yet
Unit-5 Mla
22 pages
Unit-5 (AI)
No ratings yet
Unit-5 (AI)
21 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
10 pages
Reinforcement Learning: Russell and Norvig: CH 21
No ratings yet
Reinforcement Learning: Russell and Norvig: CH 21
16 pages
L-14 - Reinforcement-L-d-07062024-111949am
No ratings yet
L-14 - Reinforcement-L-d-07062024-111949am
22 pages
Sdfesdf
No ratings yet
Sdfesdf
23 pages
Unit 1
No ratings yet
Unit 1
18 pages
Reinforcement Learning: Russell and Norvig: CH 21
No ratings yet
Reinforcement Learning: Russell and Norvig: CH 21
16 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
Module 1
No ratings yet
Module 1
72 pages
AI (IT) UNIT-5
No ratings yet
AI (IT) UNIT-5
43 pages
MLT Unit-5 notes
No ratings yet
MLT Unit-5 notes
17 pages
37 RL
No ratings yet
37 RL
18 pages
RL RS-Unit_3 (1)
No ratings yet
RL RS-Unit_3 (1)
6 pages
UNIT-4
No ratings yet
UNIT-4
56 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
25 pages
Reinforcement Learning2A
No ratings yet
Reinforcement Learning2A
88 pages
UNIT-3
No ratings yet
UNIT-3
29 pages
UNIT VI
No ratings yet
UNIT VI
17 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
15 pages
Unit 5
No ratings yet
Unit 5
10 pages
Reinforcement LN-6
No ratings yet
Reinforcement LN-6
13 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
30 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
Winter Semester 2023-24_CSE4037_ETH_AP2023246000594_2024-01-05_Reference-Material-I
No ratings yet
Winter Semester 2023-24_CSE4037_ETH_AP2023246000594_2024-01-05_Reference-Material-I
35 pages
Reinforcement
No ratings yet
Reinforcement
9 pages
Reinforcement Learning, Q-Learning
No ratings yet
Reinforcement Learning, Q-Learning
20 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
13 pages
tiếng anhi
No ratings yet
tiếng anhi
7 pages
Reinforcement Learning: Instructor: Max Welling
No ratings yet
Reinforcement Learning: Instructor: Max Welling
18 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
2 pages
unit4(AI)2024.docx-1
No ratings yet
unit4(AI)2024.docx-1
22 pages
Unit 5 ML 3year
No ratings yet
Unit 5 ML 3year
17 pages
ML Assignment 2
No ratings yet
ML Assignment 2
6 pages
Ai Unit 3
No ratings yet
Ai Unit 3
23 pages
ML Unit-4 - RTU
No ratings yet
ML Unit-4 - RTU
18 pages
AI unit -3.docx
No ratings yet
AI unit -3.docx
102 pages
Module_1 - Reinforcement Learning and Markov Decision Process
No ratings yet
Module_1 - Reinforcement Learning and Markov Decision Process
19 pages
5.5 Reinforcement Learning
No ratings yet
5.5 Reinforcement Learning
5 pages
Unit-5 ML Notes
No ratings yet
Unit-5 ML Notes
31 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
35 pages
Reinforcement Learning Notes ?
No ratings yet
Reinforcement Learning Notes ?
40 pages
Module 01
No ratings yet
Module 01
66 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
64 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
11 pages
L11 Reinforcement Learning 1
No ratings yet
L11 Reinforcement Learning 1
18 pages
Tablesmith 4.5: RPG Generation Tool
No ratings yet
Tablesmith 4.5: RPG Generation Tool
36 pages
Tutorial 01 Quick Start
No ratings yet
Tutorial 01 Quick Start
13 pages
EEEN 103 Lab Manual 2
No ratings yet
EEEN 103 Lab Manual 2
3 pages
Anti-Virus Comparative: File Detection Test of Malicious Software
No ratings yet
Anti-Virus Comparative: File Detection Test of Malicious Software
11 pages
Minimum Spanning Tree
No ratings yet
Minimum Spanning Tree
68 pages
Cisco Sub Net Lab 2 Results
No ratings yet
Cisco Sub Net Lab 2 Results
9 pages
Recursive SQL
No ratings yet
Recursive SQL
75 pages
PST - Unit 2
No ratings yet
PST - Unit 2
13 pages
M22T AVEVA Plant - 12 Series - Создание каталогов элементов трубопроводов - Rev 0.6 -
100% (1)
M22T AVEVA Plant - 12 Series - Создание каталогов элементов трубопроводов - Rev 0.6 -
161 pages
Data Analysis - Touch Football
No ratings yet
Data Analysis - Touch Football
5 pages
Functional Equations - Ralph Furmaniak - Canada 2008 PDF
No ratings yet
Functional Equations - Ralph Furmaniak - Canada 2008 PDF
2 pages
Replacing Dse Controllers With Newer Models
100% (1)
Replacing Dse Controllers With Newer Models
2 pages
Machine Learning: B.E, M.Tech, PH.D
No ratings yet
Machine Learning: B.E, M.Tech, PH.D
23 pages
Ict Written Syllabus Ntrca
No ratings yet
Ict Written Syllabus Ntrca
10 pages
Computer Science: Grocery Shop Management
No ratings yet
Computer Science: Grocery Shop Management
9 pages
Prathyush DBA Resume
No ratings yet
Prathyush DBA Resume
3 pages
System Renishaw For Machine Tool
No ratings yet
System Renishaw For Machine Tool
56 pages
Download Full (Ebook) Soft Computing: Theories and Applications : Proceedings of SoCTA 2019 (Advances in Intelligent Systems and Computing) by Millie Pant (editor), Tarun Kumar Sharma (editor), Rajeev Arya (editor) ISBN 9789811540318, 9811540314 PDF All Chapters
100% (8)
Download Full (Ebook) Soft Computing: Theories and Applications : Proceedings of SoCTA 2019 (Advances in Intelligent Systems and Computing) by Millie Pant (editor), Tarun Kumar Sharma (editor), Rajeev Arya (editor) ISBN 9789811540318, 9811540314 PDF All Chapters
65 pages
Communication L40
No ratings yet
Communication L40
11 pages
Resume Nisha NTN Is Chaya
No ratings yet
Resume Nisha NTN Is Chaya
2 pages
PL sql2
No ratings yet
PL sql2
19 pages
PLC
No ratings yet
PLC
17 pages
Cascaded Counters: Ecen 224
No ratings yet
Cascaded Counters: Ecen 224
38 pages
Data Level Fusion For Multi Biometric System Using Face and Finger
No ratings yet
Data Level Fusion For Multi Biometric System Using Face and Finger
5 pages
1766 rm001 - en P
No ratings yet
1766 rm001 - en P
668 pages
Strassens Matrix Multiplication
No ratings yet
Strassens Matrix Multiplication
11 pages
Oracle Enterprise Governance, Risk and Compliance: User Guide Release 8.6.6.1000
No ratings yet
Oracle Enterprise Governance, Risk and Compliance: User Guide Release 8.6.6.1000
62 pages
myModules _ ICT1511-19-S1 _ Online Assessment
No ratings yet
myModules _ ICT1511-19-S1 _ Online Assessment
18 pages

Reinforcement Learning

Uploaded by

Reinforcement Learning

Uploaded by

Lorem Ipsum Dolor

Reinforcement Dr. Thomas Abraham J V

In contrast to this, in active RL, an agent needs to decide what to do as

Example: Q-Learning, Deep Q-Learning

Policy-Based Reinforcement Learning

A policy is a strategy or set of rules that an agent follows to make decisions in an

On-Policy vs Off-policy Learning

If the policy is deterministic, it simplifies to:

States: S1, S2, S3, S4

The agent follows a fixed policy and always transitions:

S1 → S2, reward = +1; S2 → S3, reward = 0; S3 → S4, reward = +2; S4 →

Discount factor: γ = 0.9

The Bellman Equations

Solve the system of equations and Back-substitute to get other

1. Initialise your Q-table

S = the State or Observation

Key Terminologies in Q-learning

States(s): the current position of the agent in the environment.

Action(a): a step taken by the agent in a particular state.

Q(St+1, a): expected optimal Q-value of doing the action in a particular

Q(St, At): it is the current estimation of Q(St+1, a).

Temporal Differences(TD): used to estimate the expected value of

You might also like