Lecture 9 Reiforcement Learning

Uploaded by

abdallahsirmajor02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views29 pages

Lecture 9 Reiforcement Learning

Uploaded by

abdallahsirmajor02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

Reinforcement learning

Introduction

• Reinforcement Learning, seems intriguing, right?

• It is basically the concept where machines can teach themselves
depending upon the results of their own actions. Without further
delay, let’s start.
How does reinforcement learning
work?
1.Start in a state.
2.Take an action.
3.Receive a reward or penalty from the
environment.
4.Observe the new state of the environment.
5.Update your policy to maximize future rewards.
How it works
Teminologies
• Agent – is the sole decision-maker and learner
• Environment – a physical world where an agent learns and decides the
actions to be performed
• Action – a list of action which an agent can perform
• State – the current situation of the agent in the environment
• Reward – For each selected action by agent, the environment gives a reward.
It’s usually a scalar value and nothing but feedback from the environment
• Policy – the agent prepares strategy(decision-making) to map situations to
actions.
• Value Function – The value of state shows up the reward achieved starting
from the state until the policy is executed
• Model – Every RL agent doesn’t use a model of its environment. The agent’s
view maps state-action pairs probability distributions over the states
Elements of reinforcement learning
(policy )
• Policy: A policy can be defined as a way how an agent
behaves at a given time. It maps the perceived states of
the environment to the actions taken on those states. A
policy is the core element of the RL as it alone can
define the behavior of the agent. it may be a simple
function or a lookup table, It could also be deterministic
or a stochastic policy:
• For deterministic policy: a = π(s)
• For stochastic policy: π(a | s) = P[At =a | St = s]
Elements of reinforcement learning
(Reward Signal)
• Reward Signal: The goal of reinforcement learning is
defined by the reward signal. At each state, the
environment sends an immediate signal to the learning
agent, and this signal is known as a reward signal.
These rewards are given according to the good and bad
actions taken by the agent. The agent's main objective
is to maximize the total number of rewards for good
actions. The reward signal can change the policy, such
as if an action selected by the agent leads to low
reward, then the policy may change to select other
actions in the future.
Elements of reinforcement
learning(Value Function)
• Value Function: The value function gives information
about how good the situation and action are and how
much reward an agent can expect. A reward indicates
the immediate signal for each good and bad
action, whereas a value function specifies the good
state and action for the future.
• The value function depends on the reward as, without
reward, there could be no value. The goal of estimating
values is to achieve more rewards.
Elements of reinforcement
learning(Model)
• Model: The last element of reinforcement learning is the
model, which mimics the behavior of the environment. With
the help of the model, one can make inferences about how the
environment will behave. Such as, if a state and an action are
given, then a model can predict the next state and reward.
• The model is used for planning, which means it provides a way
to take a course of action by considering all future situations
before actually experiencing those situations. The approaches
for solving the RL problems with the help of the model are
termed as the model-based approach. Comparatively, an
approach without using a model is called a model-free
approach.
Approaches to implement
reinforcement learning algorithms
Approaches to implement
reinforcement learning
algorithms
• Value-Based – The main goal of this method is to
maximize a value function. Here, an agent through a
policy expects a long-term return of the current states.
• Policy-Based – In policy-based, you enable to come up
with a strategy that helps to gain maximum rewards in
the future through possible actions performed in each
state. Two types of policy-based methods are
deterministic and stochastic.
• Model-Based – In this method, we need to create a
virtual model for the agent to help in learning to
perform in each specific environment
Types of reinforcement learning;
Positive Reinforcement

• Positive reinforcement is defined as when an event,

occurs due to specific behavior, increases the strength
and frequency of the behavior. It has a positive impact
on behavior.
• Advantages
• Maximizes the performance of an action
• Sustain change for a longer period
• Disadvantage
• Excess reinforcement can lead to an overload of states which
would minimize the results.
Types of reinforcement learning:
Negative Reinforcement

• Negative Reinforcement is represented as the

strengthening of a behavior. In other ways, when a
negative condition is barred or avoided, it tries to stop
this action in the future.
• Advantages
1. Maximized behavior
2. Provide a decent to minimum standard of performance
• Disadvantage
• It just limits itself enough to meet up a minimum behavior
Types of Reinforcement
Learning
Widely used models for
reinforcement learning
• Markov Decision Process or MDP, is used to formalize
the reinforcement learning problems.
• If the environment is completely observable, then its
dynamic can be modeled as a Markov Process. In
MDP, the agent constantly interacts with the
environment and performs actions; at each action, the
environment responds and generates a new state.
Widely used models for
reinforcement learning
• Markov Decision Process(MDP’s) – are mathematical frameworks for
mapping solutions in RL. The set of parameters that include:-
• Set of finite states – S,
• Set of possible Actions in each state – A,
• Reward – R,
• Model – T,
• Policy – π.

The outcome of deploying an action to a state doesn’t depend on previous

actions or states but on current action and state.
Markov Decision Process
• Markov Property:
• It says that "If the agent is present in the current state S1,
performs an action a1 and move to the state s2, then the state
transition from s1 to s2 only depends on the current state and
future action and states do not depend on past actions, rewards,
or states."
• Finite MDP:
• A finite MDP is when there are finite states, finite rewards, and finite actions. In
RL, we consider only the finite MDP.
• Markov Process:
• Markov Process is a memoryless process with a sequence of random states S 1,
S2, ....., St that uses the Markov Property. Markov process is also known as Markov
chain, which is a tuple (S, P) on state S and transition function P. These two
components (S and P) can define the dynamics of the system.
Q Learning
• it’s a value-based model free approach for supplying
information to intimate which action an agent should
perform.
• It revolves around the notion of updating Q values
which shows the value of doing action A in state S.
• Value update rule is the main aspect of the Q-learning
algorithm.
• The main objective of Q-learning is to learn the
policy which can inform the agent that what
actions should be taken for maximizing the
reward under what circumstances.
• Q-learning is an Off policy RL algorithm, which is
used for the temporal difference Learning. The temporal
difference learning methods are the way of comparing
temporally successive predictions.
• It learns the value function Q (S, a), which means how
good to take action "a" at a particular state "s.“
• The goal of the agent in Q-learning is to maximize the
value of Q.
Bellman equation
• The value of Q-learning can be derived from the
Bellman equation. Consider the Bellman equation given
below:
• equation was introduced by the Mathematician Richard
Ernest Bellman in the year 1953, and hence it is
called as a Bellman equation. It is associated with
dynamic programming and used to calculate the values
of a decision problem at a certain point by including the
values of previous states
• The equation has various components, including
reward, discount factor (γ), probability, and end states
s'.
Bellman
• Action performed by the agent is referred to as "a"
• State occurred by performing the action is "s."
• The reward/feedback obtained for each good and bad action is
"R."
• A discount factor is Gamma "γ.“

V(s) = max [R(s,a) + γV(s`)]

Where :-
• V(s)= value calculated at a particular point.
• R(s,a) = Reward at a particular state s by performing an action.
• γ = Discount factor
• V(s`) = The value at the previous state.
Bell man equation
• Agent has three values options, V(s1), V(s2), V(s3).
• As this is MDP, so agent only cares for the current state and
the future state.
• The agent can go to any direction (Up, Left, or Right), so he
needs to decide where to go for the optimal path.
• Here agent will take a move as per probability bases and
changes the state. if we want some exact moves, so
for this, we need to make some changes in terms
of Q-value.
Q- represents the quality of the actions at each state. So instead
of using a value at each state, we will use a pair of state and
action, i.e., Q(s, a).
Q-value specifies that which action is more lubricative than
others, and according to the best Q-value, the agent takes his
next move. The Bellman equation can be used for deriving the Q-
value
Bell man equation
• To perform any action, the agent will get a reward R(s,
a), and also he will end up on a certain state, so the Q -
value equation will be:

Hence, we can say that, V(s) = max [Q(s, a)]

Q Learning
State Action Reward State
action (SARSA):
• SARSA stands for State Action Reward State action, which
is an on-policy temporal difference learning method. The on-
policy control method selects the action for each state while
learning using a specific policy.
• The goal of SARSA is to calculate the Q π (s, a) for the
selected current policy π and all pairs of (s-a).
• The main difference between Q-learning and SARSA algorithms
is that unlike Q-learning, the maximum reward for the
next state is not required for updating the Q-value in
the table.
• In SARSA, new action and reward are selected using the same
policy, which has determined the original action.
State Action Reward State
action (SARSA):
• The SARSA is named because it uses the quintuple Q(s,
a, r, s', a'). Where,
s: original state
a: Original action
r: reward observed while following the states
s' and a': New state, action pair.
Deep Q Neural Network (DQN)
• :As the name suggests, DQN is a Q-learning using
Neural networks.
• For a big state space environment, it will be a
challenging and complex task to define and update a Q-
table.
• To solve such an issue, we can use a DQN algorithm.
Where, instead of defining a Q-table, neural network
approximates the Q-values for each action and state
Practical Applications of reinforcement learning

• Robotics for Industrial Automation

• Text summarization engines, dialogue agents (text, speech),
gameplays
• Autonomous Self Driving Cars
• Machine Learning and Data Processing
• Training system which would issue custom instructions and
materials with respect to the requirements of students
• AI Toolkits, Manufacturing, Automotive, Healthcare, and Bots
• Aircraft Control and Robot Motion Control
• Building artificial intelligence for computer games
Further reading
• https://www.javatpoint.com/reinforcement-learning#Q-Learning

Ai (It) Unit-5
No ratings yet
Ai (It) Unit-5
43 pages
CMPE257 - W10C13 - Reinforcement Learning
No ratings yet
CMPE257 - W10C13 - Reinforcement Learning
161 pages
4.3 Reinforcement Learning
No ratings yet
4.3 Reinforcement Learning
27 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
64 pages
Unit 4
No ratings yet
Unit 4
49 pages
Reinforcement Learning2A
No ratings yet
Reinforcement Learning2A
88 pages
Unit-5 Reinforcemnt and Q Learning
No ratings yet
Unit-5 Reinforcemnt and Q Learning
45 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
57 pages
DLMAIRIL01 Q4-2024 Session1
No ratings yet
DLMAIRIL01 Q4-2024 Session1
84 pages
Unit 1
No ratings yet
Unit 1
18 pages
Module - 1 - Reinforcement Learning and Markov Decision Process
No ratings yet
Module - 1 - Reinforcement Learning and Markov Decision Process
19 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Educ2 - Behaviorist Perspective
100% (1)
Educ2 - Behaviorist Perspective
27 pages
Unit 6
No ratings yet
Unit 6
34 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
17 pages
Unit-8 - Reinforcement Learning
No ratings yet
Unit-8 - Reinforcement Learning
52 pages
Lec17 ReinforcementLearning
No ratings yet
Lec17 ReinforcementLearning
58 pages
Reinforcement Learning Note
No ratings yet
Reinforcement Learning Note
16 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
IntroductiontoRL BR
No ratings yet
IntroductiontoRL BR
22 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Unit 3
No ratings yet
Unit 3
29 pages
Unit-5 ML Notes
No ratings yet
Unit-5 ML Notes
31 pages
Unit Vi
No ratings yet
Unit Vi
17 pages
A Crash Course On Reinforcement Learning - Felix Wagner
No ratings yet
A Crash Course On Reinforcement Learning - Felix Wagner
84 pages
Sections
No ratings yet
Sections
76 pages
Artificial Intelligence: Computer Science & Engineering, Khulna University
No ratings yet
Artificial Intelligence: Computer Science & Engineering, Khulna University
30 pages
RL & DL Notes
No ratings yet
RL & DL Notes
43 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
25 pages
Unit 4
No ratings yet
Unit 4
56 pages
7 - Reinforcement Learning
No ratings yet
7 - Reinforcement Learning
23 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
5 pages
Operant Conditioning Worksheet 1
No ratings yet
Operant Conditioning Worksheet 1
1 page
Introduction To Reinforcement Learning
100% (1)
Introduction To Reinforcement Learning
52 pages
5.5 Reinforcement Learning
No ratings yet
5.5 Reinforcement Learning
5 pages
Unit 5
No ratings yet
Unit 5
10 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
13 pages
RL RS-Unit - 3
No ratings yet
RL RS-Unit - 3
6 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
30 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
10 pages
Facilitating Learning-Centered Teaching: Ma. Larachel R. Bermoy, Maed - Filipino Carissa A. Eugenio, Maed - Gen. Sci
100% (2)
Facilitating Learning-Centered Teaching: Ma. Larachel R. Bermoy, Maed - Filipino Carissa A. Eugenio, Maed - Gen. Sci
11 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
29 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
No ratings yet
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
16 pages
37 RL
No ratings yet
37 RL
18 pages
ML Unit-4
No ratings yet
ML Unit-4
10 pages
MLT Unit-5 Notes
No ratings yet
MLT Unit-5 Notes
17 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
Unit-5 Mla
No ratings yet
Unit-5 Mla
22 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
Reinforcement Learning, Q-Learning
No ratings yet
Reinforcement Learning, Q-Learning
20 pages
Reinforcement Learning: Nguyen Do Van, PHD
No ratings yet
Reinforcement Learning: Nguyen Do Van, PHD
40 pages
Unit-5 (AI)
No ratings yet
Unit-5 (AI)
21 pages
Reinforcement
No ratings yet
Reinforcement
9 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
Unit V
100% (1)
Unit V
24 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
Unit 5
No ratings yet
Unit 5
45 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
Chapter 2, Foundations For Effective Teaching and Learning
No ratings yet
Chapter 2, Foundations For Effective Teaching and Learning
39 pages
Safety Leadership TTMIB
No ratings yet
Safety Leadership TTMIB
34 pages
General Psychology Reviewer
No ratings yet
General Psychology Reviewer
3 pages
Learning: Powerpoint® Presentation
No ratings yet
Learning: Powerpoint® Presentation
40 pages
An Action Plan For Learners at Risk of Failing
No ratings yet
An Action Plan For Learners at Risk of Failing
7 pages
The Science of Learning - 2nd Edition PDF
100% (17)
The Science of Learning - 2nd Edition PDF
16 pages
The Potty Files PDF
No ratings yet
The Potty Files PDF
29 pages
B. F. Skinner'S Operant Conditioning: Biography
No ratings yet
B. F. Skinner'S Operant Conditioning: Biography
7 pages
Leadership Assignment
No ratings yet
Leadership Assignment
18 pages
Classroom Management Theories
No ratings yet
Classroom Management Theories
12 pages
Teacher PDF
No ratings yet
Teacher PDF
6 pages
Course Notes - B.F. Skinner
50% (2)
Course Notes - B.F. Skinner
5 pages
Teaching Independent Behavior With Activity Schedules To Children With Autism
No ratings yet
Teaching Independent Behavior With Activity Schedules To Children With Autism
39 pages
Class M GNT CHK List
No ratings yet
Class M GNT CHK List
5 pages
AReviewofB F SkinnersReinforcementTheoryofMotivation
No ratings yet
AReviewofB F SkinnersReinforcementTheoryofMotivation
10 pages
10.2 BI Y2 LP TS25 (Unit 5 LP 1-25)
No ratings yet
10.2 BI Y2 LP TS25 (Unit 5 LP 1-25)
26 pages
Learning and Memory PDF
No ratings yet
Learning and Memory PDF
75 pages
Intimacy A Behavioral Interpretation
No ratings yet
Intimacy A Behavioral Interpretation
12 pages
Act Faci
No ratings yet
Act Faci
28 pages
Learning & Theories of Learning: A Seminar On
No ratings yet
Learning & Theories of Learning: A Seminar On
25 pages
Learning Metaphors and Theories of Learning
No ratings yet
Learning Metaphors and Theories of Learning
47 pages
Leadership:: Theory, Application, Skill Development
No ratings yet
Leadership:: Theory, Application, Skill Development
57 pages
Behavioral Analysis Performa Skeletal Structure
No ratings yet
Behavioral Analysis Performa Skeletal Structure
4 pages
Current Diversification of Behaviorism - Araiba S. - 2021
No ratings yet
Current Diversification of Behaviorism - Araiba S. - 2021
22 pages
Supplementary OB - Question - Paper For MMS
No ratings yet
Supplementary OB - Question - Paper For MMS
4 pages
AP Psych People Review Sheet
No ratings yet
AP Psych People Review Sheet
9 pages
TGfU Slade
No ratings yet
TGfU Slade
9 pages
Markov Decision Process: Fundamentals and Applications
From Everand
Markov Decision Process: Fundamentals and Applications
Fouad Sabry
No ratings yet