Reinforcement Learning

Reinforcement learning is a type of machine learning where an agent learns how to achieve a goal by interacting with its environment. The agent performs actions and receives rewards or punishments, allowing it to gradually learn what actions yield the maximum reward. Key aspects of reinforcement learning include the agent, environment, actions, states, rewards, and policies to maximize long-term rewards. Methods like Q-learning use reinforcement learning to find optimal actions by learning action-value functions through trial-and-error interactions with dynamic and uncertain environments.

Uploaded by

vedang maheshwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views

Reinforcement Learning

Uploaded by

vedang maheshwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 32

Reinforcement Learning

By Shweta Saxena
Types of machine learning
Reinforcement Learning

Action
Environment
(State, Action, Reward)
Agent
(Computer
Program)
Reinforcement Learning
• Art of optimal decision making
• Reinforcement learning is a type of machine learning method where an
intelligent agent (computer program) interacts with the environment and
learns to act within that.
• In RL an agent learns by trial and error using feedback from its own actions
and experiences.
• How a Robotic dog learns the movement of his arms is an example of Reinforcement
learning.
• RL solves a specific type of problem where decision making is sequential and
the goal is long-term.
• Game-playing
• Rbotics, etc.
Reinforcement Learning
• The figure below illustrates the action-reward feedback loop of a
generic RL model.
Reinforcement Learning
Reinforcement Learning
• The above image shows the robot, diamond, and fire.
• The goal of the robot is to get the reward that is the diamond and
avoid the hurdles that are fired.
• The robot learns by trying all the possible paths and then choosing
the path which gives him the reward with the least hurdles.
• Each right step will give the robot a reward and each wrong step will
subtract the reward of the robot.
• The total reward will be calculated when it reaches the final reward
that is the diamond.
Main points in Reinforcement learning
1. Input: The input should be an initial state from which the model will
start
2. Output: There are many possible outputs as there are a variety of
solutions to a particular problem
3. Training:
• The training is based upon the input.
• The model will return a state and the user will decide to reward or punish the
model based on its output.
• The model keeps continues to learn.
• The best solution is decided based on the maximum reward.
Terms used in Reinforcement Learning
• Agent(): An entity that can perceive/explore the environment and act upon it.
• Environment(): A situation in which an agent is present or surrounded by. In RL, we assume
the stochastic environment, which means it is random in nature.
• Action(): Actions are the moves taken by an agent within the environment.
• State(): State is a situation returned by the environment after each action taken by the agent.
• Reward(): A feedback returned to the agent from the environment to evaluate the action of the
agent.
• Policy(): Policy is a strategy applied by the agent for the next action based on the current state.
• Value(): It is expected long-term retuned with the discount factor and opposite to the short-
term reward.
• Q-value(): It is mostly similar to the value, but it takes one additional parameter as a current
action (a).
Reinforcement learning and Supervised
learning
• Both supervised and reinforcement learning use mapping between
input and output.
• Supervised learning where the feedback provided to the agent is correct set
of actions for performing a task.
• Reinforcement learning uses rewards and punishments as signals for positive
and negative behavior
• Goal in unsupervised learning is to find similarities and differences
between data points.
• Reinforcement learning the goal is to find a suitable action model
that would maximize the total cumulative reward of the agent.
Difference between Reinforcement learning
and Supervised learning
Reinforcement learning Supervised learning

Reinforcement learning is all about making decisions In Supervised learning, the

sequentially. In simple words, we can say that the output decision is made on the initial
depends on the state of the current input and the next input input or the input given at the
depends on the output of the previous input start

In Reinforcement learning decision is dependent, So we give In supervised learning the

labels to sequences of dependent decisions decisions are independent of
each other so labels are given to
each decision.

Example: Chess game Example: Object recognition

Markov Decision Process
• Markov Decision Process or MDP, is used to formalize the
reinforcement learning problems.
• If the environment is completely observable, then its dynamic can be
modeled as a Markov Process.
• In MDP, the agent constantly interacts with the environment and
performs actions; at each action, the environment responds and
generates a new state.
Markov Decision Process
Markov Decision Process
• MDP contains a tuple of four elements (S, A, Pa, Ra):
• S = A set of finite States
• A = A set of finite Actions
• Ra =Rewards received after transitioning from state S to state S’,
due to action a.
• Pa = Probability.
• MDP uses Markov property.
Markov property
• "If the agent is present in the current state S1, performs an action a1
and move to the state s2, then the state transition from s1 to s2 only
depends on the current state and future action and states do not
depend on past actions, rewards, or states.“
• As per Markov Property, the current state transition does not depend
on any past action or state.
• Example: Chess game, the players only focus on the current state and do not
need to remember past actions or states.
Finite MDP
• A finite MDP is when there are finite states, finite rewards, and finite
actions.
• In RL, we consider only the finite MDP.
• Markov Process:
• It is a memoryless process with a sequence of random states S1, S2, ....., St
that uses the Markov Property.
• Markov process is also known as Markov chain, which is a tuple (S, P) on state
S and transition function P.
• These two components (S and P) can define the dynamics of the system.
Types of learning
Passive Reinforcement learning
• Agent’s policy is fixed.
• The goal of the agent is to evaluate how good an optimal policy is.
• The agent needs to learn the expected utility Uπ(s) for each state s
(state-action pairs).
• This can be done in three ways.
• Direct Utility Estimation
• Adaptive Dynamic Programming(ADP)
• Temporal Difference Learning (TD)
Active Reinforcement learning
• The goal of the agent is to evaluate how good an optimal policy is.
• Agent must also learn what to do.
• Types of Active reinforcement learning
• Adaptive Dynamic Programming(ADP) with exploration function
Q-learning
• Q-Learning is used when we have to find the optimal path.
• It finds the next best action, given a current state.
• It chooses this action at random and aims to maximize the reward.
Q-Learning
• Q stands for Quality, which means it specifies the quality of an action taken by
the agent.
• Q-learning is an Off policy RL algorithm
• The objective of the model is to find the best course of action given its current
state.
• To do this, it may come up with rules of its own or it may operate outside
the policy given to it to follow.
• This means that there is no actual need for a policy, hence we call it off-
policy.
• At each state S we choose an action “a” which maximizes the function Q (S, a).
• Function Q (S, a) means how good to take action "a" at a particular state "s.“
• Function Q (S, a) is also called Bellman function used to modify Q table.
Q table
• A Q-table or matrix is created while performing the Q-learning. The
table follows the state and action pair, i.e., [s, a], and initializes the
values to zero.
• After each action, the table is updated, and the q-values are stored
within the table.
• The RL agent uses this Q-table as a reference table to select the best
action based on the Q-values.
Working of Q learning
Bellman Equation
• It is used to determine the value of a particular state and deduce how
good it is to be in/take that state.
Discount Factor/ Rate
• It determines how much importance is to be given to the immediate
reward and future rewards.
• This basically helps us to avoid infinity as a reward in continuous
tasks.
• It has a value between 0 and 1.
• A value of 0 means that more importance is given to the immediate reward
and a value of 1 means that more importance is given to future rewards.
• In practice, a discount factor of 0 will never learn as it only considers
immediate reward.
• A discount factor of 1 will go on for future rewards which may lead to infinity.
• Therefore, the optimal value for the discount factor lies between 0.2 to 0.8.
Example
• An advertisement recommendation system.
• In a normal ad recommendation system, the ads you get are based on your
previous purchases or websites you may have visited.
• If you’ve bought a TV, you will get recommended TVs of different brands.

Figure: Ad Recommendation System

Example
• Using Q-learning, we can optimize the ad recommendation system to
recommend products that are frequently bought together.
• The reward will be if the user clicks on the suggested product.

Figure: Ad Recommendation System with Q-Learning

State Action Reward State action (SARSA)
• It is an on-policy temporal difference learning method.
• The on-policy control method selects the action for each state while
learning using a specific policy.
• The SARSA is named because it uses the quintuple Q(s, a, r, s', a').
Where,
s: original state
a: Original action
r: reward observed while following the states
s' and a': New state, action pair.
State Action Reward State action (SARSA)
• The goal of SARSA is to calculate the Q π (s, a) for the selected
current policy π and all pairs of (s-a).
• The main difference between Q-learning and SARSA algorithms is that
unlike Q-learning, the maximum reward for the next state is not
required for updating the Q-value in the table.
• In SARSA, new action and reward are selected using the same policy,
which has determined the original action.
Generalization in Reinforcement learning
• In Reinforcement learning, the generalization of the agents is
benchmarked on the environments they have been trained on.
• In a supervised learning setting, this would mean testing the model
using the training dataset.
Policy search
• It is a subfield in reinforcement learning which focuses on finding
good parameters for a given policy parametrization.
• It is well suited for robotics as it can cope with high-dimensional state
and action spaces, one of the main challenges in robot learning.
References
• https://www.geeksforgeeks.org/what-is-reinforcement-learning/
• https://www.kdnuggets.com/2018/06/explaining-reinforcement-learn
ing-active-passive.html
• https://www.javatpoint.com/reinforcement-learning#Markov
• https://www.freecodecamp.org/news/an-introduction-to-q-learning-r
einforcement-learning-14ac0b4493cc/
• https://www.simplilearn.com/tutorials/machine-learning-tutorial/wh
at-is-q-learning
• https://towardsdatascience.com/introduction-to-reinforcement-
learning-markov-decision-process-44c533ebf8da

Natural language processing with TensorFlow Teach language to machines using Python s deep learning library 1st Edition Thushan Ganegedara 2024 scribd download
50% (2)
Natural language processing with TensorFlow Teach language to machines using Python s deep learning library 1st Edition Thushan Ganegedara 2024 scribd download
62 pages
Tutorials
No ratings yet
Tutorials
17 pages
Java Certification Study Notes
No ratings yet
Java Certification Study Notes
91 pages
Ifsp Example
No ratings yet
Ifsp Example
11 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
25 pages
Neural Networks PDF
No ratings yet
Neural Networks PDF
89 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
234 pages
First Order Logic: Artificial Intelligence
No ratings yet
First Order Logic: Artificial Intelligence
16 pages
Unit 2 AI
No ratings yet
Unit 2 AI
22 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
43 pages
What Is Django?: By: Madhu Singh (PGT Computer Science) DPSG, Meerut Road
No ratings yet
What Is Django?: By: Madhu Singh (PGT Computer Science) DPSG, Meerut Road
43 pages
Ai Unit 1 - Compressed
No ratings yet
Ai Unit 1 - Compressed
142 pages
High Performance Django
100% (2)
High Performance Django
22 pages
Convolutional Neural Networks (Part I)
No ratings yet
Convolutional Neural Networks (Part I)
61 pages
A Practical Guide To Graph Neural Networks
No ratings yet
A Practical Guide To Graph Neural Networks
28 pages
DCGAN (Deep Convolution Generative Adversarial Networks)
No ratings yet
DCGAN (Deep Convolution Generative Adversarial Networks)
27 pages
Introduction To XState
No ratings yet
Introduction To XState
26 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
46 pages
Image Classification Using Pre-Trained Convolutional Neural Network in COLAB
No ratings yet
Image Classification Using Pre-Trained Convolutional Neural Network in COLAB
6 pages
Understanding Machine Learning Algorithms - in Depth
No ratings yet
Understanding Machine Learning Algorithms - in Depth
167 pages
Question Bank AML
No ratings yet
Question Bank AML
4 pages
Stanford University CS224d - Deep Learning For Natural Language Processing - Syllabus
No ratings yet
Stanford University CS224d - Deep Learning For Natural Language Processing - Syllabus
3 pages
Deep Learning - Wikipedia
No ratings yet
Deep Learning - Wikipedia
36 pages
Deep Learning (MODULE-3) (1)
No ratings yet
Deep Learning (MODULE-3) (1)
85 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
135 pages
Classification of Iris Data Set PDF
No ratings yet
Classification of Iris Data Set PDF
21 pages
Real-Time Sign Language Interpreter Using Deep-Learning
No ratings yet
Real-Time Sign Language Interpreter Using Deep-Learning
8 pages
Ref 3 Recommender Systems For Learning PDF
No ratings yet
Ref 3 Recommender Systems For Learning PDF
84 pages
Neural
No ratings yet
Neural
35 pages
ANN Matlab
No ratings yet
ANN Matlab
13 pages
The 9 Deep Learning Papers You Need To Know About 3
No ratings yet
The 9 Deep Learning Papers You Need To Know About 3
19 pages
Download Full Deep Learning 1st Edition Dulani Meedeniya PDF All Chapters
100% (2)
Download Full Deep Learning 1st Edition Dulani Meedeniya PDF All Chapters
50 pages
Linux Shell Scripting Cookbook
No ratings yet
Linux Shell Scripting Cookbook
125 pages
Backpropagation
No ratings yet
Backpropagation
7 pages
A Novel Adoption of LSTM in Customer Touchpoint Prediction Problems Presentation 1
No ratings yet
A Novel Adoption of LSTM in Customer Touchpoint Prediction Problems Presentation 1
73 pages
Intelligent Agents
100% (2)
Intelligent Agents
24 pages
Q-Learning and Deep Q Networks (DQN)
No ratings yet
Q-Learning and Deep Q Networks (DQN)
52 pages
Reinforcement Learning - Ipynb - Colaboratory
No ratings yet
Reinforcement Learning - Ipynb - Colaboratory
7 pages
Natural Language Processing PDF
100% (1)
Natural Language Processing PDF
47 pages
Quantum Neural Networks Versus Conventional Feedforward Neural N
No ratings yet
Quantum Neural Networks Versus Conventional Feedforward Neural N
10 pages
Computer Vision I: Ai Courses by Opencv
No ratings yet
Computer Vision I: Ai Courses by Opencv
9 pages
Unit 5
No ratings yet
Unit 5
23 pages
ML Unit-1
No ratings yet
ML Unit-1
26 pages
New CZ3005 Module 5 - Reinforcement Learning
No ratings yet
New CZ3005 Module 5 - Reinforcement Learning
31 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
28 pages
Chapter_1_Introduction_to_computer_vision_and_image_processing_for
No ratings yet
Chapter_1_Introduction_to_computer_vision_and_image_processing_for
81 pages
06 Feature Engineering
No ratings yet
06 Feature Engineering
24 pages
Artificial Neural Networks: Part 1/3
No ratings yet
Artificial Neural Networks: Part 1/3
25 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
8 pages
Feature Selection Techniques in Machine Learning - Javatpoint
No ratings yet
Feature Selection Techniques in Machine Learning - Javatpoint
9 pages
Machine Learning
100% (1)
Machine Learning
124 pages
UNIT-I_Introduction to Computer Vision
No ratings yet
UNIT-I_Introduction to Computer Vision
45 pages
LSTM
No ratings yet
LSTM
42 pages
MACHINELEARING UNIT 1material
100% (1)
MACHINELEARING UNIT 1material
64 pages
DEEP_LEARNING_UNIT_1[1]
No ratings yet
DEEP_LEARNING_UNIT_1[1]
24 pages
Back Propagation
100% (1)
Back Propagation
27 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
2 pages
Intro To QMLand QNN
No ratings yet
Intro To QMLand QNN
13 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
16 pages
Java Reflection Complete Self-Assessment Guide
From Everand
Java Reflection Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Mastering WebGL: Crafting Advanced 3D Web Experiences: WebGL Wizadry
From Everand
Mastering WebGL: Crafting Advanced 3D Web Experiences: WebGL Wizadry
Kameron Hussain
No ratings yet
No 85-How To Set Standards On Performance-Based
No ratings yet
No 85-How To Set Standards On Performance-Based
14 pages
6-HERO Human Emotions Recognition For Realizing Intelligent Internet of Things
No ratings yet
6-HERO Human Emotions Recognition For Realizing Intelligent Internet of Things
3 pages
Module 2 Lesson 2 World Englishes (2)
No ratings yet
Module 2 Lesson 2 World Englishes (2)
5 pages
Problem Solving Communication Skills
No ratings yet
Problem Solving Communication Skills
23 pages
Principles of Teaching
No ratings yet
Principles of Teaching
3 pages
Good and Bad Customers For Granting Credit: Genpact Data Science Prodegree Logistic Regression: Problem Statement
No ratings yet
Good and Bad Customers For Granting Credit: Genpact Data Science Prodegree Logistic Regression: Problem Statement
2 pages
Book Review Mathematical Methods by S
No ratings yet
Book Review Mathematical Methods by S
6 pages
DLP-L01.1 - Introduction To Personal Development
No ratings yet
DLP-L01.1 - Introduction To Personal Development
3 pages
Group 2nd Review Results ESM Assigment PDF
No ratings yet
Group 2nd Review Results ESM Assigment PDF
3 pages
7 Philosophies of Education: Essentialism Progressivism
No ratings yet
7 Philosophies of Education: Essentialism Progressivism
2 pages
Chapter 3
No ratings yet
Chapter 3
3 pages
Intelligence and Sports
No ratings yet
Intelligence and Sports
13 pages
(Template) Module 1, Lesson 1 Concepts of Research
No ratings yet
(Template) Module 1, Lesson 1 Concepts of Research
9 pages
Lecture 2 Cons0005
No ratings yet
Lecture 2 Cons0005
45 pages
JAMON, CHARLOTTE POSITION PAPER BENLAC (Project 2)
No ratings yet
JAMON, CHARLOTTE POSITION PAPER BENLAC (Project 2)
3 pages
Leena Ragha Resume 02-11-2023
No ratings yet
Leena Ragha Resume 02-11-2023
29 pages
Unit 1
No ratings yet
Unit 1
24 pages
Theater Workshop Report
No ratings yet
Theater Workshop Report
7 pages
Final Exam - Yarwin Yari - Transcultural Dialogic Encounter
No ratings yet
Final Exam - Yarwin Yari - Transcultural Dialogic Encounter
11 pages
VARK Analysis Paper - Edited
No ratings yet
VARK Analysis Paper - Edited
6 pages
Persuasive Communication From A Military Force To Local Civilians A Cognitive Treatment of Psyops Messages Based On The Elaboration Likelihood Model
100% (1)
Persuasive Communication From A Military Force To Local Civilians A Cognitive Treatment of Psyops Messages Based On The Elaboration Likelihood Model
16 pages
Data Mining MCQ
50% (2)
Data Mining MCQ
6 pages
Module 2 Lesson 4 Foundations of Curriculum Development
100% (2)
Module 2 Lesson 4 Foundations of Curriculum Development
10 pages
Christine Macasaol
No ratings yet
Christine Macasaol
16 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
26 pages
Le Patterson-01-Fp White Paper
No ratings yet
Le Patterson-01-Fp White Paper
4 pages
4087 TTL1
No ratings yet
4087 TTL1
13 pages
1.an Examination of Labour Migration To Kerala
No ratings yet
1.an Examination of Labour Migration To Kerala
113 pages
131-Article Text-274-1-10-20210630
No ratings yet
131-Article Text-274-1-10-20210630
10 pages

Reinforcement Learning

Uploaded by

Reinforcement Learning

Uploaded by

Reinforcement Learning

Reinforcement learning is all about making decisions In Supervised learning, the

In Reinforcement learning decision is dependent, So we give In supervised learning the

Example: Chess game Example: Object recognition

Figure: Ad Recommendation System

Figure: Ad Recommendation System with Q-Learning

You might also like