0% found this document useful (0 votes)

109 views15 pages

Unit 5 - Reinforcement Learning

The document discusses reinforcement learning, including its process, applications, and key concepts like agents, environments, rewards, states, actions, policies, and Markov decision processes. It provides examples of reinforcement learning algorithms like Q-learning and their uses in fields such as marketing, healthcare, robotics, gaming, and image processing.

Uploaded by

ananyasharma4014

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

109 views15 pages

Unit 5 - Reinforcement Learning

Uploaded by

ananyasharma4014

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

UNIT 5: REINFORCEMENT LEARNING

 Reinforcement learning works on a feedback-based process, in which an AI agent (A

software component) automatically explore its surrounding by hitting & trail, taking
action, learning from experiences, and improving its performance. Agent gets rewarded
for each good action and gets punished for each bad action; hence the goal of
reinforcement learning agent is to maximize the rewards.
 In reinforcement learning, there is no labelled data like supervised learning, and agents
learn from their experiences only. The reinforcement learning process is similar to a
human being; for example, a child learns various things by experiences in his day-to-day
life.
 An example of reinforcement learning is to play a game, where the Game is the
environment, moves of an agent at each step define states, and the goal of the agent
is to get a high score. Agent receives feedback in terms of punishment and rewards.
 Due to its way of working, reinforcement learning is employed in different fields such as
Game theory, Operation Research, Information theory, multi-agent systems.
Reinforcement Learning Algorithms

There are 3 approaches to implement reinforcement learning algorithms:

Value-Based – The main goal of this method is to maximize a value function. Here, an agent
through a policy expects a long-term return of the current states.

Policy-Based – In policy-based, you enable to come up with a strategy that helps to gain
maximum rewards in the future through possible actions performed in each state. Two types
of policy-based methods are deterministic and stochastic.

Model-Based – In this method, we need to create a virtual model for the agent to help in
learning to perform in each specific environment.
Types of Reinforcement Learning:
There are two types :

1. Positive Reinforcement
Positive reinforcement is defined as when an event, occurs due to specific behavior, increases the
strength and frequency of the behavior. It has a positive impact on behavior.
Advantages
– Maximizes the performance of an action
– Sustain change for a longer period
Disadvantage
– Excess reinforcement can lead to an overload of states which would minimize the results.
2. Negative Reinforcement
Negative Reinforcement is represented as the strengthening of a behavior. In other ways, when a
negative condition is barred or avoided, it tries to stop this action in the future.
Advantages
– Maximized behavior
– Provide a decent to minimum standard of performance
Disadvantage
– It just limits itself enough to meet up a minimum behavior

Learning Models for Reinforcement – (Markov Decision process):

Reinforcement Learning is defined by a specific type of problem, and all its solutions are classed
as Reinforcement Learning algorithms. In the problem, an agent is supposed to decide the best
action to select based on his current state. When this step is repeated, the problem is known as
a Markov Decision Process.

The Markov decision process (MDP) is a mathematical framework used for modeling decision-
making problems where the outcomes are partly random and partly controllable. It's a framework
that can address most reinforcement learning (RL) problems.

A Markov Decision Process (MDP) model contains:

 A set of possible world states S.

 A set of Models.
 A set of possible actions A.
 A real-valued reward function R(s,a).
 A policy the solution of Markov Decision Process.

What is a State?

A State is a set of tokens that represent every state that the agent can be in.

What is a Model?

A Model (sometimes called Transition Model) gives an action’s effect in a state. In particular,
T(S, a, S’) defines a transition T where being in state S and taking an action ‘a’ takes us to state
S’ (S and S’ may be the same). For stochastic actions (noisy, non-deterministic) we also define a
probability P(S’|S,a) which represents the probability of reaching a state S’ if action ‘a’ is taken
in state S. Markov property states that the effects of an action taken in a state depend only
on that state and not on the prior history.

What are Actions?

An Action A is a set of all possible actions. A(s) defines the set of actions that can be taken being
in state S.

What is a Reward?`

A Reward is a real-valued reward function. R(s) indicates the reward for simply being in the
state S. R(S,a) indicates the reward for being in a state S and taking an action ‘a’. R(S,a,S’)
indicates the reward for being in a state S, taking an action ‘a’ and ending up in a state S’.

What is a Policy?
A Policy is a solution to the Markov Decision Process. A policy is a mapping from S to a. It
indicates the action ‘a’ to be taken while in state S.

EXAMPLE: Let us take the example of a grid world:

An agent lives in the grid. The above example is a 3*4 grid. The grid has a START state (grid no
1,1). The purpose of the agent is to wander around the grid to finally reach the Blue Diamond
(grid no 4,3). Under all circumstances, the agent should avoid the Fire grid (orange color, grid
no 4,2). Also the grid no 2,2 is a blocked grid, it acts as a wall hence the agent cannot enter it.

The agent can take any one of these actions: UP, DOWN, LEFT, RIGHT

Walls block the agent path, i.e., if there is a wall in the direction the agent would have taken,
the agent stays in the same place. So for example, if the agent says LEFT in the START grid he
would stay put in the START grid.

First Aim: To find the shortest sequence getting from START to the Diamond. Two such
sequences can be found:

 RIGHT RIGHT UP UPRIGHT

 UP UP RIGHT RIGHT RIGHT
The agent receives rewards each time step:-

 Small reward each step (can be negative when can also be term as punishment, in the
above example entering the Fire can have a reward of -1).
 Big rewards come at the end (good or bad).
 The goal is to Maximize the sum of rewards.

Q-Learning:
The discount factor, 𝛾, is a real value ∈ [0, 1], cares for the rewards agent achieved in the past,
present, and future.
Main terminologies in Q-Learning:

1. Agent: It is an assumed entity which performs actions in an environment to gain some

reward.
2. Environment (e): A scenario that an agent has to face.
3. Rewards: For every action, the agent will get a positive or negative reward.
4. Episodes: When an agent ends up in a terminating state and can’t take a new action.
5. Q-Values: Used to determine how good an Action, A, taken at a particular state, S, is. Q
(A, S).
6. Value Function: It specifies the value of a state that is the total amount of reward. It is
an agent which should be expected beginning from that state.
7. State (s): State refers to the current situation returned by the environment.
8. Policy (π): It is a strategy which applies by the agent to decide the next action based on
the current state.
9. Temporal Difference: A formula used to find the Q-Value by using the value of current
state and action and previous state and action. Temporal Difference Learning in machine
learning is a method to learn how to predict a quantity that depends on future values of a
given signal. It can also be used to learn both the V-function and the Q-function,
whereas Q-learning is a specific TD algorithm that is used to learn the Q-function.

Q-LEARNING ALGORITHM:

Application of Reinforcement Learning:

1. RL in Marketing:

Marketing is all about promoting and then, selling the products or services either of
your brand or someone else’s. In the process of marketing, finding the right audience
which yields larger returns on investment you or your company is making is a challenge
in itself.
2. RL in Healthcare

Healthcare is an important part of our lives and through DTRs (a sequence-based use-case
of RL), doctors can discover the treatment type, appropriate doses of drugs, and timings for
taking such doses.

DTRs are equipped with: –

 a sequence of rules which confirm the current health status of a patient.

 Then, they optimally propose treatments that can diagnose diseases like diabetes,
HIV, Cancer, and mental illness too.

If required, these DTRs (i.e. Dynamic Treatment Regimes) can reduce or remove the
delayed impact of treatments through their multi-objective healthcare optimization
solutions.

3. RL in Robotics

Robotics without any doubt facilitates training a robot in such a way that a robot can
perform tasks – just like a human being can. But still, there is a bigger challenge the robotics
industry is facing today – Robots aren’t able to use common sense while making various
moral, social decisions. Here, a combination of Deep Learning and Reinforcement Learning
i.e. Deep Reinforcement Learning comes to the rescue to enable the robots with, “Learn
How To Learn” model. With this, the robots can now: –

 manipulate their decisions by grasping well various objects visible to them.

 solve complicated tasks which even humans fail to do as robots now know what and
how to learn from different levels of abstractions of the types of datasets available to
them.

4. RL in Gaming

Gaming is something nowadays without which you, me, or a huge chunk of people can’t
live. With games optimization through Reinforcement Learning algorithms, we may
expect better performances of our favorite games related to adventure, action, or
mystery.

5. RL in Image Processing

Image Processing is another important method of enhancing the current version of an

image to extract some useful information from it. And there are some steps associated like:

 Capturing the image with machines like scanners.

 Analyzing and manipulating it.
 Using the output image obtained after analysis for representation, description-
purposes.

6. RL in Manufacturing

Manufacturing is all about producing goods that can satisfy our basic needs and
essential wants. Cobot Manufacturers (or Manufacturers of Collaborative Robots that
can perform various manufacturing tasks with a workforce of more than 100 people) are
helping a lot of businesses with their own RL solutions for packaging and quality testing.

Introduction to Deep Q Learning:

 Q-Learning is a process of Q-Learning creates an exact matrix for the working agent
which it can “refer to” to maximize its reward in the long run.
 Although this approach is not wrong in itself, this is only practical for very small
environments and quickly loses its feasibility when the number of states and actions in
the environment increases.
 Imagine an environment with 10,000 states and 1,000 actions per state. This would
create a table of 10 million cells. Things will quickly get out of control!
 This presents two problems:
First, the amount of memory required to save and update that table would
increase as the number of states increases
Second, the amount of time required to explore each state to create the
required Q-table would be unrealistic
So, the idea is to approximate these Q-values with machine learning models such as a
neural network.

The basic working step for Deep Q-Learning is that the initial state is fed into the neural
network and it returns the Q-value of all possible actions as an output. The difference
between Q-Learning and Deep Q-Learning can be illustrated as follows:

03 Deep Learning Overview
No ratings yet
03 Deep Learning Overview
80 pages
Research Areas in Artificial Intelligence and Machine Learning
100% (1)
Research Areas in Artificial Intelligence and Machine Learning
72 pages
Chapter 3
No ratings yet
Chapter 3
100 pages
Adaline/Madaline:Applications
100% (1)
Adaline/Madaline:Applications
25 pages
Stochastic Process - Markov Property - Markov Chain - Markov Decision Process - Reinforcement Learning - RL Techniques - Example Applications
No ratings yet
Stochastic Process - Markov Property - Markov Chain - Markov Decision Process - Reinforcement Learning - RL Techniques - Example Applications
39 pages
Part 2 State Space Control LTI Systems V3
100% (1)
Part 2 State Space Control LTI Systems V3
60 pages
Reinforcement Learning With MATLAB: Understanding Training and Deployment
No ratings yet
Reinforcement Learning With MATLAB: Understanding Training and Deployment
39 pages
ANN - Ch2-Adaline and Madaline
100% (1)
ANN - Ch2-Adaline and Madaline
29 pages
Fuzzy Logic and Neural Networks
No ratings yet
Fuzzy Logic and Neural Networks
11 pages
Markov Decision Process (MDP)
No ratings yet
Markov Decision Process (MDP)
31 pages
MIMO Lecture Notes 1
No ratings yet
MIMO Lecture Notes 1
11 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
46 pages
Ellipsometry
No ratings yet
Ellipsometry
24 pages
Fuzzy Control System Design: Module 2 Objectives
No ratings yet
Fuzzy Control System Design: Module 2 Objectives
38 pages
Sedimentation PDF
No ratings yet
Sedimentation PDF
33 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
Various Neural Network Architect Assignment Questions
No ratings yet
Various Neural Network Architect Assignment Questions
9 pages
Assignment On RNN
No ratings yet
Assignment On RNN
1 page
Anfis Structure
No ratings yet
Anfis Structure
5 pages
Disability Unity Mentor Training
No ratings yet
Disability Unity Mentor Training
5 pages
Chapter 2 Gain Scheduling Adaptive Control
100% (1)
Chapter 2 Gain Scheduling Adaptive Control
39 pages
Digital Control System
100% (1)
Digital Control System
11 pages
Gradient Based Optimization
No ratings yet
Gradient Based Optimization
24 pages
Lecture Notes SC
No ratings yet
Lecture Notes SC
21 pages
Microbial Growth and Product Formation
No ratings yet
Microbial Growth and Product Formation
26 pages
9 - Neural Modelling and Control
No ratings yet
9 - Neural Modelling and Control
17 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
54 pages
Chapter 7 Control of Gene Expression
100% (1)
Chapter 7 Control of Gene Expression
80 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
23 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
MODULE 5
No ratings yet
MODULE 5
22 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
2 pages
07 Control Systems Time Response Analysis
No ratings yet
07 Control Systems Time Response Analysis
4 pages
Module2.3 Hyperparameter Optimization
No ratings yet
Module2.3 Hyperparameter Optimization
29 pages
Complex Engineering Problem Statement
No ratings yet
Complex Engineering Problem Statement
10 pages
Growth Kinetics and Specific Growth
No ratings yet
Growth Kinetics and Specific Growth
3 pages
EE3001 - Advanced Measurements: Digital Filters
100% (1)
EE3001 - Advanced Measurements: Digital Filters
38 pages
4.1 Reinforcement Learning 2
No ratings yet
4.1 Reinforcement Learning 2
31 pages
1.0 Nucleic Acid Chemistry
No ratings yet
1.0 Nucleic Acid Chemistry
11 pages
Reinforcement
No ratings yet
Reinforcement
9 pages
Reinforcement Learning With MATLAB: Understanding Rewards and Policy Structures
No ratings yet
Reinforcement Learning With MATLAB: Understanding Rewards and Policy Structures
26 pages
Unit-5 (AI)
No ratings yet
Unit-5 (AI)
21 pages
Unit-5 Mla
No ratings yet
Unit-5 Mla
22 pages
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
No ratings yet
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
11 pages
MLT Unit-5 Notes
No ratings yet
MLT Unit-5 Notes
17 pages
ANOVA of Unequal Sample Sizes
No ratings yet
ANOVA of Unequal Sample Sizes
7 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
2 pages
Lecture 1: Introduction To Reinforcement Learning: David Silver
No ratings yet
Lecture 1: Introduction To Reinforcement Learning: David Silver
46 pages
Chapter 9 Quantitative Feedback Theory
No ratings yet
Chapter 9 Quantitative Feedback Theory
44 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
Transmitted Load
100% (2)
Transmitted Load
17 pages
Lecture 1 Non Linear Control
100% (1)
Lecture 1 Non Linear Control
21 pages
Tutorial I Basics of State Variable Modeling
No ratings yet
Tutorial I Basics of State Variable Modeling
11 pages
Matlab - Development of Neural Network Theory For Artificial Life-Thesis - MATLAB and Java Code
No ratings yet
Matlab - Development of Neural Network Theory For Artificial Life-Thesis - MATLAB and Java Code
126 pages
ANN Supervised Learning (Compatibility Mode)
No ratings yet
ANN Supervised Learning (Compatibility Mode)
73 pages
Model Predictive Control of Van de Vusse Reactor
No ratings yet
Model Predictive Control of Van de Vusse Reactor
5 pages
Mössbauer Effect
No ratings yet
Mössbauer Effect
13 pages
GSM Basic Radio Parameters PDF
100% (2)
GSM Basic Radio Parameters PDF
50 pages
Lecture-9 Modeling in Time Domain
No ratings yet
Lecture-9 Modeling in Time Domain
39 pages
Two Day Workshop On Introduction To Neural Network Toolbox & MATLAB-17
No ratings yet
Two Day Workshop On Introduction To Neural Network Toolbox & MATLAB-17
5 pages
Bioprocess Assignment Derive Equations
No ratings yet
Bioprocess Assignment Derive Equations
12 pages
Industrail Report Final
No ratings yet
Industrail Report Final
53 pages
State Space Model Tutorial
No ratings yet
State Space Model Tutorial
5 pages
Fasting Enema
100% (2)
Fasting Enema
108 pages
DAB FK Pumpe Za Komunalne Vode
100% (1)
DAB FK Pumpe Za Komunalne Vode
24 pages
2020 Compartment Syndrome of The Forearm Caused by Contrast Medium Extravasation
No ratings yet
2020 Compartment Syndrome of The Forearm Caused by Contrast Medium Extravasation
4 pages
Japanese Pottery and Porcelain
No ratings yet
Japanese Pottery and Porcelain
3 pages
Antecedents and Consequences of Middle School Students' Achievement Goals in Science
No ratings yet
Antecedents and Consequences of Middle School Students' Achievement Goals in Science
17 pages
Rocket Propulsion
No ratings yet
Rocket Propulsion
112 pages
Xlpe Arm Cables Specs 1
No ratings yet
Xlpe Arm Cables Specs 1
5 pages
Lcs
No ratings yet
Lcs
3 pages
323 Ta Aerosil200 PDF
No ratings yet
323 Ta Aerosil200 PDF
2 pages
C++ Lab Manual PDF
No ratings yet
C++ Lab Manual PDF
14 pages
H.D.jain College, Ara, Bhojpur, Bihar SEM-4
No ratings yet
H.D.jain College, Ara, Bhojpur, Bihar SEM-4
3 pages
Nudo, Sydney Alyson V. Bsece3A Electrical Circuits I
No ratings yet
Nudo, Sydney Alyson V. Bsece3A Electrical Circuits I
4 pages
APSY-353 Developmental Psychology
No ratings yet
APSY-353 Developmental Psychology
3 pages
CH 10 Circles Answer Key
100% (1)
CH 10 Circles Answer Key
1 page
User's Next Location Prediction Using ML Algo
No ratings yet
User's Next Location Prediction Using ML Algo
10 pages
Wire EDM Pro Brochure 2021
No ratings yet
Wire EDM Pro Brochure 2021
6 pages
Cowgirl Chocolates
No ratings yet
Cowgirl Chocolates
13 pages
Infernodialecticaljournal
No ratings yet
Infernodialecticaljournal
3 pages
Safety Engineering: Devices and Processes
No ratings yet
Safety Engineering: Devices and Processes
24 pages
Condenser
No ratings yet
Condenser
3 pages
UOG Fee Schedule-2024 With Compulsory Fee Component
No ratings yet
UOG Fee Schedule-2024 With Compulsory Fee Component
2 pages
ESP Research History Based On Ann M. Johns' Research Paper
No ratings yet
ESP Research History Based On Ann M. Johns' Research Paper
7 pages
Air Bearing
No ratings yet
Air Bearing
2 pages
Anritsu KXS7534AVCLE (xr75) Rayon-X
No ratings yet
Anritsu KXS7534AVCLE (xr75) Rayon-X
3 pages
Poem of Nancy
No ratings yet
Poem of Nancy
4 pages
2018 Civic Education 2ND Edition Study Kit Authors
No ratings yet
2018 Civic Education 2ND Edition Study Kit Authors
1 page
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
From Everand
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
Fouad Sabry
No ratings yet