0% found this document useful (0 votes)

38 views17 pages

MLT Unit-5 Notes

Uploaded by

srimaddhesia9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views17 pages

MLT Unit-5 Notes

Uploaded by

srimaddhesia9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 17

UNIT-5

 Reinforcement Learning
Reinforcement learning works on a feedback-based process, in which an AI agent
(A software component) automatically explores its surrounding by hitting & trail,
taking action, learning from experiences, and improving its performance. Agent
gets rewarded for each good action and get punished for each bad action; hence the
goal of reinforcement learning agent is to maximize the rewards.

In reinforcement learning, there is no labelled data like supervised learning, and

agents learn from their experiences only.

The reinforcement learning process is similar to a human being; for example, a

child learns various things by experiences in his day-to-day life. An example of
reinforcement learning is to play a game, where the Game is the environment,
moves of an agent at each step define states, and the goal of the agent is to get a
high score. Agent receives feedback in terms of punishment and rewards.

Application Of Reinforcement Learning:

1.Robotics: RL is used in Robot navigation, Robo-soccer,

walking, juggling, etc.

2. Control: RL can be used for adaptive control such as Factory

processes, admission control in telecommunication, and
Helicopter pilot is an example of reinforcement learning.

3.Game Playing: RL can be used in Game playing such as tic-

tac-toe, chess, etc.

4.Chemistry: RL can be used for optimizing chemical reactions.

5.Business: RL is now used for business strategy planning.

6.Manufacturing: In various automobile manufacturing
companies, the robots use deep reinforcement learning to pick
goods and put them in some containers.

7.Finance Sector: The RL is currently used in the finance sector

for evaluating trading strategies.

 Elements of Reinforcement Learning

There are four main elements of Reinforcement Learning, which are given
below:

1. Policy
2. Reward Signal
3. Value Function
4. Model of the environment

1) Policy: A policy can be defined as a way how an agent behaves at a

given time. It maps the perceived states of the environment to the actions
taken on those states. A policy is the core element of the RL as it alone can
define the behavior of the agent. In some cases, it may be a simple function
or a lookup table, whereas, for other cases, it may involve general
computation as a search process. It could be deterministic or a stochastic
policy:

For deterministic policy: a = π(s)

For stochastic policy: π(a | s) = P[At =a | St = s]

2) Reward Signal: The goal of reinforcement learning is defined by the

reward signal. At each state, the environment sends an immediate signal to
the learning agent, and this signal is known as a reward signal. These
rewards are given according to the good and bad actions taken by the agent.
The agent's main objective is to maximize the total number of rewards for
good actions. The reward signal can change the policy, such as if an action
selected by the agent leads to low reward, then the policy may change to
select other actions in the future.

3) Value Function: The value function gives information about how good
the situation and action are and how much reward an agent can expect. A
reward indicates the immediate signal for each good and bad action,
whereas a value function specifies the good state and action for the
future. The value function depends on the reward as, without reward, there
could be no value. The goal of estimating values is to achieve more rewards.

4) Model: The last element of reinforcement learning is the model, which

mimics the behavior of the environment. With the help of the model, one can
make inferences about how the environment will behave. Such as, if a state
and an action are given, then a model can predict the next state and reward.

The model is used for planning, which means it provides a way to take a
course of action by considering all future situations before experiencing
those situations. The approaches for solving the RL problems with the help
of the model are termed as the model-based approach. Comparatively,
an approach without using a model is called a model-free approach.

 Types of Reinforcement learning:

There are mainly two types of reinforcement learning, which are:

o Positive Reinforcement
o Negative Reinforcement

Positive Reinforcement: The positive reinforcement learning means adding

something to increase the tendency that expected behavior would occur
again. It impacts positively on the behavior of the agent and increases the
strength of the behavior. This type of reinforcement can sustain the changes
for a long time, but too much positive reinforcement may lead to an overload
of states that can reduce the consequences.

Negative Reinforcement: The negative reinforcement learning is opposite

to the positive reinforcement as it increases the tendency that the specific
behavior will occur again by avoiding the negative condition. It can be more
effective than the positive reinforcement depending on situation and
behavior, but it provides reinforcement only to meet minimum behavior.

How to represent the agent state?

We can represent the agent state using the Markov State that contains all
the required information from history. The State St is Markov state if it
follows the given condition:

P [St+1 | St ] = P[St +1 | S1,......, St]

The Markov state follows the Markov property, which says that the future
is independent of the past and can only be defined with the present. The RL
works on fully observable environments, where the agent can observe the
environment and act for the new state. The complete process is known as
Markov Decision process, which is explained below:

Markov Decision Process

Markov Decision Process or MDP, is used to formalize the reinforcement
learning problems. If the environment is completely observable, then its
dynamic can be modeled as a Markov Process. In MDP, the agent
constantly interacts with the environment and performs actions; at each
action, the environment responds and generates a new state.

MDP is used to describe the environment for the RL, and almost all the RL
problem can be formalized using MDP.

MDP contains a tuple of four elements (S, A, Pa, Ra):

o A set of finite States S

o A set of finite Actions A
o Rewards received after transitioning from state S to state S', due to action a.
o Probability Pa.

MDP uses Markov property, and to better understand the MDP, we need to
learn about it.
Markov Property:
It says that "If the agent is present in the current state S1, performs
an action a1 and move to the state s2, then the state transition
from s1 to s2 only depends on the current state and future action
and states do not depend on past actions, rewards, or states."

Or, in other words, as per Markov Property, the current state transition does
not depend on any past action or state. Hence, MDP is an RL problem that
satisfies the Markov property. Such as in a Chess game, the players only
focus on the current state and do not need to remember past
actions or states.

Finite MDP:

A finite MDP is when there are finite states, finite rewards, and finite actions.
In RL, we consider only the finite MDP.

Markov Process:
Markov Process is a memoryless process with a sequence of random states
S1, S2, ....., St that uses the Markov Property. The Markov process is also
known as Markov chain, which is a tuple (S, P) on state S and transition
function P. These two components (S and P) can define the dynamics of the
system.

Reinforcement Learning Algorithms

Reinforcement learning algorithms are mainly used in AI applications and
gaming applications. The main used algorithms are:

o Q-Learning:
o Q-learning is an Off-policy RL algorithm, which is used for the
temporal difference Learning. The temporal difference learning
methods are the way of comparing temporally successive predictions.
o It learns the value function Q (S, a), which means how good to take
action "a" at a particular state "s."
o The below flowchart explains the working of Q- learning:
o State Action Reward State action (SARSA):
o SARSA stands for State Action Reward State action, which is
an on-policy temporal difference learning method. The on-policy
control method selects the action for each state while learning using a
specific policy.
o The goal of SARSA is to calculate the Q π (s, a) for the selected
current policy π and all pairs of (s-a).
o The main difference between Q-learning and SARSA algorithms is
that unlike Q-learning, the maximum reward for the next state
is not required for updating the Q-value in the table.
o In SARSA, new action and reward are selected using the same policy,
which has determined the original action.
o The SARSA is named because it uses the quintuple Q(s, a, r, s', a').

Where, s: original state, a: Original action r: reward

observed while following the state , s' and a': New state,
action pair.

o Deep Q Neural Network (DQN):

o As the name suggests, DQN is a Q-learning using Neural networks.
o For a big state space environment, it will be a challenging and complex
task to define and update a Q-table.
o To solve such an issue, we can use a DQN algorithm. Where, instead of
defining a Q-table, neural network approximates the Q-values for each
action and state.

Q-Learning Explanation:

o Q-learning is a popular model-free reinforcement learning algorithm based on

the Bellman equation.
o The main objective of Q-learning is to learn the policy which can
inform the agent what actions should be taken for maximizing the
reward under what circumstances.
o It is an off-policy RL that attempts to find the best action to take at a current
state.
o The goal of the agent in Q-learning is to maximize the value of Q.
o The value of Q-learning can be derived from the Bellman equation. Consider
the Bellman equation given below:

In the equation, we have various components, including reward, discount

factor (γ), probability, and end states s'. But there is no any Q-value is given
so first consider the below image:
In the above image, we can see there is an agent who has three values
options, V(s1), V(s2), V(s3). As this is MDP, so agent only cares for the current
state and the future state. The agent can go to any direction (Up, Left, or
Right), so he needs to decide where to go for the optimal path. Here agent
will take a move as per probability bases and changes the state. But if we
want some exact moves, so for this, we need to make some changes in
terms of Q-value. Consider the below image:

Q- represents the quality of the actions at each state. So instead of using a

value at each state, we will use a pair of state and action, i.e., Q(s, a). Q-
value specifies that which action is more lubricative than others, and
according to the best Q-value, the agent takes his next move. The Bellman
equation can be used for deriving the Q-value.

To perform any action, the agent will get a reward R(s, a), and also he will
end up on a certain state, so the Q -value equation will be:

Hence, we can say that, V(s) = max [Q(s, a)]

The above formula is used to estimate the Q-values in Q-Learning.

What is 'Q' in Q-learning?

The Q stands for quality in Q-learning, which means it specifies the quality
of an action taken by the agent.

Q-table: A Q-table or matrix is created while performing the Q-learning. The

table follows the state and action pair, i.e., [s, a], and initializes the values to
zero. After each action, the table is updated, and the q-values are stored
within the table.

The RL agent uses this Q-table as a reference table to select the best action
based on the q-values.

 Difference between Reinforcement Learning and

Supervised Learning
Reinforcement Learning and Supervised Learning both are the part of
machine learning, but both types of learnings are far opposite to each other.
The RL agents interact with the environment, explore it, take action, and get
rewarded. Whereas supervised learning algorithms learn from the labeled
dataset and, on the basis of the training, predict the output.

The difference table between RL and Supervised learning is given below:

Reinforcement Learning Supervised Learning

RL works by interacting with the Supervised learning works on the

environment. existing dataset.

The RL algorithm works like the Supervised Learning works as when a

human brain works when making human learns things in the
some decisions. supervision of a guide.

There is no labeled dataset is present The labeled dataset is present.

No previous training is provided to Training is provided to the algorithm

the learning agent. so that it can predict the output.

RL helps to take decisions In Supervised learning, decisions are

sequentially. made when input is given.
 Deep Q Learning:

Deep Q Learning uses the Q-learning idea and takes it one step further.

Instead of using a Q-table, we use a Neural Network that takes a state

and approximates the Q-values for each action based on that state.

We do this because using a classic Q-table is not very scalable. It might work

for simple games, but in a more complex game with dozens of possible

actions and game states the Q-table will soon get complex and cannot be

solved efficiently anymore.

So now we use a Deep Neural Network that gets the state as input,

and produces different Q values for each action. Then again we choose

the action with the highest Q-value. The learning process is still the same

with the iterative update approach, but instead of updating the Q-Table,

here we update the weights in the neural network so that the outputs are

improved.
Genetics Algorithm:
A genetic algorithm is an adaptive heuristic search algorithm inspired by
"Darwin's theory of evolution in Nature." It is used to solve optimization
problems in machine learning. It is one of the important algorithms as it
helps solve complex problems that would take a long time to solve.

Genetic Algorithms are being widely used in different real-world applications,

for example, Designing electronic circuits, code-breaking, image
processing, and artificial creativity.

In this topic, we will explain Genetic algorithms in detail, including basic

terminologies used in Genetic algorithm, how it works, advantages and
limitations of genetic algorithm, etc.

o Population: Population is the subset of all possible or probable

solutions which can solve the given problem.
o Chromosomes: A chromosome is one of the solutions in the
population for the given problem, and the collection of gene generates
a chromosome.
o Gene: A chromosome is divided into a different gene, or it is an
element of the chromosome.
o Allele: Allele is the value provided to the gene within a particular
chromosome.
o Fitness Function: The fitness function is used to determine the
individual's fitness level in the population. It means the ability of an
individual to compete with other individuals. In every iteration,
individuals are evaluated based on their fitness function.
o Genetic Operators: In a genetic algorithm, the best individual mate
to regenerate offspring better than parents. Here genetic operators
play a role in changing the genetic composition of the next generation.
o Selection
After calculating the fitness of every existent in the population, a selection
process is used to determine which of the individuals in the population will
get to reproduce and produce the seed that will form the coming generation.

Types of selection styles available

o Roulette wheel selection

o Event selection
o Rank- grounded selection

So, now we can define a genetic algorithm as a heuristic search algorithm to

solve optimization problems. It is a subset of evolutionary algorithms, which
is used in computing. A genetic algorithm uses genetic and natural selection
concepts to solve optimization problems.

How Genetic Algorithm Work?

The genetic algorithm works on the evolutionary generational cycle to
generate high-quality solutions. These algorithms use different operations
that either enhance or replace the population to give an improved fit
solution.

It basically involves five phases to solve the complex optimization problems,

which are given as below:

o Initialization
o Fitness Assignment
o Selection
o Reproduction
o Termination

1. Initialization
The process of a genetic algorithm starts by generating the set of
individuals, which is called population. Here each individual is the solution for
the given problem. An individual contains or is characterized by a set of
parameters called Genes. Genes are combined into a string and generate
chromosomes, which is the solution to the problem. One of the most popular
techniques for initialization is the use of random binary strings.
2. Fitness Assignment
Fitness function is used to determine how fit an individual is? It means the
ability of an individual to compete with other individuals. In every iteration,
individuals are evaluated based on their fitness function. The fitness function
provides a fitness score to each individual. This score further determines the
probability of being selected for reproduction. The high the fitness score, the
more chances of getting selected for reproduction.

3. Selection
The selection phase involves the selection of individuals for the reproduction
of offspring. All the selected individuals are then arranged in a pair of two to
increase reproduction. Then these individuals transfer their genes to the next
generation.

There are three types of Selection methods available, which are:

o Roulette wheel selection

o Tournament selection
o Rank-based selection

4. Reproduction
After the selection process, the creation of a child occurs in the reproduction
step. In this step, the genetic algorithm uses two variation operators that are
applied to the parent population. The two operators involved in the
reproduction phase are given below:

o Crossover: The crossover plays a most significant role in the reproduction

phase of the genetic algorithm. In this process, a crossover point is selected
at random within the genes. Then the crossover operator swaps genetic
information of two parents from the current generation to produce a new
individual representing the offspring.

The genes of parents are exchanged among themselves until the crossover
point is met. These newly generated offspring are added to the population.
This process is also called or crossover. Types of crossover styles available:
o One point crossover
o Two-point crossover
o Livery crossover
o Inheritable Algorithms crossover
o Mutation
The mutation operator inserts random genes in the offspring (new child) to
maintain the diversity in the population. It can be done by flipping some bits
in the chromosomes.
Mutation helps in solving the issue of premature convergence and enhances
diversification. The below image shows the mutation process:
Types of mutation styles available,
o Flip bit mutation
o Gaussian mutation
o Exchange/Swap mutation
5. Termination
After the reproduction phase, a stopping criterion is applied as a base for
termination. The algorithm terminates after the threshold fitness solution is
reached. It will identify the final solution as the best solution in the
population.

 Application of Genetic Algorithms:

1. Recurrent Neural Network

2. Mutation testing

3. Code breaking

4. Filtering and signal processing

5. Learning fuzzy rule base

Life Cycle of Genetic Algorithm:

1. Initialization of Population: Every gene represents a
parameter (variables) in the solution. This collection of
parameters that forms the solution is the chromosome. The
population is a collection of chromosomes. Order of genes on
the chromosome matters.

2. Fitness Function: Out of the available chromosomes, we

have to select the best ones for the reproduction of off-
spring, so each chromosome is given a fitness value. The
fitness score helps to select the individuals which will be
used for reproduction.

3. Selection: The main goal of this phase is to find the

region where the chances of getting the best solution are
more.

4. Reproduction: Generation of off-springs happen in 2

ways:

 Crossover : A random point is selected while mating a

pair of parents to generate offspring.
 Mutation: It happens to take care of diversity among
the population and stop premature convergence.

5. Convergence (when to stop):

 When there is no improvement in the solution.

 When a hard and fast range of generations and time
is reached.
 Till an acceptable solution is obtained.

Concepts of Genetics 3rd Edition Brooker
No ratings yet
Concepts of Genetics 3rd Edition Brooker
302 pages
Ai (It) Unit-5
No ratings yet
Ai (It) Unit-5
43 pages
Quiz Eukaryotes Gene Regulation 1ST Att
No ratings yet
Quiz Eukaryotes Gene Regulation 1ST Att
4 pages
Dna Manipulation of The Human Race With Few Pictures
100% (7)
Dna Manipulation of The Human Race With Few Pictures
50 pages
CMPE257 - W10C13 - Reinforcement Learning
No ratings yet
CMPE257 - W10C13 - Reinforcement Learning
161 pages
Course Syllabus in Genetics
No ratings yet
Course Syllabus in Genetics
5 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
25 pages
Keragaman Genetik Varietas Padi Japonica Dan Indica Berdasarkan Marka Dna Terkait Mutu Rasa
No ratings yet
Keragaman Genetik Varietas Padi Japonica Dan Indica Berdasarkan Marka Dna Terkait Mutu Rasa
7 pages
(Compendium of Plant Genomes) Schuyler S. Korban - The Pear Genome-Springer International Publishing (2019)
No ratings yet
(Compendium of Plant Genomes) Schuyler S. Korban - The Pear Genome-Springer International Publishing (2019)
322 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
Maai 6
No ratings yet
Maai 6
143 pages
Genetics MAID 2024-25
No ratings yet
Genetics MAID 2024-25
20 pages
Biophysical Approach To Understand Life and Cancer
No ratings yet
Biophysical Approach To Understand Life and Cancer
21 pages
Unit No. 05 - Reinforced and Deep Learning
No ratings yet
Unit No. 05 - Reinforced and Deep Learning
44 pages
ML Unit-4
No ratings yet
ML Unit-4
10 pages
Reinforcement Learning Overview
No ratings yet
Reinforcement Learning Overview
2 pages
DLMAIRIL01 Q4-2024 Session1
No ratings yet
DLMAIRIL01 Q4-2024 Session1
84 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
15 pages
Module - 1 - Reinforcement Learning and Markov Decision Process
No ratings yet
Module - 1 - Reinforcement Learning and Markov Decision Process
19 pages
Unit 6
No ratings yet
Unit 6
34 pages
Reinforcement Learning (RL) : Agent
No ratings yet
Reinforcement Learning (RL) : Agent
35 pages
Codigo 1
No ratings yet
Codigo 1
11 pages
Reinforcement Learning2A
No ratings yet
Reinforcement Learning2A
88 pages
Bio Project
100% (5)
Bio Project
17 pages
4.2 Genes and DNA
No ratings yet
4.2 Genes and DNA
18 pages
Unit 5
No ratings yet
Unit 5
10 pages
Genes Lesson1
No ratings yet
Genes Lesson1
13 pages
Binf732 Dhruvam Shukla
No ratings yet
Binf732 Dhruvam Shukla
26 pages
21ai020 & Reinforcement Learning UNIT 1-LM:1
No ratings yet
21ai020 & Reinforcement Learning UNIT 1-LM:1
8 pages
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
No ratings yet
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
16 pages
3.RL Unit 3
No ratings yet
3.RL Unit 3
31 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
13 pages
Human Genome Project
No ratings yet
Human Genome Project
49 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Biologyquiz 1
No ratings yet
Biologyquiz 1
3 pages
Satuan Pendidikan Kerjasama (SPK) SD - SMP - Sma Teuku Nyak Arif Fatih Bilingual School
No ratings yet
Satuan Pendidikan Kerjasama (SPK) SD - SMP - Sma Teuku Nyak Arif Fatih Bilingual School
8 pages
Unit-5 ML Notes
No ratings yet
Unit-5 ML Notes
31 pages
Natural Selection 2.5
No ratings yet
Natural Selection 2.5
26 pages
Introduction To Development: Principles of Development 4e
No ratings yet
Introduction To Development: Principles of Development 4e
28 pages
Unit-5 Reinforcemnt and Q Learning
No ratings yet
Unit-5 Reinforcemnt and Q Learning
45 pages
Genetics Chapter 16
No ratings yet
Genetics Chapter 16
49 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
29 pages
Unit 3
No ratings yet
Unit 3
29 pages
L11 Reinforcement Learning 1
No ratings yet
L11 Reinforcement Learning 1
18 pages
Unit-8 - Reinforcement Learning
No ratings yet
Unit-8 - Reinforcement Learning
52 pages
Reinforcement Learning Enhanced
No ratings yet
Reinforcement Learning Enhanced
3 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
28 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
10 pages
L-14 - Reinforcement-L-d-07062024-111949am
No ratings yet
L-14 - Reinforcement-L-d-07062024-111949am
22 pages
Module 1
No ratings yet
Module 1
72 pages
Unit 4
No ratings yet
Unit 4
56 pages
Unit4 (AI) 2024 Docx-1
No ratings yet
Unit4 (AI) 2024 Docx-1
22 pages
Unit 3
No ratings yet
Unit 3
12 pages
IntroductiontoRL BR
No ratings yet
IntroductiontoRL BR
22 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
19 pages
Deep Reinforcement Learning: From Q-Learning To Deep Q-Learning
No ratings yet
Deep Reinforcement Learning: From Q-Learning To Deep Q-Learning
9 pages
Reinforcement Learning: Nguyen Do Van, PHD
No ratings yet
Reinforcement Learning: Nguyen Do Van, PHD
40 pages
CLS Aipmt 15 16 XIII Bot Study Package 4 Set 1 Chapter 16 PDF
100% (1)
CLS Aipmt 15 16 XIII Bot Study Package 4 Set 1 Chapter 16 PDF
30 pages
HL BIOLOGY PAST QUESTIONS PAPER Printing
No ratings yet
HL BIOLOGY PAST QUESTIONS PAPER Printing
192 pages
Blast Primer Its1 Dan Its4
No ratings yet
Blast Primer Its1 Dan Its4
6 pages
Bio 9.4
No ratings yet
Bio 9.4
69 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
RL RS-Unit - 3
No ratings yet
RL RS-Unit - 3
6 pages
Smallpox
No ratings yet
Smallpox
11 pages
Global Gene Expression Profiling of The Polyamine System in Suicide Completers
No ratings yet
Global Gene Expression Profiling of The Polyamine System in Suicide Completers
11 pages
Assignment 15 Modern AI
No ratings yet
Assignment 15 Modern AI
3 pages
Unit-5 (AI)
No ratings yet
Unit-5 (AI)
21 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
Artificial Intelligence: Computer Science & Engineering, Khulna University
No ratings yet
Artificial Intelligence: Computer Science & Engineering, Khulna University
30 pages
New Gcse: Foundation Tier Biology 1
No ratings yet
New Gcse: Foundation Tier Biology 1
15 pages
Reinforcement
No ratings yet
Reinforcement
9 pages
Lecture 9 Reiforcement Learning
No ratings yet
Lecture 9 Reiforcement Learning
29 pages
Unit-5 Mla
No ratings yet
Unit-5 Mla
22 pages
Reinforcement Learning, Q-Learning
No ratings yet
Reinforcement Learning, Q-Learning
20 pages
Lab Activity #10 BIO 113
No ratings yet
Lab Activity #10 BIO 113
2 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
Bio F111 1002
No ratings yet
Bio F111 1002
5 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Lecture 5
No ratings yet
Lecture 5
28 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
ĐỀ THI ĐỀ XUẤT TIẾNG ANH TỈNH 12
No ratings yet
ĐỀ THI ĐỀ XUẤT TIẾNG ANH TỈNH 12
17 pages
Unit 5
No ratings yet
Unit 5
45 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
ML Assignment 2
No ratings yet
ML Assignment 2
6 pages
Unit V
100% (1)
Unit V
24 pages
RL Unit 1
100% (1)
RL Unit 1
26 pages
WEEK2
No ratings yet
WEEK2
5 pages
Markov Decision Process: Fundamentals and Applications
From Everand
Markov Decision Process: Fundamentals and Applications
Fouad Sabry
No ratings yet

MLT Unit-5 Notes

Uploaded by

MLT Unit-5 Notes

Uploaded by

UNIT-5

In reinforcement learning, there is no labelled data like supervised learning, and

The reinforcement learning process is similar to a human being; for example, a

Application Of Reinforcement Learning:

1.Robotics: RL is used in Robot navigation, Robo-soccer,

2. Control: RL can be used for adaptive control such as Factory

3.Game Playing: RL can be used in Game playing such as tic-

4.Chemistry: RL can be used for optimizing chemical reactions.

5.Business: RL is now used for business strategy planning.

7.Finance Sector: The RL is currently used in the finance sector

 Elements of Reinforcement Learning

1) Policy: A policy can be defined as a way how an agent behaves at a

For deterministic policy: a = π(s)

For stochastic policy: π(a | s) = P[At =a | St = s]

2) Reward Signal: The goal of reinforcement learning is defined by the

4) Model: The last element of reinforcement learning is the model, which

 Types of Reinforcement learning:

Positive Reinforcement: The positive reinforcement learning means adding

Negative Reinforcement: The negative reinforcement learning is opposite

How to represent the agent state?

P [St+1 | St ] = P[St +1 | S1,......, St]

Markov Decision Process

MDP contains a tuple of four elements (S, A, Pa, Ra):

o A set of finite States S

Reinforcement Learning Algorithms

Where, s: original state, a: Original action r: reward

o Deep Q Neural Network (DQN):

o Q-learning is a popular model-free reinforcement learning algorithm based on

In the equation, we have various components, including reward, discount

Q- represents the quality of the actions at each state. So instead of using a

Hence, we can say that, V(s) = max [Q(s, a)]

The above formula is used to estimate the Q-values in Q-Learning.

Q-table: A Q-table or matrix is created while performing the Q-learning. The

 Difference between Reinforcement Learning and

The difference table between RL and Supervised learning is given below:

Reinforcement Learning Supervised Learning

RL works by interacting with the Supervised learning works on the

The RL algorithm works like the Supervised Learning works as when a

There is no labeled dataset is present The labeled dataset is present.

No previous training is provided to Training is provided to the algorithm

RL helps to take decisions In Supervised learning, decisions are

Instead of using a Q-table, we use a Neural Network that takes a state

solved efficiently anymore.

Genetic Algorithms are being widely used in different real-world applications,

In this topic, we will explain Genetic algorithms in detail, including basic

o Population: Population is the subset of all possible or probable

Types of selection styles available

o Roulette wheel selection

So, now we can define a genetic algorithm as a heuristic search algorithm to

How Genetic Algorithm Work?

It basically involves five phases to solve the complex optimization problems,

There are three types of Selection methods available, which are:

o Roulette wheel selection

o Crossover: The crossover plays a most significant role in the reproduction

 Application of Genetic Algorithms:

1. Recurrent Neural Network

4. Filtering and signal processing

5. Learning fuzzy rule base

Life Cycle of Genetic Algorithm:

2. Fitness Function: Out of the available chromosomes, we

3. Selection: The main goal of this phase is to find the

4. Reproduction: Generation of off-springs happen in 2

 Crossover : A random point is selected while mating a

5. Convergence (when to stop):

 When there is no improvement in the solution.

You might also like