0% found this document useful (0 votes)
38 views17 pages

MLT Unit-5 Notes

Uploaded by

srimaddhesia9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views17 pages

MLT Unit-5 Notes

Uploaded by

srimaddhesia9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

UNIT-5

 Reinforcement Learning
Reinforcement learning works on a feedback-based process, in which an AI agent
(A software component) automatically explores its surrounding by hitting & trail,
taking action, learning from experiences, and improving its performance. Agent
gets rewarded for each good action and get punished for each bad action; hence the
goal of reinforcement learning agent is to maximize the rewards.

In reinforcement learning, there is no labelled data like supervised learning, and


agents learn from their experiences only.

The reinforcement learning process is similar to a human being; for example, a


child learns various things by experiences in his day-to-day life. An example of
reinforcement learning is to play a game, where the Game is the environment,
moves of an agent at each step define states, and the goal of the agent is to get a
high score. Agent receives feedback in terms of punishment and rewards.

Application Of Reinforcement Learning:

1.Robotics: RL is used in Robot navigation, Robo-soccer,


walking, juggling, etc.

2. Control: RL can be used for adaptive control such as Factory


processes, admission control in telecommunication, and
Helicopter pilot is an example of reinforcement learning.

3.Game Playing: RL can be used in Game playing such as tic-


tac-toe, chess, etc.

4.Chemistry: RL can be used for optimizing chemical reactions.

5.Business: RL is now used for business strategy planning.


6.Manufacturing: In various automobile manufacturing
companies, the robots use deep reinforcement learning to pick
goods and put them in some containers.

7.Finance Sector: The RL is currently used in the finance sector


for evaluating trading strategies.

 Elements of Reinforcement Learning


There are four main elements of Reinforcement Learning, which are given
below:

1. Policy
2. Reward Signal
3. Value Function
4. Model of the environment

1) Policy: A policy can be defined as a way how an agent behaves at a


given time. It maps the perceived states of the environment to the actions
taken on those states. A policy is the core element of the RL as it alone can
define the behavior of the agent. In some cases, it may be a simple function
or a lookup table, whereas, for other cases, it may involve general
computation as a search process. It could be deterministic or a stochastic
policy:

For deterministic policy: a = π(s)

For stochastic policy: π(a | s) = P[At =a | St = s]

2) Reward Signal: The goal of reinforcement learning is defined by the


reward signal. At each state, the environment sends an immediate signal to
the learning agent, and this signal is known as a reward signal. These
rewards are given according to the good and bad actions taken by the agent.
The agent's main objective is to maximize the total number of rewards for
good actions. The reward signal can change the policy, such as if an action
selected by the agent leads to low reward, then the policy may change to
select other actions in the future.

3) Value Function: The value function gives information about how good
the situation and action are and how much reward an agent can expect. A
reward indicates the immediate signal for each good and bad action,
whereas a value function specifies the good state and action for the
future. The value function depends on the reward as, without reward, there
could be no value. The goal of estimating values is to achieve more rewards.

4) Model: The last element of reinforcement learning is the model, which


mimics the behavior of the environment. With the help of the model, one can
make inferences about how the environment will behave. Such as, if a state
and an action are given, then a model can predict the next state and reward.

The model is used for planning, which means it provides a way to take a
course of action by considering all future situations before experiencing
those situations. The approaches for solving the RL problems with the help
of the model are termed as the model-based approach. Comparatively,
an approach without using a model is called a model-free approach.

 Types of Reinforcement learning:


There are mainly two types of reinforcement learning, which are:

o Positive Reinforcement
o Negative Reinforcement

Positive Reinforcement: The positive reinforcement learning means adding


something to increase the tendency that expected behavior would occur
again. It impacts positively on the behavior of the agent and increases the
strength of the behavior. This type of reinforcement can sustain the changes
for a long time, but too much positive reinforcement may lead to an overload
of states that can reduce the consequences.

Negative Reinforcement: The negative reinforcement learning is opposite


to the positive reinforcement as it increases the tendency that the specific
behavior will occur again by avoiding the negative condition. It can be more
effective than the positive reinforcement depending on situation and
behavior, but it provides reinforcement only to meet minimum behavior.

How to represent the agent state?


We can represent the agent state using the Markov State that contains all
the required information from history. The State St is Markov state if it
follows the given condition:

P [St+1 | St ] = P[St +1 | S1,......, St]


The Markov state follows the Markov property, which says that the future
is independent of the past and can only be defined with the present. The RL
works on fully observable environments, where the agent can observe the
environment and act for the new state. The complete process is known as
Markov Decision process, which is explained below:

Markov Decision Process


Markov Decision Process or MDP, is used to formalize the reinforcement
learning problems. If the environment is completely observable, then its
dynamic can be modeled as a Markov Process. In MDP, the agent
constantly interacts with the environment and performs actions; at each
action, the environment responds and generates a new state.

MDP is used to describe the environment for the RL, and almost all the RL
problem can be formalized using MDP.

MDP contains a tuple of four elements (S, A, Pa, Ra):

o A set of finite States S


o A set of finite Actions A
o Rewards received after transitioning from state S to state S', due to action a.
o Probability Pa.

MDP uses Markov property, and to better understand the MDP, we need to
learn about it.
Markov Property:
It says that "If the agent is present in the current state S1, performs
an action a1 and move to the state s2, then the state transition
from s1 to s2 only depends on the current state and future action
and states do not depend on past actions, rewards, or states."

Or, in other words, as per Markov Property, the current state transition does
not depend on any past action or state. Hence, MDP is an RL problem that
satisfies the Markov property. Such as in a Chess game, the players only
focus on the current state and do not need to remember past
actions or states.

Finite MDP:

A finite MDP is when there are finite states, finite rewards, and finite actions.
In RL, we consider only the finite MDP.

Markov Process:
Markov Process is a memoryless process with a sequence of random states
S1, S2, ....., St that uses the Markov Property. The Markov process is also
known as Markov chain, which is a tuple (S, P) on state S and transition
function P. These two components (S and P) can define the dynamics of the
system.

Reinforcement Learning Algorithms


Reinforcement learning algorithms are mainly used in AI applications and
gaming applications. The main used algorithms are:

o Q-Learning:
o Q-learning is an Off-policy RL algorithm, which is used for the
temporal difference Learning. The temporal difference learning
methods are the way of comparing temporally successive predictions.
o It learns the value function Q (S, a), which means how good to take
action "a" at a particular state "s."
o The below flowchart explains the working of Q- learning:
o State Action Reward State action (SARSA):
o SARSA stands for State Action Reward State action, which is
an on-policy temporal difference learning method. The on-policy
control method selects the action for each state while learning using a
specific policy.
o The goal of SARSA is to calculate the Q π (s, a) for the selected
current policy π and all pairs of (s-a).
o The main difference between Q-learning and SARSA algorithms is
that unlike Q-learning, the maximum reward for the next state
is not required for updating the Q-value in the table.
o In SARSA, new action and reward are selected using the same policy,
which has determined the original action.
o The SARSA is named because it uses the quintuple Q(s, a, r, s', a').

Where, s: original state, a: Original action r: reward


observed while following the state , s' and a': New state,
action pair.

o Deep Q Neural Network (DQN):


o As the name suggests, DQN is a Q-learning using Neural networks.
o For a big state space environment, it will be a challenging and complex
task to define and update a Q-table.
o To solve such an issue, we can use a DQN algorithm. Where, instead of
defining a Q-table, neural network approximates the Q-values for each
action and state.

Q-Learning Explanation:

o Q-learning is a popular model-free reinforcement learning algorithm based on


the Bellman equation.
o The main objective of Q-learning is to learn the policy which can
inform the agent what actions should be taken for maximizing the
reward under what circumstances.
o It is an off-policy RL that attempts to find the best action to take at a current
state.
o The goal of the agent in Q-learning is to maximize the value of Q.
o The value of Q-learning can be derived from the Bellman equation. Consider
the Bellman equation given below:

In the equation, we have various components, including reward, discount


factor (γ), probability, and end states s'. But there is no any Q-value is given
so first consider the below image:
In the above image, we can see there is an agent who has three values
options, V(s1), V(s2), V(s3). As this is MDP, so agent only cares for the current
state and the future state. The agent can go to any direction (Up, Left, or
Right), so he needs to decide where to go for the optimal path. Here agent
will take a move as per probability bases and changes the state. But if we
want some exact moves, so for this, we need to make some changes in
terms of Q-value. Consider the below image:

Q- represents the quality of the actions at each state. So instead of using a


value at each state, we will use a pair of state and action, i.e., Q(s, a). Q-
value specifies that which action is more lubricative than others, and
according to the best Q-value, the agent takes his next move. The Bellman
equation can be used for deriving the Q-value.

To perform any action, the agent will get a reward R(s, a), and also he will
end up on a certain state, so the Q -value equation will be:

Hence, we can say that, V(s) = max [Q(s, a)]

The above formula is used to estimate the Q-values in Q-Learning.


What is 'Q' in Q-learning?

The Q stands for quality in Q-learning, which means it specifies the quality
of an action taken by the agent.

Q-table: A Q-table or matrix is created while performing the Q-learning. The


table follows the state and action pair, i.e., [s, a], and initializes the values to
zero. After each action, the table is updated, and the q-values are stored
within the table.

The RL agent uses this Q-table as a reference table to select the best action
based on the q-values.

 Difference between Reinforcement Learning and


Supervised Learning
Reinforcement Learning and Supervised Learning both are the part of
machine learning, but both types of learnings are far opposite to each other.
The RL agents interact with the environment, explore it, take action, and get
rewarded. Whereas supervised learning algorithms learn from the labeled
dataset and, on the basis of the training, predict the output.

The difference table between RL and Supervised learning is given below:

Reinforcement Learning Supervised Learning

RL works by interacting with the Supervised learning works on the


environment. existing dataset.

The RL algorithm works like the Supervised Learning works as when a


human brain works when making human learns things in the
some decisions. supervision of a guide.

There is no labeled dataset is present The labeled dataset is present.

No previous training is provided to Training is provided to the algorithm


the learning agent. so that it can predict the output.

RL helps to take decisions In Supervised learning, decisions are


sequentially. made when input is given.
 Deep Q Learning:

Deep Q Learning uses the Q-learning idea and takes it one step further.

Instead of using a Q-table, we use a Neural Network that takes a state

and approximates the Q-values for each action based on that state.

We do this because using a classic Q-table is not very scalable. It might work

for simple games, but in a more complex game with dozens of possible

actions and game states the Q-table will soon get complex and cannot be

solved efficiently anymore.

So now we use a Deep Neural Network that gets the state as input,

and produces different Q values for each action. Then again we choose

the action with the highest Q-value. The learning process is still the same

with the iterative update approach, but instead of updating the Q-Table,

here we update the weights in the neural network so that the outputs are

improved.
Genetics Algorithm:
A genetic algorithm is an adaptive heuristic search algorithm inspired by
"Darwin's theory of evolution in Nature." It is used to solve optimization
problems in machine learning. It is one of the important algorithms as it
helps solve complex problems that would take a long time to solve.

Genetic Algorithms are being widely used in different real-world applications,


for example, Designing electronic circuits, code-breaking, image
processing, and artificial creativity.

In this topic, we will explain Genetic algorithms in detail, including basic


terminologies used in Genetic algorithm, how it works, advantages and
limitations of genetic algorithm, etc.

o Population: Population is the subset of all possible or probable


solutions which can solve the given problem.
o Chromosomes: A chromosome is one of the solutions in the
population for the given problem, and the collection of gene generates
a chromosome.
o Gene: A chromosome is divided into a different gene, or it is an
element of the chromosome.
o Allele: Allele is the value provided to the gene within a particular
chromosome.
o Fitness Function: The fitness function is used to determine the
individual's fitness level in the population. It means the ability of an
individual to compete with other individuals. In every iteration,
individuals are evaluated based on their fitness function.
o Genetic Operators: In a genetic algorithm, the best individual mate
to regenerate offspring better than parents. Here genetic operators
play a role in changing the genetic composition of the next generation.
o Selection
After calculating the fitness of every existent in the population, a selection
process is used to determine which of the individuals in the population will
get to reproduce and produce the seed that will form the coming generation.

Types of selection styles available

o Roulette wheel selection


o Event selection
o Rank- grounded selection

So, now we can define a genetic algorithm as a heuristic search algorithm to


solve optimization problems. It is a subset of evolutionary algorithms, which
is used in computing. A genetic algorithm uses genetic and natural selection
concepts to solve optimization problems.

How Genetic Algorithm Work?


The genetic algorithm works on the evolutionary generational cycle to
generate high-quality solutions. These algorithms use different operations
that either enhance or replace the population to give an improved fit
solution.

It basically involves five phases to solve the complex optimization problems,


which are given as below:

o Initialization
o Fitness Assignment
o Selection
o Reproduction
o Termination

1. Initialization
The process of a genetic algorithm starts by generating the set of
individuals, which is called population. Here each individual is the solution for
the given problem. An individual contains or is characterized by a set of
parameters called Genes. Genes are combined into a string and generate
chromosomes, which is the solution to the problem. One of the most popular
techniques for initialization is the use of random binary strings.
2. Fitness Assignment
Fitness function is used to determine how fit an individual is? It means the
ability of an individual to compete with other individuals. In every iteration,
individuals are evaluated based on their fitness function. The fitness function
provides a fitness score to each individual. This score further determines the
probability of being selected for reproduction. The high the fitness score, the
more chances of getting selected for reproduction.

3. Selection
The selection phase involves the selection of individuals for the reproduction
of offspring. All the selected individuals are then arranged in a pair of two to
increase reproduction. Then these individuals transfer their genes to the next
generation.

There are three types of Selection methods available, which are:

o Roulette wheel selection


o Tournament selection
o Rank-based selection

4. Reproduction
After the selection process, the creation of a child occurs in the reproduction
step. In this step, the genetic algorithm uses two variation operators that are
applied to the parent population. The two operators involved in the
reproduction phase are given below:

o Crossover: The crossover plays a most significant role in the reproduction


phase of the genetic algorithm. In this process, a crossover point is selected
at random within the genes. Then the crossover operator swaps genetic
information of two parents from the current generation to produce a new
individual representing the offspring.

The genes of parents are exchanged among themselves until the crossover
point is met. These newly generated offspring are added to the population.
This process is also called or crossover. Types of crossover styles available:
o One point crossover
o Two-point crossover
o Livery crossover
o Inheritable Algorithms crossover
o Mutation
The mutation operator inserts random genes in the offspring (new child) to
maintain the diversity in the population. It can be done by flipping some bits
in the chromosomes.
Mutation helps in solving the issue of premature convergence and enhances
diversification. The below image shows the mutation process:
Types of mutation styles available,
o Flip bit mutation
o Gaussian mutation
o Exchange/Swap mutation
5. Termination
After the reproduction phase, a stopping criterion is applied as a base for
termination. The algorithm terminates after the threshold fitness solution is
reached. It will identify the final solution as the best solution in the
population.

 Application of Genetic Algorithms:

1. Recurrent Neural Network

2. Mutation testing

3. Code breaking

4. Filtering and signal processing

5. Learning fuzzy rule base

Life Cycle of Genetic Algorithm:


1. Initialization of Population: Every gene represents a
parameter (variables) in the solution. This collection of
parameters that forms the solution is the chromosome. The
population is a collection of chromosomes. Order of genes on
the chromosome matters.

2. Fitness Function: Out of the available chromosomes, we


have to select the best ones for the reproduction of off-
spring, so each chromosome is given a fitness value. The
fitness score helps to select the individuals which will be
used for reproduction.

3. Selection: The main goal of this phase is to find the


region where the chances of getting the best solution are
more.

4. Reproduction: Generation of off-springs happen in 2


ways:

 Crossover : A random point is selected while mating a


pair of parents to generate offspring.
 Mutation: It happens to take care of diversity among
the population and stop premature convergence.

5. Convergence (when to stop):

 When there is no improvement in the solution.


 When a hard and fast range of generations and time
is reached.
 Till an acceptable solution is obtained.

You might also like