0% found this document useful (0 votes)
39 views52 pages

Unit 5

Uploaded by

sksharma3058
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views52 pages

Unit 5

Uploaded by

sksharma3058
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Machine Learning

Techniques

KCS 055
Reinforcement Learning
Reinforcement Learning
• It is a feedback-based machine
learning approach.
• Learns depending on changes
occurring in environment without
any labelled data.
• Goal - To perform actions by looking
at the environment and get the
maximum positive rewards.
• Example: Chessboard
– Goal - To win the game
– Feedback- based on the right
choice
Reinforcement Learning

• The agent learn by its experience as there is no


labelled data.
• It is used to solve specific type of problem where
decision making is sequential, and the goal is long-
term, such as game-playing, robotics, etc.
Basic Components Of Reinforcement
Learning

• Agent → A hardware/software/computer program. For


ex: AI Robot, Robotic Car.
• Environment → The situation or surroundings of the
agent, ex: Road Highway.
• Action→ The movement of agent inside the
environment, ex: Move right/left/up/down.
• State → The situation returned by the environment
after each action.
Basic Components Of Reinforcement
Learning

• Reward → Positive feedback


• Penalty → Negative feedback
• Policy → Strategy of agent for next action.
• Policy Map → Agent’s action selection is called policy
map.
Steps in Reinforcement Learning
Take an Action

Get a Feedback
(Reward/Penalty)

Remain in same
state/change state
Two types of Reinforcement Learning:

Positive Reinforcement Learning Negative Reinforcement Learning


• Recurrence of behavior due to • Negative rewards are used as
positive rewards. a deterrent to weaken the
• Such rewards increases behavior and to avoid it.
strength and the frequency of a • These rewards decreases the
specific behavior and strength and the frequency of
encourages to execute similar a specific behavior.
action in future.
Markov Decision Problem
Q-Learning Algorithm

• Model-free reinforcement learning algorithm.

• Learns the value of an action in a particular state.

• The ‘Q’ stands for quality of actions.

• The quality represents the usefulness of a given action.


Q-Learning Algorithm

• States(s): the current position of the agent in the environment.

• Action(a): a step taken by the agent in a particular state.

• Rewards: for every action, the agent receives a reward and penalty.

• Episodes: the end of the stage, where agents can’t take new action. It
happens when the agent has achieved the goal or failed.
Q-Learning Algorithm

• Q(St+1, a): expected optimal Q-value of doing the action in a particular


state.

• Q(St, At): It is the current estimation of Q(St+1, a).

• Q-Table: the agent maintains the Q-table of sets of states and actions.

• Temporal Differences(TD): used to estimate the expected value of Q(St+1,


a) by using the current state and action and previous state and action.
Steps followed:

• Exploration: Explore all possible paths


• Exploitation: Best Possible path is identified.

Choose Perform
Initialize Measure Update
an an
Q-table Reward Q-table
Action Action

A number of iteration result a good Q-table.


Q function
• Based on Bellman Equation.
• Takes 2 inputs as state (s) and action (a)
Updating Q-table
Q Table
• Example: In a
Game
• Actions: up,
down, right, left
• State – Start, End,
Idle, Hole, etc.

Reference: https://www.datacamp.com/tutorial/introduction-q-
learning-beginner-tutorial
Deep Q learning

• Q-Learning creates an exact matrix for the working


agent which it can “refer to” to maximize its
reward in the long run.
• This is only practical for very small environments
and quickly loses it’s feasibility when the number
of states and actions in the environment
increases.
Deep Q learning

• The solution for the above problem comes from the realization
that the values in the matrix only have relative importance ie the
values only have importance with respect to the other values.
• Thus, this thinking leads us to Deep Q-Learning which uses a deep
neural network to approximate the values.
• The basic working step for Deep Q-Learning is that the initial state
is fed into the neural network, and it returns the Q-value of all
possible actions as an output.
Q learning vs Deep Q learning
Genetic Algorithm
• Search-based optimization technique.
• Based on the principles of genetics and natural selection.
• It keeps on evolving better solutions over next
generations, till it reach stopping criteria.
Basic Terminologies of
Genetic Algorithm
• Genes : A single bit of a
bit string.
• Chromosome: The
possible solution (a bit
string, collection of
genes.
• Population: Set of
solutions.
Basic Terminologies of
Genetic Algorithm
• Allele: The possibility of
combination of genes to
make a property.
• Gene Pool: All possible
combination of genes
that are alleles
Basic Terminologies of
Genetic Algorithm
• Crossover: process of
taking 2 individual bit
stream (solution) and
producing new child bit
stream (offspring) from
them.
Basic Terminologies of
Genetic Algorithm
• 3 types of crossover:
– Single point crossover: data
bits are swapped between 2
parent strings after this
crossover point.
– 2-point crossover: Bits
between 2 points are swapped.
– Uniform crossover: Random
bits are swapped with equal
probability
Basic Terminologies of
Genetic Algorithm
• Mutation: A small random change
in chromosome. It is used to
introduce diversity in the genetic
population.
• Types of Mutation:
– Bit Flip mutation
– Swap mutation
– Random resetting
– Scramble mutation
– Inversion mutation
Basic Terminologies of
Genetic Algorithm

• Bit Flip Mutation: One or more random bits are selected and flipped.
Basic Terminologies of
Genetic Algorithm

• Random Resetting: Extension of bit flip method, for integer


representation.
Basic Terminologies of
Genetic Algorithm
• Swap Mutation: We
select 2 positions on the
chromosome at random
and interchange their
values.
Basic Terminologies of
Genetic Algorithm
• Scramble
Mutation: A subset
of genes is chosen,
and their values
are shuffled
randomly.
Basic Terminologies of
Genetic Algorithm
• Inversion Mutation: A
subset of genes is
chosen, and their genes
are inverted as a string.
Evolution

Selection Flow chart of GA


Best Individual
Solution
Crossover
Start

Optimal Solution
Mutation as output
Initial Population
of Solutions

Termi
Stop
No nate? Yes
Fitness Function
• Determines the fitness of
individual solution(bit
string).
• Fitness refers to the
ability of an individual to
compete with other
individuals.
• The induvial solution is
selected based on the
fitness score value.
Advantages and Disadvantages of GA

Advantages Disadvantages
• It has wide solution space. • The fitness function
• It is easier to discover global calculation is a limitation.
optimum. • Convergence of GA can be too
• Multiple GA can run fast or too slow.
together in same CPU. • Limit of selecting parameters.
Example-1

• Let the population of chromosome in genetic


algorithm is represented in terms of binary number.
The strength of fitness of a chromosome in decimal
𝑓(𝑥)
form x, is given by 𝑆𝑓 𝑥 = σ 𝑓(𝑥)
where f(x) = x2 .
• The population is given by P where:
P ={(01101),(11000),(01000),(10011)}
P ={(01101),(11000),(01000),(10011)}
𝑓(𝑥) 2
𝑆𝑓 𝑥 = σ 𝑓(𝑥)
where f(x) = x
Step 1: Selection
P Value in f(x)=x2
decimal
01101 13 169
11000 24 576
01000 8 64
10011 19 361
P ={(01101),(11000),(01000),(10011)}
𝑓(𝑥) 2
𝑆𝑓 𝑥 = σ 𝑓(𝑥)
where f(x) = x
P Value in f(x)=x2 𝑓(𝑥)
𝑆𝑓 𝑥 =
decimal σ 𝑓(𝑥)
01101 13 169 169/1170=0.14
11000 24 576 576/1170=0.49
01000 8 64 64/1170=0.06
10011 19 361 361/1170=0.31
Total 1170
P ={(01101),(11000),(01000),(10011)}
𝑓(𝑥) 2
𝑆𝑓 𝑥 = σ 𝑓(𝑥)
where f(x) = x
Step 1: Selection
P Value in f(x)=x2 𝑓(𝑥) Expected count
𝑆𝑓 𝑥 =
decimal σ 𝑓(𝑥) N*Sf(x)
01101 13 169 0.14 4*0.14=0.56
11000 24 576 0.49 4*0.49=1.96
01000 8 64 0.06 4*0.06=0.24
10011 19 361 0.31 4*0.31=1.24
Total 1170
P ={(01101),(11000),(01000),(10011)}
𝑓(𝑥) 2
𝑆𝑓 𝑥 = σ 𝑓(𝑥)
where f(x) = x
Step 2: Crossover
P Crossover After Value in f(x)=x2
Initial point crossover decimal
0110|1 4 01100 12 144
1100|0 4 11001 25 625
11|000 2 11011 27 729
10|011 2 10000 16 256
Total 1754
P ={(01101),(11000),(01000),(10011)}
𝑓(𝑥) 2
𝑆𝑓 𝑥 = σ 𝑓(𝑥)
where f(x) = x
Step 3: Mutation
After After Value in f(x)=x2
crossover mutation decimal
01100 11100 26 676
11001 11001 25 625
11011 11011 27 729
10000 10100 18 324
Total 2354
Example - 2

Suppose a genetic algorithm uses chromosomes of the form x = “a


b c d e f g h” with a fixed length of eight genes. Each gene can be
any digit between 0 and 9. Let the fitness of individual x be
calculated as: f(x) = (a+b)-(c+d)+(e+f)-(g+h). Let the initial
population consist of four individuals with the following
chromosomes:
x1 = 6 5 4 1 3 5 3 2
x2 = 8 7 1 2 6 6 0 1
x3 = 2 3 9 2 1 2 8 5
x4 = 4 1 8 5 2 0 9 4
Example - 2

a. Evaluate the fitness of each individual, showing all your workings, and arrange them
in order with the fittest first and the least fit last.
b. Perform the following crossover operations.
i. Cross the fittest two individual using one-point crossover at the middle point.
ii. Cross the second and third fittest individuals using a two-point crossover
(points b and f).
iii. Cross the first and third fittest individuals (ranked 1st and 3rd) using a uniform
crossover.
c. Suppose the new population consists of the six offspring individuals received by the
crossover operations in the above question. Evaluate the fitness of the new
population, showing all the workings. Has the overall fitness improved?
x = “a b c d e f g h”
f(x) = (a+b)-(c+d)+(e+f)-(g+h)
x1 = 6 5 4 1 3 5 3 2
x2 = 8 7 1 2 6 6 0 1
x3 = 2 3 9 2 1 2 8 5
x4 = 4 1 8 5 2 0 9 4
a. Evaluate the fitness of each individual, showing all your
workings, and arrange them in order with the fittest first
and the least fit last.
Sol: f(x1) = (6+5)-(4+1)+(3+5)-(3+2) = 9
f(x2) = (8+7)-(1+2)+(6+6)-(0+1) = 23 The order is
f(x3) = (2+3)-(9+2)+(1+2)-(8+5) = -16 x2, x1, x3, x4
f(x4) = (4+1)-(8+5)+(2+0)-(9+4) = -19
x = “a b c d e f g h”
f(x) = (a+b)-(c+d)+(e+f)-(g+h)
x1 = 6 5 4 1 3 5 3 2
x2 = 8 7 1 2 6 6 0 1
x3 = 2 3 9 2 1 2 8 5
x4 = 4 1 8 5 2 0 9 4
i. Cross the fittest two individual using one-point crossover at the middle
point.
Sol: x2 = 8 7 1 2 | 6 6 0 1 o1 = 8 7 1 2 3 5 3 2
x1 = 6 5 4 1 |3 5 3 2 o2 = 6 5 4 1 6 6 0 1
x = “a b c d e f g h”
f(x) = (a+b)-(c+d)+(e+f)-(g+h)
x1 = 6 5 4 1 3 5 3 2
x2 = 8 7 1 2 6 6 0 1
x3 = 2 3 9 2 1 2 8 5
x4 = 4 1 8 5 2 0 9 4
ii. Cross the second and third fittest individuals using a two-point crossover
(points b and f).
Sol: x1 = 6 5 | 4 1 3 5 | 3 2 o3 = 6 5 9 2 1 2 3 2
x3 = 2 3 | 9 2 1 2 | 8 5 o4 = 2 3 4 1 3 5 8 5
x = “a b c d e f g h”
f(x) = (a+b)-(c+d)+(e+f)-(g+h)
x1 = 6 5 4 1 3 5 3 2
x2 = 8 7 1 2 6 6 0 1
x3 = 2 3 9 2 1 2 8 5
x4 = 4 1 8 5 2 0 9 4
iii. Cross the first and third fittest individuals (ranked 1st and 3rd) using a
uniform crossover.
Sol: x2 = 8 7 1 2 6 6 0 1 o5 = 2 7 1 2 6 2 0 1
x3 = 2 3 9 2 1 2 8 5 o6 = 8 3 9 2 1 6 8 5
x = “a b c d e f g h”
f(x) = (a+b)-(c+d)+(e+f)-(g+h)
o1 = 8 7 1 2 3 5 3 2
o2 = 6 5 4 1 6 6 0 1
o3 = 6 5 9 2 1 2 3 2
o4 = 2 3 4 1 3 5 8 5
o5 = 2 7 1 2 6 2 0 1
o6 = 8 3 9 2 1 6 8 5

c. Suppose the new population consists of the six offspring individuals received
by the crossover operations in the above question. Evaluate the fitness of the
new population, showing all the workings. Has the overall fitness improved?.
Sol: f(o1) = (8+7)-(1+2)+(3+5)-(3+2) = 15
f(o2) = (6+5)-(4+1)+(6+6)-(0+1) = 17
f(o3) = (6+5)-(9+2)+(1+2)-(3+2) = -2
f(o4) = (2+3)-(4+1)+(3+5)-(8+5) = -5 Yes, the overall
f(o5) = (2+7)-(1+2)+(6+2)-(0+1) = 13 fitness has
f(o6) = (8+3)-(9+2)+(1+6)-(8+5) = -6 improved.
Reference Books

Tom M. Mitchell, Ethem Alpaydin, ―Introduction Stephen Marsland, Bishop, C., Pattern
―Machine Learning, to Machine Learning (Adaptive ―Machine Learning: An Recognition and Machine
McGraw-Hill Computation and Machine Algorithmic Perspective, Learning. Berlin:
Education (India) Learning), The MIT Press 2004. CRC Press, 2009. Springer-Verlag.
Private Limited, 2013.
Text Books

Saikat Dutt, Andreas C. Müller and John Paul Mueller and Dr. Himanshu
Subramanian Sarah Guido - Luca Massaron - Sharma, Machine
Chandramaouli, Amit Introduction to Machine Machine Learning for Learning, S.K.
Kumar Das – Machine Learning with Python Dummies Kataria & Sons -2022
Learning, Pearson

You might also like