ML Unit IV
ML Unit IV
ML Unit IV
Genetic algorithms are based on the ideas of natural selection and genetics. These are
intelligent exploitation of random search provided with historical data to direct the search
into the region of better performance in solution space. They are commonly used to
generate high-quality solutions for optimization problems and search problems.
Genetic algorithms simulate the process of natural selection which means those
species who can adapt to changes in their environment are able to survive and
reproduce and go to next generation. In simple words, they simulate “survival of the
fittest” among individual of consecutive generation for solving a problem. Each
generation consist of a population of individuals and each individual represents a
point in search space and possible solution. Each individual is represented as a string of
character/integer/float/bits. This string is analogous to the Chromosome.
Genetic Algorithms are being widely used in different real-world applications, for
example, Designing electronic circuits, code-breaking, image processing, and artificial
creativity.
Foundation of GAs
Genetic algorithms are based on an analogy with genetic structure and behaviour of
chromosomes of the population. Following is the foundation of GAs based on this
analogy –
It basically involves five phases to solve the complex optimization problems, which are given as
below:
o Initialization
o Fitness Assignment
o Selection
o Reproduction
o Termination
1. Initialization
The process of a genetic algorithm starts by generating the set of individuals, which is called
population. Here each individual is the solution for the given problem. An individual contains or is
characterized by a set of parameters called Genes. Genes are combined into a string and generate
chromosomes, which is the solution to the problem. One of the most popular techniques for
initialization is the use of random binary strings.
2) Fitness Score
A Fitness Score is given to each individual which shows the ability of an individual to
“compete”. The individual having optimal fitness score (or near optimal) are sought.
3) Selection
The selection phase involves the selection of individuals for the reproduction of offspring. All the
selected individuals are then arranged in a pair of two to increase reproduction. Then these
individuals transfer their genes to the next generation.
4) Reproduction
Once the initial generation is created, the algorithm evolves the generation using
following operators –
1) Selection Operator: The idea is to give preference to the individuals with good
fitness scores and allow them to pass their genes to successive generations.
3) Mutation Operator: The key idea is to insert random genes in offspring to maintain
the diversity in the population to avoid premature convergence. For example –
5. Termination
After the reproduction phase, a stopping criterion is applied as a base for termination. The
algorithm terminates after the threshold fitness solution is reached. It will identify the final
solution as the best solution in the population.
Example problem and solution using Genetic Algorithms
Given a target string, the goal is to produce target string starting from a random string of
the same length. In the following implementation, following analogies are made –
Characters A-Z, a-z, 0-9, and other special symbols are considered as genes
A string generated by these characters is considered as
chromosome/solution/Individual
Fitness score is the number of characters which differ from characters in target string
at a particular index. So individual having lower fitness value is given more preference.
From our prior knowledge, we know that the value of the function is assumed to be
zero. This function is called the objective function and is used to estimate the
values of a and b so that the value itself decreases to zero.
So to begin with, we start by guessing the initial sets of values for a and b. It may
or may not include the optimal values. We refer to these values as chromosomes,
and this process is called “population initialization”. The set of a and b, [a,b] is
referred to as the population. As we are taking the example of the optimization
problem, we take six sets of a and b values generated between 1 and 10.
The next step, step 2, is to compute the value of the objective function for each
chromosome, this method is known as “selection”. We call it selection as we select
the fittest chromosomes from the population of subsequent operations.
The fitness values are used to discard the chromosomes that have low fitness
values. It is done so to allow the succeeding generations to survive.
The following selection method is widely used:
Roulette wheel method
Roulette wheel refers to a pie plot. In this, the value of each pie is expressed in
terms of fitness probability. Chromosomes that produce low fitness value have a
high fitness probability, implying these two are very different factors and are
inversely related.
The chromosomes that have a higher fitness probability will have a greater chance
of being selected.
It is to be noted that the sum of all fitness probabilities always equals one.
Let us assume a scenario where we choose six random probabilities to generate six
values between 0 and 1, let us say the six probabilities are as follows:
For chromosome 1, Probability1 = 0.02.
For chromosome 2, Probability2 = 0.13.
For chromosome 3, Probability3 = 0.40.
For chromosome 4, Probability4 = 0.60.
For chromosome 5, Probability5 = 0.85.
For chromosome 6, Probability6 = 0.96.
So based on the position of the above probability values on the roulette wheel are
expressed. It is expressed as a cumulative fitness probability. Each segment in this
represents the corresponding chromosome.
The third step is known as crossover. In this step, the chromosomes are expressed
in terms of genes. We convert the values of a and b into bit strings.
The one parameter we need to keep in mind is the crossover parameter, it decides
which out of the six chromosomes, will be able to produce offspring.
Hypotheses Space Search
As already understood from our illustrative example, it is clear that genetic
algorithms employ a randomized beam search method to seek maximally fit
hypotheses. In the hypothesis space search method, we can see that the gradient
descent search in back-propagation moves smoothly from one hypothesis to
another. On the other hand, the genetic algorithm search can move much more
abruptly. It replaces the parent hypotheses with an offspring that can be very
different from the parent. Due to this reason, genetic algorithm search has lower
chances of it falling into the same kind of local minima that plaques the gradient
descent methods.
There is one practical difficulty that is often encountered in genetic algorithms, it
is crowding. Crowding can be defined as the phenomenon in which some
individuals that are more fit in comparison to others, reproduce quickly, therefore
the copies of this individual take over a larger fraction of the population. Most of
the strategies used in the genetic algorithms are inspired by biological evolution.
One such other strategy used is fitness sharing, in which the measured fitness of an
individual is decreased by the presence of another individual of a similar kind. The
third method is to restrict all the individuals to combine to form offspring. To
better understand we can say that by allowing individuals of the same kind to
recombine, clusters of similar individuals are formed, forming multiple subspecies
in the population.
Another method would be to spatially distribute individuals and allow only nearby
individuals to combine.
Genetic programming
In artificial intelligence, genetic programming (GP) is a technique of evolving
programs, starting from a population of unfit (usually random) programs, fit for a
particular task by applying operations analogous to natural genetic processes to the
population of programs.
The operations are: selection of the fittest programs for reproduction (crossover)
and mutation according to a predefined fitness measure, usually proficiency at the
desired task. The crossover operation involves swapping random parts of selected
pairs (parents) to produce new and different offspring that become part of the new
generation of programs. Mutation involves substitution of some random part of a
program with some other random part of a program. Some programs not selected
for reproduction are copied from the current generation to the new generation.
Then the selection and other operations are recursively applied to the new
generation of programs.
Typically, members of each new generation are on average more fit than the
members of the previous generation, and the best-of-generation program is often
better than the best-of-generation programs from previous generations.
Termination of the evolution usually occurs when some individual program reaches
a predefined proficiency or fitness level.
It may and often does happen that a particular run of the algorithm results in
premature convergence to some local maximum which is not a globally optimal or
even good solution. Multiple runs (dozens to hundreds) are usually necessary to
produce a very good result. It may also be necessary to have a large starting
population size and variability of the individuals to avoid pathologies.
GP evolves computer programs, traditionally represented in memory as tree
structures. Trees can be easily evaluated in a recursive manner. Every tree node
has an operator function and every terminal node has an operand, making
mathematical expressions easy to evolve and evaluate. Thus traditionally GP
favors the use of programming languages that naturally embody tree structures (for
example, Lisp; other functional programming languages are also suitable).
Non-tree representations have been suggested and successfully implemented, such
as linear genetic programming which suits the more traditional imperative
languages. The commercial GP software Discipulus uses automatic induction
of binary machine code ("AIM") to achieve better performance. µGP uses directed
multigraphs to generate programs that fully exploit the syntax of a given assembly
language. Multi expression programming uses Three-address code for encoding
solutions. Other program representations on which significant research and
development have been conducted include programs for stack-based virtual
machines, and sequences of integers that are mapped to arbitrary programming
languages via grammars. Cartesian genetic programming is another form of GP,
which uses a graph representation instead of the usual tree based representation to
encode computer programs.
Most representations have structurally non-effective code (introns). Such non-
coding genes may seem to be useless because they have no effect on the
performance of any one individual. However, they alter the probabilities of
generating different offspring under the variation operators, and thus alter the
individual's variational properties. Experiments seem to show faster convergence
when using program representations that allow such non-coding genes, compared
to program representations that do not have any non-coding genes.
Crossover
In Genetic Programming two fit individuals are chosen from the population to be
parents for one or two children. In tree genetic programming, these parents are
represented as inverted lisp like trees, with their root nodes at the top. In subtree
crossover in each parent a subtree is randomly chosen. In the root donating parent
the chosen subtree is removed and replaced with a copy of the randomly chosen
subtree from the other parent, to give a new child tree.
Sometimes two child crossover is used, in which case the removed subtree (in the
animation on the left) is not simply deleted but is copied to a copy of the second
parent (here on the right) replacing (in the copy) its randomly chosen subtree. Thus
this type of subtree crossover takes two fit trees and generates two child trees.
In the beginning,
Step 2.a – if all training examples ∈ class ‘y’, then it’s classified as positive example.
Step 2.b – else if all training examples ∉ class ‘y’, then it’s classified as negative
example.
Step 3 – The rule becomes ‘desirable’ when it covers a majority of the positive
examples.
Step 4 – When this rule is obtained, delete all the training data associated with that rule.
(i.e. when the rule is applied to the dataset, it covers most of the training data, and has
to be removed)
Step 5 – The new rule is added to the bottom of decision list, ‘R’. (Fig.3)
Let us understand step by step how the algorithm is working in the example shown in
Fig.4.
First, we created an empty decision list. During Step 1, we see that there are three
sets of positive examples present in the dataset. So, as per the algorithm, we
consider the one with maximum no of positive example. Once we cover these 6
positive examples, we get our first rule R 1, which is then pushed into the decision list
and those positive examples are removed from the dataset.
Now, we take the next majority of positive examples and follow the same process
until we get rule R2. (Same for R3)
In the end, we obtain our final decision list with all the desirable rules.
Sequential Learning is a powerful algorithm for generating rule-based classifiers in
Machine Learning. It uses ‘Learn-One-Rule’ algorithm as its base to learn a sequence of
disjunctive rules.
First-Order Logic:
All expressions in first-order logic are composed of the following attributes:
1. constants — e.g. tyler, 23, a
2. variables — e.g. A, B, C
3. predicate symbols — e.g. male, father (True or False values only)
4. function symbols — e.g. age (can take on any constant as a value)
5. connectives — e.g. ∧, ∨, ¬, →, ←
6. quantifiers — e.g. ∀, ∃
Term: It can be defined as any constant, variable or function applied to any term.
e.g. age(bob)
Literal: It can be defined as any predicate or negated predicate applied to any
terms. e.g. female(sue), father(X, Y)
Algorithm Involved
FOIL(Target predicate, predicates, examples)
– NewRuleNeg ← Neg
– while NewRuleNeg, do
Add a new literal to specialize NewRule
1. Candidate_literals ← generate candidates for newRule based on
Predicates
2. Best_literal ←
argmaxL∈Candidate literalsFoil_Gain(L,NewRule)
Step 4 - Foil now considers all the literals from the previous step as well
as:
(Female(z), Father(z,w), Father(w,z), etc.) and their negations.
Step 5 - Foil might select Father(z,x), and on the next step Female(y)
leading to
NewRule = GrandDaughter (x,y) ← Father(y,z) ∧ Father(z,x) ∧ Female(y)
FOIL now removes all positive examples covered by this new rule.
If more are left then the outer while loop continues.
FOIL: Performance Evaluation Measure
The performance of a new rule is not defined by its entropy measure (like
the PERFORMANCE method in Learn-One-Rule algorithm).
FOIL uses a gain algorithm to determine which new specialized rule to opt. Each rule’s
utility is estimated by the number of bits required to encode all the positive bindings.
where,
L is the candidate literal to add to rule R
Inverting Resolution
Resolution is a theorem proving technique that proceeds by building refutation proofs, i.e., proofs
by contradictions. It was invented by a Mathematician John Alan Robinson in the year 1965.
Resolution is used, if there are various statements are given, and we need to prove a conclusion
of those statements. Unification is a key concept in proofs by resolutions. Resolution is a single
inference rule which can efficiently operate on the conjunctive normal form or clausal form.
Clause: Disjunction of literals (an atomic sentence) is called a clause. It is also known as a unit
clause.
This rule is also called the binary resolution rule because it only resolves exactly two literals.
Example:
We can resolve two clauses which are given below:
Where two complimentary literals are: Loves (f(x), x) and ¬ Loves (a, b)
These literals can be unified with unifier θ= [a/f(x), and b/x] , and it will generate a resolvent
clause:
To better understand all the above steps, we will take an example in which we will apply
resolution.
Example:
In the first step we will convert all the given statements into its first order logic.
In First order logic resolution, it is required to convert the FOL into CNF as CNF form makes easier
for resolution proofs.
a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
c. ∀x ∀y ¬ [eats(x, y) Λ ¬ killed(x)] V food(y)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
f. ∀x¬ [¬ killed(x) ] V alive(x)
g. ∀x ¬ alive(x) V ¬ killed(x)
h. likes(John, Peanuts).
o Move negation (¬)inwards and rewrite
a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
c. ∀x ∀y ¬ eats(x, y) V killed(x) V food(y)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
f. ∀x ¬killed(x) ] V alive(x)
g. ∀x ¬ alive(x) V ¬ killed(x)
h. likes(John, Peanuts).
o Rename variables or standardize variables
a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
c. ∀y ∀z ¬ eats(y, z) V killed(y) V food(z)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀w¬ eats(Anil, w) V eats(Harry, w)
f. ∀g ¬killed(g) ] V alive(g)
g. ∀k ¬ alive(k) V ¬ killed(k)
h. likes(John, Peanuts).
o Eliminate existential instantiation quantifier by elimination.
In this step, we will eliminate existential quantifier ∃, and this process is known
as Skolemization. But in this example problem since there is no existential quantifier so all
the statements will remain same in this step.
o Drop Universal quantifiers.
In this step we will drop all universal quantifier since all the statements are not implicitly
quantified so we don't need it.
a) food(x) V likes(John, x)
b) food(Apple)
c) food(vegetables)
In this statement, we will apply negation to the conclusion statements, which will be written as
¬likes(John, Peanuts)
Hence the negation of the conclusion has been proved as a complete contradiction with the
given set of statements.
The above image shows the robot, diamond, and fire. The goal of the robot is to get the reward
that is the diamond and avoid the hurdles that are fired. The robot learns by trying all the
possible paths and then choosing the path which gives him the reward with the least hurdles.
Each right step will give the robot a reward and each wrong step will subtract the reward of the
robot. The total reward will be calculated when it reaches the final reward that is the diamond.
Main points in Reinforcement learning –
Input: The input should be an initial state from which the model will start
Output: There are many possible outputs as there are a variety of solutions to a particular
problem
Training: The training is based upon the input, The model will return a state and the user will
decide to reward or punish the model based on its output.
The model keeps continues to learn.
The best solution is decided based on the maximum reward.
The Learning Task :
Now that we understand the basic terminology, let’s talk about formalising this whole
process using a concept called a Markov Decision Process or MDP.
1.
1.
A set of possible world states S
A set of possible actions A
A real valued reward function R(s,a)
A description T of each action’s effects in each state
Any random process in which the probability of being in a given state depends only on
the previous state, is a markov process.
In other words, in the markov decision process setup, the environment’s response at
time t+1 depends only on the state and action representations at time t, and is
independent of whatever happened in the past.
1.
St: State of the agent at time t
At: Action taken by agent at time t
Rt: Reward obtained at time t
The above diagram clearly illustrates the iteration at each time step wherein the agent
receives a reward Rt+1 and ends up in state St+1 based on its action At at a particular
state St. The overall goal for the agent is to maximise the cumulative reward it receives
in the long run. Total reward at any time instant t is given by:
where T is the final time step of the episode. In the above equation, we see that all
future rewards have equal weight which might not be desirable. That’s where an
additional concept of discounting comes into the picture. Basically, we define γ as a
discounting factor and each reward after the immediate reward is discounted by this
factor as follows:
For discount factor < 1, the rewards further in the future are getting diminished. This
can be understood as a tuning parameter which can be changed based on how much
one wants to consider the long term (γ close to 1) or short term (γ close to 0).
Can we use the reward function defined at each time step to define how good it is, to
be in a given state for a given policy? The value function denoted as v(s) under a
policy π represents how good a state is for an agent to be in. In other words, what
is the average reward that the agent will get starting from the current state under
policy π?
E in the above equation represents the expected reward at each state if the agent
follows policy π and S represents the set of all possible states.
Now, it’s only intuitive that ‘the optimum policy’ can be reached if the value function is
maximised for each state. This optimal policy is then given by:
Q-Learning
Let’s say that a robot has to cross a maze and reach the end point. There are mines, and the
robot can only move one tile at a time. If the robot steps onto a mine, the robot is dead. The
robot has to reach the end point in the shortest time possible.
The scoring/reward system is as below:
1. The robot loses 1 point at each step. This is done so that the robot takes the shortest path and
reaches the goal as fast as possible.
2. If the robot steps on a mine, the point loss is 100 and the game ends.
4. If the robot reaches the end goal, the robot gets 100 points.
Now, the obvious question is: How do we train a robot to reach the end goal with the
shortest path without stepping on a mine?
In the Q-Table, the columns are the actions and the rows are the states.
Each Q-table score will be the maximum expected future reward that the robot will get if it takes
that action at that state. This is an iterative process, as we need to improve the Q-Table at each
iteration.
There is an iterative process of updating the values. As we start to explore the environment, the
Q-function gives us better and better approximations by continuously updating the Q-values in
the table.
As said we will proceed using iterative solution because the analytical one is
hard to obtain.
We start with an initial random value at each state, then we pick a random
policy to follow. The reason to have a policy is simply because in order to
compute any state-value function we need to know how the agent is behaving.
(If you are wondering aren’t we supposed to compute the v* and 𝜋* ?! be
patient)
So we start by randomly initializing the values of all states and we call the
resulting state-value function v0(s), (we call it a function because after
assigning the values to all states, the v0(s) will return the value at any state s).
Now we follow the policy 𝜋, and on each iteration we update the values of all
the states. After updating all the states (roughly speaking) we obtain a new
state-value function. The values of the states after each iteration will be closer
to the theoretical values given by v𝜋(s) (remember that we are dong this
because the theoretical/analytical solution v𝜋(s) is hard to get).
So as the iteration goes on we will get the functions v1(s), v2(s), v3(s),
….vk(s), we keep iterating until the absolute difference
|vk-1(s) - vk(s)| for any state s, is less than a number θ.
This θ is a number below which we consider that the function vk(s) has
converged enough towards the v𝜋(s).
Define the reward when moving from one state to the other, and the discount
factor gamma.
Click Step button to run the number of iterations under the “Steps to Iterate”.
you can make one step at a time, or several steps (to speed up processing).
As you will notice, after a certain number of steps the value at each state varies
a little. This is when you know vk(s) has become close to v𝜋(s).
It is not hard to see that at some states (or cells) there are more than one good
action to take.
Conclusion