AI - Unit - 5 - Machine Learning
AI - Unit - 5 - Machine Learning
What is learning?
- “Learning denotes changes in the system that are adaptive in the sense that they enable the
system to do the same task (or tasks drawn from the same population) more effectively the
next time.” --Herbert Simon
- "Learning is constructing or modifying representations of what is being experienced." --
Ryszard Michalski
- "Learning is making useful changes in our minds." --Marvin Minsky
Types of Learning
Supervised learning
In supervised learning, the AI model is trained based on the given input and its expected output,
i.e., the label of the input. The model creates a mapping equation based on the inputs and outputs
and predicts the label of the inputs in the future based on that mapping equation.
Let’s suppose we have to develop a model that differentiates between a cat and a dog. To train the
model, we feed multiple images of cats and dogs into the model with a label indicating whether
the image is of a cat or a dog. The model tries to develop an equation between the input images
and their labels. After training, the model can predict whether an image is of a cat or a dog even if
the image is previously unseen by the model.
Unsupervised learning
In unsupervised learning, the AI model is trained only on the inputs, without their labels. The
model classifies the input data into classes that have similar features. The label of the input is then
predicted in the future based on the similarity of its features with one of the classes.
Suppose we have a collection of red and blue balls and we have to classify them into two classes.
Let’s say all other features of the balls are the same except for their color. The model tries to find
the dissimilar features between the balls on the basis of how the model can classify the balls into
two classes. After the balls are classified into two classes depending on their color, we get two
clusters of balls, one of blue color and one of red color.
Reinforcement learning
In reinforcement learning, the AI model tries to take the best possible action in a given situation
to maximize the total profit. The model learns by getting feedback on its past outcomes.
Consider the example of a robot that is asked to choose a path between A and B. In the beginning,
the robot chooses either of the paths as it has no past experience. The robot is given feedback on
the path it chooses and learns from this feedback. The next time the robot gets into a similar
situation, it can use feedback to solve the problem. For example, if the robot chooses path B and
gets a reward, i.e., positive feedback, this time the robot knows that it has to choose path B to
maximize its reward.
The general learning approach is to generate potential improvements, test them, and discard those
which do not work. Naturally, there are many ways we might generate the potential improvements,
and many ways we can test their usefulness. At one extreme, there are model driven (top-down)
generators of potential improvements, guided by an understanding of how the problem domain
works. At the other, there are data driven (bottom-up) generators, guided by patterns in some set
of training data.
Machine Learning
As regards machines, we might say, very broadly, that a machine learns whenever it changes its
structure, program, or data (based on its inputs or in response to external information) in such a
manner that it’s expected future performance improves. Some of these changes, such as the
addition of a record to a data base, fall comfortably within the province of other disciplines and
are not necessarily better understood for being called learning. But, for example, when the
performance of a speech-recognition machine improves after hearing several samples of a person's
speech, we feel quite justified in that case saying that the machine has learned.
Machine learning usually refers to the changes in systems that perform tasks associated with
artificial intelligence (AI). Such tasks involve recognition, diagnosis, planning, robot control,
prediction, etc. The changes might be either enhancements to already performing systems or
synthesis of new systems.
Suppose that there are m classes, C1, C2… Cm. Given a tuple, X = (x1,x2, … , xn), the classifier
will predict that X belongs to the class having the highest posterior probability, conditioned on X.
That is, the naïve Bayesian classifier predicts that tuple X belongs to the class Ci if and only if
By Bayes’ Theorem,
Now, as the denominator remains constant for a given input, we can remove that term:
Genetic Algorithm
- A genetic algorithm is an adaptive heuristic search algorithm inspired by "Darwin's theory of
evolution in Nature."
- It is used to solve optimization problems in machine learning.
- It is one of the important algorithms as it helps solve complex problems that would take a long
time to solve.
- Genetic Algorithms are being widely used in different real-world applications, for example,
Designing electronic circuits, code-breaking, image processing, and artificial creativity.
Some Terminologies
- Before understanding the Genetic algorithm, let's first understand basic terminologies to better
understand this algorithm:
Population: Population is the subset of all possible or probable solutions, which can
solve the given problem.
Chromosomes: A chromosome is one of the solutions in the population for the given
problem, and the collection of gene generate a chromosome.
Gene: A chromosome is divided into a different gene, or it is an element of the
chromosome.
Allele: Allele is the value provided to the gene within a particular chromosome.
Fitness Function: The fitness function is used to determine the individual's fitness
level in the population. It means the ability of an individual to compete with other
individuals. In every iteration, individuals are evaluated based on their fitness function.
Genetic Operators: In a genetic algorithm, the best individual mate to regenerate
offspring better than parents. Here genetic operators play a role in changing the genetic
composition of the next generation.
Selection
After calculating the fitness of every existent in the population, a selection process is
used to determine which of the individualities in the population will get to reproduce
and produce the seed that will form the coming generation.
It basically involves five phases to solve the complex optimization problems, which are given as
below:
1. Initialization
2. Fitness Assignment
3. Selection
4. Reproduction
5. Termination
1. Initialization
The process of a genetic algorithm starts by generating the set of individuals, which is called
population. Here each individual is the solution for the given problem. An individual contains or
is characterized by a set of parameters called Genes. Genes are combined into a string and generate
chromosomes, which is the solution to the problem. One of the most popular techniques for
initialization is the use of random binary strings.
2. Fitness Assignment
Fitness function is used to determine how fit an individual is? It means the ability of an individual
to compete with other individuals. In every iteration, individuals are evaluated based on their
fitness function. The fitness function provides a fitness score to each individual. This score further
determines the probability of being selected for reproduction. The high the fitness score, the more
chances of getting selected for reproduction.
3. Selection
The selection phase involves the selection of individuals for the reproduction of offspring. All the
selected individuals are then arranged in a pair of two to increase reproduction. Then these
individuals transfer their genes to the next generation.
4. Reproduction
After the selection process, the creation of a child occurs in the reproduction step. In this step, the
genetic algorithm uses two variation operators that are applied to the parent population. The two
operators involved in the reproduction phase are given below:
Crossover:
The crossover plays a most significant role in the reproduction phase of the genetic algorithm. In
this process, a crossover point is selected at random within the genes. Then the crossover operator
swaps genetic information of two parents from the current generation to produce a new individual
representing the offspring.
The genes of parents are exchanged among themselves until the crossover point is met. These
newly generated offspring are added to the population. This process is also called or crossover.
Types of crossover styles available:
Mutation
The mutation operator inserts random genes in the offspring (new child) to maintain the diversity
in the population. It can be done by flipping some bits in the chromosomes.
Mutation helps in solving the issue of premature convergence and enhances diversification. The
below image shows the mutation process:
5. Termination
After the reproduction phase, a stopping criterion is applied as a base for termination. The
algorithm terminates after the threshold fitness solution is reached. It will identify the final solution
as the best solution in the population.
Neural Networks
A neuron is a cell in brain whose principle function is the collection, Processing, and dissemination
of electrical signals. Brains Information processing capacity comes from networks of such
neurons. Due to this reason some earliest AI work aimed to create such artificial networks. (Other
Names are Connectionism; Parallel distributed processing and neural computing).
1. Adaptive learning: An ability to learn how to do tasks based on the data given for training
or initial experience.
2. Self-Organization: An ANN can create its own organization or representation of the
information it receives during learning time.
3. Real Time Operation: ANN computations may be carried out in parallel, and special
hardware devices are being designed and manufactured which take advantage of this
capability.
4. Fault Tolerance via Redundant Information Coding: Partial destruction of a network
leads to the corresponding degradation of performance. However, some network
capabilities may be retained even with major network damage
Neural networks take a different approach to problem solving than that of conventional computers.
Conventional computers use an algorithmic approach i.e. the computer follows a set of instructions
in order to solve a problem. Unless the specific steps that the computer needs to follow are known
the computer cannot solve the problem. That restricts the problem solving capability of
conventional computers to problems that we already understand and know how to solve. But
computers would be so much more useful if they could do things that we don't exactly know how
to do.
Neural networks process information in a similar way the human brain does. The network is
composed of a large number of highly interconnected processing elements (neurones) working in
parallel to solve a specific problem. Neural networks learn by example. They cannot be
programmed to perform a specific task. The examples must be selected carefully otherwise useful
time is wasted or even worse the network might be functioning incorrectly. The disadvantage is
that because the network finds out how to solve the problem by itself, its operation can be
unpredictable.
On the other hand, conventional computers use a cognitive approach to problem solving; the way
the problem is to solved must be known and stated in small unambiguous instructions. These
instructions are then converted to a high level language program and then into machine code that
the computer can understand. These machines are totally predictable; if anything goes wrong is
due to a software or hardware fault.
Nodes(units):
Links:
Links are directed arrows that show propagation of information from one node to another node.
Activation:
Weight:
Each link has weight associated with it which determines strength and sign of the connection.
Activation function:
A function which is used to derive output activation from the input activations to a given node is
called activation function.
Bias Weight:
Bias weight is used to set the threshold for a unit. Unit is activated when the weighted sum of real
inputs exceeds the bias weight.
A neural network is composed of nodes (units) connected by directed links A link from unit j to i
serve to propagate the activation aj from j to i. Each link has some numeric weight Wj,i associated
with it, which determines strength and sign of connection.
Here, aj output activation from unit j and Wj,i is the weight on the link j to this node.
• Linear
• Threshold (Heaviside function)
• Sigmoid
For linear activation functions, the output activity is proportional to the total weighted output.
For threshold activation functions, the output are set at one of two levels, depending on whether
the total input is greater than or less than some threshold value.
g(x) = 1 if x>= k
=0 if x < k
For sigmoid activation functions, the output varies continuously but not linearly as the input
changes. Sigmoid units bear a greater resemblance to real neurons than do linear or threshold units.
It has the advantage of differentiable.
g(x) = 1/ (1 + e-x)
Network structures
Feed-forward networks:
Feed-forward ANNs allow signals to travel one way only; from input to output. There is no
feedback (loops) i.e. the output of any layer does not affect that same layer. Feed- forward ANNs
tend to be straight forward networks that associate inputs with outputs. They are extensively used
in pattern recognition. This type of organization is also referred to as bottom-up or top-down.
Inputs
Outputs
Inputs
Outputs
Feedback networks (figure 1) can have signals traveling in both directions by introducing loops in
the network. Feedback networks are very powerful and can get extremely complicated. Feedback
networks are dynamic; their 'state' is changing continuously until they reach an equilibrium point.
They remain at the equilibrium point until the input changes and a new equilibrium needs to be
found. Feedback architectures are also referred to as interactive or recurrent.
A neural network in which all the inputs connected directly to the outputs is called a single-layer
neural network, or a perceptron network. Since each output unit is independent of the others each
weight affects only one of the outputs.
The neural network which contains input layers, output layers and some hidden layers also is called
multilayer neural network. The advantage of adding hidden layers is that it enlarges the space of
hypothesis. Layers of the network are normally fully connected.
Once the number of layers, and number of units in each layer, has been selected, training is used to
set the network's weights and thresholds so as to minimize the prediction error made by the
network
Training is the process of adjusting weights and threshold to produce the desired result for
different set of data.
The operation of a neural network is determined by the values of the interconnection weights.
There is no algorithm that determines how the weights should be assigned in order to solve specific
problems. Hence, the weights are determined by a learning process
Supervised Learning:
In supervised learning, the network is presented with inputs together with the target (teacher signal)
outputs. Then, the neural network tries to produce an output as close as possible to the target signal
by adjusting the values of internal weights. The most common supervised learning method is the
“error correction method”.
Error correction method is used for networks which their neurons have discrete output functions.
Neural networks are trained with this method in order to reduce the error (difference between the
network's output and the desired output) to zero.
Unsupervised Learning:
In unsupervised learning, there is no teacher (target signal) from outside and the network adjusts
its weights in response to only the input patterns. A typical example of unsupervised learning is
Hebbian learning
Hebbian Learning
The oldest and most famous of all learning rules is Hebb’s postulate of learning:
When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part
in firing it, some growth process or metabolic changes take place in one or both cells such that
A’s efficiency as one of the cells firing B is increased.
From the point of view of artificial neurons and artificial neural networks, Hebb's principle can be
described as a method of determining how to alter the weights between model neurons. The weight
between two neurons increases if the two neurons activate simultaneously—and reduces if they
activate separately. Nodes that tend to be either both positive or both negative at the same time
have strong positive weights, while those that tend to be opposite have strong negative weights.
Hebb’s Algorithm:
Step 0: initialize all weights to 0
Step 1: Given a training input, s, with its target output, t, set the activations of the input units:
xi = si
Step 2: Set the activation of the output unit to the target value: y = t
Step 4: Adjust the bias (just like the weights): b(new) = b(old) + y
Example:
PROBLEM: Construct a Hebb Net which performs like an AND function, that is, only when both
features are “active” will the data be in the target class.
Solution:
Training-First Input:
Final Neuron:
Perceptron Learning
The term "Perceptrons" was coined by Frank Rosen Blatt in 1962 and is used to describe the
connection of simple neurons into networks. These networks are simplified versions of the real
nervous system where some properties are exaggerated and others are ignored. For the moment
we will concentrate on Single Layer Perceptrons.
So how can we achieve learning in our model neuron? We need to train them so they can do things
that are useful. To do this we must allow the neuron to learn from its mistakes. There is in fact a
learning paradigm that achieves this, it is known as supervised learning and works in the following
manner.
The algorithm for Perceptron Learning is based on the supervised learning procedure discussed
previously.
Algorithm:
Steps 3 and 4 are repeated until the iteration error is less than a user-specified error threshold
or a predetermined number of iterations have been completed.
Please note that the weights only change if an error is made and hence this is only when learning
shall occur.
Delta Rule:
The delta rule is a gradient descent learning rule for updating the weights of the artificial neurons
in a single-layer perceptron. It is a special case of the more general backpropagation algorithm.
For a neuron with activation function the delta rule for 's th weight is given by
where is a small constant called learning rate, is the neuron's activation function, is the
target output, is the weighted sum of the neuron's inputs, is the actual output, and is the
th input. It holds and .
The delta rule is commonly stated in simplified form for a perceptron with a linear activation
function as
Backpropagation
It is a supervised learning method, and is an implementation of the Delta rule. It requires a teacher
that knows, or can calculate, the desired output for any given input. It is most useful for feed-
forward networks (networks that have no feedback, or simply, that have no connections that loop).
The term is an abbreviation for "backwards propagation of errors". Backpropagation requires that
the activation function used by the artificial neurons (or "nodes") is differentiable.
As the algorithm's name implies, the errors (and therefore the learning) propagate backwards from
the output nodes to the inner nodes. So technically speaking, backpropagation is used to calculate
the gradient of the error of the network with respect to the network's modifiable weights. This
gradient is almost always then used in a simple stochastic gradient descent algorithm, is a general
optimization algorithm, but is typically used to fit the parameters of a machine learning model, to
find weights that minimize the error. Often the term "backpropagation" is used in a more general
sense, to refer to the entire procedure encompassing both the calculation of the gradient and its use
in stochastic gradient descent. Backpropagation usually allows quick convergence on satisfactory
local minima for error in the kind of networks to which it is suited.
Backpropagation networks are necessarily multilayer perceptrons (usually with one input, one
hidden, and one output layer). In order for the hidden layer to serve any useful function, multilayer
networks must have non-linear activation functions for the multiple layers: a multilayer network
using only linear activation functions is equivalent to some single layer, linear network.
3. For each neuron, calculate what the output should have been, and a scaling factor, how
much lower or higher the output must be adjusted to match the desired output. This is the
local error.
4. Adjust the weights of each neuron to lower the local error.
5. Assign "blame" for the local error to neurons at the previous level, giving greater
responsibility to neurons connected by stronger weights.
6. Repeat from step 3 on the neurons at the previous level, using each one's "blame" as its
error.
Backpropagation Algorithm
Assignment #5
1. What is the role of activation function in ANN? How sigmoid function works? Discuss
about perceptron learning. (10) [TU 2080]
2. Define selection, crossover, and mutation operations in genetic algorithm. (5) [TU 2080]
3. Discuss the different operators used in genetic algorithm. (5) [TU 2079]
4. Give an example of reinforcement learning. Explain the types of ANN. (5) [TU 2079]
5. Describe mathematical model of neural network. What does it means to train a neural
network? Write algorithm for perceptron learning. (10) [TU 2078]
6. What is crossover operation in genetic algorithm? Given following chromosomes show the
result of one-point and two point crossover.
C1 = 01100010
C2 = 10101100
Choose appropriate crossover points as per your own suggestions. (5) [TU 2078]
7. Define mathematical model of artificial neural network. Discuss how Hebbian learning
algorithm can be used to train a neural network. Support your answer with an example.
(10) [TU 2076]
8. Write an algorithm for learning by Genetic Approach. (5) [TU 2076]