Unit 4 - The Learning Mechanisms - New
Unit 4 - The Learning Mechanisms - New
Unit 4
The Learning Mechanisms
3
In RBFN
4
Introduction
• The back propagation learning algorithm for supervised multilayer
perceptron may be viewed as the application of recursive technique
known in statistics as stochastic approximation.
• It is having a completely different approach than other neural
network .
• For curve fitting problem radial basis neural network is used.
• According to this viewpoint, learning is equivalent to finding a
surface in a multidimensional space that provides a best fit to the
training data, with the criterion for “best fit” being measured in
some statistical sense.
• In the context of a neural network, the hidden units provide a set of
functions that constitute an arbitrary basis for the input
patterns(vectors) when they are expanded into hidden space; these
functions are called radial basis functions.
• The construction of a radial basis function (RBF) network, in its
most basic form involves three layers with entirely different roles
Introduction
• For complex pattern classification and recognition
problems, a hybrid network is used with two stage.
• The first stage transforms a given set of nonlinearly
separable patterns into a new set for which, under certain
conditions , the likelihood of the transformed patterns
becoming linearly separable is high.(Cover theorem)
✓ A pattern classification problem cast in a high dimensional space
is more likely to be linearly separable than in a low dimensional
space, hence the reason for frequently making the dimension of
the hidden space in an RBF network is high.
✓ Another important point is the fact that the dimension of the
hidden space is directly related to the capacity of the network to
approximate the smooth input output mapping.
✓ The higher the dimension of the hidden space, the more
accurate the approximation will be.
• The second stage completes the solution to the
classification problem by using least square estimation.
Radial Basis Function (RBF)
• Radial Basis Functions are a special class of feed-forward neural
networks consisting of three layers: an input layer, a hidden layer, and the
output layer.
• This is fundamentally different from most neural network architectures,
which are composed of many layers and bring about nonlinearity by
recurrently applying non-linear activation functions.
• The hidden units of NN have a set of functions which are radial basis
functions which formulate basis for input patterns when expanded in hidden
space.
• The structure of RBF consist of three layers:-
1. Input layer has source nodes that connect the network to the
environment. The input layer receives input data and passes it into the
hidden layer, where the computation occurs.
2. The hidden layer of Radial Basis Functions Neural Network is the most
powerful and very different from most Neural networks. The hidden layer
applies non-linear transformation from input space to the hidden space.
3. The output layer is linear and gives response of the network to the
activation pattern applied to the input layer. The output layer is designated
for prediction tasks like classification or regression.
Principle of Operation
Cover’s Theorem
COVER’s THEOREM (Separability of
Patterns)
COVER’s THEOREM (Separability of
Patterns)
COVER’s THEOREM (Separability of
Patterns)
• To sum up cover’s theorem on separability of patterns
encompasses two basic ingredients:
• Nonlinear formulation of the hidden function defined by:
: Isbethedetermined
width or radius of the bell shape and has to
empirically
Architecture of RBNN
• Functions that are covered by Micchalli’s theorem include:
• Multiquadrics:
– (r)=(r2 + c2)½ c>0, r R
• Inverse Multiquadrics:
– (r)=1/(r2 + c2)½ c>0, r R
• Gaussian functions:
– (r)=exp(-r2/22) >0, r R
• All that is required for nonsigular is that the points x be
different.
Similarities between RBF and MLP
• Both are feedforward
• Both are universal approximators
• Both are used in similar application areas
Differences between MLP and RBF
MLP RBF
Can have any number of hidden layer Can have only one hidden layer
Can be fully or partially connected Has to be mandatorily completely
connected
Processing nodes in different layers Hidden nodes operate very differently
shares a common neural model and have a different purpose
Argument of hidden function activation The argument of each hidden unit
function is the inner product of the activation function is the distance
inputs and the weights between the input and the weights
Trained with a single global supervised RBF networks are usually trained one
algorithm later at a time
Training is slower compared to RBF Training is comparitely faster than MLP
After training MLP is much faster than After training RBF is much slower than
RBF MLP
Example: the XOR problem
x2
Input space:
(0,1) (1,1)
x1
(0,0) (1,0)
Output space:
0 1 y
φ2
(0,0)
1.0 Decision boundar
0.5 (1,1)
0.5 1.0 φ1
(0,1) and (1,0)
• When mapped into the feature space < 1 , 2 > (hidden layer), C1 and C2
become linearly separable. So a linear classifier with 1(x) and 2(x) as inputs
can be used to solve the XOR problem.
RBF NN for the XOR problem
−|| x − 1 ||2
1 ( x) = e with 1 = (0,0) and 2 = (1,1)
−|| x − 2 ||2
2 ( x) = e
x1
t1 -1
y
x2 t2 -1
+1
Pattern X1 X2
1 0 0
2 0 1
3 1 0
4 1 1
RBF network parameters
• What do we have to learn for a RBF NN with a given
architecture?
– The centers of the RBF activation functions
– the spreads of the Gaussian RBF activation
functions
– the weights from the hidden to the output layer
• Different learning algorithms may be used for
learning the RBF network parameters. We describe
three possible methods for learning centers, spreads
and weights.
Learning Algorithm 1
• Centers: are selected at random
– centers are chosen randomly from the training
set
• Spreads:
=
Maximum
aredistance
chosen between
by any 2 centers dmax
normalization: =
number of centers m
1
i
• Then the activation function of hidden neuron
becomes:
m1
i ( x ) = exp − 2 x − i
2
d max
Self-organizing Maps
Introduction
• Neural Networks use processing, inspired by the
human brain, as a basis to develop algorithms that
can be used to model and understand complex
patterns and prediction problems.
• There are several types of neural networks and each
has its own unique use.
• The Self Organizing Map (SOM) is one such variant of
the neural network, also known as Kohonen’s Map.
Self-Organizing Maps
• A self-organizing map is also known as SOM
and it was proposed by Kohonen.
• It is an unsupervised neural network that is
trained using unsupervised learning
techniques to produce a low dimensional,
discretized representation from the input
space of the training samples, known as a map
and is, therefore, a method to reduce data
dimensions.
Self-Organizing Maps
• Self-Organizing Maps were initially only being used
for data visualization, but these days, it has been
applied to different problems, including as a solution
to the Traveling Salesman Problem as well.
• Map units or neurons usually form a two
dimensional space and hence a mapping from high
dimensional space onto a plane is created.
Self-Organizing Maps
• The map retains the calculated relative distance between
the points. Points closer to each other within the input
space are mapped to the nearby map units in Self
Organizing Maps.
• Self-Organizing Maps can thus serve as a cluster analyzing
tool for high dimensional data.
• Self-Organizing Maps also have the capability to generalize.
• During generalization, the network can recognize or
characterize inputs that it has never seen as data before.
• New input is taken up with the map unit and is therefore
mapped.
Uses of Self-Organizing Maps
• Self-Organizing Maps provide an advantage in
maintaining the structural information from the
training data and are not inherently linear.
• Using Principal Component Analysis on high
dimensional data may just cause loss of data when
the dimension gets reduced into two.
• If the data comprises a lot of dimensions and if every
dimension preset is useful, in such cases Self-
Organizing Maps can be very useful over PCA for
dimensionality reduction.
Uses of Self-Organizing Maps
winner
For each training the input data gets updated and by considering the shortest
Euclidean distances the winning vector is decided and it gets updated at each
training iteration.
SOM algorithm – weight adaptation
Wij (t + 1) = Wij (t ) + hij ( X (t ) − Wij (t ))
= exp(−dij / 2 2 )
2
h ij
Kohonen Learning Algorithm
1. Initialise weights (random values) and set topological neighbourhood and
learning rate parameters
2. While stopping condition is false, do steps 3-8
3. For each input vector X, do steps 4-6
4. For each neuron j, compute Euclidean distance:
26 May 2024
Architecture
• Self-Organizing Maps consist of two important
layers, the first one is the input layer, and the
second one is the output layer, which is also
known as a feature map.
• Each data point in the dataset recognizes itself
by competing for a representation.
• The Self-Organizing Maps’ mapping steps start
from initializing the weight to vectors.
Architecture
Architecture
• After this, a random vector as the sample is selected
and the mapped vectors are searched to find which
weight best represents the chosen sample.
• Each weighted vector has neighboring weights present
that are close to it. The chosen weight is then
rewarded by being able to become a random sample
vector.
• This helps the map to grow and form different shapes.
Most generally, they form square or hexagonal shapes
in a 2D feature space.
• This whole process is repeatedly performed a large
number of times and more than 1000 times.
Architecture
• Self-Organizing Maps do not use backpropagation
with SGD to update weights, this unsupervised
ANN uses competitive learning to update its
weights i.e Competition, Cooperation and
Adaptation.
• Each neuron of the output layer is present with a
vector with dimension n.
• The distance between each neuron present at the
output layer and the input data is computed.
Architecture
• The neuron with the lowest distance is termed
as the most suitable fit.
• Updating the vector of the suitable neuron in
the final process is known as adaptation,
along with its neighbour in cooperation.
• After selecting the suitable neuron and its
neighbours, we process the neuron to update.
• The more the distance between the neuron
and the input, the more the data grows.
Architecture
• To simply explain, learning occurs in the following ways:
• Every node is examined to calculate which suitable weights
are similar to the input vector. The suitable node is commonly
known as the Best Matching Unit.
• The neighbourhood value of the Best Matching Unit is then
calculated. The number of neighbours tends to decrease over
time.
Architecture
• The suitable weight is further rewarded with
transitioning into more like the sample vector.
The neighbours transition like the sample
vector chosen.
✓The closer a node is to the Best Matching Unit,
the more its weights get altered and the
farther away the neighbour is from the node,
the less it learns.
❖Repeat the second step for N iterations.
Pros
• Data can be easily interpreted and understood
with the help of techniques like reduction of
dimensionality and grid clustering.
• Self-Organizing Maps are capable of handling
several types of classification problems while
providing a useful, and intelligent summary
from the data at the same time.
Cons
• It does not create a generative model for the
data and therefore the model does not
understand how data is being created.
• Self-Organizing Maps do not perform well
while working with categorical data and even
worse for mixed types of data.
• The model preparation time is comparatively
very slow and hard to train against the slowly
evolving data
Why Recurrent Neural Networks?
• Feed-Forward Neural Network: Used for general Regression
and Classification problems.
• Convolutional Neural Network: Used for object detection and
image classification.
• Deep Belief Network: Used in healthcare sectors for cancer
detection.
• RNN: Used for speech recognition, voice recognition, time
series prediction, and natural language processing.
Why Recurrent Neural Networks?
• In a feed-forward neural network, the decisions are based on
the current input.
• It doesn’t memorize the past data, and there’s no future scope.
• Feed-forward neural networks are used in general regression
and classification problems.
Why Recurrent Neural Networks?
• RNN were created because there were a few issues in the
feed-forward neural network:
✓ Cannot handle sequential data
✓ Considers only the current input
✓ Cannot memorize previous inputs
• The solution to these issues is the RNN. An RNN can handle
sequential data, accepting the current input data, and
previously received inputs.
• RNNs can memorize previous inputs due to their internal
memory.
What Is a Recurrent Neural Network
(RNN)?
• RNN works on the principle of saving the output of a particular
layer and feeding this back to the input in order to predict the
output of the layer.
• Below is how you can convert a Feed-Forward Neural Network
into a Recurrent Neural Network:
What Is a Recurrent Neural Network
(RNN)?
What Is a Recurrent Neural Network
(RNN)?
• The nodes in different layers of the neural network are
compressed to form a single layer of recurrent neural
networks.
• A, B, and C are the parameters of the network.