0% found this document useful (0 votes)
17 views79 pages

Unit 4 - The Learning Mechanisms - New

study matrials for student

Uploaded by

pra.neu133
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views79 pages

Unit 4 - The Learning Mechanisms - New

study matrials for student

Uploaded by

pra.neu133
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

Neural Network

Unit 4
The Learning Mechanisms

Er. Sachita Nand Mishra


Radial Basis Function Neural Network
• Completely different approach by viewing the
design of a neural network as a curve-fitting
(approximation) problem in high-dimensional
space ( I.e MLP )
In MLP

3
In RBFN

4
Introduction
• The back propagation learning algorithm for supervised multilayer
perceptron may be viewed as the application of recursive technique
known in statistics as stochastic approximation.
• It is having a completely different approach than other neural
network .
• For curve fitting problem radial basis neural network is used.
• According to this viewpoint, learning is equivalent to finding a
surface in a multidimensional space that provides a best fit to the
training data, with the criterion for “best fit” being measured in
some statistical sense.
• In the context of a neural network, the hidden units provide a set of
functions that constitute an arbitrary basis for the input
patterns(vectors) when they are expanded into hidden space; these
functions are called radial basis functions.
• The construction of a radial basis function (RBF) network, in its
most basic form involves three layers with entirely different roles
Introduction
• For complex pattern classification and recognition
problems, a hybrid network is used with two stage.
• The first stage transforms a given set of nonlinearly
separable patterns into a new set for which, under certain
conditions , the likelihood of the transformed patterns
becoming linearly separable is high.(Cover theorem)
✓ A pattern classification problem cast in a high dimensional space
is more likely to be linearly separable than in a low dimensional
space, hence the reason for frequently making the dimension of
the hidden space in an RBF network is high.
✓ Another important point is the fact that the dimension of the
hidden space is directly related to the capacity of the network to
approximate the smooth input output mapping.
✓ The higher the dimension of the hidden space, the more
accurate the approximation will be.
• The second stage completes the solution to the
classification problem by using least square estimation.
Radial Basis Function (RBF)
• Radial Basis Functions are a special class of feed-forward neural
networks consisting of three layers: an input layer, a hidden layer, and the
output layer.
• This is fundamentally different from most neural network architectures,
which are composed of many layers and bring about nonlinearity by
recurrently applying non-linear activation functions.
• The hidden units of NN have a set of functions which are radial basis
functions which formulate basis for input patterns when expanded in hidden
space.
• The structure of RBF consist of three layers:-
1. Input layer has source nodes that connect the network to the
environment. The input layer receives input data and passes it into the
hidden layer, where the computation occurs.
2. The hidden layer of Radial Basis Functions Neural Network is the most
powerful and very different from most Neural networks. The hidden layer
applies non-linear transformation from input space to the hidden space.
3. The output layer is linear and gives response of the network to the
activation pattern applied to the input layer. The output layer is designated
for prediction tasks like classification or regression.
Principle of Operation
Cover’s Theorem
COVER’s THEOREM (Separability of
Patterns)
COVER’s THEOREM (Separability of
Patterns)
COVER’s THEOREM (Separability of
Patterns)
• To sum up cover’s theorem on separability of patterns
encompasses two basic ingredients:
• Nonlinear formulation of the hidden function defined by:

• High dimensionality of the hidden (feature) space; where


dimensionality refers to number of hidden neurons.
Principle of Operation
• The input vector which is in the localized space is
transformed into a higher space.
• How it is transformed?
✓With the help of basis function.
• Basis function is to cast the lower dimensional
space data to higher dimensional space data.
• The Basis function is the function of mapping
pattern from lower dimensional space to higher
dimensional space where the patterns can be
classified, patterns can be separated. This is
known as finding the interpolator.
The Interpolation Problem
The interpolation Problem
The interpolation Problem
Architecture of RBNN
Architecture of RBNN
Architecture of RBNN
Architecture of RBNN
Architecture of RBNN

 : Isbethedetermined
width or radius of the bell shape and has to
empirically
Architecture of RBNN
• Functions that are covered by Micchalli’s theorem include:
• Multiquadrics:
– (r)=(r2 + c2)½ c>0, r R
• Inverse Multiquadrics:
– (r)=1/(r2 + c2)½ c>0, r R
• Gaussian functions:
– (r)=exp(-r2/22) >0, r R
• All that is required for nonsigular  is that the points x be
different.
Similarities between RBF and MLP
• Both are feedforward
• Both are universal approximators
• Both are used in similar application areas
Differences between MLP and RBF
MLP RBF
Can have any number of hidden layer Can have only one hidden layer
Can be fully or partially connected Has to be mandatorily completely
connected
Processing nodes in different layers Hidden nodes operate very differently
shares a common neural model and have a different purpose
Argument of hidden function activation The argument of each hidden unit
function is the inner product of the activation function is the distance
inputs and the weights between the input and the weights
Trained with a single global supervised RBF networks are usually trained one
algorithm later at a time
Training is slower compared to RBF Training is comparitely faster than MLP
After training MLP is much faster than After training RBF is much slower than
RBF MLP
Example: the XOR problem
x2
 Input space:
(0,1) (1,1)

x1
(0,0) (1,0)

 Output space:
0 1 y

 Construct an RBF pattern classifier such that:


(0,0) and (1,1) are mapped to 0, class C1
(1,0) and (0,1) are mapped to 1, class C2
Example: the XOR problem
• In the feature (hidden layer) space:

φ2
(0,0)
1.0 Decision boundar

0.5 (1,1)

0.5 1.0 φ1
(0,1) and (1,0)

• When mapped into the feature space < 1 , 2 > (hidden layer), C1 and C2
become linearly separable. So a linear classifier with 1(x) and 2(x) as inputs
can be used to solve the XOR problem.
RBF NN for the XOR problem
−|| x − 1 ||2
1 ( x) = e with 1 = (0,0) and 2 = (1,1)
−|| x −  2 ||2
 2 ( x) = e
x1
t1 -1
y
x2 t2 -1
+1

Pattern X1 X2
1 0 0
2 0 1
3 1 0
4 1 1
RBF network parameters
• What do we have to learn for a RBF NN with a given
architecture?
– The centers of the RBF activation functions
– the spreads of the Gaussian RBF activation
functions
– the weights from the hidden to the output layer
• Different learning algorithms may be used for
learning the RBF network parameters. We describe
three possible methods for learning centers, spreads
and weights.
Learning Algorithm 1
• Centers: are selected at random
– centers are chosen randomly from the training
set
• Spreads:
=
Maximum
aredistance
chosen between
by any 2 centers dmax
normalization: =
number of centers m
1

i
• Then the activation function of hidden neuron
becomes:
 m1 
i ( x ) = exp − 2 x − i 
2

 d max 
Self-organizing Maps
Introduction
• Neural Networks use processing, inspired by the
human brain, as a basis to develop algorithms that
can be used to model and understand complex
patterns and prediction problems.
• There are several types of neural networks and each
has its own unique use.
• The Self Organizing Map (SOM) is one such variant of
the neural network, also known as Kohonen’s Map.
Self-Organizing Maps
• A self-organizing map is also known as SOM
and it was proposed by Kohonen.
• It is an unsupervised neural network that is
trained using unsupervised learning
techniques to produce a low dimensional,
discretized representation from the input
space of the training samples, known as a map
and is, therefore, a method to reduce data
dimensions.
Self-Organizing Maps
• Self-Organizing Maps were initially only being used
for data visualization, but these days, it has been
applied to different problems, including as a solution
to the Traveling Salesman Problem as well.
• Map units or neurons usually form a two
dimensional space and hence a mapping from high
dimensional space onto a plane is created.
Self-Organizing Maps
• The map retains the calculated relative distance between
the points. Points closer to each other within the input
space are mapped to the nearby map units in Self
Organizing Maps.
• Self-Organizing Maps can thus serve as a cluster analyzing
tool for high dimensional data.
• Self-Organizing Maps also have the capability to generalize.
• During generalization, the network can recognize or
characterize inputs that it has never seen as data before.
• New input is taken up with the map unit and is therefore
mapped.
Uses of Self-Organizing Maps
• Self-Organizing Maps provide an advantage in
maintaining the structural information from the
training data and are not inherently linear.
• Using Principal Component Analysis on high
dimensional data may just cause loss of data when
the dimension gets reduced into two.
• If the data comprises a lot of dimensions and if every
dimension preset is useful, in such cases Self-
Organizing Maps can be very useful over PCA for
dimensionality reduction.
Uses of Self-Organizing Maps

• Seismic facies analysis generates groups based


on the identification of different individual
features.
• This method finds feature organizations in the
dataset and forms organized relational
clusters.
Uses of Self-Organizing Maps
• However, these clusters sometimes may or may not have any
physical analogs.
• Therefore a calibration method to relate these clusters to
reality is required and Self Organizing Maps do the job.
• This calibration method defines the mapping between the
groups and the measured physical properties.
Uses of Self-Organizing Maps
• Text clustering is another important preprocessing
step that can be performed through Self-Organizing
Maps.
• It is a method that helps to verify how the present
text can be converted into a mathematical expression
for further analysis and processing.
• Exploratory data analysis and visualization are also
the most important applications of Self Organizing
Maps.
SOM Training
• SOM doesn’t use backpropagation with SGD
to update weights, this type of unsupervised
artificial neural network uses competetive
learning to update its weights.
• Competitive learning is based on three
processes :
– Competition
– Cooperation
– Adaptation
Competition
• As we said before each neuron in a SOM is assigned a
weight vector with the same dimensionality as the
input space.
• In the example below, in each neuron of the output
layer we will have a vector with dimension n.
• We compute distance between each neuron (neuron
from the output layer) and the input data, and the
neuron with the lowest distance will be the winner of
the competetion.
• The Euclidean metric is commonly used to measure
distance.
Cooperation
• We will update the vector of the winner neuron
in the final process (adaptation) but it is not the
only one, also it’s neighbor will be updated.
• How do we choose the neighbors ?
• To choose neighbors we use neighborhood kernel
function, this function depends on two factor :
time ( time incremented each new input data)
and distance between the winner neuron and the
other neuron (How far is the neuron from the
winner neuron).
Cooperation
• The image below show us how the winner neuron’s ( The most
green one in the center) neighbors are choosen depending on
distance and time factors.
Adaptation
• After choosing the winner neuron and it’s
neighbors we compute neurons update.
• Those choosen neurons will be updated but
not the same update, more the distance
between neuron and the input data grow less
we adjust it like shown in the image below :
Adaptation
Adaptation
• The winner neuron and it’s neighbors will be updated using
this formula:
Wij (t + 1) = Wij (t ) + hij ( X (t ) − Wij (t ))
• This learning rate indicates how much we want to adjust our
weights.
• After time t (positive infinite), this learning rate will converge
to zero so we will have no update even for the neuron winner
.
Adaptation
• The neighborhood kernel depends on the
distance between winner neuron and the
other neuron (they are proportionally
reversed : d increase make h(t) decrease) and
the neighborhood size which itself depends on
time ( decrease while time incrementing) and
this make neighborhood kernel function
decrease also.
SOM algorithm – Finding winning neuron
Find neuron whose weights closest match the inputs

Yj = arg min j ( || X − W ||)


i ij

winner
For each training the input data gets updated and by considering the shortest
Euclidean distances the winning vector is decided and it gets updated at each
training iteration.
SOM algorithm – weight adaptation
Wij (t + 1) = Wij (t ) + hij ( X (t ) − Wij (t ))

= exp(−dij / 2 2 )
2
h ij
Kohonen Learning Algorithm
1. Initialise weights (random values) and set topological neighbourhood and
learning rate parameters
2. While stopping condition is false, do steps 3-8
3. For each input vector X, do steps 4-6
4. For each neuron j, compute Euclidean distance:

Yj = arg min j ( || Xi − Wij ||)


5. Find index j such that YJ is a minimum
6. For all units j within a specified neighbourhood of j, and for all i:

Wij (t + 1) = Wij (t ) + hij ( X (t ) − Wij (t ))


7. Update learning rate ()
8. Reduce topological neighbourhood at specified times
9. Test stopping condition.
Example

26 May 2024
Architecture
• Self-Organizing Maps consist of two important
layers, the first one is the input layer, and the
second one is the output layer, which is also
known as a feature map.
• Each data point in the dataset recognizes itself
by competing for a representation.
• The Self-Organizing Maps’ mapping steps start
from initializing the weight to vectors.
Architecture
Architecture
• After this, a random vector as the sample is selected
and the mapped vectors are searched to find which
weight best represents the chosen sample.
• Each weighted vector has neighboring weights present
that are close to it. The chosen weight is then
rewarded by being able to become a random sample
vector.
• This helps the map to grow and form different shapes.
Most generally, they form square or hexagonal shapes
in a 2D feature space.
• This whole process is repeatedly performed a large
number of times and more than 1000 times.
Architecture
• Self-Organizing Maps do not use backpropagation
with SGD to update weights, this unsupervised
ANN uses competitive learning to update its
weights i.e Competition, Cooperation and
Adaptation.
• Each neuron of the output layer is present with a
vector with dimension n.
• The distance between each neuron present at the
output layer and the input data is computed.
Architecture
• The neuron with the lowest distance is termed
as the most suitable fit.
• Updating the vector of the suitable neuron in
the final process is known as adaptation,
along with its neighbour in cooperation.
• After selecting the suitable neuron and its
neighbours, we process the neuron to update.
• The more the distance between the neuron
and the input, the more the data grows.
Architecture
• To simply explain, learning occurs in the following ways:
• Every node is examined to calculate which suitable weights
are similar to the input vector. The suitable node is commonly
known as the Best Matching Unit.
• The neighbourhood value of the Best Matching Unit is then
calculated. The number of neighbours tends to decrease over
time.
Architecture
• The suitable weight is further rewarded with
transitioning into more like the sample vector.
The neighbours transition like the sample
vector chosen.
✓The closer a node is to the Best Matching Unit,
the more its weights get altered and the
farther away the neighbour is from the node,
the less it learns.
❖Repeat the second step for N iterations.
Pros
• Data can be easily interpreted and understood
with the help of techniques like reduction of
dimensionality and grid clustering.
• Self-Organizing Maps are capable of handling
several types of classification problems while
providing a useful, and intelligent summary
from the data at the same time.
Cons
• It does not create a generative model for the
data and therefore the model does not
understand how data is being created.
• Self-Organizing Maps do not perform well
while working with categorical data and even
worse for mixed types of data.
• The model preparation time is comparatively
very slow and hard to train against the slowly
evolving data
Why Recurrent Neural Networks?
• Feed-Forward Neural Network: Used for general Regression
and Classification problems.
• Convolutional Neural Network: Used for object detection and
image classification.
• Deep Belief Network: Used in healthcare sectors for cancer
detection.
• RNN: Used for speech recognition, voice recognition, time
series prediction, and natural language processing.
Why Recurrent Neural Networks?
• In a feed-forward neural network, the decisions are based on
the current input.
• It doesn’t memorize the past data, and there’s no future scope.
• Feed-forward neural networks are used in general regression
and classification problems.
Why Recurrent Neural Networks?
• RNN were created because there were a few issues in the
feed-forward neural network:
✓ Cannot handle sequential data
✓ Considers only the current input
✓ Cannot memorize previous inputs
• The solution to these issues is the RNN. An RNN can handle
sequential data, accepting the current input data, and
previously received inputs.
• RNNs can memorize previous inputs due to their internal
memory.
What Is a Recurrent Neural Network
(RNN)?
• RNN works on the principle of saving the output of a particular
layer and feeding this back to the input in order to predict the
output of the layer.
• Below is how you can convert a Feed-Forward Neural Network
into a Recurrent Neural Network:
What Is a Recurrent Neural Network
(RNN)?
What Is a Recurrent Neural Network
(RNN)?
• The nodes in different layers of the neural network are
compressed to form a single layer of recurrent neural
networks.
• A, B, and C are the parameters of the network.

Here, “x” is the input layer, “h” is


the hidden layer, and “y” is the
output layer. A, B, and C are the
network parameters used to
improve the output of the model.
What Is a Recurrent Neural Network
(RNN)?
• At any given time t, the current input is a combination of input at x(t) and
x(t-1).
• The output at any given time is fetched back to the network to improve on
the output.
What Is a Recurrent Neural Network
(RNN)?
How Does Recurrent Neural Networks
Work?
• In Recurrent Neural networks, the information cycles through a
loop to the middle hidden layer.
How Does Recurrent Neural Networks
Work?
• The input layer ‘x’ takes in the input to the neural network
and processes it and passes it onto the middle layer.
• The middle layer ‘h’ can consist of multiple hidden layers,
each with its own activation functions and weights and biases.
• If you have a neural network where the various parameters of
different hidden layers are not affected by the previous layer,
ie: the neural network does not have memory, then you can
use a recurrent neural network.
• The Recurrent Neural Network will standardize the different
activation functions and weights and biases so that each
hidden layer has the same parameters.
• Then, instead of creating multiple hidden layers, it will create
one and loop over it as many times as required.
Types of Recurrent Neural Networks

• There are four types of Recurrent Neural


Networks:
• One to One
• One to Many
• Many to One
• Many to Many
One to One RNN
• This type of neural network is
known as the Vanilla Neural
Network.
• It's used for general machine
learning problems, which has a
single input and a single output.
One to Many RNN

• This type of neural


network has a single
input and multiple
outputs.
• An example of this is
the image caption.
Many to One RNN
• This RNN takes a sequence of
inputs and generates a single
output.
• Sentiment analysis is a good
example of this kind of network
where a given sentence can be
classified as expressing positive
or negative sentiments.
Many to Many RNN

• This RNN takes a


sequence of inputs and
generates a sequence of
outputs.
• Machine translation is one
of the examples.
Steps for Training a Recurrent Neural
Network
• In the input layers, the initial input is sent with all having the same
weight and activation function.
• Using the current input and the previous state output, the current
state is calculated.
• Now the current state ht will become ht-1 for the second time step.
• This keeps on repeating all the steps, and to solve any particular
problem, it can go on as many times to join the information from all
the previous steps.
• The final step is then calculated by the current state of the final state
and all other previous steps.
• Now an error is generated by calculating the difference between the
actual output and the output generated by our RNN model.
• The final step is when the process of backpropagation occurs
wherein the error is backpropagated to update the weights.
Advantages:
• RNN can process inputs of any length.
• An RNN model is modeled to remember each
information throughout the time which is very helpful in
any time series predictor.
• Even if the input size is larger, the model size does not
increase.
• The weights can be shared across the time steps.
• RNN can use their internal memory for processing the
arbitrary series of inputs which is not the case with
feedforward neural networks.
Disadvantages:
• Due to its recurrent nature, the computation is slow.
• Training of RNN models can be difficult.
• If we are using relu or tanh as activation functions, it
becomes very difficult to process sequences that are
very long.
• Prone to problems such as exploding and gradient
vanishing.

You might also like