0% found this document useful (0 votes)
47 views19 pages

Unit II-NNDL

The document discusses various training algorithms for pattern association in neural networks, including Auto Associative Memory, Hetero Associative Memory, Bidirectional Associative Memory (BAM), and Hopfield Networks. It outlines the architecture, training, and testing algorithms for each type, emphasizing their ability to store and recall patterns based on input. Additionally, it covers advanced concepts like Temporal Associative Memory, Learning Vector Quantization, and Counter Propagation Networks, highlighting their applications and operational mechanisms.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views19 pages

Unit II-NNDL

The document discusses various training algorithms for pattern association in neural networks, including Auto Associative Memory, Hetero Associative Memory, Bidirectional Associative Memory (BAM), and Hopfield Networks. It outlines the architecture, training, and testing algorithms for each type, emphasizing their ability to store and recall patterns based on input. Additionally, it covers advanced concepts like Temporal Associative Memory, Learning Vector Quantization, and Counter Propagation Networks, highlighting their applications and operational mechanisms.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

TRAINING ALGORITHMS FOR PATTERN ASSOCIATION

Write Short notes on Training Algorithms for Pattern Association.


These kinds of neural networks work on the basis of pattern association, which means they
can store different patterns and at the time of giving an output they can produce one of the
stored patterns by matching them with the given input pattern. These types of memories are
also called Content-Addressable Memory CAM Associative memory makes a parallel search
with the stored patterns as data files. Following are the two types of associative memories we
can observe
o Auto Associative Memory
o Hetero Associative memory
Auto Associative Memory
This is a single layer neural network in which the input training vector and the output target
vectors are the same. The weights are determined so that the network stores a set of patterns.
Architecture
As shown in the following figure, the architecture of Auto Associative memory network has
‘n’ number of input training vectors and similar ‘n’ number of output target vectors

Training Algorithm
For training, this network is using the Hebb or Delta learning rule.
Step 1 − Initialize all the weights to zero as wij = 0, i=1 to n, j=1 to n
Step 2 − Perform steps 3-4 for each input vector.
Step 3 − Activate each input unit as follows, xi=si(i=1 to n)
Step 4 − Activate each output unit as follows, yj=sj(j=1 to n)
Step 5 − Adjust the weights as follows, wij(new)=wij (old)+xiyj
Testing Algorithm
Step 1 − Set the weights obtained during training for Hebb’s rule.
Step 2 − Perform steps 3-5 for each input vector.
Step 3 − Set the activation of the input units equal to that of the input vector.
Step 4 − Calculate the net input to each output unit j = 1 to n

Step 5 − Apply the following activation function to calculate the output

HETERO ASSOCIATIVE MEMORY


Similar to Auto Associative Memory network, this is also a single layer neural network.
However, in this network the input training vector and the output target vectors are not the
same. The weights are determined so that the network stores a set of patterns. Hetero
associative network is static in nature, hence, there would be no non-linear and delay
operations.

Architecture
As shown in the following figure, the architecture of Hetero Associative Memory
network has ‘n’ number of input training vectors and ‘m’ number of output target vectors.
Training Algorithm
For training, this network is using the Hebb or Delta learning rule.
Step 1 − Initialize all the weights to zero as wij = 0 i=1 to n,j=1 to m
Step 2 − Perform steps 3-4 for each input vector.
Step 3 − Activate each input unit as follows − xi=si(i=1 to n)
Step 4 − Activate each output unit as follows − yj=sj(j=1 to m)
Step 5 − Adjust the weights as follows − wij(new)=wij(old)+xiyj
Testing Algorithm
Step 1 − Set the weights obtained during training for Hebb’s rule.
Step 2 − Perform steps 3-5 for each input vector.
Step 3 − Set the activation of the input units equal to that of the input vector.
Step 4 − Calculate the net input to each output unit j = 1 to m;

Step 5 − Apply the following activation function to calculate the output

ANN – BIDIRECTIONAL ASSOCIATIVE MEMORY (BAM)


BAM is a recurrent neural network (RNN) that has connections between neurons in
both forward and backward directions. This bidirectional connectivity allows information to
flow not only from input to output but also from output to input.
The architecture of BAM consists of two layers: an input layer and an output layer.
Each neuron in the input layer is connected to every neuron in the output layer, and
vice versa. The connections between neurons are characterized by weights that determine the
strength of the connection.
The learning process in BAM involves adjusting the weights of these connections based
on the input and output patterns presented during training. BAM is trained in a supervised
manner, meaning that it requires input-output pairs to learn the associations between patterns.
The key feature of BAM is its ability to associate input patterns with output
patterns bidirectionally. Given an input pattern, the network can recall the associated output
pattern, and vice versa. This bidirectional association makes BAM suitable for tasks where
both forward and backward information retrieval is important.
BAM Architecture:
`When BAM accepts an input of n-dimensional vector X from set A then the model recalls
m- dimensional vector Y from set B. Similarly when Y is treated as input, the BAM recalls X.

Algorithm:
1. Storage (Learning): In this learning step of BAM, weight matrix is calculated between M
pairs
of patterns (fundamental memories) are stored in the synaptic weights of the network
following the equation

2. Testing: We have to check that the BAM recalls perfectly Ym for corresponding Xm and
recalls
Xm for Ym corresponding Using,

All pairs should be recalled accordingly.


3. Retrieval: For an unknown vector X (a corrupted or incomplete version of a pattern from
set A or B) to the BAM and retrieve a previously stored association:
HOPFIELD NETWORKS
It is a fully interconnected neural network where each unit is connected to every other
unit. It behaves in a discrete manner, i.e. it gives finite distinct output. Structure & Architecture
of Hopfield Network. Each neuron has an inverting and a non-inverting output. Being fully
connected, the output of each neuron is an input to all other neurons but not the self. The below
figure shows a sample representation of a Discrete Hopfield Neural Network architecture

[ x1 , x2 , ... , xn ] -> Input to the n given neurons.


[ y1 , y2 , ... , yn ] -> Output obtained from the n given neurons
Wij -> weight associated with the connection between the ith and the jth neuron.
Training Algorithm
For storing a set of input patterns S(p) [p = 1 to P], where S(p) = S1(p) … Si(p) … Sn(p), the
weight
matrix is given by:
For binary patterns

For bipolar patterns

Steps Involved in the training of a Hopfield Network are as mapped below:


1. Initialize weights (wij) to store patterns (using training algorithm).
2. For each input vector yi, perform steps 3-7.
3. Make the initial activators of the network equal to the external input vector x.

4. For each vector yi, perform steps 5-7.


5. Calculate the total input of the network yin using the equation given below.

6. Apply activation over the total input to calculate the output as per the equation given
below:

(where θi (threshold) and is normally taken as 0)


7. Now feedback the obtained output yi to all other units. Thus, the activation vectors are
updated.
8. Test the network for convergence.
Energy Function:
Hopfield Networks use an energy function to represent the state of the network. The goal
during learning is to minimize this energy function for a given set of patterns.
The energy function is defined based on the weights of connections between neurons
and the states (binary values) of the neurons.
ITERATIVE AUTOASSOCIATIVE MEMORY NETWORKS- TEMPORAL
ASSOCIATIVE MEMORY NETWORK
Temporal associative memory:
It is a special case of a hetero-associative memory, where each subpattern is a
pattern (example) from another time interval.
Usually, the task is to predict the pattern of the next time interval given the patterns from
the current time interval and from several past time intervals.
Autoassociative memory
Autoassociative memories are capable of retrieving a piece of data upon presentation of
only partial information from that piece of data.
Hopfield networks have been shown to act as autoassociative memory since they are
capable of remembering data by observing a portion of that data.
Iterative Autoassociative Net
In some cases, an auto-associative net does not reproduce a stored pattern the first
time around, but if the result of the first showing is input to the net again, the stored
pattern is reproduced.
They are of 3 further kinds — Recurrent linear auto-associator, Brain-State-in-a-Box
net, and Discrete Hopfield net.
The Hopfield Network is the most well-known example of an autoassociative memory.
FIXED WEIGHT COMPETITIVE NETS
They are additional structures included in networks of multi output in order to force their
output layers to make a decision as to which one neuron will fire.
This mechanism is called competition. When competition is complete, only one output
neuron has nonzero output. Symmetric (fixed) weight nets are: (Maxnet and Hamming
Net).
1- Maxnet
-Maxnet is based on winner-take-all policy.
-The n-nodes of Maxnet are completely connected
-There is no need for training the network, since the weights are fixed.
-The Maxnet operates as a recurrent recall network that operates in an Auxiliary mode
Activation functions
Where Є is usually positive less than 1 number

Maxnet Algorithm
Step 1: Set activations and weights,
aj (0) is the starting input value to node Aj

Step 2: If more than one node has nonzero output, do step 3 to 5.


Step 3: Update the activation (output) at each node for j = 1, 2, 3……., n
aj (t+1) = f [ aj (t) – є Σ ai (t)] , i ≠ j
є < 1/m where m is the number of competitive neurons
Step 4: Save activations for use in the next iteration.
aj (t+1) → aj (t)
Step 5: Test for stopping condition. If more than one node has a nonzero output
then Go To step 3, Else Stop.

2.Hamming Net:
Hamming net is a maximum likelihood classifier net. It is used to determine an exemplar
vector which is most similar to an input vector. The measure of similarity is obtained from
the formula:
x.y = a – D = 2a – n , since a +D = n --------- (5)
Where D is the hamming distance (number of component in which vectors differ), a is the
number components in which the components agree and n is the number of each vector
components.
When weight vector off a class unit is set to be one half of the exemplar vector, and bias to
be (n/2), the net will find the unit closest exemplar by finding the unit with maximum net
input. Maxnet is used for this purpose.

KOHONEN SELF-ORGANIZING FEATURE MAPS


There can be various topologies, however the following two topologies are used the most
Rectangular Grid Topology
This topology has 24 nodes in the distance-2 grid, 16 nodes in the distance-1 grid, and 8
nodes in the distance-0 grid, which means the difference between each rectangular grid is
8 nodes. The winning unit is indicated by #.
Hexagonal Grid Topology
This topology has 18 nodes in the distance-2 grid, 12 nodes in the distance-1 grid, and 6
nodes in the distance-0 grid, which means the difference between each rectangular grid is
6 nodes. The winning unit is indicated by #.

Architecture
The architecture of KSOM is similar to that of the competitive network. With the help of
neighborhood schemes, discussed earlier, the training can take place over the extended
region of the network.

Algorithm for training


Step 1 − Initialize the weights, the learning rate α and the neighborhood topological scheme.
Step 2 − Continue step 3-9, when the stopping condition is not true.
Step 3 − Continue step 4-6 for every input vector x.
Step 4 − Calculate Square of Euclidean Distance for j = 1 to m
D(j)=Σi=1nΣj=1m(xi−wij)2D(j)=Σi=1nΣj=1m(xi−wij)2
Step 5 − Obtain the winning unit J where Djj is minimum.
Step 6 − Calculate the new weight of the winning unit by the following relation −
wij(new)=wij(old)+α[xi−wij(old)]wij(new)=wij(old)+α[xi−wij(old)]
Step 7 − Update the learning rate α by the following relation −
α(t+1)=0.5αtα(t+1)=0.5αt
Step 8 − Reduce the radius of topological scheme.
Step 9 − Check for the stopping condition for the network.
LEARNING VECTOR QUANTIZATION
Explain Learning Vector Quantization LVQ.
LVQ, different from Vector quantization VQVQ and Kohonen Self-Organizing Maps
KSOM KSOM, basically is a competitive network which uses supervised learning.
It as a process of classifying the patterns where each output unit represents a class.
As it uses supervised learning, the network will be given a set of training patterns with
known classification along with an initial distribution of the output class.
After completing the training process, LVQ will classify an input vector by assigning it to
the same class as that of the output unit.
Architecture:

Following figure shows the architecture of LVQ which is quite similar to the architecture of
KSOM. As we can see, there are “n” number of input units and “m” number of output units.
The layers are fully interconnected with having weights on them.
Parameters Used:
Following are the parameters used in LVQ training process as well as in the flowchart
x = training vector (x1,...,xi,...,xn)
T = class for training vector x
wj = weight vector for jth output unit
Cj = class associated with the jth output unit
Training Algorithm:
Step 1 − Initialize reference vectors, which can be done as follows −
Step 1aa − From the given set of training vectors, take the first “m” number of clusters
number of clusters training vectors and use them as weight vectors. The remaining vectors
can be used for training.
Step 1bb − Assign the initial weight and classification randomly.
Step 1cc − Apply K-means clustering method.
Step 2 − Initialize reference vector αα
Step 3 − Continue with steps 4-9, if the condition for stopping this algorithm is not met.
Step 4 − Follow steps 5-6 for every training input vector x.
Step 5 − Calculate Square of Euclidean Distance for j = 1 to m and i = 1 to n
D(j)=Σi=1nΣj=1m(xi−wij)2D(j)=Σi=1nΣj=1m(xi−wij)2
Step 6 − Obtain the winning unit J where Djj is minimum.
Step 7 − Calculate the new weight of the winning unit by the following relation −
if T = Cj
then wj(new)=wj(old)+α[x−wj(old)]wj(new)=wj(old)+α[x−wj(old)]
if T ≠ Cj
then wj(new)=wj(old)−α[x−wj(old)]wj(new)=wj(old)−α[x−wj(old)]
Step 8 − Reduce the learning rate αα.
Step 9 − Test for the stopping condition. It may be as follows −
Maximum number of epochs reached.
Learning rate reduced to a negligible value.
COUNTER PROPAGATION NETWORKS
CPN (Counter propagation network) were proposed by Hecht Nielsen in 1987.They
are multilayer network based on the combinations of the input, output, and
clustering layers.
The application of counter propagation net are data compression, function
approximation and pattern association.
There are two types of counter propagation net:
1. Full counter propagation network
2. Forward-only counter propagation network
1. Full counter propagation network:
Full CPN efficiently represents a large number of vector pair x:y by adaptively constructing
a look-up-table.
The full CPN works best if the inverse function exists and is continuous. The vector x and y
propagate through the network in a counter flow manner to yield output vector x* and y*.
Architecture of Full CPN:

The four major components of the instar-outstar model are the input layer, the instar, the
competitive layer and the outstar.
For each node in the input layer there is an input value xi. All the instar are grouped into a
layer called the competitive layer.
Each of the instar responds maximally to a group of input vectors in a different
region of space.
An outstar model is found to have all the nodes in the output layer and a single node
in the competitive layer. The outstar looks like the fan-out of a node.
Training Algorithm for Full CPN:
Step 0: Set the weights and the initial learning rate.
Step 1: Perform step 2 to 7 if stopping condition is false for phase I training.
Step 2: For each of the training input vector pair x:y presented, perform step 3 to .
Step 3: Make the X-input layer activations to vector X. Make the Y-input layer activation to
vector Y.
Step 4: Find the winning cluster unit.
If dot product method is used, find the cluster unit zj with target net input; for j=1 to p,
zinj=Σxi.vij + Σyk.wkj ---------------------(1)
If Euclidean distance method is used, find the cluster unit zj whose squared distance
from input vectors is the smallest:
Dj=Σ(xi-vij)2 + Σ(yk-wkj)2 ---------------------(2)
If there occurs a tie in case of selection of winner unit, the unit with the smallest index
is the winner. Take the winner unit index as J.
Step 5: Update the weights over the calculated winner unit zj. For i=1 to n,
viJ(new)=viJ(old) + α[xi-viJ(old)] ---------------------(3)
For k =1 to m, wkJ(new)=wkJ(old) + β[yk-wkJ(old)]
Step 6: Reduce the learning rates.
α (t+1)=0.5α(t); β(t+1)=0.5β(t) ---------------------(4)
Step 7: Test stopping condition for phase I training.
Step 8: Perform step 9 to 15 when stopping condition is false for phase II training.
Step 9: Perform step 10 to 13 for each training input vector pair x:y. Here α and β are
small constant values.
Step 10: Make the X-input layer activations to vector x. Make the Y-input layer
activations to vector y.
Step 11: Find the winning cluster unit (Using the formula from step 4). Take the
winner unit index as J.
Step 12: Update the weights entering into unit zJ.
For i=1 to n,
viJ(new)=viJ(old) + α[xi-viJ(old)]
For k =1 to m,
wkJ(new)=wkJ(old) + β[yk-wkJ(old)]
Step 13: Update the weights from unit zj to the output layers.
For i=1 to n, tJi(new)=tJi(old) + b[xi-tJi(old)]
For k =1 to m, uJk(new)=uJk(old) + a[yk-uJk(old)]
Step 14: Reduce the learning rates a and b.
a(t+1)=0.5a(t); b(t+1)=0.5b(t)
Step 15: Test stopping condition for phase II training.

2. Forward-only Counter propagation network:


A simplified version of full CPN is the forward-only CPN. Forward-only CPN
uses only the x vector to form the cluster on the Kohonen units during phase
I training.
In case of forward- only CPN, first input vectors are presented to the input
units. First, the weights between the input layer and cluster layer are trained.
Then the weights between the cluster layer and output layer are trained.
This is a specific competitive network, with target known.
Architecture of forward-only CPN:
It consists of three layers: input layer, cluster layer and output layer.
Its architecture resembles the back-propagation network, but in CPN there
exists interconnections between the units in the cluster layer.
Training Algorithm for Forward-only CPN:
Step 0: Initialize the weights and learning rates.
Step 1: Perform step 2 to 7 when stopping condition for phase I training is false.
Step 2: Perform step 3 to 5 for each of training input X.
Step 3: Set the X-input layer activation to vector X.
Step 4: Compute the winning cluster unit J. If dot product method is used, find the
cluster unit zJ with the largest net input:
zinj=Σxi.vij ---------------------(5)
If Euclidean distance is used, find the cluster unit zJ square of whose distance
from the input pattern is smallest:
Dj=Σ(xi-vij)2 ---------------------(6)
If there exists a tie in the selection of winner unit, the unit with the smallest
index is chosen as the winner.
Step 5: Perform weight updation for unit zJ.
For i=1 to n,
viJ(new)=viJ(old) + α[xi-viJ(old)]
Step 6: Reduce learning rate α: α (t+1)=0.5α(t)
Step 7: Test the stopping condition for phase I training.
Step 8: Perform step 9 to 1 when stopping condition for phase II training is false.
Step 9: Perform step 10 to 13 for each training input pair x:y.
Step 10: Set X-input layer activations to vector X. Set Y-output layer activation to vector Y.
Step 11: Find the winning cluster unit J.
Step 12: Update the weights into unit zJ.
For i=1 to n,
viJ(new)=viJ(old) + α[xi-viJ(old)]
Step 13: Update the weights from unit zJ to the output units.
For k=1 to m,
wJk(new)=wJk(old) + β[yk-wJk(old)]
Step 14: Reduce learning rate β,
β(t+1)=0.5β(t)
Step 15: Test the stopping condition for phase II training
ADAPTIVE RESONANCE THEORY NETWORK
Explain about Adaptive Resonance Theory Network
Adaptive resonance theory is a type of neural network technique developed by
Stephen Grossberg and Gail Carpenter in 1987. The basic ART uses unsupervised
learning technique.
The term “adaptive” and “resonance” used in this suggests that they are open to new
learning(i.e. adaptive) without discarding the previous or the old information(i.e.
resonance).
The ART networks are known to solve the stability-plasticity dilemma i.e., stability
refers to their nature of memorizing the learning and plasticity refers to the fact that
they are flexible to gain new information.
Types of Adaptive Resonance Theory (ART)
Carpenter and Grossberg developed different ART architectures as a result of 20 years
of research. The ARTs can be classified as follows:
• ART1 – It is the simplest and the basic ART architecture. It is capable of
clustering binary input values.
• ART2 – It is extension of ART1 that is capable of clustering continuous-valued
input data.
• Fuzzy ART – It is the augmentation of fuzzy logic and ART.
• ARTMAP – It is a supervised form of ART learning where one ART learns based
on the previous ART module. It is also known as predictive ART.
• FARTMAP – This is a supervised ART architecture with Fuzzy logic included.
Basic of Adaptive Resonance Theory (ART) Architecture
The adaptive resonant theory is a type of neural network that is self-organizing and
competitive.

It can be of both types, the unsupervised ones(ART1, ART2, ART3, etc) or the supervised
ones(ARTMAP). Generally, the supervised algorithms are named with the suffix “MAP”.
But the basic ART model is unsupervised in nature and consists of :
o The F1 layer accepts the inputs and performs some processing and transfers it to the F2
layer that best matches with the classification factor.
o There exist two sets of weighted interconnection for controlling the degree of similarity
between the units in the F1 and the F2 layer.
o The F2 layer is a competitive layer.The cluster unit with the large net input becomes the
candidate to learn the input pattern first and the rest F2 units are ignored.
o The reset unit makes the decision whether or not the cluster unit is allowed to learn the
input pattern depending on how similar its top-down weight vector is to the input vector
and to he decision. This is called the vigilance test.
Thus we can say that the vigilance parameter helps to incorporate new memories or new
information. Higher vigilance produces more detailed memories, lower vigilance produces
more general memories.
Generally two types of learning exists, slow learning and fast learning. In fast learning,
weight update during resonance occurs rapidly.
It is used in ART1.In slow learning, the weight change occurs slowly relative to the
duration of the learning trial. It is used in ART2.
Advantage of Adaptive Resonance Theory (ART)
It exhibits stability and is not disturbed by a wide variety of inputs provided to its
network.
It can be integrated and used with various other techniques to give more good results.
It can be used for various fields such as mobile robot control, face recognition, land
cover classification, target recognition, medical diagnosis, signature verification,
clustering web users, etc.
It has got advantages over competitive learning (like bpnn etc). The competitive
learning lacks the capability to add new clusters when deemed necessary
Limitations of Adaptive Resonance Theory:
Some ART networks are inconsistent (like the Fuzzy ART and ART1) as they depend
upon the order in which training data, or upon the learning rate.

You might also like