Unit II-NNDL
Unit II-NNDL
Training Algorithm
For training, this network is using the Hebb or Delta learning rule.
Step 1 − Initialize all the weights to zero as wij = 0, i=1 to n, j=1 to n
Step 2 − Perform steps 3-4 for each input vector.
Step 3 − Activate each input unit as follows, xi=si(i=1 to n)
Step 4 − Activate each output unit as follows, yj=sj(j=1 to n)
Step 5 − Adjust the weights as follows, wij(new)=wij (old)+xiyj
Testing Algorithm
Step 1 − Set the weights obtained during training for Hebb’s rule.
Step 2 − Perform steps 3-5 for each input vector.
Step 3 − Set the activation of the input units equal to that of the input vector.
Step 4 − Calculate the net input to each output unit j = 1 to n
Architecture
As shown in the following figure, the architecture of Hetero Associative Memory
network has ‘n’ number of input training vectors and ‘m’ number of output target vectors.
Training Algorithm
For training, this network is using the Hebb or Delta learning rule.
Step 1 − Initialize all the weights to zero as wij = 0 i=1 to n,j=1 to m
Step 2 − Perform steps 3-4 for each input vector.
Step 3 − Activate each input unit as follows − xi=si(i=1 to n)
Step 4 − Activate each output unit as follows − yj=sj(j=1 to m)
Step 5 − Adjust the weights as follows − wij(new)=wij(old)+xiyj
Testing Algorithm
Step 1 − Set the weights obtained during training for Hebb’s rule.
Step 2 − Perform steps 3-5 for each input vector.
Step 3 − Set the activation of the input units equal to that of the input vector.
Step 4 − Calculate the net input to each output unit j = 1 to m;
Algorithm:
1. Storage (Learning): In this learning step of BAM, weight matrix is calculated between M
pairs
of patterns (fundamental memories) are stored in the synaptic weights of the network
following the equation
2. Testing: We have to check that the BAM recalls perfectly Ym for corresponding Xm and
recalls
Xm for Ym corresponding Using,
6. Apply activation over the total input to calculate the output as per the equation given
below:
Maxnet Algorithm
Step 1: Set activations and weights,
aj (0) is the starting input value to node Aj
2.Hamming Net:
Hamming net is a maximum likelihood classifier net. It is used to determine an exemplar
vector which is most similar to an input vector. The measure of similarity is obtained from
the formula:
x.y = a – D = 2a – n , since a +D = n --------- (5)
Where D is the hamming distance (number of component in which vectors differ), a is the
number components in which the components agree and n is the number of each vector
components.
When weight vector off a class unit is set to be one half of the exemplar vector, and bias to
be (n/2), the net will find the unit closest exemplar by finding the unit with maximum net
input. Maxnet is used for this purpose.
Architecture
The architecture of KSOM is similar to that of the competitive network. With the help of
neighborhood schemes, discussed earlier, the training can take place over the extended
region of the network.
Following figure shows the architecture of LVQ which is quite similar to the architecture of
KSOM. As we can see, there are “n” number of input units and “m” number of output units.
The layers are fully interconnected with having weights on them.
Parameters Used:
Following are the parameters used in LVQ training process as well as in the flowchart
x = training vector (x1,...,xi,...,xn)
T = class for training vector x
wj = weight vector for jth output unit
Cj = class associated with the jth output unit
Training Algorithm:
Step 1 − Initialize reference vectors, which can be done as follows −
Step 1aa − From the given set of training vectors, take the first “m” number of clusters
number of clusters training vectors and use them as weight vectors. The remaining vectors
can be used for training.
Step 1bb − Assign the initial weight and classification randomly.
Step 1cc − Apply K-means clustering method.
Step 2 − Initialize reference vector αα
Step 3 − Continue with steps 4-9, if the condition for stopping this algorithm is not met.
Step 4 − Follow steps 5-6 for every training input vector x.
Step 5 − Calculate Square of Euclidean Distance for j = 1 to m and i = 1 to n
D(j)=Σi=1nΣj=1m(xi−wij)2D(j)=Σi=1nΣj=1m(xi−wij)2
Step 6 − Obtain the winning unit J where Djj is minimum.
Step 7 − Calculate the new weight of the winning unit by the following relation −
if T = Cj
then wj(new)=wj(old)+α[x−wj(old)]wj(new)=wj(old)+α[x−wj(old)]
if T ≠ Cj
then wj(new)=wj(old)−α[x−wj(old)]wj(new)=wj(old)−α[x−wj(old)]
Step 8 − Reduce the learning rate αα.
Step 9 − Test for the stopping condition. It may be as follows −
Maximum number of epochs reached.
Learning rate reduced to a negligible value.
COUNTER PROPAGATION NETWORKS
CPN (Counter propagation network) were proposed by Hecht Nielsen in 1987.They
are multilayer network based on the combinations of the input, output, and
clustering layers.
The application of counter propagation net are data compression, function
approximation and pattern association.
There are two types of counter propagation net:
1. Full counter propagation network
2. Forward-only counter propagation network
1. Full counter propagation network:
Full CPN efficiently represents a large number of vector pair x:y by adaptively constructing
a look-up-table.
The full CPN works best if the inverse function exists and is continuous. The vector x and y
propagate through the network in a counter flow manner to yield output vector x* and y*.
Architecture of Full CPN:
The four major components of the instar-outstar model are the input layer, the instar, the
competitive layer and the outstar.
For each node in the input layer there is an input value xi. All the instar are grouped into a
layer called the competitive layer.
Each of the instar responds maximally to a group of input vectors in a different
region of space.
An outstar model is found to have all the nodes in the output layer and a single node
in the competitive layer. The outstar looks like the fan-out of a node.
Training Algorithm for Full CPN:
Step 0: Set the weights and the initial learning rate.
Step 1: Perform step 2 to 7 if stopping condition is false for phase I training.
Step 2: For each of the training input vector pair x:y presented, perform step 3 to .
Step 3: Make the X-input layer activations to vector X. Make the Y-input layer activation to
vector Y.
Step 4: Find the winning cluster unit.
If dot product method is used, find the cluster unit zj with target net input; for j=1 to p,
zinj=Σxi.vij + Σyk.wkj ---------------------(1)
If Euclidean distance method is used, find the cluster unit zj whose squared distance
from input vectors is the smallest:
Dj=Σ(xi-vij)2 + Σ(yk-wkj)2 ---------------------(2)
If there occurs a tie in case of selection of winner unit, the unit with the smallest index
is the winner. Take the winner unit index as J.
Step 5: Update the weights over the calculated winner unit zj. For i=1 to n,
viJ(new)=viJ(old) + α[xi-viJ(old)] ---------------------(3)
For k =1 to m, wkJ(new)=wkJ(old) + β[yk-wkJ(old)]
Step 6: Reduce the learning rates.
α (t+1)=0.5α(t); β(t+1)=0.5β(t) ---------------------(4)
Step 7: Test stopping condition for phase I training.
Step 8: Perform step 9 to 15 when stopping condition is false for phase II training.
Step 9: Perform step 10 to 13 for each training input vector pair x:y. Here α and β are
small constant values.
Step 10: Make the X-input layer activations to vector x. Make the Y-input layer
activations to vector y.
Step 11: Find the winning cluster unit (Using the formula from step 4). Take the
winner unit index as J.
Step 12: Update the weights entering into unit zJ.
For i=1 to n,
viJ(new)=viJ(old) + α[xi-viJ(old)]
For k =1 to m,
wkJ(new)=wkJ(old) + β[yk-wkJ(old)]
Step 13: Update the weights from unit zj to the output layers.
For i=1 to n, tJi(new)=tJi(old) + b[xi-tJi(old)]
For k =1 to m, uJk(new)=uJk(old) + a[yk-uJk(old)]
Step 14: Reduce the learning rates a and b.
a(t+1)=0.5a(t); b(t+1)=0.5b(t)
Step 15: Test stopping condition for phase II training.
It can be of both types, the unsupervised ones(ART1, ART2, ART3, etc) or the supervised
ones(ARTMAP). Generally, the supervised algorithms are named with the suffix “MAP”.
But the basic ART model is unsupervised in nature and consists of :
o The F1 layer accepts the inputs and performs some processing and transfers it to the F2
layer that best matches with the classification factor.
o There exist two sets of weighted interconnection for controlling the degree of similarity
between the units in the F1 and the F2 layer.
o The F2 layer is a competitive layer.The cluster unit with the large net input becomes the
candidate to learn the input pattern first and the rest F2 units are ignored.
o The reset unit makes the decision whether or not the cluster unit is allowed to learn the
input pattern depending on how similar its top-down weight vector is to the input vector
and to he decision. This is called the vigilance test.
Thus we can say that the vigilance parameter helps to incorporate new memories or new
information. Higher vigilance produces more detailed memories, lower vigilance produces
more general memories.
Generally two types of learning exists, slow learning and fast learning. In fast learning,
weight update during resonance occurs rapidly.
It is used in ART1.In slow learning, the weight change occurs slowly relative to the
duration of the learning trial. It is used in ART2.
Advantage of Adaptive Resonance Theory (ART)
It exhibits stability and is not disturbed by a wide variety of inputs provided to its
network.
It can be integrated and used with various other techniques to give more good results.
It can be used for various fields such as mobile robot control, face recognition, land
cover classification, target recognition, medical diagnosis, signature verification,
clustering web users, etc.
It has got advantages over competitive learning (like bpnn etc). The competitive
learning lacks the capability to add new clusters when deemed necessary
Limitations of Adaptive Resonance Theory:
Some ART networks are inconsistent (like the Fuzzy ART and ART1) as they depend
upon the order in which training data, or upon the learning rate.