0% found this document useful (0 votes)
12 views16 pages

NN&DL Unit-II Unsupervise Learning Networks

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views16 pages

NN&DL Unit-II Unsupervise Learning Networks

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

NEURAL NETWORKS AND DEEP LEARNING

UNIT – II (Unsupervised Learning Networks)

Unsupervised Learning Networks- Introduction, Fixed Weight Competitive Nets, Maxnet,


Hamming Network, Kohonen Self-Organizing Feature Maps, Learning Vector Quantization,
Counter Propagation Networks, Adaptive Resonance Theory Networks. Special Networks-
Introduction to various networks.

UNSUPERVISED LEARNING NETWORKS


Introduction
 Learning is a fundamental component required by every human being in the creation of intelligence.
Humans derive their intelligence from the brain's capacity to learn from experience and utilizing that
to adapt when confronted with existing and new circumstances.
 Reproduction of human intelligence in machines and computers is the objective of artificial
intelligence techniques, one of which is an Artificial Neural Network.
 ANNs are models defined to mimic the learning capability of human brains. Like humans,
validation, training, and testing are significant components in making such computational models.
 Artificial Neural Networks acquire information by getting some datasets (might be labelled or
unlabelled) and computationally changing the network's free parameters adapted from the
environment through simulation.
 Based on the learning rules and training process, learning in ANNs can be sorted into supervised,
reinforcement, and unsupervised learning.
 Unsupervised learning is used when it is absurd to augment the training data sets with class identities
(labels). This difficulty happens in situations where there is no knowledge of the system, or the cost
of obtaining such knowledge is too high.
 In unsupervised learning, as its name suggests, the ANN is not under the guidance of a "teacher."
Instead, it is provided with unlabelled data sets (contains only the input data) and left to discover the
patterns in the data and build a new model from it. In this situation, ANN figures out how to arrange
the data by exploiting the separation between clusters within it.

Mr. Mohammed Afzal, Asst. Professor in AIML


Mob: +91-8179700193, Email: [email protected]
Fixed Weight Competitive Nets
 During training process, the weights remain fixed in these competitive networks. The idea of
competition is used among neurons for enhancement of contrast in their activation functions.
 In this we have two types of networks- Maxnet and Hamming networks

MAXNET
 Maxnet network was developed by Lippmann in 1987.
 The Maxner serves as a sub net for picking the node whose input is larger.
 All the nodes present in this subnet are fully interconnected and there exist symmetrical weights in
all these weighted interconnections.

Architecture of Maxnet
 The architecture of Maxnet is a fixed symmetrical weights are present over the weighted
interconnections. The weights between the neurons are inhibitory and fixed.
 The Maxnet with this structure can be used as a subnet to select a particular node whose net input is
the largest.

Testing Algorithm of Maxnet


 The Maxnet uses the following activation function:

Testing algorithm
Step – 0: Initial weights and initial activations are set. The weight is set as [0 < ε < 1/m], where "m"
is the total number of nodes. Let

Mr. Mohammed Afzal, Asst. Professor in AIML


Mob: +91-8179700193, Email: [email protected]
Xj(0) = input the node Xj
and

Step – 1: Perform Steps 2-4, when stopping condition is false.


Step – 2: Update the activations of each node. For j = 1 to m,

Step – 3: Save the activations obtained for use in the next iteration. For j = 1 to m,

Step – 4: Finally, test the stopping condition for convergence of the network. The following is the
stopping condition: If more than one node has a nonzero activation, continue; else stop.

Hamming Network
 The Hamming network is a two-layer feedforward neural network for classification of binary bipolar
n-tuple input vectors using minimum Hamming distance denoted as D H(Lippmann, 1987).
 The first layer is the input layer for the n-tuple input vectors.
 The second layer (also called the memory layer) stores p memory patterns.
 A p-class Hamming network has p output neurons in this layer. The strongest response of a neuron is
indicative of the minimum Hamming distance between the stored pattern and the input vector.

Hamming Distance
 Hamming distance of two vectors, x and y of dimension n
x.y = a - d
where: a is number of bits in agreement in x & y (No. of Similar bits in x & y), and d is number of
bits different in x and y (No. of Dissimilar bits in x & y).
 The value "a - d" is the Hamming distance existing between two vectors. Since, the total number of
components is n, we have,
n=a+d
i.e., d = n - a

 On simplification, we get
Mr. Mohammed Afzal, Asst. Professor in AIML
Mob: +91-8179700193, Email: [email protected]
 From the above equation, it is clearly understood that the weights can be set to one-half the exemplar
vector and bias can be set initially to n/2

Testing Algorithm of Hamming Network


Step – 0: Initialize the weights. For i = 1 to n and j = 1 to m,

Initialize the bias for storing the "m" exemplar vectors. For j = 1 to m,

Step – 1: Perform Steps 2-4 for each input vector x.


Step – 2: Calculate the net input to each unit Yj, i.e.,

Step – 3: Initialize the activations for Maxnet, i.e.,

Step – 4: Maxnet is found to iterate for finding the exemplar that best matches the input
patterns.

KOHONEN SELF-ORGANIZING FEATURE MAPS


 Self-Organizing Feature Maps (SOM) was developed by Dr.Teuvo Kohonen in 1982. Kohonen Self-
Organizing feature map (KSOM) refers to a neural network, which is trained using competitive
learning.

Mr. Mohammed Afzal, Asst. Professor in AIML


Mob: +91-8179700193, Email: [email protected]
 Suppose we have some pattern of arbitrary dimensions, however, we need them in one dimension or
two dimensions. Then the process of feature mapping would be very useful to convert the wide
pattern space into a typical feature space.
 Now, the question arises why do we require self-organizing feature map? The reason is, along with
the capability to convert the arbitrary dimensions into 1-D or 2-D, it must also have the ability to
preserve the neighbour topology.
Neighbour Topologies in Kohonen SOM
 There can be various topologies, however the following two topologies are used the most –
Rectangular Grid Topology
o This topology has 24 nodes in the distance-2 grid, 16 nodes in the distance-1 grid, and 8
nodes in the distance-0 grid, which means the difference between each rectangular grid is 8
nodes. The winning unit is indicated by #.

Hexagonal Grid Topology


o This topology has 18 nodes in the distance-2 grid, 12 nodes in the distance-1 grid, and 6
nodes in the distance-0 grid, which means the difference between each rectangular grid is 6
nodes. The winning unit is indicated by #.

Mr. Mohammed Afzal, Asst. Professor in AIML


Mob: +91-8179700193, Email: [email protected]
 Basic competitive learning implies that the competition process takes place before the cycle of
learning. The competition process suggests that some criteria select a winning processing element.
 After the winning processing element is selected, its weight vector is adjusted according to the used
learning law.
 Feature mapping is a process which converts the patterns of arbitrary dimensionality into a response
of one- or two-dimensions array of neurons.
 The network performing such a mapping is called feature map. The reason for reducing the higher
dimensionality, the ability to preserve the neighbour topology.

Training Algorithm
Step – 0: Initialize the weights with Random values and the learning rate
Step – 1: Perform Steps 2-8 when stopping condition is false.
Step – 2: Perform Steps 3-5 for each input vector x.
Step – 3: Compute the square of the Euclidean distance, i.e., for each j = i to m,

Step – 4: Find the winning unit index J, so that D(J) is minimum.


Step – 5: For all units j within a specific neighbourhood of J and for all i, calculate the new
weights:

Step – 6: Update the learning rare a using the formula (t is timestamp)

Mr. Mohammed Afzal, Asst. Professor in AIML


Mob: +91-8179700193, Email: [email protected]
Step – 7: Reduce radius of topological neighbourhood at specified time intervals.
Step – 8: Test for stopping condition of the network.

LEARNING VECTOR QUANTIZATION


 In 1980, Finnish Professor Kohonen discovered that some areas of the brain develop structures with
different areas, each of them with a high sensitive for a specific input pattern. It is based on
competition among neural units based on a principle called winner-takes-all.
 Learning Vector Quantization (LVQ) is a prototype-based supervised classification algorithm. A
prototype is an early sample, model, or release of a product built to test a concept or process. One or
more prototypes are used to represent each class in the dataset. New (unknown) data points are then
assigned the class of the prototype that is nearest to them.
 In order for "nearest" to make sense, a distance measure has to be defined. There is no limitation on
how many prototypes can be used per class, the only requirement being that there is at least one
prototype for each class.
 LVQ is a special case of an artificial neural network and it applies a winner-take-all Hebbian
learning-based approach. With a small difference, it is similar to Self-Organizing Maps (SOM)
algorithm. SOM and LVQ were invented by Teuvo Kohonen.
 LVQ system is represented by prototypes W=(W1....,Wn). In winner-take-all training algorithms, the
winner is moved closer if it correctly classifies the data point or moved away if it classifies the data
point incorrectly. An advantage of LVQ is that it creates prototypes that are easy to interpret for
experts in the respective application domain

Training Algorithm
Step – 0: Initialize the reference vectors. This can be done using the following steps.
From the given set of training vectors, take the first "m" (number of clusters) training
vectors and use them as weight vectors, the remaining vectors can be used for training.
o Assign the initial weights and classifications randomly.
o K-means clustering method.
o Set initial learning rate α
Step – 1: Perform Steps 2-6 if the stopping condition is false.

Mr. Mohammed Afzal, Asst. Professor in AIML


Mob: +91-8179700193, Email: [email protected]
Step – 2: Perform Steps 3-4 for each training input vector x
Step – 3: Calculate the Euclidean distance; for i = 1 to n, j = 1 to m,

Find the winning unit index J, when D(J) is minimum


Step – 4: Update the weights on the winning unit, Wj using the following conditions.

Step – 5: Reduce the learning rate α


Step – 6: Test for the stopping condition of the training process. (The stopping conditions may
be fixed number of epochs or if learning rare has reduced to a negligible value.)

COUNTER PROPAGATION NETWORK


 Counter propagation networks (CPN) were proposed by Hecht Nielsen in 1987.They are multilayer
network based on the combinations of the input, output, and clustering layers.
 The application of counter propagation net are data compression, function approximation and pattern
association. The counter propagation network is basically constructed from an instar-outstar model.
 This model is three-layer neural network that performs input-output data mapping, producing an
output vector y in response to input vector x, on the basis of competitive learning.
 The three layers in an instar-outstar model are the input layer, the hidden (competitive) layer and the
output layer.
 There are two stages involved in the training process of a counter propagation net. The input vector
are clustered in the first stage.
 In the second stage of training, the weights from the cluster layer units to the output units are tuned
to obtain the desired response.
 There are two types of counter propagation network:
1. Full counter propagation network
2. Forward-only counter propagation network
1. Full counter propagation network
 Full CPN efficiently represents a large number of vector pair x:y by adaptively constructing a look-
up-table.
Mr. Mohammed Afzal, Asst. Professor in AIML
Mob: +91-8179700193, Email: [email protected]
 The full CPN works best if the inverse function exists and is continuous. The vector x and y
propagate through the network in a counter-flow manner to yield output vector x* and y*.

Architecture of Full Counter propagation Network


 The four major components of the instar-outstar model are the input layer, the instar, the
competitive layer and the outstar.
 For each node in the input layer there is an input value x i. All the instar are grouped into a
layer called the competitive layer. Each of the instar responds maximally to a group of input
vectors in a different region of space. An outstar model is found to have all the nodes in the
output layer and a single node in the competitive layer. The outstar looks like the fan-out of a
node.

Training Algorithm for Full Counter propagation Network:


Step – 0: Set the initial weights and the initial learning rate.
Step – 1: Perform Steps 2-7 if stopping condition is false for phase-I training.
Step – 2: For each of the training input vector pair x: y presented, perform Steps 3-5.
Step – 3: Make the X-input layer activations to vector X. Make the Y-input layer activations
to vector Y.
Step – 4: Find the winning cluster unit. If dot product method is used, find the cluster unit Z j
with target net input: for j = 1 to p.

If Euclidean distance method is used, find the cluster unit Z j whose squared distance
from input vectors is the smallest

If there occurs a tie in case of selection of winner unit, the unit with the smallest
index is the winner. Take the winner unit index as J.
Step – 5: Update the weights over the calculated winner unit Zj.

Mr. Mohammed Afzal, Asst. Professor in AIML


Mob: +91-8179700193, Email: [email protected]
Step – 6: Reduce the learning rates α and β

Step – 7: Test stopping condition for phase-I training.


Step – 8: Perform Steps 9-15 when stopping condition is false for phase-II training.
Step – 9: Perform Steps 10-13 for each training input pair x:y. Here α and β are small constant
values.
Step – 10: Make the X-input layer activations to vector x. Make the Y-input layer activations to
vector y.
Step – 11: Find the winning cluster unit (use formulas from Step 4). Take the winner unit index
as J.
Step – 12: Update the weights entering into unit ZJ.

Step – 13: Update the weights from unit Zj to the output layers.

Step – 14: Reduce the learning rates a and b.

Step – 15: Test stopping condition for phase-II training.

Mr. Mohammed Afzal, Asst. Professor in AIML


Mob: +91-8179700193, Email: [email protected]
2. Forward-only Counter propagation network:
 A simplified version of full CPN is the forward-only CPN. Forward-only CPN uses only the x
vector to form the cluster on the Kohonen units during phase I training. In case of forward-only
CPN, first input vectors are presented to the input units.
 First, the weights between the input layer and cluster layer are trained. Then the weights between
the cluster layer and output layer are trained.
 This is a specific competitive network, with target known.
Architecture of forward-only CPN
o It consists of three layers: input layer, cluster layer and output layer. Its architecture
resembles the back-propagation network, but in CPN there exists interconnections
between the units in the cluster layer.
Training Algorithm for Forward-only Counter propagation network:
Step – 0: Initial the weights and learning rare.
Step – 1: Perform Steps 2-7 if stopping condition is false for phase-I training.
Step – 2: Perform Steps 3-5 for each of training input X.
Step – 3: Set the X-input layer activations to vector X.
Step – 4: Compute the winning cluster unit (J). If dot product method is used, find the cluster
unit zj with the largest net input.

If Euclidean distance method is used, find the cluster unit Z j whose squared distance
from input patterns is the smallest

If there exists a tie in the selection of winner unit, the unit with the smallest index is
chosen as the winner.
Step – 5: Perform weight update for unit Zj. For i= 1 to n,

Step – 6: Reduce the learning rates α

Mr. Mohammed Afzal, Asst. Professor in AIML


Mob: +91-8179700193, Email: [email protected]
Step – 7: Test stopping condition for phase-I training.
Step – 8: Perform Steps 9-15 when stopping condition is false for phase-II training.
Step – 9: Perform Steps 10-13 for each training input Pair x:y.
Step – 10: Set X-input layer activations to vector X. Sec Y-outpur layer activations to vector Y.
Step – 11: Find the winning cluster unit (use formulas from Step 4). Take the winner unit index
as J.
Step – 12: Update the weights entering into unit ZJ,

Step – 13: Update the weights from unit Zj to the output layers.

Step – 14: Reduce the learning rates β.

Step – 15: Test stopping condition for phase-II training.

ADAPTIVE RESONANCE THEORY NETWORKS


This network was developed by Stephen Grossberg and Gail Carpenter in 1987. It is based on competition
and uses unsupervised learning model. Adaptive Resonance Theory ART
networks, as the name suggests, is always open to new learning adaptive without losing the old patterns
resonance. Basically, ART network is a vector classifier which accepts an input vector and classifies it into
one of the categories depending upon which of the stored pattern it resembles the most.
The basic ART uses unsupervised learning technique. The term “adaptive” and “resonance” used in this
suggests that they are open to new learning (i.e., adaptive) without discarding the previous or the old
information (i.e. resonance). The ART networks are known to solve the stability-plasticity dilemma i.e.,
stability refers to their nature of memorizing the learning and plasticity refers to the fact that they are
flexible to gain new information. Due to this the nature of ART they are always able to learn new input
patterns without forgetting the past. ART networks implement a clustering algorithm. Input is presented to
the network and the algorithm checks whether it fits into one of the already stored clusters. If it fits then the
input is added to the cluster that matches the most else a new cluster is formed.
Types of Adaptive Resonance Theory (ART) Carpenter and Grossberg developed different ART
architectures as a result of 20 years of research. The ARTs can be classified as follows:

Mr. Mohammed Afzal, Asst. Professor in AIML


Mob: +91-8179700193, Email: [email protected]
 ART1 – It is the simplest and the basic ART architecture. It is capable of clustering binary
input values.
 ART2 – It is extension of ART1 that is capable of clustering continuous-valued input data.
 Fuzzy ART – It is the augmentation of fuzzy logic and ART.
 ARTMAP – It is a supervised form of ART learning where one ART learns based on the
previous ART module. It is also known as predictive ART.
 FARTMAP – This is a supervised ART architecture with Fuzzy logic included.

Operating Principal

The main operation of ART classification can be divided into the following phases −

 Recognition phase − The input vector is compared with the classification presented at every node in
the output layer. The output of the neuron becomes “1” if it best matches with the classification applied,
otherwise it becomes “0”.
 Comparison phase − In this phase, a comparison of the input vector to the comparison layer vector is
done. The condition for reset is that the degree of similarity would be less than vigilance parameter.
 Search phase − In this phase, the network will search for reset as well as the match done in the above
phases. Hence, if there would be no reset and the match is quite good, then the classification is over.
Otherwise, the process would be repeated and the other stored pattern must be sent to find the correct
match.
Basic of Adaptive Resonance Theory (ART) Architecture
The adaptive resonant theory is a type of neural network that is self-organizing and competitive. It can be
of both types, the unsupervised ones (ART1, ART2, ART3, etc) or the supervised ones (ARTMAP).
Generally, the supervised algorithms are named with the suffix “MAP”. But the basic ART model is
unsupervised in nature and consists of:

It consists of the following two units −


Computational Unit − It is made up of the following −
o Input unit (F1 layer) − It further has the following two portions –
 F1 a layer Input portion − In ART1, there would be no processing in this portion rather than
having the input vectors only. It is connected to F 1 b layer interface portion.

Mr. Mohammed Afzal, Asst. Professor in AIML


Mob: +91-8179700193, Email: [email protected]
 F1 b layer Interface portion − This portion combines the signal from the input portion with
that of F2 layer. F1 b layer is connected to F2 layer through bottom-up weights bij and F2 layer
is connected to F1 b layer through top-down weights tji.

o Cluster Unit (F2 layer) − This is a competitive layer. The unit having the largest net input is
selected to learn the input pattern. The activation of all other cluster units are set to 0.
o Reset Mechanism − The work of this mechanism is based upon the similarity between the top-down
weight and the input vector. Now, if the degree of this similarity is less than the vigilance parameter,
then the cluster is not allowed to learn the pattern and a rest would happen.
The F1 layer accepts the inputs and performs some processing and transfers it to the F2 layer that best
matches with the classification factor. There exist two sets of weighted interconnections for controlling
the degree of similarity between the units in the F1 and the F2 layer. The F2 layer is a competitive layer.
The cluster unit with the large net input becomes the candidate to learn the input pattern first and the rest
F2 units are ignored. The reset unit makes the decision whether or not the cluster unit is allowed to learn
the input pattern depending on how similar its top-down weight vector is to the input vector and to the
decision. This is called the vigilance test. Thus, we can say that the vigilance parameter helps to
incorporate new memories or new information. Higher vigilance produces more detailed memories, lower
vigilance produces more general memories.
Supplement Unit − Actually the issue with Reset mechanism is that the layer F2 must have to be inhibited
under certain conditions and must also be available when some learning happens. That is why two
supplemental units namely, G1 and G2 is added along with reset unit, R. They are called gain control units.
These units receive and send signals to the other units present in the network. ‘+’ indicates an excitatory
signal, while ‘−’ indicates an inhibitory signal.

Mr. Mohammed Afzal, Asst. Professor in AIML


Mob: +91-8179700193, Email: [email protected]
Parameters Used

Following parameters are used −

 n − Number of components in the input vector


 m − Maximum number of clusters that can be formed
 bij − Weight from F1 b to F2 layer, i.e., bottom-up weights
 tji − Weight from F2 to F1 b layer, i.e., top-down weights
 ρ − Vigilance parameter
 ||x|| − Norm of vector x

Training Algorithm for Forward-only Counter propagation network:


Step – 1: Initialize the learning rate, the vigilance parameter, and the weights as follows –

Step – 2: Continue step 3-9, when the stopping condition is not true.
Step – 3: Continue step 4-6 for every training input.
Step – 4: Set activations of all F1 a and F1 units as follows

Step – 5: Input signal from F1 a to F1 b layer must be sent like

Step – 6: For every inhibited F2 node

Mr. Mohammed Afzal, Asst. Professor in AIML


Mob: +91-8179700193, Email: [email protected]
Step – 7: Perform step 8-10, when the reset is true.
Step – 8: Find J for yJ ≥ yj for all nodes j
Step – 9: Again, calculate the activation on F1 b as follows

Step – 10: Now, after calculating the norm of vector x and vector s, we need to check the reset
condition as follows –

Step – 11: Weight updating for node J can be done as follows –

Step – 12: The stopping condition for algorithm must be checked and it may be as follows –

 Do not have any change in weight.


 Reset is not performed for units.
 Maximum number of epochs reached.

Advantage of Adaptive Resonance Theory (ART)


 It exhibits stability and is not disturbed by a wide variety of inputs provided to its network.
 It can be integrated and used with various other techniques to give more good results.
 It can be used for various fields such as mobile robot control, face recognition, land cover
classification, target recognition, medical diagnosis, signature verification, clustering web users,
etc.
 It has got advantages over competitive learning (like bpnn etc). The competitive learning lacks the
capability to add new clusters when deemed necessary.
 It does not guarantee stability in forming clusters.
Limitations of Adaptive Resonance Theory
 Some ART networks are inconsistent (like the Fuzzy ART and ART1) as they depend upon the
order of training data, or upon the learning rate.

Mr. Mohammed Afzal, Asst. Professor in AIML


Mob: +91-8179700193, Email: [email protected]

You might also like