0% found this document useful (0 votes)
113 views75 pages

Unit - II: Recurrent Neural Network

This document discusses recurrent neural networks and associative memory networks. It provides details on auto associative memory networks and hetero associative memory networks. It also discusses Hopfield neural networks, Boltzmann machines, and counter propagation networks. The key aspects covered are the architecture, training algorithms, applications, and differences between these recurrent neural network models.

Uploaded by

Lovely Liana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views75 pages

Unit - II: Recurrent Neural Network

This document discusses recurrent neural networks and associative memory networks. It provides details on auto associative memory networks and hetero associative memory networks. It also discusses Hopfield neural networks, Boltzmann machines, and counter propagation networks. The key aspects covered are the architecture, training algorithms, applications, and differences between these recurrent neural network models.

Uploaded by

Lovely Liana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 75

Unit - II

RECURRENT NEURAL NETWORK


Associative Memory
Networks
Associative Memory Networks
• An associative memory network can store a set of patterns as memories.
• When the associative memory is being presented with a key pattern, it responds
by producing one of the stored patterns, which closely resembles or relates to the
key pattern.
Each association is an input(s)- output(t) vector pair, “s:t”.
• If each vector ‘t’ is the same as the same as the vector ‘s’ with which it is
associated, then it is Auto Associative Memory.
• If the vector ‘t’ are different from the vector ‘s’ then the network is Hetero
Associative Memory.
Associative Memory Networks
• Both these nets are single layer nets in which the weights are determined in
a manner that the net stores a set of pattern association.

• Training Algorithm: Hebb, Outer product rule


Auto Associative Memory
• In auto associative memory neural network, the input and target are the
same. The Determination of weight of the association net is called storing
of vectors.

• The network performance is based on its ability to reproduce a stored


pattern from a noisy input.

• In auto associative network, the weights on the diagonal can set to zero.
Auto Associative Memory
• Retrieves previous stored pattern that most closely resembles current
pattern.

A MEMORY
A
Testing Algorithm
Hetero Associative Memory
• In case of a hetero-associative neural net, the training input and the
target output vectors are different.

• The weights are determined in a way that the net can store a set of pattern
associations.
Hetero Associative Memory
• Retrieved pattern is generally different from input pattern (not only in
content but also in type and format).

Tehran MEMORY City


Testing Algorithm

• Step 0: Initialize the weights from the training algorithm.

• Step 1: Perform Steps 2-4 for each input vector presented.

• Step 2: Set the activation for input layer units equal to that of the current input vector given, x i
Hopfield neural network
Hopfield neural network
• Hopfield neural network was invented by Dr. John J. Hopfield in 1982.

• It consists of a single layer which contains one or more fully connected


recurrent neurons.

• The Hopfield network is commonly used for auto-association and


optimization tasks.
Discrete Hopfield Network
• A Hopfield network which operates in a discrete line fashion or in other
words, it can be said the input and output patterns are discrete vector,
which can be either binary 0,1 or bipolar +1,−1 in nature.

• The network has symmetrical weights with no self-connections i.e., wij =


wji and wii = 0.
Discrete Hopfield Network - Architecture
• This model consists of neurons with one inverting and one non-inverting output.

• The output of each neuron should be the input of other neurons but not the input
of self.

• Weight/connection strength is represented by wij.

• Connections can be excitatory as well as inhibitory. It would be excitatory, if the


output of the neuron is same as the input, otherwise inhibitory.

• Weights should be symmetrical, i.e. wij = wji


Discrete Hopfield Network - Architecture
Discrete Hopfield Network
Analysis of Energy Function and Storage Capacity on Discrete Hopfield Network

An energy function generally is defined as a function that is bounded and is a non


increasing function of the stare of fie system. The energy function is given by

If the network is stable, energy function decreases whenever the state of any node
changes.
Continuous Hopfield
Network
Continuous Hopfield Network
A discrete Hopfield net can be modified to a continuous model, in which time is assumed to be a

continuous variable, and can be used for associative memory problems or optimization problems like

traveling salesman problem. The nodes of this network have a continuous, graded output rather than a

two-state binary output. Thus, the energy of the network decreases continuously with time. The

continuous Hopfield networks can be realized as an electronic circuit, which uses non-linear amplifiers

and resistors. This helps building the Hopf1eld network using analog VLSI technology.
Hardware Model of Continuous Hopfield
Network
Hardware Model of Continuous Hopfield
Network
Hardware Model of Continuous Hopfield
Network
Hopfield Network - Limitations

• A major disadvantage of the Hopfield network is that it can rest in a


local minimum state instead of a global minimum energy state,
thus associating a new input pattern with a spurious state.

• Memory capacity of Hopfield is severely limited. Catastrophic


forgotten may occur if try to memorize more patterns.
Hopfield Network - Applications
• Pattern retrieval

• Pattern restoration

• Pattern Completion

• Pattern Generalization

• Pattern association

• solving optimization problems

This network acts like a CAM (content addressable memory); it is capable of recalling a pattern from
the stored memory even if it's noisy or partial form is given to the model.
Boltzmann Machine
Boltzmann Machine
• When the simulated annealing process is applied to the discrete Hopfield network, it
become a Boltzmann machine. On applying the Boltzmann machine to a constrained
optimization problem, the weights represent the constraint of the problem and the
quantity to be optimized.

• Deterministic local search dynamics of Hopfield is replaced by a randomized local


search dynamics.

• Boltzmann machine uses stochastic learning algorithm.

• Connections are bidirectional.


Boltzmann Machine
• The Boltzmann machine consists of  a set of units (Xi and Xj) and a set of bi-
directional connections between pairs of units. This machine can be used as an
associative memory. The weighs of a Boltzmann machine is fixed; hence there is no
specific training algorithm for updation of weights. With the Boltzmann machine
weights remaining fixed, the net makes its transition toward maximum of the
Consenses Factor (CF).
Boltzmann Machine
• Boltzmann machines are typically used to solve different computational problems,
where the weights present on the connections can be fixed and are used to represent
the cost function of the optimization problem such as search problem.

The units within each row and column are fully


interconnected. The weights on the interconnections
are given by -p where (p > 0). Also, there exists a self-
connection for each unit, with weight b > 0.
It is assumed that units are arranged in a two
dimensional array. There are n2 units.
Boltzmann Machine – Training Algorithm
Boltzmann Machine – Training Algorithm
Boltzmann Machine vs Hopfield
Similarities:
1.Neuron states are bipolar.
2.Weights are symmetric wij = wji.
3.Neurons are selected at random for asynchronous update.
Difference:
1.Hidden layer of neurons present.
2.Neuron updation – Boltzmann machine – Stochastic, Hopfield - Deterministic
3.No separate learning phase for hopfield.
Counter propagation
network
Counter propagation network
• CPN (Counter propagation network) are multilayer network based  on the combinations of the input,
output, and clustering layers. Introduced by Hecht Nelsen in 1987.

• The application of counter propagation net are data compression, function approximation and pattern
association.

• The counter propagation network is basically constructed from an instar-outstar model.

• Unsupervised – Instar, Supervised - Outstar

• This model is three layer neural network that performs input-output data mapping, producing an output
vector y in response to input vector x, on the basis of competitive learning.
Counter propagation network
• The three layer in an instar-outstar model are the input layer, the hidden(competitive) layer and the output layer.

• There are two stages involved in the training process of a counter propagation net. The input vector are clustered

in the first stage. In the second stage of training, the weights from the cluster layer units to the output units are

tuned to obtain the desired response. There are two types of counter propagation net:

1. Full counter propagation network

2. Forward-only counter propagation network


Full Counter propagation network
• Full CPN efficiently represents a large number of vector pair x:y by adaptively 

constructing a look-up-table.

• The full CPN works best if the inverse function exists and is continuous.

• The vector x and y propagate through the network in a counter flow manner to yield

output vector x* and y*.


Full Counter propagation network - Architecture
The four major components of the instar-outstar model are
•input layer
•instar
•competitive layer
•outstar.

•For each node in the input layer there is an input value xi.

•All the instar are grouped into a layer called the competitive layer. Each of the instar responds maximally to a group of
input vectors in a different region of space.

•An outstar model is found to have all the nodes in the output layer and a single node in the competitive layer. The
outstar looks like the fan-out of a node.
Full CPN – Training Algorithm
• Step 0: Set the weights and the initial learning rate.

• Step 1: Perform step 2 to 7 if stopping condition is false for phase I training.

• Step 2: Perform 3 to 5 for each training input X.

• Step 3: Make the X-input layer activations to vector X.

Make the Y-input layer activation to vector Y.

• Step 4: Find the winning cluster unit. zinj=∑xi.vij + ∑yk.wkj for j=1 to p,

If Euclidean distance method is used, find the cluster unit z j whose squared distance from input vectors is the smallest: Dj=∑(xi-vij)^2 + ∑(yk-wkj)^2

Winner take all method: If there occurs a tie in case of selection of winner unit, the unit with the smallest index is the winner. Take the winner unit index as J.
Full CPN – Training Algorithm
• Step 5: Update the weights over the calculated winner unit zj.

For i=1 to n,  vij(new)=vij(old) + α[xi-vij(old)]

For k =1 to m,  wkj(new)=wkj(old) + β[yk-wkJ(old)]

• Step 6: Reduce the learning rates. α (t+1)=0.5α(t);  β(t+1)=0.5β(t)

• Step 7: Test stopping condition for phase I training.

• Step 8: Perform step 9 to 15 when stopping condition is false for phase II training.

• Step 9: Perform step 10 to 13 for each training input vector pair x:y.  Here α and β are small constant values.

• Step 10: Make the X-input layer activations to vector x. Make the Y-input layer activations to vector y.
Full CPN – Training Algorithm
• Step 11:  Find the winning cluster unit (Using the formula from step 4). Take the winner unit index as J.

• Step 12: Update the weights entering into unit zj.

For i=1 to n,  vij(new)=vij(old) + α[xi-vij(old)]

For k =1 to m,  wkj(new)=wkj(old) + β[yk-wkj(old)]

• Step 13: Update the weights from unit zj to the output layers.

For i=1 to n,  tji(new)=tji(old) + b[xi-tji(old)]

For k =1 to m,  ujk(new)=ujk(old) + a[yk-ujk(old)]

• Step 14: Reduce the learning rates a and b. a(t+1)=0.5a(t);  b(t+1)=0.5b(t)

• Step 15: Test stopping condition for phase II training.


Full CPN – Testing Algorithm
• Step 0: Initialize the weights (from training algorithm).

• Step 1: Perform Steps 2-4 for each input pair X: Y.

• Step 2: Set X-input layer activations to vector X. Set Y-in put layer activations to vector
Y.

• Step 3: Find the duster unit ZJ that is closest to the input pair.

• Step 4: Calculate approximations m x and y


Full CPN – Advantage
• Simple

• Good statistical model

• Rapid execution. Save large computational time.


Forward only Counter
propagation Net
Forward only Counter propagation Net
• A simplified version of full CPN is the forward-only CPN.

• Forward-only CPN uses only the x vector to form the cluster on the
Kohonen units during phase I training.

• In case of forward-only CPN, first input vectors are presented to the


input units and the weights between the input layer and cluster layer are
trained. Then the weights between the cluster layer and output layer are
trained. This is a specific competitive network, with target known.
Forward only Counter propagation Net
• It consists of three layers: input layer, cluster layer and output layer.

• Its architecture resembles the back-propagation network, but in CPN


there exists interconnections between the units in the cluster layer.
Forward only Counter propagation Net
Forward only CPN – Training Algorithm
• Step 0: Initialize the weights and learning rates.

• Step 1: Perform step 2 to 7 when stopping condition for phase I training is false.

• Step 2: Perform step 3 to 5 for each of training  input X.

• Step 3: Set the X-input layer activation to vector X.

• Step 4: Compute the winning cluster unit J. If dot product method is used, find the cluster unit z J

with the largest net input: zinj=∑xi.vij

If Euclidean distance is used, find the cluster unit z J square of whose distance from the input pattern is

smallest: Dj=∑(xi-vij)^2

If there exists a tie in the selection of winner unit, the unit with the smallest index is chosen as the
winner.
Forward only CPN – Training Algorithm
• Step 5: Perform weight updation for unit zJ. For i=1 to n,
viJ(new)=viJ(old) + α[xi-viJ(old)]
• Step 6: Reduce learning rate α: α (t+1)=0.5α(t)
• Step 7: Test the stopping condition for phase I training.
• Step 8: Perform step 9 to 1 when stopping condition for phase II training is false.
• Step 9: Perform step 10 to 13 for each training input pair x:y.
• Step 10: Set X-input layer activations to vector X. Set Y-output layer activation to vector Y.
• Step 11: Find the winning cluster unit J.
• Step 12: Update the weights into unit zJ. For i=1 to n, viJ(new)=viJ(old) + α[xi-viJ(old)]
• Step 13: Update the weights from unit zJ to the output units.
For k=1 to m,   wJk(new)=wJk(old) + β[yk-wJk(old)]
• Step 14: Reduce learning rate β, β(t+1)=0.5β(t)
• Step 15: Test the stopping condition for phase II training.
ART (ADAPTIVE RESONANCE THEORY)
NETWORK
ART (ADAPTIVE RESONANCE THEORY)
• The adaptive resonance theory (ART) network, is an unsupervised learning, developed by Steven Grossberg
and Gail Carpenter in 1987. The adaptive resonance was developed to solve the problem of instability
occurring in feed-forward systems.

Fundamental Architecture: To build an adaptive resonance theory or ART network three groups of neurons are
used. These include

1.Input processing neurons(F1 layer): The input processing neuron consists of two portions: Input portion and
interface portion. The input portion perform some processing based on inputs it receives. The interface
portion of the F1 layer combines the input from input portion of F1 and F2 layers for comparing the similarity
of the input signal with the weight vector for the cluster unit that has been selected as a unit for learning.

2.Clustering unit(F2 layer).

3.Control mechanism(controls degree of similarity of pattern placed on the same cluster).


ART (ADAPTIVE RESONANCE THEORY)
Classification:

ART is of two types-

1.ART1

2.ART2

ART1 is designed for clustering binary vectors and ART2 is designed to accept
continuous-valued vectors.
Adaptive Resonance Theory
Adaptive Resonance Theory
• Adaptive: Learning new

• Resonance : Without losing old information

• Unsupervised learning

• ART is known to solve stability-plasticity dilemma

• Stability: Nature of memorizing the learning

• Plasticity: Flexible to gain new information

• Feedback is present (data in the form of processing element output reflect back and ahead
among layers)

• ART is a clustering algorithm.


Adaptive Resonance Theory
Types:

•ART 1 : Simple, basic, capable of clustering binary input values

•ART 2: Extension of ART 1, capable of clustering continuous valued input


Adaptive Resonance Theory1
Adaptive Resonance Theory1
• ART1 network is designed for binary input vectors.

• The ART1 net consists

• input unit(F1 layer) – Comparison field – input are processed

• output unit(F2 layer) – Recognition field – consist of clustering unit

• Reset module – act as controlling mechanism. It controls the degree of similarity of patterns placed on the same cluster unit. There exit two sets
of weighted interconnection path between F1 and F2 layers.

• ART1 network runs autonomously that means it does not require any external control signals and can run stably with infinite patterns of input
data.

• ART1 network is trained using fast learning method. This network performs well with perfect binary input patterns, but it is sensitive to noise in
the input data and should be handled carefully.
Adaptive Resonance Theory1
ART1 Architecture:

It includes two units-

1. Computational units

2. Supplemental units

Computational unit again consist of following-

i). Input units(F1 unit-both input portion and interface portion).

ii). Cluster units(F2 unit-output unit).

iii). Reset control unit(controls degree of similarity of patterns placed on the same cluster).
Adaptive Resonance Theory1 -Architecture

Computational units
Adaptive Resonance Theory1
ART1 Architecture:

F1 layer accept input and perform processing then transfer the best match with classification factor to F2 layer . F1 layer and
F2 layer contains two set of weighted interconnections. In Competitive layer the net input become candidate to learn the input
pattern and rest are ignored. The reset unit make decision whether or not the cluster unit us allowed to learn depending on top-
down weight vector.

Vigilance test will be conducted for taking decision. If degree of similarity is less than vigilance parameter then cluster unit is
not allowed to learn.

Note: Higher vigilance – More detailed memories

Lower vigilance – General information


Adaptive Resonance Theory1
ART1 Architecture:

•Supplemental units called as gain control units contains G1, G2 along with reset.

•Excitatory weighted signals: + weights

•Inhitatory weighted signals: - weights

•When any designated layer is ON, F1(b) layer receives input from G1 or F1(a)or F2 and

•F2 layer receives input from G2 or F1(b)or R

•F1 and F2 receives signals from 3 ways and it is called as two third rule.
Adaptive Resonance Theory1 -Architecture

G2

Supplemental units
Adaptive Resonance Theory1
ART1 Architecture:

The F1(b) unit should send a signal whenever it receives input from F1(a) and no F2 node is active. After
a F2 node has been chosen in competition, it is necessary that only F1(b) units whose input signal and top-
down signal match remain constant. This is performed by the two gains G1 and G2. Whenever F2 unit is
on, G1 unit is inhibited. When no F2 unit is on, each F1 interface unit receives a signal from G1 unit. In
the same way, G2 unit controls the firing of F2 units, obeying fie two thirds rule. Vigilance matching is
controlled by the reset control unit R. R also receives inhibitory signals from the F1 interface units that are
on. If sufficient number of interface units is on, then unit "F" may be prevented from firing. ·
ART 1 – Training Algorithm
• Step 0: Initialize the parameters: α>1 and 0<ρ<=1
Initialize the weights: 0<bij(0)<α/(α-1+n) and tij(0)=1
• Step 1: Perform steps 2 to 13 when stopping condition is false.
• Step 2: Perform steps 3 to 12 for each of the training input.
• Step 3: Set activation of all F2 units to zero. Set the activation of F1(a) units to input vectors.
• Step 4: Calculate the norm of s: ||s||= ∑si
• Step 5: Send input signal from F1(a) layer to F1(b) layer: xi=si
• Step 6: For each F2 node that is not inhibited, the following rule should hold: If yj not=-1, then
yj=  ∑bij.xi
• Step 7: Perform step 8 to 11 when reset is true.
• Step 8: Find J for yJ>=yj for all nodes. If yJ = -1, then all the nodes are inhibited and this pattern cannot be clustered.
• Step 9: Recalculate activation X of F1(b): xi = si.tJi
• Step 10: Calculate the norm of vector x: ||x||=∑xi
ART 1 – Training Algorithm
• Step 11: Test for reset condition.
If ||x||/||s||<ρ, then inhibit node J, yJ = -1.go to step 7 again.
Else if ||x||/||s||>=ρ, then proceed to the next step.
• Step 12: Perform weight updation for node J:
biJ(new)=αxi / α-1+||x||
tJi(new)=xi
• Step 13: Test for stopping condition. The following may be stopping conditions:
i).no change in weights
ii).no reset of units
iii).maximum number of epochs reached.
Adaptive Resonance Theory2
Adaptive Resonance Theory2
• ART2 is for continuous-valued input vectors.

• In ART2 network complexity is higher than ART1 network because much processing s needed in F1

layer.

• ART2 network was designed to self-organize recognition categories for analog as well as binary input

sequences. The continuous-valued inputs presented to the ART2 network may be of two forms-the first

form is a “noisy binary” signal form and the second form of data is “truly continuous”.
Adaptive Resonance Theory2
• The major difference between ART1 and ART2 network is the input layer.

• A three-layer feedback system in the input layer of Art2 network is required :

bottom layer: where the input patterns are read in

top layer: where inputs coming from the output layer are read in

middle layer: where the top and bottom patterns are combined together to form a matched pattern which is then

fed back to the top and bottom input layers.


Adaptive Resonance Theory2
ART2 Architecture:

•In ART2 architecture F1 layer consist of six types of units-W, X, U, V, P, Q- and there are n unit of each type. The

supplemental unit “N” between units W and X receives signals from all “W” units, computes the norm of vector w and

sends this signal to each of the X units. Similarly there exit supplemental units between U and V, and P and Q, performing

same operation as done between W an X. The connection between Pi of the F1 layer and Yj of the F2 layer show the

weighted interconnections, which multiplies the signal transmitted over those paths.

•The operation performed in F2 layer are same for both ART1 and ART2.
Adaptive Resonance Theory2
Adaptive Resonance Theory2
Adaptive Resonance Theory2
Adaptive Resonance Theory2 - Algorithm
• Step 0: Initialize the parameters :a, b, c, d, e, α, ρ, θ. Also specify the number of epochs of training(nep) and number of
leaning iterations(nit).
• Step 1: Perform step 2 to 12 (nep) times.
• Step 2: Perform steps 3 to 11 for each input vector s.
• Step 3: Update F1 unit activations: ui=0  ; wi=si; Pi=0; qi=0; vi=f(xi); xi=si / e+||s||
Update F1 unit activation again: ui=vi / e+||v||; wi=si+a.ui; Pi=ui; xi=wi / e+||w||;
qi=pi / e+ ||p||; vi= f(xi) + b.f(qi)
In  ART2  networks, norms are calculated as the square root of the sum of the squares of the respective values.
• Step 4: Calculate signals to F2 units: yj=∑bij.Pi
• Step 5: Perform steps 6 and 7 when reset is true.
• Step 6: Find F2 unit Yj with largest signal( J is defined such that yj>=yj, j=1 to m).
Adaptive Resonance Theory2 - Algorithm
• Step 7: Check for reset:
 ui=vi / e + ||v||;  Pi=ui + d.tJi;  ri=ui + c.Pi / e+||u||+c||p||
If ||r|| < (ρ-e), then yJ =-1(inhibit J).Reset is true; perform step 5.
If ||r|| >=  (ρ-e), then wi=si+a.ui;  xi=wi / e+||w||;  qi=pi / e+||p||;  vi=f(xi)+b.f(qi)
Reset is false. Proceed to step 8.
• Step 8: Perform steps 9 to 11 for specified number of learning iterations.
• Step 9: Update the weights for winning unit J: tJi=α.d.ui + {[1+α.d(d-1)]}tJi
biJ= α.d.ui + {[1+α.d(d-1)]}biJ
• Step 10: Update F1 activations: ui = vi / e+ ||v||;  wi=si+a.ui;
Pi=ui+d.tJi;  xi=wi / e+||w||; qi=Pi / e+||p||;  vi=f(xi)+b.f(qi)
• Step 11: Check for the stopping condition of weight updation.
• Step 12: Check for the stopping condition for number of epochs.

You might also like