Unit_2
Unit_2
Unit_2
A single-layer feedforward neural network was introduced in the late 1950s by Frank
Rosenblatt. It was Perceptron is one of the first and most straightforward models of artificial
neural networks. Despite being a straightforward model, the perceptron has been proven to be
successful in solving specific categorization issues.
What is Perceptron?
Perceptron is one of the simplest Artificial neural network architectures. It was introduced by
Frank Rosenblatt in 1957s. It is the simplest type of feedforward neural network, consisting of a
single layer of input nodes that are fully connected to a layer of output nodes. It can learn the
linearly separable patterns. it uses slightly different types of artificial neurons known as
threshold logic units (TLU
Types of Perceptron
Single-Layer Perceptron: This type of perceptron is limited to learning linearly
separable patterns. effective for tasks where the data can be divided into distinct
categories through a straight line.
Multilayer Perceptron: Multilayer perceptrons possess enhanced processing
capabilities as they consist of two or more layers, adept at handling more complex
patterns and relationships within the data.
Basic Components of Perceptron
A perceptron, the basic unit of a neural network, comprises essential components that
collaborate in information processing.
Input Features: The perceptron takes multiple input features, each input feature
represents a characteristic or attribute of the input data.
Weights: Each input feature is associated with a weight, determining the significance of
each input feature in influencing the perceptron’s output. During training, these weights
are adjusted to learn the optimal values.
Summation Function: The perceptron calculates the weighted sum of its inputs using
the summation function. The summation function combines the inputs with their
respective weights to produce a weighted sum.
Activation Function: The weighted sum is then passed through an activation function.
Perceptron uses Heaviside step function. which take the summed values as input and
compare with the threshold and provide the output as 0 or 1.
Output: The final output of the perceptron, is determined by the activation function’s
result. For example, in binary classification problems, the output might represent a
predicted class (0 or 1).
Bias: A bias term is often included in the perceptron model. The bias allows the model to
make adjustments that are independent of the input. It is an additional parameter that is
learned during training.
Learning Algorithm (Weight Update Rule): During training, the perceptron learns by
adjusting its weights and bias based on a learning algorithm. A common approach is the
perceptron learning algorithm, which updates weights based on the difference between
the predicted output and the true output.
These components work together to enable a perceptron to learn and make predictions. While a
single perceptron can perform binary classification, more complex tasks require the use of
multiple perceptrons organized into layers, forming a neural network.
How does Perceptron work?
A weight is assigned to each input node of a perceptron, indicating the significance of that input
to the output. The perceptron’s output is a weighted sum of the inputs that have been run
through an activation function to decide whether or not the perceptron will fire. it computes the
weighted sum of its inputs as:
z = w1x1 + w1x2 + ... + wnxn = XTW
The step function compares this weighted sum to the threshold, which outputs 1 if the input is
larger than a threshold value and 0 otherwise, is the activation function that perceptrons utilize
the most frequently. The most common step function used in perceptron is the Heaviside step
function:
A perceptron has a single layer of threshold logic units with each TLU onnected to all inputs.
When all the neurons in a layer are connected to every neuron of the previous layer, it is known
as a fully connected layer or dense layer.
Q Develop a Perceptron for AND and OR function using bipolar inputs and targets.
Backpropagation
Backpropagation is a powerful algorithm in deep learning, primarily used to train artificial
neural networks, particularly feed-forward networks. It works iteratively, minimizing the cost
function by adjusting weights and biases.
In each epoch, the model adapts these parameters, reducing loss by following the error gradient.
Backpropagation often utilizes optimization algorithms like gradient descent or stochastic
gradient descent. The algorithm computes the gradient using the chain rule from calculus,
allowing it to effectively navigate complex layers in the neural network to minimize the cost
function.
fig(a) A simple illustration of how the backpropagation works by adjustments of weights
Importance of Backpropagation
Backpropagation plays a critical role in how neural networks improve over time. Here's why:
1. Efficient Weight Update: It computes the gradient of the loss function with respect to
each weight using the chain rule, making it possible to update weights efficiently.
2. Scalability: The backpropagation algorithm scales well to networks with multiple layers
and complex architectures, making deep learning feasible.
3. Automated Learning: With backpropagation, the learning process becomes automated,
and the model can adjust itself to optimize its performance.
Working of Backpropagation Algorithm
The Backpropagation algorithm involves two main steps: the Forward Pass and the Backward
Pass.
Forward Pass Working
In the forward pass, the input data is fed into the input layer. These inputs, combined with their
respective weights, are passed to hidden layers.
For example, in a network with two hidden layers (h1 and h2 as shown in Fig. (a)), the output
from h1 serves as the input to h2. Before applying an activation function, a bias is added to the
weighted inputs.
Each hidden layer applies an activation function like ReLU (Rectified Linear Unit), which
returns the input if it’s positive and zero otherwise. This adds non-linearity, allowing the model
to learn complex relationships in the data. Finally, the outputs from the last hidden layer are
passed to the output layer, where an activation function, such as softmax, converts the weighted
outputs into probabilities for classification.
Associative Memory
These kinds of neural networks work on the basis of pattern association, which means they can
store different patterns and at the time of giving an output they can produce one of the stored
patterns by matching them with the given input pattern. These types of memories are also
called Content-Addressable Memory CAMCAM. Associative memory makes a parallel
search with the stored patterns as data files.
This is a single layer neural network in which the input training vector and the output target
vectors are the same. The weights are determined so that the network stores a set of patterns.
Architecture
As shown in the following figure, the architecture of Auto Associative memory network
has ‘n’ number of input training vectors and similar ‘n’ number of output target vectors.
Training Algorithm
For training, this network is using the Hebb or Delta learning rule.
xi=si(i=1ton)xi=si(i=1ton)
yj=sj(j=1to n)yj=sj(j=1ton)
wij(new)=wij(old)+xiyjwij(new)=wij(old)+xiyj
Testing Algorithm
Step 1 − Set the weights obtained during training for Hebb’s rule.
Step 3 − Set the activation of the input units equal to that of the input vector.
yinj=∑i=1nxiwijyinj=∑i=1nxiwij
yj=f(yinj)={+1−1ifyinj>0ifyinj⩽0yj=f(yinj)={+1ifyinj>0−1ifyinj⩽0
Similar to Auto Associative Memory network, this is also a single layer neural network.
However, in this network the input training vector and the output target vectors are not the
same. The weights are determined so that the network stores a set of patterns. Hetero associative
network is static in nature, hence, there would be no non-linear and delay operations.
Architecture
As shown in the following figure, the architecture of Hetero Associative Memory network
has ‘n’ number of input training vectors and ‘m’ number of output target vectors.
Training Algorithm
For training, this network is using the Hebb or Delta learning rule.
Step 1 − Initialize all the weights to zero as wij = 0 i=1ton, j=1to mi=1to n, j=1to m
xi=si(i=1to n)xi=si(i=1ton)
yj=sj(j=1to m)yj=sj(j=1tom)
wij(new)=wij(old)+xiyjwij(new)=wij(old)+xiyj
Testing Algorithm
Step 1 − Set the weights obtained during training for Hebb’s rule.
Step 3 − Set the activation of the input units equal to that of the input vector.
Step 4 − Calculate the net input to each output unit j = 1 to m;
yinj=∑i=1nxiwijyinj=∑i=1nxiwij
yj=f(yinj)=⎧⎩⎨⎪⎪+10−1ifyinj>0ifyinj=0ifyinj<0
Hopfield network
Hopfield neural network was invented by Dr. John J. Hopfield in 1982. It consists of a single
layer which contains one or more fully connected recurrent neurons. The Hopfield network is
commonly used for auto-association and optimization tasks.
A Hopfield network which operates in a discrete line fashion or in other words, it can be said
the input and output patterns are discrete vector, which can be either binary 0,10,1 or
bipolar +1,−1+1,−1 in nature. The network has symmetrical weights with no self-connections
i.e., wij = wji and wii = 0.
Architecture
Following are some important points to keep in mind about discrete Hopfield network −
This model consists of neurons with one inverting and one non-inverting output.
The output of each neuron should be the input of other neurons but not the input of self.
Weight/connection strength is represented by wij.
Connections can be excitatory as well as inhibitory. It would be excitatory, if the output
of the neuron is same as the input, otherwise inhibitory.
Weights should be symmetrical, i.e. wij = wji
The output from Y1 going to Y2, Yi and Yn have the weights w12, w1i and w1n respectively.
Similarly, other arcs have the weights on them.
Training Algorithm
During training of discrete Hopfield network, weights will be updated. As we know that we can
have the binary input vectors as well as bipolar input vectors. Hence, in both the cases, weight
updates can be done with the following relation
wij=∑p=1P[2si(p)−1][2sj(p)−1]fori≠jwij=∑p=1P[2si(p)−1][2sj(p)−1]fori≠j
wij=∑p=1P[si(p)][sj(p)]fori≠jwij=∑p=1P[si(p)][sj(p)]fori≠j
Testing Algorithm
Step 1 − Initialize the weights, which are obtained from training algorithm by using Hebbian
principle.
Step 2 − Perform steps 3-9, if the activations of the network is not consolidated.
Step 4 − Make initial activation of the network equal to the external input vector X as follows −
yini=xi+∑jyjwjiyini=xi+∑jyjwji
Step 7 − Apply the activation as follows over the net input to calculate the output −
yi=⎧⎩⎨1yi0ifyini>θiifyini=θiifyini<θiyi={1ifyini>θiyiifyini=θi0ifyini<θi
An energy function is defined as a function that is bonded and non-increasing function of the
state of the system.
Energy function Ef, also called Lyapunov function determines the stability of discrete Hopfield
network, and is characterized as follows −
Ef=−12∑i=1n∑j=1nyiyjwij−∑i=1nxiyi+∑i=1nθiyiEf=−12∑i=1n∑j=1nyiyjwij−∑i=1nxiyi+∑i=
1nθiyi
Condition − In a stable network, whenever the state of node changes, the above energy function
will decrease.
Suppose when node i has changed state from y(k)iyi(k) to y(k+1)iyi(k+1) then the Energy
change ΔEfΔEf is given by the following relation
ΔEf=Ef(y(k+1)i)−Ef(y(k)i)ΔEf=Ef(yi(k+1))−Ef(yi(k))
=−(∑j=1nwijy(k)i+xi−θi)(y(k+1)i−y(k)i)=−(∑j=1nwijyi(k)+xi−θi)(yi(k+1)−yi(k))
=−(neti)Δyi=−(neti)Δyi
Here Δyi=y(k+1)i−y(k)iΔyi=yi(k+1)−yi(k)
The change in energy depends on the fact that only one unit can update its activation at a time.
In comparison with Discrete Hopfield network, continuous network has time as a continuous
variable. It is also used in auto association and optimization problems such as travelling
salesman problem.
Model − The model or architecture can be build up by adding electrical components such as
amplifiers which can map the input voltage to the output voltage over a sigmoid activation
function.
Ef=12∑i=1n∑j=1j≠inyiyjwij−∑i=1nxiyi+1λ∑i=1n∑j=1j≠inwijgri∫yi0a−1(y)dyEf=12∑i=1n∑j=
1j≠inyiyjwij−∑i=1nxiyi+1λ∑i=1n∑j=1j≠inwijgri∫0yia−1(y)dy
If there occurs a tie in case of selection of winner unit, the unit with the smallest index is the
winner. Take the winner unit index as J.
Step 5: Update the weights over the calculated winner unit Zj
vij(new)=vij(old)+α[xi−vij(old)]i=1ton
wkj(new)=wkj(old)+β[yk−wkj(old)]k=1tom
Step 6: Reduce the learning rates α and β
α(t+1)=0.5αt
β(t+1)=0.5βt
Step 7: Test stopping condition for phase-I training.
Step 8: Perform Steps 9-15 when stopping condition is false for phase-II training.
Step 9: Perform Steps 10-13 for each training input pair x:y. Here α and β are small constant
values.
Step 10: Make the X-input layer activations to vector x. Make the Y-input layer activations to
vector y.
Step 11: Find the winning cluster unit (use formulas from Step 4). Take the winner unit index
as J.
Step 12: Update the weights entering into unit ZJ
vij(new)=vij(old)+α[xi−vij(old)]i=1ton
wkj(new)=wkj(old)+β[yk−wkj(old)]k=1tom
Step 13: Update the weights from unit Zj to the output layers.
tji(new)=tji(old)+b[xi−tji(old)]i=1ton
ujk(new)=ujk(old)+a[yk−ujk(old)]k=1tom
Step 14: Reduce the learning rates a and b.
a(t+1)=0.5at
b(t+1)=0.5bt
Step 15: Test stopping condition for phase-II training.
Forward-only Counterpropagation network:
A simplified version of full CPN is the forward-only CPN. Forward-only CPN uses only the x
vector to form the cluster on the Kohonen units during phase I training. In case of forward-only
CPN, first input vectors are presented to the input units. First, the weights between the input
layer and cluster layer are trained. Then the weights between the cluster layer and output layer
are trained. This is a specific competitive network, with target known.
Architecture of forward-only CPN
It consists of three layers: input layer, cluster layer and output layer. Its architecture resembles
the back-propagation network, but in CPN there exists interconnections between the units in the
cluster layer.
Training Algorithm for Forward-only Counterpropagation network:
Step 0: Initial the weights and learning rare.
Step 1: Perform Steps 2-7 if stopping condition is false for phase-I training.
Step 2: Perform Steps 3-5 for each of uaining input X
Step 3: Set the X-input layer activations to vector X.
Step 4: Compute the winning cluster unit (J). If dot product method is used, find the cluster unit
zj with the largest net input.
Zinj=∑i=1nxivij
If Euclidean distance method is used, find the cluster unit Z j whose squared distance from input
patterns is the smallest
D(j)=∑i=1n(xi−vij)2
If there exists a tie in the selection of wiriner unit, the unit with the smallest index is chosen as
the winner.
Step 5: Perform weight updation for unit Zj. For i= 1 to n,
vij(new)=vij(old)+α[xi−vij(old)]i=1ton
Step 6: Reduce the learning rates α
α(t+1)=0.5αt
Step 7: Test stopping condition for phase-I training.
Step 8: Perform Steps 9-15 when stopping condition is false for phase-II training.
Step 9: Perform Steps 10-13 for each training input Pair x:y..
Step 10: Set X-input layer activations to vector X. Sec Y-outpur layer activations to vector Y.
Step 11: Find the winning cluster unit (use formulas from Step 4). Take the winner unit index
as J.
Step 12: Update the weights entering into unit ZJ,
vij(new)=vij(old)+α[xi−vij(old)]i=1ton
Step 13: Update the weights from unit Zj to the output layers.
wkj(new)=wkj(old)+β[yk−wkj(old)]k=1tom
Step 14: Reduce the learning rates β.
β(t+1)=0.5βt
Step 15: Test stopping condition for phase-II training.
This network was developed by Stephen Grossberg and Gail Carpenter in 1987. It is based on
competition and uses unsupervised learning model. Adaptive Resonance
Theory ARTART networks, as the name suggests, is always open to new
learning adaptiveadaptive without losing the old patterns resonanceresonance. Basically, ART
network is a vector classifier which accepts an input vector and classifies it into one of the
categories depending upon which of the stored pattern it resembles the most.
Operating Principal
The main operation of ART classification can be divided into the following phases −
Recognition phase − The input vector is compared with the classification presented at
every node in the output layer. The output of the neuron becomes “1” if it best matches
with the classification applied, otherwise it becomes “0”.
Comparison phase − In this phase, a comparison of the input vector to the comparison
layer vector is done. The condition for reset is that the degree of similarity would be less
than vigilance parameter.
Search phase − In this phase, the network will search for reset as well as the match
done in the above phases. Hence, if there would be no reset and the match is quite good,
then the classification is over. Otherwise, the process would be repeated and the other
stored pattern must be sent to find the correct match.
ART1
It is a type of ART, which is designed to cluster binary vectors. We can understand about this
with the architecture of it.
Architecture of ART1
It consists of the following two units −
Computational Unit − It is made up of the following −
Input unit (F1 layer) − It further has the following two portions −
o F1aa layer InputportionInputportion − In ART1, there would be no processing in
this portion rather than having the input vectors only. It is connected to
F1bb layer interfaceportioninterfaceportion.
o F1bb layer InterfaceportionInterfaceportion − This portion combines the signal
from the input portion with that of F 2 layer. F1bb layer is connected to F2 layer
through bottom up weights bij and F2 layer is connected to F1bb layer through top
down weights tji.
Cluster Unit (F2 layer) − This is a competitive layer. The unit having the largest net
input is selected to learn the input pattern. The activation of all other cluster unit are set
to 0.
Reset Mechanism − The work of this mechanism is based upon the similarity between
the top-down weight and the input vector. Now, if the degree of this similarity is less
than the vigilance parameter, then the cluster is not allowed to learn the pattern and a
rest would happen.
Supplement Unit − Actually the issue with Reset mechanism is that the layer F2 must have to
be inhibited under certain conditions and must also be available when some learning happens.
That is why two supplemental units namely, G1 and G2 is added along with reset unit, R. They
are called gain control units. These units receive and send signals to the other units present in
the network. ‘+’ indicates an excitatory signal, while ‘−’ indicates an inhibitory signal.
Parameters Used
Following parameters are used −
n − Number of components in the input vector
m − Maximum number of clusters that can be formed
bij − Weight from F1bb to F2 layer, i.e. bottom-up weights
tji − Weight from F2 to F1bb layer, i.e. top-down weights
ρ − Vigilance parameter
||x|| − Norm of vector x
Algorithm
Step 1 − Initialize the learning rate, the vigilance parameter, and the weights as follows −
α>1and0<ρ≤1α>1and0<ρ≤1
0<bij(0)<αα−1+nandtij(0)=10<bij(0)<αα−1+nandtij(0)=1
Step 2 − Continue step 3-9, when the stopping condition is not true.
Step 3 − Continue step 4-6 for every training input.
Step 4 − Set activations of all F1aa and F1 units as follows
F2 = 0 and F1aa = input vectors
Step 5 − Input signal from F1aa to F1bb layer must be sent like
si=xisi=xi
Step 6 − For every inhibited F2 node
yj=∑ibijxiyj=∑ibijxi the condition is yj ≠ -1
Step 7 − Perform step 8-10, when the reset is true.
Step 8 − Find J for yJ ≥ yj for all nodes j
Step 9 − Again calculate the activation on F1bb as follows
xi=sitJixi=sitJi
Step 10 − Now, after calculating the norm of vector x and vector s, we need to check the reset
condition as follows −
If ||x||/ ||s|| < vigilance parameter ρ,theninhibit node J and go to step 7
Else If ||x||/ ||s|| ≥ vigilance parameter ρ, then proceed further.
Step 11 − Weight updating for node J can be done as follows −
bij(new)=αxiα−1+||x||bij(new)=αxiα−1+||x||
tij(new)=xitij(new)=xi
Step 12 − The stopping condition for algorithm must be checked and it may be as follows −