Unit_2

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 20

Perceptron

A single-layer feedforward neural network was introduced in the late 1950s by Frank
Rosenblatt. It was Perceptron is one of the first and most straightforward models of artificial
neural networks. Despite being a straightforward model, the perceptron has been proven to be
successful in solving specific categorization issues.
What is Perceptron?
Perceptron is one of the simplest Artificial neural network architectures. It was introduced by
Frank Rosenblatt in 1957s. It is the simplest type of feedforward neural network, consisting of a
single layer of input nodes that are fully connected to a layer of output nodes. It can learn the
linearly separable patterns. it uses slightly different types of artificial neurons known as
threshold logic units (TLU
Types of Perceptron
 Single-Layer Perceptron: This type of perceptron is limited to learning linearly
separable patterns. effective for tasks where the data can be divided into distinct
categories through a straight line.
 Multilayer Perceptron: Multilayer perceptrons possess enhanced processing
capabilities as they consist of two or more layers, adept at handling more complex
patterns and relationships within the data.
Basic Components of Perceptron
A perceptron, the basic unit of a neural network, comprises essential components that
collaborate in information processing.
 Input Features: The perceptron takes multiple input features, each input feature
represents a characteristic or attribute of the input data.
 Weights: Each input feature is associated with a weight, determining the significance of
each input feature in influencing the perceptron’s output. During training, these weights
are adjusted to learn the optimal values.
 Summation Function: The perceptron calculates the weighted sum of its inputs using
the summation function. The summation function combines the inputs with their
respective weights to produce a weighted sum.
 Activation Function: The weighted sum is then passed through an activation function.
Perceptron uses Heaviside step function. which take the summed values as input and
compare with the threshold and provide the output as 0 or 1.
 Output: The final output of the perceptron, is determined by the activation function’s
result. For example, in binary classification problems, the output might represent a
predicted class (0 or 1).
 Bias: A bias term is often included in the perceptron model. The bias allows the model to
make adjustments that are independent of the input. It is an additional parameter that is
learned during training.
 Learning Algorithm (Weight Update Rule): During training, the perceptron learns by
adjusting its weights and bias based on a learning algorithm. A common approach is the
perceptron learning algorithm, which updates weights based on the difference between
the predicted output and the true output.
These components work together to enable a perceptron to learn and make predictions. While a
single perceptron can perform binary classification, more complex tasks require the use of
multiple perceptrons organized into layers, forming a neural network.
How does Perceptron work?
A weight is assigned to each input node of a perceptron, indicating the significance of that input
to the output. The perceptron’s output is a weighted sum of the inputs that have been run
through an activation function to decide whether or not the perceptron will fire. it computes the
weighted sum of its inputs as:
z = w1x1 + w1x2 + ... + wnxn = XTW
The step function compares this weighted sum to the threshold, which outputs 1 if the input is
larger than a threshold value and 0 otherwise, is the activation function that perceptrons utilize
the most frequently. The most common step function used in perceptron is the Heaviside step
function:

A perceptron has a single layer of threshold logic units with each TLU onnected to all inputs.
When all the neurons in a layer are connected to every neuron of the previous layer, it is known
as a fully connected layer or dense layer.
Q Develop a Perceptron for AND and OR function using bipolar inputs and targets.
Backpropagation
Backpropagation is a powerful algorithm in deep learning, primarily used to train artificial
neural networks, particularly feed-forward networks. It works iteratively, minimizing the cost
function by adjusting weights and biases.
In each epoch, the model adapts these parameters, reducing loss by following the error gradient.
Backpropagation often utilizes optimization algorithms like gradient descent or stochastic
gradient descent. The algorithm computes the gradient using the chain rule from calculus,
allowing it to effectively navigate complex layers in the neural network to minimize the cost
function.
fig(a) A simple illustration of how the backpropagation works by adjustments of weights
Importance of Backpropagation
Backpropagation plays a critical role in how neural networks improve over time. Here's why:
1. Efficient Weight Update: It computes the gradient of the loss function with respect to
each weight using the chain rule, making it possible to update weights efficiently.
2. Scalability: The backpropagation algorithm scales well to networks with multiple layers
and complex architectures, making deep learning feasible.
3. Automated Learning: With backpropagation, the learning process becomes automated,
and the model can adjust itself to optimize its performance.
Working of Backpropagation Algorithm
The Backpropagation algorithm involves two main steps: the Forward Pass and the Backward
Pass.
Forward Pass Working
In the forward pass, the input data is fed into the input layer. These inputs, combined with their
respective weights, are passed to hidden layers.
For example, in a network with two hidden layers (h1 and h2 as shown in Fig. (a)), the output
from h1 serves as the input to h2. Before applying an activation function, a bias is added to the
weighted inputs.
Each hidden layer applies an activation function like ReLU (Rectified Linear Unit), which
returns the input if it’s positive and zero otherwise. This adds non-linearity, allowing the model
to learn complex relationships in the data. Finally, the outputs from the last hidden layer are
passed to the output layer, where an activation function, such as softmax, converts the weighted
outputs into probabilities for classification.

The forward pass using weights and biases


How Does the Backward Pass Work?
In the backward pass, the error (the difference between the predicted and actual output) is
propagated back through the network to adjust the weights and biases. One common method for
error calculation is the Mean Squared Error (MSE), given by:
MSE=(Predicted Output−Actual Output)2MSE=(Predicted Output−Actual Output)2
Once the error is calculated, the network adjusts weights using gradients, which are computed
with the chain rule. These gradients indicate how much each weight and bias should be adjusted
to minimize the error in the next iteration. The backward pass continues layer by layer, ensuring
that the network learns and improves its performance. The activation function, through its
derivative, plays a crucial role in computing these gradients during backpropagation.
Example of Backpropagation in Machine Learning
Study link: https://www.geeksforgeeks.org/backpropagation-in-neural-network/

Associative Memory

These kinds of neural networks work on the basis of pattern association, which means they can
store different patterns and at the time of giving an output they can produce one of the stored
patterns by matching them with the given input pattern. These types of memories are also
called Content-Addressable Memory CAMCAM. Associative memory makes a parallel
search with the stored patterns as data files.

Following are the two types of associative memories we can observe −

 Auto Associative Memory


 Hetero Associative memory

Auto Associative Memory

This is a single layer neural network in which the input training vector and the output target
vectors are the same. The weights are determined so that the network stores a set of patterns.

Architecture

As shown in the following figure, the architecture of Auto Associative memory network
has ‘n’ number of input training vectors and similar ‘n’ number of output target vectors.

Training Algorithm
For training, this network is using the Hebb or Delta learning rule.

Step 1 − Initialize all the weights to zero as wij = 0 i=1ton,j=1toni=1ton,j=1ton

Step 2 − Perform steps 3-4 for each input vector.

Step 3 − Activate each input unit as follows −

xi=si(i=1ton)xi=si(i=1ton)

Step 4 − Activate each output unit as follows −

yj=sj(j=1to n)yj=sj(j=1ton)

Step 5 − Adjust the weights as follows −

wij(new)=wij(old)+xiyjwij(new)=wij(old)+xiyj

Testing Algorithm

Step 1 − Set the weights obtained during training for Hebb’s rule.

Step 2 − Perform steps 3-5 for each input vector.

Step 3 − Set the activation of the input units equal to that of the input vector.

Step 4 − Calculate the net input to each output unit j = 1 to n

yinj=∑i=1nxiwijyinj=∑i=1nxiwij

Step 5 − Apply the following activation function to calculate the output

yj=f(yinj)={+1−1ifyinj>0ifyinj⩽0yj=f(yinj)={+1ifyinj>0−1ifyinj⩽0

Hetero Associative memory

Similar to Auto Associative Memory network, this is also a single layer neural network.
However, in this network the input training vector and the output target vectors are not the
same. The weights are determined so that the network stores a set of patterns. Hetero associative
network is static in nature, hence, there would be no non-linear and delay operations.

Architecture
As shown in the following figure, the architecture of Hetero Associative Memory network
has ‘n’ number of input training vectors and ‘m’ number of output target vectors.

Training Algorithm

For training, this network is using the Hebb or Delta learning rule.

Step 1 − Initialize all the weights to zero as wij = 0 i=1ton, j=1to mi=1to n, j=1to m

Step 2 − Perform steps 3-4 for each input vector.

Step 3 − Activate each input unit as follows −

xi=si(i=1to n)xi=si(i=1ton)

Step 4 − Activate each output unit as follows −

yj=sj(j=1to m)yj=sj(j=1tom)

Step 5 − Adjust the weights as follows −

wij(new)=wij(old)+xiyjwij(new)=wij(old)+xiyj

Testing Algorithm

Step 1 − Set the weights obtained during training for Hebb’s rule.

Step 2 − Perform steps 3-5 for each input vector.

Step 3 − Set the activation of the input units equal to that of the input vector.
Step 4 − Calculate the net input to each output unit j = 1 to m;

yinj=∑i=1nxiwijyinj=∑i=1nxiwij

Step 5 − Apply the following activation function to calculate the output

yj=f(yinj)=⎧⎩⎨⎪⎪+10−1ifyinj>0ifyinj=0ifyinj<0

Hopfield network

Hopfield neural network was invented by Dr. John J. Hopfield in 1982. It consists of a single
layer which contains one or more fully connected recurrent neurons. The Hopfield network is
commonly used for auto-association and optimization tasks.

Discrete Hopfield Network

A Hopfield network which operates in a discrete line fashion or in other words, it can be said
the input and output patterns are discrete vector, which can be either binary 0,10,1 or
bipolar +1,−1+1,−1 in nature. The network has symmetrical weights with no self-connections
i.e., wij = wji and wii = 0.

Architecture

Following are some important points to keep in mind about discrete Hopfield network −

 This model consists of neurons with one inverting and one non-inverting output.
 The output of each neuron should be the input of other neurons but not the input of self.
 Weight/connection strength is represented by wij.
 Connections can be excitatory as well as inhibitory. It would be excitatory, if the output
of the neuron is same as the input, otherwise inhibitory.
 Weights should be symmetrical, i.e. wij = wji
The output from Y1 going to Y2, Yi and Yn have the weights w12, w1i and w1n respectively.
Similarly, other arcs have the weights on them.

Training Algorithm

During training of discrete Hopfield network, weights will be updated. As we know that we can
have the binary input vectors as well as bipolar input vectors. Hence, in both the cases, weight
updates can be done with the following relation

Case 1 − Binary input patterns

For a set of binary patterns spp, p = 1 to P

Here, spp = s1pp, s2pp,..., sipp,..., snpp

Weight Matrix is given by

wij=∑p=1P[2si(p)−1][2sj(p)−1]fori≠jwij=∑p=1P[2si(p)−1][2sj(p)−1]fori≠j

Case 2 − Bipolar input patterns


For a set of binary patterns spp, p = 1 to P

Here, spp = s1pp, s2pp,..., sipp,..., snpp

Weight Matrix is given by

wij=∑p=1P[si(p)][sj(p)]fori≠jwij=∑p=1P[si(p)][sj(p)]fori≠j

Testing Algorithm

Step 1 − Initialize the weights, which are obtained from training algorithm by using Hebbian
principle.

Step 2 − Perform steps 3-9, if the activations of the network is not consolidated.

Step 3 − For each input vector X, perform steps 4-8.

Step 4 − Make initial activation of the network equal to the external input vector X as follows −

yi=xi for i=1ton yi=xi for i=1ton

Step 5 − For each unit Yi, perform steps 6-9.

Step 6 − Calculate the net input of the network as follows −

yini=xi+∑jyjwjiyini=xi+∑jyjwji

Step 7 − Apply the activation as follows over the net input to calculate the output −

yi=⎧⎩⎨1yi0ifyini>θiifyini=θiifyini<θiyi={1ifyini>θiyiifyini=θi0ifyini<θi

Here θiθi is the threshold.

Step 8 − Broadcast this output yi to all other units.

Step 9 − Test the network for conjunction.

Energy Function Evaluation

An energy function is defined as a function that is bonded and non-increasing function of the
state of the system.
Energy function Ef⁡, ⁡also called Lyapunov function determines the stability of discrete Hopfield
network, and is characterized as follows −

Ef=−12∑i=1n∑j=1nyiyjwij−∑i=1nxiyi+∑i=1nθiyiEf=−12∑i=1n∑j=1nyiyjwij−∑i=1nxiyi+∑i=
1nθiyi

Condition − In a stable network, whenever the state of node changes, the above energy function
will decrease.

Suppose when node i has changed state from y(k)iyi(k) to y(k+1)iyi(k+1) ⁡then the Energy
change ΔEfΔEf is given by the following relation

ΔEf=Ef(y(k+1)i)−Ef(y(k)i)ΔEf=Ef(yi(k+1))−Ef(yi(k))
=−(∑j=1nwijy(k)i+xi−θi)(y(k+1)i−y(k)i)=−(∑j=1nwijyi(k)+xi−θi)(yi(k+1)−yi(k))
=−(neti)Δyi=−(neti)Δyi

Here Δyi=y(k+1)i−y(k)iΔyi=yi(k+1)−yi(k)

The change in energy depends on the fact that only one unit can update its activation at a time.

Continuous Hopfield Network

In comparison with Discrete Hopfield network, continuous network has time as a continuous
variable. It is also used in auto association and optimization problems such as travelling
salesman problem.

Model − The model or architecture can be build up by adding electrical components such as
amplifiers which can map the input voltage to the output voltage over a sigmoid activation
function.

Energy Function Evaluation

Ef=12∑i=1n∑j=1j≠inyiyjwij−∑i=1nxiyi+1λ∑i=1n∑j=1j≠inwijgri∫yi0a−1(y)dyEf=12∑i=1n∑j=
1j≠inyiyjwij−∑i=1nxiyi+1λ∑i=1n∑j=1j≠inwijgri∫0yia−1(y)dy

Here λ is gain parameter and gri input conductance.

Counter Propagation Networks


Counter propagation network (CPN) were proposed by Hecht Nielsen in 1987.They are
multilayer network based on the combinations of the input, output, and clustering layers. The
application of counter propagation net are data compression, function approximation and pattern
association. The ccounter propagation network is basically constructed from an instar-outstar
model. This model is three layer neural network that performs input-output data mapping,
producing an output vector y in response to input vector x, on the basis of competitive learning.
The three layer in an instar-out star model are the input layer, the hidden(competitive) layer and
the output layer.
There are two stages involved in the training process of a counterpropagation net. The input
vector are clustered in the first stage. In the second stage of training, the weights from the
cluster layer units to the output units are tuned to obtain the desired response.

There are two types of counter propagation network:


1. Full counter propagation network
2. Forward-only counter propagation network
Full counter propagation network
Full CPN efficiently represents a large number of vector pair x:y by adaptively constructing a
look-up-table. The full CPN works best if the inverse function exists and is continuous. The
vector x and y propagate through the network in a counterflow manner to yield output vector x*
and y*.
Architecture of Full Counterpropagation Network
The four major components of the instar-outstar model are the input layer, the instar, the
competitive layer and the outstar. For each node in the input layer there is an input value xi. All
the instar are grouped into a layer called the competitive layer. Each of the instar responds
maximally to a group of input vectors in a different region of space. An outstar model is found
to have all the nodes in the output layer and a single node in the competitive layer. The outstar
looks like the fan-out of a node.
Training Algorithm for Full Counterpropagation Network:
Step 0: Set the initial weights and the initial learning rare.
Step 1: Perform Steps 2-7 if stopping condition is false for phase-I training.
Step 2: For each of the training input vector pair x: y presented, perform Steps 3-5.
Step 3: Make the X-input layer activations to vector X. Make the Y-inpur layer activations to
vector Y.
Step 4: Find the winning cluster unit. If dot product method is used, find rhe cluster unit Zj with
target net input: for j = 1 to p.
Zinj=∑i=1nxivij+∑k=1mykwkj
If Euclidean distance method is used, find the cluster unit Z j whose squared distance from input
vectors is the smallest

If there occurs a tie in case of selection of winner unit, the unit with the smallest index is the
winner. Take the winner unit index as J.
Step 5: Update the weights over the calculated winner unit Zj
vij(new)=vij(old)+α[xi−vij(old)]i=1ton
wkj(new)=wkj(old)+β[yk−wkj(old)]k=1tom
Step 6: Reduce the learning rates α and β
α(t+1)=0.5αt
β(t+1)=0.5βt
Step 7: Test stopping condition for phase-I training.
Step 8: Perform Steps 9-15 when stopping condition is false for phase-II training.
Step 9: Perform Steps 10-13 for each training input pair x:y. Here α and β are small constant
values.
Step 10: Make the X-input layer activations to vector x. Make the Y-input layer activations to
vector y.
Step 11: Find the winning cluster unit (use formulas from Step 4). Take the winner unit index
as J.
Step 12: Update the weights entering into unit ZJ
vij(new)=vij(old)+α[xi−vij(old)]i=1ton
wkj(new)=wkj(old)+β[yk−wkj(old)]k=1tom
Step 13: Update the weights from unit Zj to the output layers.
tji(new)=tji(old)+b[xi−tji(old)]i=1ton
ujk(new)=ujk(old)+a[yk−ujk(old)]k=1tom
Step 14: Reduce the learning rates a and b.
a(t+1)=0.5at
b(t+1)=0.5bt
Step 15: Test stopping condition for phase-II training.
Forward-only Counterpropagation network:
A simplified version of full CPN is the forward-only CPN. Forward-only CPN uses only the x
vector to form the cluster on the Kohonen units during phase I training. In case of forward-only
CPN, first input vectors are presented to the input units. First, the weights between the input
layer and cluster layer are trained. Then the weights between the cluster layer and output layer
are trained. This is a specific competitive network, with target known.
Architecture of forward-only CPN
It consists of three layers: input layer, cluster layer and output layer. Its architecture resembles
the back-propagation network, but in CPN there exists interconnections between the units in the
cluster layer.
Training Algorithm for Forward-only Counterpropagation network:
Step 0: Initial the weights and learning rare.
Step 1: Perform Steps 2-7 if stopping condition is false for phase-I training.
Step 2: Perform Steps 3-5 for each of uaining input X
Step 3: Set the X-input layer activations to vector X.
Step 4: Compute the winning cluster unit (J). If dot product method is used, find the cluster unit
zj with the largest net input.
Zinj=∑i=1nxivij
If Euclidean distance method is used, find the cluster unit Z j whose squared distance from input
patterns is the smallest
D(j)=∑i=1n(xi−vij)2
If there exists a tie in the selection of wiriner unit, the unit with the smallest index is chosen as
the winner.
Step 5: Perform weight updation for unit Zj. For i= 1 to n,
vij(new)=vij(old)+α[xi−vij(old)]i=1ton
Step 6: Reduce the learning rates α
α(t+1)=0.5αt
Step 7: Test stopping condition for phase-I training.
Step 8: Perform Steps 9-15 when stopping condition is false for phase-II training.
Step 9: Perform Steps 10-13 for each training input Pair x:y..
Step 10: Set X-input layer activations to vector X. Sec Y-outpur layer activations to vector Y.
Step 11: Find the winning cluster unit (use formulas from Step 4). Take the winner unit index
as J.
Step 12: Update the weights entering into unit ZJ,
vij(new)=vij(old)+α[xi−vij(old)]i=1ton
Step 13: Update the weights from unit Zj to the output layers.
wkj(new)=wkj(old)+β[yk−wkj(old)]k=1tom
Step 14: Reduce the learning rates β.
β(t+1)=0.5βt
Step 15: Test stopping condition for phase-II training.

Adaptive Resonance Network

This network was developed by Stephen Grossberg and Gail Carpenter in 1987. It is based on
competition and uses unsupervised learning model. Adaptive Resonance
Theory ARTART networks, as the name suggests, is always open to new
learning adaptiveadaptive without losing the old patterns resonanceresonance. Basically, ART
network is a vector classifier which accepts an input vector and classifies it into one of the
categories depending upon which of the stored pattern it resembles the most.
Operating Principal
The main operation of ART classification can be divided into the following phases −
 Recognition phase − The input vector is compared with the classification presented at
every node in the output layer. The output of the neuron becomes “1” if it best matches
with the classification applied, otherwise it becomes “0”.
 Comparison phase − In this phase, a comparison of the input vector to the comparison
layer vector is done. The condition for reset is that the degree of similarity would be less
than vigilance parameter.
 Search phase − In this phase, the network will search for reset as well as the match
done in the above phases. Hence, if there would be no reset and the match is quite good,
then the classification is over. Otherwise, the process would be repeated and the other
stored pattern must be sent to find the correct match.
ART1
It is a type of ART, which is designed to cluster binary vectors. We can understand about this
with the architecture of it.
Architecture of ART1
It consists of the following two units −
Computational Unit − It is made up of the following −
 Input unit (F1 layer) − It further has the following two portions −
o F1aa layer InputportionInputportion − In ART1, there would be no processing in
this portion rather than having the input vectors only. It is connected to
F1bb layer interfaceportioninterfaceportion.
o F1bb layer InterfaceportionInterfaceportion − This portion combines the signal
from the input portion with that of F 2 layer. F1bb layer is connected to F2 layer
through bottom up weights bij and F2 layer is connected to F1bb layer through top
down weights tji.
 Cluster Unit (F2 layer) − This is a competitive layer. The unit having the largest net
input is selected to learn the input pattern. The activation of all other cluster unit are set
to 0.
 Reset Mechanism − The work of this mechanism is based upon the similarity between
the top-down weight and the input vector. Now, if the degree of this similarity is less
than the vigilance parameter, then the cluster is not allowed to learn the pattern and a
rest would happen.
Supplement Unit − Actually the issue with Reset mechanism is that the layer F2 must have to
be inhibited under certain conditions and must also be available when some learning happens.
That is why two supplemental units namely, G1 and G2 is added along with reset unit, R. They
are called gain control units. These units receive and send signals to the other units present in
the network. ‘+’ indicates an excitatory signal, while ‘−’ indicates an inhibitory signal.
Parameters Used
Following parameters are used −
 n − Number of components in the input vector
 m − Maximum number of clusters that can be formed
 bij − Weight from F1bb to F2 layer, i.e. bottom-up weights
 tji − Weight from F2 to F1bb layer, i.e. top-down weights
 ρ − Vigilance parameter
 ||x|| − Norm of vector x
Algorithm
Step 1 − Initialize the learning rate, the vigilance parameter, and the weights as follows −
α>1and0<ρ≤1α>1and0<ρ≤1
0<bij(0)<αα−1+nandtij(0)=10<bij(0)<αα−1+nandtij(0)=1
Step 2 − Continue step 3-9, when the stopping condition is not true.
Step 3 − Continue step 4-6 for every training input.
Step 4 − Set activations of all F1aa and F1 units as follows
F2 = 0 and F1aa = input vectors
Step 5 − Input signal from F1aa to F1bb layer must be sent like
si=xisi=xi
Step 6 − For every inhibited F2 node
yj=∑ibijxiyj=∑ibijxi the condition is yj ≠ -1
Step 7 − Perform step 8-10, when the reset is true.
Step 8 − Find J for yJ ≥ yj for all nodes j
Step 9 − Again calculate the activation on F1bb as follows
xi=sitJixi=sitJi
Step 10 − Now, after calculating the norm of vector x and vector s, we need to check the reset
condition as follows −
If ||x||/ ||s|| < vigilance parameter ρ,⁡then⁡inhibit ⁡node J and go to step 7
Else If ||x||/ ||s|| ≥ vigilance parameter ρ, then proceed further.
Step 11 − Weight updating for node J can be done as follows −
bij(new)=αxiα−1+||x||bij(new)=αxiα−1+||x||
tij(new)=xitij(new)=xi
Step 12 − The stopping condition for algorithm must be checked and it may be as follows −

You might also like