0% found this document useful (0 votes)
145 views16 pages

Unit 4

Unit-4 discusses different types of learning in artificial neural networks: - Supervised learning involves training a network with input and desired output examples under the supervision of a teacher. The network adjusts its weights to match the actual output to the desired output. - Unsupervised learning trains a network to find patterns in input data without labeled examples or a teacher. The network forms clusters of similar input patterns. - Reinforcement learning provides evaluative feedback to the network to reinforce or weaken connections, similar to supervised learning but with less direct instruction. Backpropagation is an algorithm that calculates weight adjustments in a neural network to minimize error between the actual and target outputs. It propagates error information from

Uploaded by

Hitesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
145 views16 pages

Unit 4

Unit-4 discusses different types of learning in artificial neural networks: - Supervised learning involves training a network with input and desired output examples under the supervision of a teacher. The network adjusts its weights to match the actual output to the desired output. - Unsupervised learning trains a network to find patterns in input data without labeled examples or a teacher. The network forms clusters of similar input patterns. - Reinforcement learning provides evaluative feedback to the network to reinforce or weaken connections, similar to supervised learning but with less direct instruction. Backpropagation is an algorithm that calculates weight adjustments in a neural network to minimize error between the actual and target outputs. It propagates error information from

Uploaded by

Hitesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Unit-4

Learning
Learning, in artificial neural network, is the method of modifying the weights of
connections between the neurons of a specified network. Learning in ANN can be
classified into three categories namely supervised learning, unsupervised learning, and
reinforcement learning.

Supervised Learning

As the name suggests, this type of learning is done under the supervision of a teacher.
This learning process is dependent.
During the training of ANN under supervised learning, the input vector is presented to
the network, which will give an output vector. This output vector is compared with the
desired output vector. An error signal is generated, if there is a difference between the
actual output and the desired output vector. On the basis of this error signal, the
weights are adjusted until the actual output is matched with the desired output.

Unsupervised Learning

As the name suggests, this type of learning is done without the supervision of a
teacher. This learning process is independent.
During the training of ANN under unsupervised learning, the input vectors of similar
type are combined to form clusters. When a new input pattern is applied, then the
neural network gives an output response indicating the class to which the input pattern
belongs.
There is no feedback from the environment as to what should be the desired output
and if it is correct or incorrect. Hence, in this type of learning, the network itself must
discover the patterns and features from the input data, and the relation for the input
data over the output.

Reinforcement Learning

As the name suggests, this type of learning is used to reinforce or strengthen the
network over some critic information. This learning process is similar to supervised
learning, however we might have very less information.
During the training of network under reinforcement learning, the network receives some
feedback from the environment. This makes it somewhat similar to supervised learning.
However, the feedback obtained here is evaluative not instructive, which means there
is no teacher as in supervised learning. After receiving the feedback, the network
performs adjustments of the weights to get better critic information in future.

What is Backpropagation?
Back-propagation is the essence of neural net training. It is the method of fine-
tuning the weights of a neural net based on the error rate obtained in the
previous epoch (i.e., iteration). Proper tuning of the weights allows you to
reduce error rates and to make the model reliable by increasing its
generalization.
Backpropagation is a short form for "backward propagation of errors." It is a
standard method of training artificial neural networks. This method helps to
calculate the gradient of a loss function with respects to all the weights in the
network.

How Backpropagation Works: Simple Algorithm


Consider the following diagram

How Backpropagation Works

1. Inputs X, arrive through the preconnected path


2. Input is modeled using real weights W. The weights are usually
randomly selected.
3. Calculate the output for every neuron from the input layer, to the hidden
layers, to the output layer.
4. Calculate the error in the outputs

ErrorB= Actual Output – Desired Output


5. Travel back from the output layer to the hidden layer to adjust the
weights such that the error is decreased.

Keep repeating the process until the desired output is achieved

A Step by Step Backpropagation Example

Background

Backpropagation is a common method for training a neural network. There is no shortage of


papers online that attempt to explain how backpropagation works, but few that include an example
with actual numbers. This post is my attempt to explain how it works with a concrete example that
folks can compare their own calculations to in order to ensure they understand backpropagation
correctly.
we’re going to use a neural network with two inputs, two hidden neurons, two output
neurons. Additionally, the hidden and output neurons will include a bias.

Here’s the basic structure:


In order to have some numbers to work with, here are the initial weights, the biases,
and training inputs/outputs:

The goal of backpropagation is to optimize the weights so that the neural network can learn
how to correctly map arbitrary inputs to outputs.

For the rest of this tutorial we’re going to work with a single training set: given inputs 0.05
and 0.10, we want the neural network to output 0.01 and 0.99.

The Forward Pass

To begin, lets see what the neural network currently predicts given the weights and biases
above and inputs of 0.05 and 0.10. To do this we’ll feed those inputs forward though the
network.

We figure out the total net input to each hidden layer neuron, squash the total net input using
an activation function (here we use the logistic function), then repeat the process with the
output layer neurons.
Total net input is also referred to as just net input by some sources.

Here’s how we calculate the total net input for :


We then squash it using the logistic function to get the output of :

Carrying out the same process for we get:

We repeat this process for the output layer neurons, using the output from the hidden layer
neurons as inputs.

Here’s the output for :

And carrying out the same process for we get:

Calculating the Total Error

We can now calculate the error for each output neuron using the squared error function and
sum them to get the total error:

Some sources refer to the target as the ideal and the output as the actual.

The is included so that exponent is cancelled when we differentiate later on. The result is
eventually multiplied by a learning rate anyway so it doesn’t matter that we introduce a
constant here [1].

For example, the target output for is 0.01 but the neural network output 0.75136507,
therefore its error is:

Repeating this process for (remembering that the target is 0.99) we get:
The total error for the neural network is the sum of these errors:

The Backwards Pass

Our goal with backpropagation is to update each of the weights in the network so that they
cause the actual output to be closer the target output, thereby minimizing the error for each
output neuron and the network as a whole.

Output Layer

Consider . We want to know how much a change in affects the total error, aka .

is read as “the partial derivative of with respect to “. You can also say “the
gradient with respect to “.

By applying the chain rule we know that:

Visually, here’s what we’re doing:

We need to figure out each piece in this equation.

First, how much does the total error change with respect to the output?
is sometimes expressed as

When we take the partial derivative of the total error with respect to , the
quantity becomes zero because does not affect it which means
we’re taking the derivative of a constant which is zero.

Next, how much does the output of change with respect to its total net input?

The partial derivative of the logistic function is the output multiplied by 1 minus the output:

Finally, how much does the total net input of change with respect to ?

Putting it all together:

You’ll often see this calculation combined in the form of the delta rule:

Alternatively, we have and which can be written as , aka (the Greek letter
delta) aka the node delta. We can use this to rewrite the calculation above:
Therefore:

Some sources extract the negative sign from so it would be written as:

To decrease the error, we then subtract this value from the current weight (optionally
multiplied by some learning rate, eta, which we’ll set to 0.5):

Some sources use (alpha) to represent the learning rate, others use (eta), and others even
use (epsilon).

We can repeat this process to get the new weights , , and :

We perform the actual updates in the neural network after we have the new weights leading
into the hidden layer neurons (ie, we use the original weights, not the updated weights,
when we continue the backpropagation algorithm below).

Hidden Layer

Next, we’ll continue the backwards pass by calculating new values for , , , and .

Big picture, here’s what we need to figure out:

Visually:
We’re going to use a similar process as we did for the output layer, but slightly different to
account for the fact that the output of each hidden layer neuron contributes to the output
(and therefore error) of multiple output neurons. We know that affects both
and therefore the needs to take into consideration its effect on the both output
neurons:

Starting with :

We can calculate using values we calculated earlier:

And is equal to :
Plugging them in:

Following the same process for , we get:

Therefore:

Now that we have , we need to figure out and then for each weight:

We calculate the partial derivative of the total net input to with respect to the same as
we did for the output neuron:

Putting it all together:

You might also see this written as:


We can now update :

Repeating this for , , and

Finally, we’ve updated all of our weights! When we fed forward the 0.05 and 0.1 inputs
originally, the error on the network was 0.298371109. After this first round of
backpropagation, the total error is now down to 0.291027924. It might not seem like much,
but after repeating this process 10,000 times, for example, the error plummets to
0.0000351085. At this point, when we feed forward 0.05 and 0.1, the two outputs neurons
generate 0.015912196 (vs 0.01 target) and 0.984065734 (vs 0.99 target).

Kohonen Self-Organizing Feature Maps


Suppose we have some pattern of arbitrary dimensions, however, we need them in
one dimension or two dimensions. Then the process of feature mapping would be very
useful to convert the wide pattern space into a typical feature space. Now, the question
arises why do we require self-organizing feature map? The reason is, along with the
capability to convert the arbitrary dimensions into 1-D or 2-D, it must also have the ability to
preserve the neighbor topology.

Neighbor Topologies in Kohonen SOM


There can be various topologies, however the following two topologies are used the
most −

Rectangular Grid Topology

This topology has 24 nodes in the distance-2 grid, 16 nodes in the distance-1 grid, and
8 nodes in the distance-0 grid, which means the difference between each rectangular
grid is 8 nodes. The winning unit is indicated by #.
Hexagonal Grid Topology

This topology has 18 nodes in the distance-2 grid, 12 nodes in the distance-1 grid, and
6 nodes in the distance-0 grid, which means the difference between each rectangular
grid is 6 nodes. The winning unit is indicated by #.

Architecture

The architecture of KSOM is similar to that of the competitive network. With the help of
neighborhood schemes, discussed earlier, the training can take place over the
extended region of the network.
Algorithm for training

Step 1 − Initialize the weights, the learning rate α and the neighborhood topological
scheme.
Step 2 − Continue step 3-9, when the stopping condition is not true.
Step 3 − Continue step 4-6 for every input vector x.
Step 4 − Calculate Square of Euclidean Distance for j = 1 to m
$$D(j)\:=\:\displaystyle\sum\limits_{i=1}^n \displaystyle\sum\limits_{j=1}^m (x_{i}\:-
\:w_{ij})^2$$
Step 5 − Obtain the winning unit J where D(j) is minimum.
Step 6 − Calculate the new weight of the winning unit by the following relation −
$$w_{ij}(new)\:=\:w_{ij}(old)\:+\:\alpha[x_{i}\:-\:w_{ij}(old)]$$
Step 7 − Update the learning rate α by the following relation −
$$\alpha(t\:+\:1)\:=\:0.5\alpha t$$
Step 8 − Reduce the radius of topological scheme.
Step 9 − Check for the stopping condition for the network.

Hopfield Networks
Hopfield neural network was invented by Dr. John J. Hopfield in 1982. It consists of a single
layer which contains one or more fully connected recurrent neurons. The Hopfield network
is commonly used for auto-association and optimization tasks.

Discrete Hopfield Network


A Hopfield network which operates in a discrete line fashion or in other words, it can be
said the input and output patterns are discrete vector, which can be either binary (0,1)
or bipolar (+1, -1) in nature. The network has symmetrical weights with no self-
connections i.e., wij = wji and wii = 0.

Architecture

Following are some important points to keep in mind about discrete Hopfield network −
 This model consists of neurons with one inverting and one non-inverting output.
 The output of each neuron should be the input of other neurons but not the input of self.
 Weight/connection strength is represented by wij.
 Connections can be excitatory as well as inhibitory. It would be excitatory, if the output of the
neuron is same as the input, otherwise inhibitory.
 Weights should be symmetrical, i.e. wij = wji

The output from Y1 going to Y2, Yi and Yn have the weights w12, w1i and w1n respectively.
Similarly, other arcs have the weights on them.

Training Algorithm

During training of discrete Hopfield network, weights will be updated. As we know that
we can have the binary input vectors as well as bipolar input vectors. Hence, in both
the cases, weight updates can be done with the following relation
Case 1 − Binary input patterns
For a set of binary patterns s(p), p = 1 to P
Here, s(p) = s1(p), s2(p),..., si(p),..., sn(p)
Weight Matrix is given by
$$w_{ij}\:=\:\sum_{p=1}^P[2s_{i}(p)-\:1][2s_{j}(p)-\:1]\:\:\:\:\:for\:i\:\neq\:j$$
Case 2 − Bipolar input patterns
For a set of binary patterns s(p), p = 1 to P
Here, s(p) = s1(p), s2(p),..., si(p),..., sn(p)
Weight Matrix is given by
$$w_{ij}\:=\:\sum_{p=1}^P[s_{i}(p)][s_{j}(p)]\:\:\:\:\:for\:i\:\neq\:j$$

Testing Algorithm

Step 1 − Initialize the weights, which are obtained from training algorithm by using
Hebbian principle.
Step 2 − Perform steps 3-9, if the activations of the network is not consolidated.
Step 3 − For each input vector X, perform steps 4-8.
Step 4 − Make initial activation of the network equal to the external input vector X as
follows −
$$y_{i}\:=\:x_{i}\:\:\:for\:i\:=\:1\:to\:n$$
Step 5 − For each unit Yi, perform steps 6-9.
Step 6 − Calculate the net input of the network as follows −
$$y_{ini}\:=\:x_{i}\:+\:\displaystyle\sum\limits_{j}y_{j}w_{ji}$$
Step 7 − Apply the activation as follows over the net input to calculate the output −
$$y_{i}\:=\begin{cases}1 & if\:y_{ini}\:>\:\theta_{i}\\y_{i} & if\:y_{ini}\:=\:\theta_{i}\\0 &
if\:y_{ini}\:<\:\theta_{i}\end{cases}$$
Here $\theta_{i}$ is the threshold.
Step 8 − Broadcast this output yi to all other units.
Step 9 − Test the network for conjunction.

You might also like