scunit-2-application-of-soft-computing-kcs056
scunit-2-application-of-soft-computing-kcs056
LECTURE NOTES ON
(KCS056) (APPLICATION OF SOFT COMPUTING)
(B.TECH.) (CSE/IT) (IVTH-YEAR) (VIITH-SEMESTER)
(AKTU)
UNIT – 2
(NEURAL NETWORKS-II)
CONTENTS
1. Introduction
1.2 What is BPN ?
1.3 Back-Propagation Network
2. Architecture of a back propagation network
2.1 Perceptron model
2.2 The solution
2.3 Single layer artificial neural network
2.4 Multilayer perception model
3. Back propagation learning methods
3.1 AND Operation
3.2 Input layer Computation
3.3 Output layer Computation
3.4 Hidden layer Computation
3.5 Calculation of error
4. Effect of learning rule co-efficient
5. Back propagation algorithm
6. Factors Affecting back propagation training
6.1 Momentum Factor
6.2 Learning Coefficient
6.3 Sigmoidal Gain
6.4 Threshold Value
7. Back Propagation Algorithm Application
7.1 Classification of Soil
7.2 Hot Extrusion of steel
7.3 Design of Journal Bearing
Introduction:
What is BPN ?
• A single-layer neural network has many restrictions. This network can accomplish very limited
classes of tasks.
• Minsky and Papert (1969) showed that a two layer feed-forward network can overcome many
restrictions, but they did not present a solution to the problem as “how to adjust the weights
from input to hidden layer" ?
• An answer to this question was presented by Rumelhart, Hinton and Williams in 1986. The
central idea behind this solution is that the errors for the units of the hidden layer are
determined by back-propagating the errors of the units of the output layer.
• Back-propagation can also be considered as a generalization of the delta rule for non-linear activation
functions and multi-layer networks.
• Back-propagation is a systematic method of training multi-layer artificial neural networks.
Back-Propagation Network
Real world is faced with situations where data is incomplete or noisy. To make reasonable predictions
about what is missing from the information available is a difficult task when there is no a good theory
available that may to help reconstruct the missing data. It is in such situations the Back-propagation
(Back-Prop) networks may provide some answers.
- an input layer,
- at least one intermediate hidden layer, and
- an output layer.
• Typically, units are connected in a feed-forward fashion with input units fully connected to units in the
hidden layer and hidden units fully connected to units in the output layer.
• When a Back Prop network is cycled, an input pattern is propagated forward to the output units through
the intervening input-to-hidden and hidden-to-output weights.
• The output of a Back Prop network is interpreted as a classification decision.
• With BackProp networks, learning occurs during a training phase. The steps followed during earning are :
− Each input pattern in a training set is applied to the input units and then propagated forward.
− The pattern of activation arriving at the output layer is compared with the correct (associated) output
pattern to calculate an error signal. the error signal for each such target output pattern is then back-
propagated from the outputs to the inputs in order to appropriately adjust the weights in each layer of the
network.
− After a BackProp network has learned the correct classification for a set of inputs, it can be tested on a
second set of inputs to see how well it classifies untrained patterns.
The initial approach to solve linearly separable problems was to have more than one perceptrons, each
set up identifying small linearly separable sections of the inputs.
Then combining their outputs into another perceptron would produce a final indication of the class to
which the inputs belongs.
The problem for the artificial neural network is to classify the inputs as odd parity or even parity.
Here, odd parity means odd numbers of 1 bits in the inputs and even parity refers to even number of 1
bits .in the inputs.
Figure illustrates the-combination of perceptrons to solve the XOR problem. However, on deep study,
we can find that this arrangement won't work because this arrangement of perceptrpns in layer will be
unable to learn. We know that each neuron in the structure considers the weighted sum of inputs and
then thresholds it and accordingly gives a zero or one as output.
For the perceptrons in the first layer, the input comes from the actual inputs while the perceptrons
present in the second layer get their input from the outputs of the first layer perceptrons. The
perceptrons of the second layer cannot distinguish whether the actual inputs from the first layer were
on or off.
The perceptron is a computational model of the retina of the eye. The network comprises three units,
the sensory unit S, association unit A and response unit R.
The unit S comprises of 400 photo detectors. It receives input images and provides a 0/1 electric signal
as output. If the input signal exceed a threshold, then the photo detector output 1 else 0. The photo
detectors are randomly connected to the Association unit A. The A unit comprises predicates. The
predicates examine the output of the S unit for specific features of the image. The third unit R
comprises perceptrons which receives the result of the predicates, also in binary form. While the
weights of S and A units are fixed, those of R are adjustable. The output of R unit could be such that if
the weighted sum of its inputs is less than or equal to zero, then the output is zero otherwise it is the
weighted sum itself.
Limitation :
1. It is impossible to strengthen the connections between active inputs and strengthen the correct parts
of the network.
2. The actual inputs are effectively masked off from the output units by the intermediate layer. The
two states of neuron being on or off (fig. A) do not give us any indication of the scale by which we
have to adjust the weights.
3. The hard hitting threshold functions remove the information that is needed if the network is to
successfully learn.
Hence, due to all the above reasons together, the network is unable to determine which of the input
weights should be increased and which one should not and so, it is unable to work to produce a better
solution.
The Solution :
If we smoothen the threshold function so that it more or less turns on or off as before but has a sloping
region in the middle that will give us information on the inputs, we will be able to determine when we
need to strengthen or weaken the relevant weights. now the network will be able to learn as required.
However, some new thresholding functions are possible. A list has been given in Table B.
Even now, the value of the outputs will practically be one if the input exceeds the value of the
threshold a lot and will be practically zero if the input is far less than the threshold.
However in cases, when input and threshold are almost same, the output of neurons will have a value
between zero and one, meaning that the output to the neurons can be related to input in more
informative way.
An artificial neuron receives much input representing the output of the other neurons. Each input is
multiplied by the corresponding weights to synaptic strengths. All of these weighted inputs are then
summed up and passed through an activation function to determine the neuron output. The artificial
neural model is shown in Fig..
The activation function f(u) is chosen as a nonlinear function to emulate the nonlinear behavior of
conduction current mechanism in a biological neuron. However, as the artificial neurons are not
intended to be a xerox copy of the biological neuron, many form of nonlinear functions are also used.
Some of these activation functions are given in Table B.
Frank Rosenblatt's (1962) and Minsky and Papert (1989) developed large classes of artificial neural
network known as "Perceptrons". These Perceptrons mainly use the pitts model of a neuron and
threshold output function. The Perceptron's iterative learning converges to the correct weight,
producing the EXACT output value for the corresponding training input pattern.
The Perceptron can be divided into three layers :
1. Sensory Layer/Unit
2. Associator Layer/Unit
2. Response Layer/Unit.
The first two units (i.e., sensory and associator) support only binary activations, while the last unit
(i.e., response unit) supports activations -l, 0, + 1. ·
All units are connected using corresponding weighted inter connections. This is shown in below Fig..
Consider a single layer feed-forward neural network as in Fig. 2.2 consisting of an input layer to
receive the inputs and the output layer outputs of the vector respectively. The input layer consists of 'n'
neurons and the output layer consists of 'm' neurons. The weight of synapse connecting ith input neuron
to the Jth neuron is considered as Wij·
The inputs of the input layer and the corresponding outputs layers are given.
………….2.1
Assume we use linear transfer function foe the neurons in input layer and the unipolar sigmodial
function for the neurons in the output layer.
Fig. 2.3. Block Diagram for Single Layer Feed-forward Neural Network.
Using the unipolar sigmoidal or squashed -S function as shown m Fig. 2.4, and the slope of function as
given in fig. 2.5 for neurons in output layer, the output is given by
OOK is evaluated as given Eq. 2.5. In Eq. 2.5 λ is known as sigmodial gain. The block diagram
representation of equation 2.6 is shown in Fig. 2.5. In equation 2.4; [W] is called weight matrix and is
also known as connection matrix.
A Multilayer perceptron model is a model in which the adaptive neurons are arranged in layers, unlike
the single layer perceptron, which had only a single layer of perceptrons.
The layers in the multilayer perceptron can be divided into three basic categories namely input layer,
output layer and a hidden layer which lies intermediately between the input and output layer having no
direct connection with both the layers. For the perceptrons in the input layer, we use linear transfer
function and for the perceptrons in the hidden layers and the output layers we use sigmoidal or
squashed S function·.
The input layer has nothing to do with the weighted sum or threshold. It is just concerned with the
distribution of the values they receive, to the next layer.
The input-output mapping of multilayer perceptron is shown in Fig 2.6 and can be represented by
O=N3[N2[N1[I]]] …………….. 2.7
.
Here N1, N2 and N3 are the non -linear mapping provided by input, hidden and output layers
respectively. The ability of multilayer perceptrons to recognize complex things is due to the nonlinear
activation function between layers. Many capabilities of neural networks such as non linear functional
approximation, learning, generalization etc. are in fact, due to nonlinear activation function of each
neuron.
The three layer network shown in Fig. 2.6 and block diagram in Fig. 2.7 show that the activity of
neurons in the input layers represent the raw information that is fed into the network.
Here, the activity of neurons in the hidden layer is computed by the activity of neurons in the input
layer and the corresponding connecting weights between the input and hidden units. In the same way,
the activity of the output layer depends upon the activity of the hidden layer and the corresponding
weights between the hidden and output layers .
I: Input
N1 : Non-linear mapping provided by input layer
N2 : Non-linear mapping provided by hidden layer
N3 : Non-linear mapping provided by output layer
O: Output
AND function
− there are 4 inequalities in the AND function and they must be stisfied.
w10 + w2 0 < θ , w1 0 + w2 1 < θ ,
w11 + w2 0 < θ , w1 1 + w2 1 > θ
Although it is straightforward to explicitly calculate a solution to the AND function problem, but the
question is "how the network can learn such a solution". That is, given random values for the weights
can we define an incremental procedure which will cover a set of weights which implements AND
function.
Example
Consider a simple neural network made up of two inputs connected to a single output unit.
− The output of the network is determined by calculating a weighted sum of its two inputs and
comparing this value with a threshold θ.
− If the net input (net) is greater than the threshold, then the output is 1, else it is 0.
A perceptron consists of a set of input units and a single output unit. As in the AND network, the
output of the perceptron is calculated by comparing the net input
net = n
∑WiIi
i=1 and a threshold θ.
If the net input is greater than the threshold θ , then the output unit is turned on , otherwise it is turned
off. To address the learning question, Rosenblatt solved two problems.
− First, defined a cost function which measured error.
− Second, defined a procedure or a rule which reduced that error by appropriately adjusting each of
the weights in the network.
However, the procedure (or learning rule) required to assesses the relative contribution of each
weight to the total error.
The learning rule that Roseblatt developed, is based on determining the difference between the
actual output of the network with the target output(0 or 1), called "error measure.
➢ the threshold is incremented by 1 (to make it less likely that the output unit would be turned on if the
same input vector was presented again).
Case 2 : If output unit is 0 but need to be 1 then the opposite changes are made.
Hidden Layer:
Back-propagation is simply a way to determine the error values in hidden layers. This needs be done
in order to update the weights.
The best example to explain where back-propagation can be used is the XOR problem.
Consider a simple graph shown below.
− all points on the right side of the line are +ve, therefore the output of the neuron should be +ve.
− all points on the left side of the line are –ve, therefore the output of the neuron should be –ve.
With this graph, one can make a simple table of inputs and outputs as shown below.
AND
INPUTS OUTPUT
X1 X2 Y
1 1 1
1 0 0
0 1 0
0 0 0
But a XOR problem can't be solved using only one neuron. If we want to train an XOR, we need 3
neurons, fully-connected in a feed-forward network.
Example
The table below indicates an 'nset' of input and output data. It shows ℓ inputs and the corresponding n
output data.
Table : 'nset' of input and output data
The hidden neurons are connected by synapses to input neurons and (let us denote) Vij is the weight of
the arcs between the ith input neuron to jth hidden neuron. Now, we know that the input to the hidden
neuron is the weighted sum of the outputs of the input neurons to get IHP (i.e., input to the pth hidden-
neuron). This can be shown as
Now, denoting weight matrix or connectivity matrix between input neurons and hidden neurons as [V] l x
m we can reduce an equation for input to hidden neuron as, ·
where {O} and {I} are output and input sets or matrices.
mxl lxl ·
Consider sigmoidal function or squashed S function, the output of the pth hidden neurons is given by,
OHP = 1/ (1 + e-λ(IHP-θHP))
where OHP is the output of Pth hidden neuron. IHP is the input of the Pth hidden neuron, and θHP is the
threshold of the Pth neuron. A nonzero threshold neuron is computationally equivalent to an input that
always has a value of '-1' and the non-zero threshold serves as the connecting weight value. This whole
can be shown as Fig. 2.8
Treating each component of the input of the hidden neuron separately, we can get the outputs of the
hidden neuron as given by above eq.. Now, we know that input to the output neuron is the weighted sum
of the outputs of the hidden neurons. So if we want to compute the input to the qth output neuron,
denoted by Ioq we can find it by,
Denoting weight matrix or connectivity matrix between hidden neurons and output neurons as [W], we
can get input to the output neurons as,
Considering sigmoidal function, the output of the qth output neuron is given by,
OOq= 1/(1+e-λ(IOq-θOq))
where, OOq is the output of the qth output neuron, IOq is the input to the qth output, and θOq is the
threshold of the qth neuron. This threshold may also be tackled again by considering extra Oth neuron in
the hidden layer with output -1 and the threshold value θOq becomes the connecting weight value as
shown in fig. 2.9.
Calculation of Error:
Considering any rth output neuron and for the training example we have calculated the output 'O' for
which the target output 'T is given. Hence, the error norm in output for the rth output neuron is given by
where second norm of the error in the rth neuron (er) for the given training pattern. The square
of the error is considered since irrespective of whether error is positive or negative, we consider only
absolute values. The Euclidean norm of error E1 for the first training pattern is given by
Above Equation gives the error function in one training pattern. If we use the same technique for all the
training patterns, we get
where E is the error function depending on the m (I + n) weights of [W] and [V]. This is a classic type of
optimization problem. For such problems, an objective function or cost function is usually defined to be
maximized or minimized with respect to a set of parameters. In this case, the network parameters that
optimize the error function E over the 'nset' of pattern sets [Inset, tnset] are synaptic weight values [V] and
[W] whose sizes are
Learning rate coefficient determines the size of the weight adjustments made at each iteration and hence
influences the rate of convergence. Poor choice of the coefficient can result in a failure in convergence.
We should keep the coefficient constant through all the iterations for best results. If the learning rate
coefficient is too large, the search path will oscillate and converges more slowly than a direct descent as
shown in Fig. 2.9(a). If the coefficient is too small, the descent will progress in smal1 steps significantly
increasing the time to converge (see Fig. 2.9(b)). the learning coefficient is taken as 0.9 and this seems
to be optimistic (see Fig. 2.9(c)). Jacobs (1988) has suggested the use of adaptive coefficient where the
value of the learning coefficient is the function of error derivative on successive updates.
and we condiser sigmoidal function for activation functions for the hidden and output layers and linear
activation function for input layer, and no. of neuron in the hidden layer may be chosen to lie between 1
and 2l i.e.,1 < m < 2l
..
Step 1. Normalize the inputs and outputs with respect to their maximum values. It is proved that neural
networks work better if input and output lie between 0 and 1. For each training pair, assume there are
'l'··inputs given by {I}I and ' 'n' outputs {O}O in a normalized form '
lx1 nxl _ ·
Step 2. Assume the no. of neurons in the hidden layer to lie between 1 < m < 21.
Step 3. [V] – represents the weights of synapses connecting input neurons and hidden neurons and [W]
represents weights of synapses connecting hidden neurons and output neurons. Initialize the weights to
small random values usually from - l to 1.for general problems can be assumed as 1 and the threshold
values can be taken as zero.
Step 4. For the training data, present one set of inputs and outputs. Present the pattern to the input layer
{I}I as inputs to the input layer . By using linear activation function the output of the input layer may be
evaluated as
{O}I = {I}I
lx1 lx1
Step 5. Compute the inputs to the hidden layer by multiplying corresponding weights of synapses as
{I}H = . [V]T {O}I
mxl mx l lxl
.'
Step 6. Let the hidden layer units evaluate the output using the. sigmoidal function as
………..
………..
{O}H = 1/1+e-IHi
…………..
…………
mx1
Step 7. Compute the inputs to the output layer by multiplying corresponding weights of synapses as
{I}O = [W]T {O}H
nxl nxm mxl
Step 8. Let the output layer units evaluate the output using sigmodial function as
………..
…………
{O}O = 1/1+e-Ioj
…………..
…………
Setp 9. Calculate the error and the difference between the network output and the desired output as for
the ith training set as
…….
………
{d*} = ei(OHi)(1-OHi)
……….
……….
mx1 mx1
lxm lxm l xm
Step 17. Repeat steps 4-16 until the convergence in the error rate is less than the tolerance value.
A proper selection of tuning parameters such as momentum factor, learning co-efficient, sigmoidal gain
and threshold value are required for efficient learning and designing of stable network.
Weight adjustment is based on Momentum method. The Momentum factor has a significant role in
deciding the values of learning rate that will produce rapid learning. If momentum is zero, then
smoothening is minimum and the entire weight adjustment comes from newly calculated change. If
momentum is one, new adjustment is ignored and the previous one is repeated. Between 0 and 1 is a
region where the weight adjustment is smoothened by an amount proportional to the momentum factor.
The role of momentum factor is to increase the speed of learning without leading to oscillations.
The momentum also overcomes the effect of level minima. The use of momentum terns will often carry
a weight change process through one or local minima to get into global minima. This is perhaps the most
important function. If α is momentum coefficient then
Note : α should be positive but less than 1 i.e., range of a is O < α < 1. Typical value of α lies between
0.5-0.9.
The choice learning co-efficient is a tricky task in back propagation algorithm. The range of learning
co-efficient that will produce rapid training depends on the number and types of input patterns. An
empirical formula to select learning co-efficient has been suggested by Eaton and Oliver (1992) and is
given as
where N1 is the number of patterns of type 1 and m is the no. of different patterns. If the learning co-
efficient is large , i. e., greater then 0.5, the weights are changed drastically but this may cause optimum
combination of weights to be "Overshot" resulting in oscillations about the optimum. If the learning is
small, i.e., less than 0.2, the weights are changed in small increments, thus, causing the system to
converge more slowly but with little oscillations.
In some problems, when the weights become large and force the neuron to operate on a· region where
sigmoidal function is very flat, a better method of coping with network paralyses is to adjust the
sigmoidal gain. By decreasing the scaling factor, we effectively spread out sigmoidal function on wide
range to that training proceeds faster.
If sigmoidal function is selected, the input-output relationship of the neuron can be set
O = 1/ (1 + e-λ(I + θ))
where λ is a scaling factor, known as sigmoidal gain.
As the scaling factor increases, it is seen that input-output characteristic of the analog neuron approaches
that of the two-state neuron or the activation function approaches the Sat function.
(iv) Threshold Value : O = 1/ (1 + e-λ(I + θ))
θ in above equation is commonly called as threshold value of a neuron, or the bias or the noise factor. A
neuron, fires or generate an output if the weighted sum of the input exceeds the threshold value.
The objective of soil classification are to find the suitability of the soil for the construction of different
structures, embankments, sub-grades and wearing surface. The classification is given in Fig. 2.10.
NN is used for the classification of soil. The architecture used for grouping the soil is 6 input neurons, 6
hidden neurons and 1 output neuron. The six input represent the colour of the soil, percentage of gravel,
percentage of sand, liquid limit WL and plastic limit WP The output represent the IS classification of soil.
The codes taken for IS is -
0.1 ➔ Clayel sand
0.2 ➔ Clay with medium compressibility
Hot metal forming has become an attractive process in industry due to achieved energy and material
savings, quality improvement and development of homogeneous properties throughout the component.
NNs are now-a-days applied to the simulation of hot extrusion due to the fact that the finite element
solution of hot forging are very complex and require a lot of computer time.
The NN architecture consist of three layers-two input neurons, eight neurons in a hidden layer and five
output neurons. Both inputs and outputs are normalized such that values lie between 0 and 1 . The
learning rate and momentum factor are as 0.6 and 0.8 respectively. The error rate reaches at tolerance
value at 3500 iterations
The inputs are die angle and punch velocity and the outputs are forging load, maximum equivalent
strain, maximum equivalent stress, maximum normal velocity and equivalent strain rate. Twenty-four
data set are taken for training and eighteen data-sets are taken for testing;. After neural network is
trained, it is used for inferring.
Whenever the machine elements move, there are bearing surfaces some of which are lubricated easily,
some of which are lubricated incompletely and with difficulty, and some of which are not lubricated at
all. When there is relative motion between two machine parts, one of which supporting the other, then
the supporting member is called bearing. The bearings are classified as shown in Fig. 2.11
Out of these bearing, let us consider journal bearing. ·Design of journal bearing is depend upon the load,
speed of the journal, clearance in the bearing, length and diameter of bearing, and kinds of surface.
"NEURONET" is used to train the data and infer the results for test data. A back propagation NN with
8 input neurons, 8 hidden neurons and 2 output neurons has been used. The learning and momentum
factor is 0.6 and 0.9 respectively. Although, 5000 iterations have been performed till the error rate
converges to the tolerance.
Once the network is trained with the given training data the values are inferred for both training and
testing the data.