28-Back Propagation Network-05-10-2024
28-Back Propagation Network-05-10-2024
References:
1. Ethem Alpaydin, "Introduction to Machine Learning”, MIT Press,
Prentice Hall of India
2. Tom Mitchell, “Machine Learning”, McGraw Hill
3. “Data Mining : Concepts and Techniques”, Jiawei Han, Micheline
Kamber, Jian Pei, 3rd Edition
Neural Networks
• Networks of processing units (neurons)
with connections (synapses) between
them
• Large number of neurons: 1011
• Large connectitivity: Neurons are
connected to 104 other neurons
• Parallel processing
• Distributed computation/memory
Understanding the Brain
• Levels of analysis (Marr, 1982) – Understanding
the information processing system has 3 levels
1. Computational theory - goal of computation and
an abstract definition of the task
2. Representation and algorithm - Representation of
the input and the output , specification of the
algorithm for the transformation from the input to
the output.
3. Hardware implementation - physical realization of
the system.
Biological nervous system
• Biological nervous system is the most important part of
many living things, in particular, human beings.
• Brain is at the centre of human nervous system.
• Any biological nervous system consists of a large number
of interconnected processing units called neurons.
• Each neuron is very long and they can operate in parallel.
• Typically, a human brain consists of approximately 1011
neurons communicating with each other with the help of
electrical impulses.
Biological Neuron
Neuron: Basic unit of nervous system
• To generate the final output y, the sum is passed to a filter called transfer
function, which releases the output.
y = ɸ(I)
Artificial Neuron
• A biological neuron is modeled artificially.
• Let us suppose that there are n inputs (such as I1, I2,... , In) to a neuron j.
• The weights connecting n number of inputs to j-th neuron are represented by [W ] =
[W1j, W2j,... , Wnj].
• The function of summing junction of an artificial neuron is to collect the weighted
inputs and sum them up.
– Thus, it is similar to the function of combined dendrites and soma.
• The activation function (also known as the transfer function) performs the task of
axon and synapse.
• The output of summing junction may sometimes become equal to zero and to
prevent such a situation, a bias of fixed value bj is added to it.
• Thus the input to transfer function f is determined as
Artificial Neuron
• The output of j-th neuron, Oj can be obtained as follows:
O=u
Artificial Neural Network
• Log-Sigmoid Transfer Function:
– It generates the output lying in a range of (0.0, 1.0) and becomes
equal to 0.5, corresponding to an input u = 0.0.
– The output O of the transfer function can be expressed as the function
d
y w j x j w0 w T x
j 1
x 1, x1 ,..., xd
T
Perceptron
• During testing, with given
• With more than one
weights, w, for input x, we
compute the output y. input, the line becomes
• When d = 1 and x is fed a (hyper)plane, and the
from the environment perceptron with more
through an input unit, y = than one input can be
wx + w0 used to implement
• Perceptron with one input multivariate linear fit
and one output can be used • Classification:
to implement a linear fit.
– Regression: y=wx+w
y=1(wx+w 0>0)
0
Perceptron
• Perceptron can separate two classes by
checking the sign of the output.
• Define s(·) as the threshold function, such as
Perceptron
• Regression: y=wx+w0 • Classification: y=1(wx+w0>0)
y y
s y
w0 w0
w w
x
w0
x x
x0=+1
1
y sigmoid o
1 exp w T x
Perceptron
• For posterior probability : use the sigmoid
function at the output as
Regression:
d
y i w ij x j w i 0 w Ti x
• K Outputs j 1
y Wx
Classification:
oi w Ti x
expoi
yi
k expok
choose C i
if y i max y k
k
Perceptrons
• Each perceptron is a local function of its inputs and synaptic
weights.
• In classification, if we need the posterior probabilities ,use
the softmax.
• Implementing this as a neural network results in a two-stage
process, where the first calculates the weighted sums, and
the second calculates the softmax values (denoted as a
single layer)
• Softmax function, or normalized exponential function, is a generalization
of the logistic function that "squashes" a K-dimensional vector z of
arbitrary real values to a K-dimensional vector σ ( z ) of real values in the
range [0, 1] that add up to 1. The function is given by
Softmax example
import math
z = [1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0]
z_exp = [math.exp(i) for i in z]
print([round(i, 2) for i in z_exp])
#[2.72, 7.39, 20.09, 54.6, 2.72, 7.39, 20.09]
sum_z_exp = sum(z_exp)
print(round(sum_z_exp, 2)) #114.98
softmax = [round(i / sum_z_exp, 3) for i in z_exp]
print(softmax)
#[0.024, 0.064, 0.175, 0.475, 0.024, 0.064, 0.175]
Training
• The perceptron defines a hyperplane, and the neural
network perceptron is just a way of implementing the
hyperplane.
• Given a data sample, the weight values can be calculated
offline and then when they are plugged in, the perceptron
can be used to calculate the output values.
• In training neural networks, online learning algorithm is
used where we are not given the whole sample, but we
are given instances one by one
– network updates its parameters after each instance, adapting
itself slowly in time.
Training
• Online (instances seen one by one) vs batch (whole sample)
learning:
– No need to store the whole sample
– Problem may change in time
– Wear and degradation in system components
• Stochastic gradient-descent is used to update after a single
pattern.
• Generic update rule:
w ri y x
t
ij
t t
i
t
j
• Here
zh sigmoid w Th x
1
1 exp d
j 1
w hj x j w h 0
y= z1 + z2 –
0.5
z1= x1 – x2 – 0.5
z2= x2 – x1 –
0.5
x1 XOR x2 = (x1 AND ~x2) OR (~x1 AND x2), The hidden units
and the output have the threshold activation function with threshold at 0.
XOR
• For every input combination where the output is 1, define a hidden unit
that checks for that particular conjunction of the input.
• The output layer then implements the disjunction.
• Hence, x1 XOR x2 = (x1 AND ~x2) OR (~x1 AND x2)
• The first layer maps inputs from the (x1, x2) to the (z1, z2) space defined
by the first-layer perceptrons.
• Inputs (0,0) and (1,1) are mapped to (0,0) in the (z1, z2) space (as in
previous slide and table given below), allowing linear separability in this
second space.
X1 X2 ~X1 ~X2 Z1=X1 AND ~X2 Z2=~X1 AND X2 Z1 OR Z2 X1 XOR X2
0 0 1 1 0 0 0 0
0 1 1 0 0 1 1 1
1 0 0 1 1 0 1 1
1 1 0 0 0 0 0 0
Training a Perceptron: Regression
• Regression (Linear output):
1 t 1 t
t t t
2 2
E w | x , r r y r w x
t 2 T t 2
w j r y x j
t t t t
Classification
• Single sigmoid output
y t sigmoid w T xt
E t w | xt , r t r t log y t 1 r t log 1 y t
Cross-Entropy
w ijt rit y it x tj
MLP -Example
WSH1 = (0.2 * 10) + (-0.1 * 30) +
(0.4 * 20) = 2 -3 + 8 = 7.
σ(WSH1) = 1/(1+e-7) =
1/(1+0.000912) = 0.999
x1 x2 t
1 1 1
1 -1 -1
-1 1 -1
-1 -1 -1
– Let the initial weights and threshold be set to zero, i.e., w1=w2= 0 and
ɵ = 0.
– The learning rate α(alpha) be set equal to 1.
Perceptron Example
• Example: AND function continued (x2=-x1+1)
– The initial weights and threshold are w1=w2= 0, bias b=0, learning rate α=1 and ɵ = 0
1 1 1 1 1 0 0 0 1 1 -1
Epoch 2
1 -1 -1 -1 -1 0 0 0 1 1 -1
-1 1 -1 -1 -1 0 0 0 1 1 -1
-1 -1 -1 -3 -1 0 0 0 1 1 -1
– Let the initial weights and threshold be set to zero, i.e., w1=w2= 0 and
ɵ = 0.
– The learning rate α(alpha) be set equal to 1.
Perceptron Example
• Example: Find the weights using perceptron network for
ANDNOT function when all the inputs are presented only one
time. Use bipolar input and bipolar targets.
– The truth table for ANDNOT function is given below:
x1 x2 t
1 1 -1
1 -1 1
-1 1 -1
-1 -1 -1
– Let the initial weights and threshold be set to zero, i.e., w1=w2= 0 and
ɵ = 0.
– The learning rate α(alpha) be set equal to 1.
Perceptron
• Example: Implement OR function using perceptron networks
for binary inputs and bipolar targets.
x1 x2 t
1 1 1
1 0 1
0 1 1
0 0 -1
Perceptron
• Example: Implement OR function using perceptron networks
for bipolar inputs and bipolar targets.
x1 x2 t
1 1 1
1 -1 1
-1 1 1
-1 -1 -1
Perceptron
• Example: Implement OR function using perceptron networks
for bipolar inputs and bipolar targets. Also, present the
separating hyperplane.
x1 x2 t
1 1 1
1 -1 1
-1 1 1
-1 -1 -1
Perceptron Example
• Classify the two-dimensional input pattern shown in Figure 6 using
perceptron network.
– The symbol "*" indicates the data representation to be +1 and "•" indicates
data to be -1.
– The patterns are I F.
– For pattern I, the target is + 1, and for F, the target is -1.
Perceptron Example
• Classify the two-dimensional input pattern shown in Figure 6 using
perceptron network.
– The symbol "*" indicates the data representation to be +1 and "•" indicates
data to be -1.
– The patterns are I F.
– For pattern I, the target is + 1, and for F, the target is -1.
Perceptron
• Example: Find the weights using perceptron
network for AND function when all the inputs
are presented only one time. Use binary
inputs and bipolar targets.
Back Propagation Network
vqj wiq
x1 y1
xj yi
xm yn
H
y i v z v ih zh v i 0
T
i
h 1
zh sigmoid w Th x
1
1 exp d
j 1
w hj x j w h 0
E E y i zh
w hj y i zh w hj
Other References
1. D. K. Pratihar, Soft Computing : Fundamentals
and Applications,2nd Ed., Narosa, 2013,
chapter 9
2. S.N.Sivanandam and S.N.Deepa, Principles of
Soft Computing, rded, Wiley
Publications,2018