0% found this document useful (0 votes)
13 views92 pages

28-Back Propagation Network-05-10-2024

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views92 pages

28-Back Propagation Network-05-10-2024

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 92

Module5_Neural_Networks

References:
1. Ethem Alpaydin, "Introduction to Machine Learning”, MIT Press,
Prentice Hall of India
2. Tom Mitchell, “Machine Learning”, McGraw Hill
3. “Data Mining : Concepts and Techniques”, Jiawei Han, Micheline
Kamber, Jian Pei, 3rd Edition
Neural Networks
• Networks of processing units (neurons)
with connections (synapses) between
them
• Large number of neurons: 1011
• Large connectitivity: Neurons are
connected to 104 other neurons
• Parallel processing
• Distributed computation/memory
Understanding the Brain
• Levels of analysis (Marr, 1982) – Understanding
the information processing system has 3 levels
1. Computational theory - goal of computation and
an abstract definition of the task
2. Representation and algorithm - Representation of
the input and the output , specification of the
algorithm for the transformation from the input to
the output.
3. Hardware implementation - physical realization of
the system.
Biological nervous system
• Biological nervous system is the most important part of
many living things, in particular, human beings.
• Brain is at the centre of human nervous system.
• Any biological nervous system consists of a large number
of interconnected processing units called neurons.
• Each neuron is very long and they can operate in parallel.
• Typically, a human brain consists of approximately 1011
neurons communicating with each other with the help of
electrical impulses.
Biological Neuron
Neuron: Basic unit of nervous system

• Dendrite : A bush of very thin fibre.


• Axon : A long cylindrical fibre.
• Soma : It is also called a cell body, and just like as a nucleus of cell.
• Synapse : It is a junction where axon makes contact with the dendrites of neighbouring
neurons.
Similarity with Biological Network
• Fundamental processing element of a neural
network is a neuron
Neuron and its working
• There is a chemical in each neuron called neurotransmitter.
• A signal (also called sense) is transmitted across neurons by this
chemical.
– all inputs from other neuron arrive at a neuron through dendrites.
• These signals are accumulated at the synapse of the neuron and then
serve as the output to be transmitted through the neuron.
• An action may produce an electrical impulse, which usually lasts for
about a millisecond.
– this pulse is generated due to an incoming signal and all signal may not
produce pulses in axon unless
– an action signal in axon of a neuron are the signals that arrive at dendrites
which is summed up at soma.
Analogy between BNN and ANN
Artificial Neural Networks
• A neutron is a part of an interconnected network of nervous system and
serves the following.
– Compute input signals
– Transportation of signals (at a very high speed)
– Storage of information
– Perception, automatic training and learning
Artificial Neural Networks
• The human brain is a highly complex structure viewed as a
massive, highly interconnected network of simple
processing elements called neurons.
• Artificial neural networks (ANNs) or simply neural network
(NNs), are simplified models (i.e. imitations) of the
biological nervous system
– have been motivated by the kind of computing performed by the
human brain.
• The behavior of a biological neural network can be captured
by a simple model called Artificial Neural Network.
Artificial Neural Network
• A biological neuron receives all inputs through the dendrites,
sums them and produces an output if the sum is greater than
a threshold value.
• The input signals are passed on to the cell body through the
synapse, which may accelerate or retard an arriving signal.
• It is this acceleration or retardation of the input signals that is
modeled by the weights.
• An effective synapse, which transmits a stronger signal will
have a correspondingly larger weight while a weak synapse
will have smaller weight.
• Thus, weights here are multiplicative factors of the inputs to
account for the strength of the synapse.
Artificial Neural Network
• A neuron is a part of an interconnected network of nervous
system and serves the following.
– Compute input signals
– Transportation of signals (at a very high speed)
– Storage of information
– Perception, automatic training and learning
• Analogy between the biological neuron and artificial neuron:
– Every component of the model (i.e. artificial neuron) bears a direct
analogy to that of a biological neuron.
– It is this model which forms the basis of neural network (i.e. artificial
neural network).
Artificial Neuron

– Here, x1, x2, …, xn are the n inputs to the artificial


neuron.
– w1, w2, …, wn are weights attached to the input links.
Artificial Neuron
• Hence, the total input, say I, received by the soma of the artificial neuron
is

• To generate the final output y, the sum is passed to a filter called transfer
function, which releases the output.

y = ɸ(I)
Artificial Neuron
• A biological neuron is modeled artificially.
• Let us suppose that there are n inputs (such as I1, I2,... , In) to a neuron j.
• The weights connecting n number of inputs to j-th neuron are represented by [W ] =
[W1j, W2j,... , Wnj].
• The function of summing junction of an artificial neuron is to collect the weighted
inputs and sum them up.
– Thus, it is similar to the function of combined dendrites and soma.
• The activation function (also known as the transfer function) performs the task of
axon and synapse.
• The output of summing junction may sometimes become equal to zero and to
prevent such a situation, a bias of fixed value bj is added to it.
• Thus the input to transfer function f is determined as
Artificial Neuron
• The output of j-th neuron, Oj can be obtained as follows:

• In an ANN, the output of a neuron largely depends on its transfer function.


• Different types of transfer function are in use, such as hard-limit, linear, log-
sigmoid, tan-sigmoid, etc..
• Schematic view:
Artificial Neuron
• Schematic View:
Artificial Neuron
• A very commonly known transfer function is
the thresholding function.
• In this thresholding function, weighted sum is
compared with a threshold value ɵ.
• If the value of the weighted sum is greater
than ɵ , then the output is 1 else it is 0.
Artificial Neural Network
• Two simple thresholding functions
– Hard-limit transfer function
– Linear transfer function (Signum transfer function)
Artificial Neural Network
• Hard-limit transfer function:
– It generates either 1.0 or 0.0 depending on its
input u. The output O is calculated from input u
like the following:

• If u is found to be less than 0.0, the output of this transfer


function becomes equal to 0.0; otherwise it yields 1.0.
• It is generally used in a perceptron neuron (for
classification).
Artificial Neural Network
• Hard-limit transfer function:
Artificial Neural Network
• Linear transfer function:
– The output of this transfer function is made equal
to its input and it lies in the range of (-1.0 to 1.0).
The input-output relationship of this transfer
function may be expressed like the following:
O=u
– It is generally utilized in a linear filter.
Artificial Neural Network
• Linear transfer function:

O=u
Artificial Neural Network
• Log-Sigmoid Transfer Function:
– It generates the output lying in a range of (0.0, 1.0) and becomes
equal to 0.5, corresponding to an input u = 0.0.
– The output O of the transfer function can be expressed as the function

– where a represents the coefficient of transfer


function.
– The nature of distribution of this transfer
function depends on the value of a.
– It is generally used in a
Back-Propagation Neural Network (BPNN)
Artificial Neural Network
• Tan-Sigmoid Transfer Function:
– This transfer function yields an output lying in the range of (-1.0, 1.0).
– The input-output relationship can be expressed as follows:

• where a is the coefficient of transfer


function.
• It generates zero output,
corresponding to its input u = 0.0.
– It is frequently used in BPNN.
Artificial Neural Network
• Example with difference values for coefficient of
transfer function
Advantages of ANN
• ANNs exhibits mapping capabilities, that is, they can map input patterns to their
associated output pattern.
• ANNs learn by examples.
– Thus, an ANN architecture can be trained with known example of a problem before they are tested
for their inference capabilities on unknown instance of the problem. In other words, they can identify
new objects previous untrained.
• ANNs posses the capability to generalize.
– This is the power to apply in application where exact mathematical model to problem are not
possible.
• ANNs are robust system and fault tolerant.
– They can therefore, recall full patterns from incomplete, partial or noisy patterns.
• ANNs can process information in parallel, at high speed and in a distributed
manner.
– Thus a massively parallel distributed processing system made up of highly interconnected (artificial)
neural computing elements having ability to learn and acquire knowledge is possible.
Characteristics of Artificial Neural Network

• It is a neurally implemented mathematical model


• It contains huge number of interconnected processing elements
called neurons to do all operations.
• Information stored in the neurons are basically the weighted linkage
of neurons.
• The input signals arrive at the processing elements through
connections and connecting weights.
• It has the ability to learn, recall and generalize from the given data by
suitable assignment and adjustment of weights.
• The collective behavior of the neurons describes its computational
power, and no single neuron carries specific information.
Perceptron
• Basic processing element
• Input is from environment or other perceptron
• Connection/synaptic weight is associated with each input
• Output y is weighted sum of inputs (simple case)

d
y  w j x j  w0 w T x
j 1

w w0 ,w1 ,...,wd  T

x 1, x1 ,..., xd 
T
Perceptron
• During testing, with given
• With more than one
weights, w, for input x, we
compute the output y. input, the line becomes
• When d = 1 and x is fed a (hyper)plane, and the
from the environment perceptron with more
through an input unit, y = than one input can be
wx + w0 used to implement
• Perceptron with one input multivariate linear fit
and one output can be used • Classification:
to implement a linear fit.
– Regression: y=wx+w
y=1(wx+w 0>0)
0
Perceptron
• Perceptron can separate two classes by
checking the sign of the output.
• Define s(·) as the threshold function, such as
Perceptron
• Regression: y=wx+w0 • Classification: y=1(wx+w0>0)

y y
s y
w0 w0
w w
x
w0
x x
x0=+1
1
y sigmoid o  

1  exp  w T x 
Perceptron
• For posterior probability : use the sigmoid
function at the output as
Regression:
d
y i  w ij x j  w i 0 w Ti x
• K Outputs j 1

y Wx
Classification:

oi w Ti x
expoi
yi 
k expok
choose C i
if y i max y k
k
Perceptrons
• Each perceptron is a local function of its inputs and synaptic
weights.
• In classification, if we need the posterior probabilities ,use
the softmax.
• Implementing this as a neural network results in a two-stage
process, where the first calculates the weighted sums, and
the second calculates the softmax values (denoted as a
single layer)
• Softmax function, or normalized exponential function, is a generalization
of the logistic function that "squashes" a K-dimensional vector z of
arbitrary real values to a K-dimensional vector σ ( z ) of real values in the
range [0, 1] that add up to 1. The function is given by
Softmax example
import math
z = [1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0]
z_exp = [math.exp(i) for i in z]
print([round(i, 2) for i in z_exp])
#[2.72, 7.39, 20.09, 54.6, 2.72, 7.39, 20.09]
sum_z_exp = sum(z_exp)
print(round(sum_z_exp, 2)) #114.98
softmax = [round(i / sum_z_exp, 3) for i in z_exp]
print(softmax)
#[0.024, 0.064, 0.175, 0.475, 0.024, 0.064, 0.175]
Training
• The perceptron defines a hyperplane, and the neural
network perceptron is just a way of implementing the
hyperplane.
• Given a data sample, the weight values can be calculated
offline and then when they are plugged in, the perceptron
can be used to calculate the output values.
• In training neural networks, online learning algorithm is
used where we are not given the whole sample, but we
are given instances one by one
– network updates its parameters after each instance, adapting
itself slowly in time.
Training
• Online (instances seen one by one) vs batch (whole sample)
learning:
– No need to store the whole sample
– Problem may change in time
– Wear and degradation in system components
• Stochastic gradient-descent is used to update after a single
pattern.
• Generic update rule:
w  ri  y x
t
ij
t t
i
t
j

Update LearningFa ctor DesiredOut put  ActualOutp ut Input


Learning Boolean Functions

• A perceptron can be used to represent


boolean functions.
• A boolean function be viewed as a two-class
classification problem.
• Perceptrons can be used to learn boolean
functions.
Learning Boolean AND
• Perceptron that implements AND and its geometric interpretation in two dimensions
• The discriminant is y = s(x1 + x2 − 1.5)
• y satisfies the four constraints given by the definition of AND function
Learning Boolean OR
• The discriminant is y = s(x1 + x2 − 0.5)
• y satisfies the four constraints given by the definition of AND function
43
Learning Boolean Functions
• Hence Boolean functions like AND and OR are
linearly separable.
– A line can binary classify the given samples.
• But XOR is not linearly separable.
Learning XOR
• XOR is not linearly separable (We cannot draw a line where the red circles are on
one side and the blue circles are on the other side)

• No w0, w1, w2 satisfy:


w0 0
w 2  w0 0
w1  w0 0
w1  w 2  w 0 0
Learning XOR
• 1.w1 + 0.w2 (>= t , threshold )
0.w1 + 1.w2 (>= t )
0.w1 + 0.w2 (<t)
1.w1 + 1.w2 (< t )
• w1 >= t
w2 >= t
0<t
w1+w2 < t
Is a Contradiction.
• Note: We need all 4 inequalities for the contradiction. If weights are negative, e.g. weights
= -4 and t = -5, then weights can be greater than t yet adding them is less than t, but t > 0
stops this.
• A "single-layer" perceptron can't implement XOR. The reason is because the classes in XOR
are not linearly separable.
• We cannot draw a straight line to separate the points (0,0),(1,1) from the points (0,1),(1,0).
• This requires the concept of multi-layer networks.
Learning XOR
• Alternatively,

• Here

• The second, third and fourth equations are self-contradictory.


Hence, cannot be solved with linear equations.
Multilayer Perceptrons
H
y i v Ti z  v ih zh  v i 0
h 1

zh sigmoid w Th x 
1

1  exp    d
j 1
w hj x j  w h 0 

(Rumelhart et al., 1986)


XOR

y= z1 + z2 –
0.5
z1= x1 – x2 – 0.5

z2= x2 – x1 –
0.5

x1 XOR x2 = (x1 AND ~x2) OR (~x1 AND x2), The hidden units
and the output have the threshold activation function with threshold at 0.
XOR
• For every input combination where the output is 1, define a hidden unit
that checks for that particular conjunction of the input.
• The output layer then implements the disjunction.
• Hence, x1 XOR x2 = (x1 AND ~x2) OR (~x1 AND x2)
• The first layer maps inputs from the (x1, x2) to the (z1, z2) space defined
by the first-layer perceptrons.
• Inputs (0,0) and (1,1) are mapped to (0,0) in the (z1, z2) space (as in
previous slide and table given below), allowing linear separability in this
second space.
X1 X2 ~X1 ~X2 Z1=X1 AND ~X2 Z2=~X1 AND X2 Z1 OR Z2 X1 XOR X2

0 0 1 1 0 0 0 0

0 1 1 0 0 1 1 1

1 0 0 1 1 0 1 1

1 1 0 0 0 0 0 0
Training a Perceptron: Regression
• Regression (Linear output):

1 t 1 t
t t t

2 2
 
E w | x , r  r  y   r  w x 
t 2 T t 2

w j  r  y x j
t t t t
Classification
• Single sigmoid output
y t sigmoid w T xt 
E t w | xt , r t  r t log y t  1  r t log 1  y t 
Cross-Entropy

w tj  r t  y t x tj Online update rule

• K>2 softmax outputs


exp w Ti xt
t
y  E t w i i | xt , r t   rit log y it
k exp w T t
kx i

w ijt  rit  y it x tj
MLP -Example
WSH1 = (0.2 * 10) + (-0.1 * 30) +
(0.4 * 20) = 2 -3 + 8 = 7.

σ(WSH1) = 1/(1+e-7) =
1/(1+0.000912) = 0.999

WSH2 = (0.7 * 10) + (-1.2 * 30) +


(1.2 * 20) = 7 - 36 + 24 = -5

σ(WSH2) = 1/(1+e5) = 1/(1+148.4)


= 0.0067
WSO1 = (1.1 * 0.999) + (0.1*0.0067) =
1.0996

WSO2 = (3.1 * 0.999) + (1.17*0.0067) =


3.1047

σ(WSO1) = 1/(1+e-1.0996) = 1/(1+0.333) =


0.750

σ(WSO2) = 1/(1+e-3.1047) = 1/(1+0.045) =


0.957
Consolidated output for MLP
Perceptron
Perceptron
Perceptron Learning Rule
Perceptron Training Algorithm for Single Output Classes
Perceptron: Flowchart for training
Perceptron Network Testing Algorithm
Perceptron Example
• Example: Implement AND function using perceptron training
algorithm with bipolar inputs and bipolar targets.

x1 x2 t
1 1 1
1 -1 -1
-1 1 -1
-1 -1 -1

– Let the initial weights and threshold be set to zero, i.e., w1=w2= 0 and
ɵ = 0.
– The learning rate α(alpha) be set equal to 1.
Perceptron Example
• Example: AND function continued (x2=-x1+1)
– The initial weights and threshold are w1=w2= 0, bias b=0, learning rate α=1 and ɵ = 0

Changes in Weight Weight


x1 x2 target(t) Net Input Calculated
output
Δw1(αtx1) Δw2(αtx2) Δb(αt) w1 w2 b
1 1 1 0 -1 1 1 1 1 1 1
Epoch 1
1 -1 -1 1 1 -1 1 -1 0 2 0
-1 1 -1 2 1 1 -1 -1 1 1 -1
-1 -1 -1 -3 -1 0 0 0 1 1 -1

1 1 1 1 1 0 0 0 1 1 -1
Epoch 2
1 -1 -1 -1 -1 0 0 0 1 1 -1
-1 1 -1 -1 -1 0 0 0 1 1 -1
-1 -1 -1 -3 -1 0 0 0 1 1 -1

– The final weights are w1=1, w2= 1, bias b= -1


Perceptron Example
• Example: AND function continued
– The final weights are w1=1, w2= 1, bias b= -1
– the equation of the separating hyperplane (boundary) is ?
Perceptron Example
• Example: AND function continued
– The final weights are w1=1, w2= 1, bias b= -1
– Hence the equation of the separating hyperplane
(boundary) is
w1x1 + w2x2 + b= ɵ
w1x1 + w2x2 + b= 0
1 *x1 + 1 *x2 + (-1)= 0
x2 = - x1 + 1
Perceptron Example
• Example: AND function
continued
– The equation of the
separating hyperplane
(boundary) is
x2 = -x1 + 1
• Since the intercept is 1
• First point of the line/plane
is (0,1)
• Since the slope is -1 i.e -
1/1
• i.e dy/dx= change in
y/change in x = -1/1
• Second point of the
line/plane is (1,0)
– (1,0) <- ( 0+1, 1-1)
Perceptron Example
• Example: Find the weights using perceptron network for
ANDNOT function when all the inputs are presented only one
time. Use bipolar input and bipolar targets.
Perceptron Example
• Example: Find the weights using perceptron network for
ANDNOT function when all the inputs are presented only one
time. Use bipolar input and bipolar targets.

– Let the initial weights and threshold be set to zero, i.e., w1=w2= 0 and
ɵ = 0.
– The learning rate α(alpha) be set equal to 1.
Perceptron Example
• Example: Find the weights using perceptron network for
ANDNOT function when all the inputs are presented only one
time. Use bipolar input and bipolar targets.
– The truth table for ANDNOT function is given below:

x1 x2 t
1 1 -1
1 -1 1
-1 1 -1
-1 -1 -1
– Let the initial weights and threshold be set to zero, i.e., w1=w2= 0 and
ɵ = 0.
– The learning rate α(alpha) be set equal to 1.
Perceptron
• Example: Implement OR function using perceptron networks
for binary inputs and bipolar targets.

x1 x2 t
1 1 1
1 0 1
0 1 1
0 0 -1
Perceptron
• Example: Implement OR function using perceptron networks
for bipolar inputs and bipolar targets.

x1 x2 t
1 1 1
1 -1 1
-1 1 1
-1 -1 -1
Perceptron
• Example: Implement OR function using perceptron networks
for bipolar inputs and bipolar targets. Also, present the
separating hyperplane.

x1 x2 t
1 1 1
1 -1 1
-1 1 1
-1 -1 -1
Perceptron Example
• Classify the two-dimensional input pattern shown in Figure 6 using
perceptron network.
– The symbol "*" indicates the data representation to be +1 and "•" indicates
data to be -1.
– The patterns are I F.
– For pattern I, the target is + 1, and for F, the target is -1.
Perceptron Example
• Classify the two-dimensional input pattern shown in Figure 6 using
perceptron network.
– The symbol "*" indicates the data representation to be +1 and "•" indicates
data to be -1.
– The patterns are I F.
– For pattern I, the target is + 1, and for F, the target is -1.
Perceptron
• Example: Find the weights using perceptron
network for AND function when all the inputs
are presented only one time. Use binary
inputs and bipolar targets.
Back Propagation Network
vqj wiq

x1 y1

xj yi

xm yn

xj (j=1,…m) zq (q=1,….l) yi (i=1,…n)


Back Propagation learning algorithm - ‘BP’

BP has two phases:

Forward pass phase: Computes ‘functional signal’, feed forward


propagation of input pattern signals through network

Backward pass phase: Computes ‘error signal’, propagates


the error backwards through network starting at output units
(where the error is the difference between actual and desired
output values)
Conceptually: Forward Activity -
Backward Error
Back-propagation algorithm
The back – propagation algorithm can be outlined as
• Step 1: Initialize all weights to small random values.
• Step 2: Choose an input-output training pair.
• Step 3: Calculate the actual output from each neuron in a layer by propagating
the signal forward through the network layer by layer (forward propagation).
• Step 4: Compute the error value and error signals for output layer.
• Step 5: Propagate the errors backward to update the weights and compute the
error signals for the preceding layers.
• Step 6: Check whether the whole set of training data have been cycled once,
yes – go to step 7; otherwise go to step 2.
• Step 7: Check whether the current total error is acceptable; yes- terminate the
training process and output the field weights, otherwise initiate a new training
epoch by going to step 2.
Forward Propagation of Activity
• Step 1: Initialize weights at random, choose a
learning rate η
• Until network is trained:
– Step 2: For each training example i.e. input pattern and
target output(s):
– Do forward pass through net (with fixed weights) to
produce output(s)
– i.e., in Forward Direction, layer by layer:
• Inputs applied
• Multiplied by weights
• Summed
• ‘Squashed’ by sigmoid activation function
• Output passed to each neuron in next layer
– Repeat above until network output(s) produced
Backpropagation Algorithm
Back- Propagation of error
• Step 3:Compute error (delta or local gradient) for each
output unit (δk )
– Layer-by-layer, compute error (delta or local gradient) for each
hidden unit δj by back propagating errors
• Step 4: Update all the weights Δwij by gradient descent,
and go back to Step 2.
• The overall MLP learning algorithm, involving forward pass
and back-propagation of error (until the network training
completion), is known as the Generalized Delta Rule
(GDR), or more commonly, the Back Propagation (BP)
algorithm
MLP/BP: Example
Example: Forward Pass
Example: Forward Pass
Example: Backward Pass
Example: Update Weights
Using Generalized Delta Rule (BP)
Similarly for the all weights wij:
Verification that it works
Training
• This was a single iteration of back-
propagation.
• Training requires many iterations with many
training examples or epochs (one epoch is
entire presentation of complete training set)
• It can be slow !
• The computation in MLP is local (with
respect to each neuron)
• Parallel computation implementation is also
possible.
Back- Propagation
• Training a multilayer perceptron is the same as
training a perceptron;
– Difference: output is a nonlinear function of the
input.
• Considering the hidden units as inputs, the
second layer is a perceptron.
Backpropagation
N Non Linear Regression (1 output)

H
y i v z  v ih zh  v i 0
T
i
h 1

zh sigmoid w Th x 
1

1  exp   d
j 1
w hj x j  w h 0 
E E y i zh

w hj y i zh w hj
Other References
1. D. K. Pratihar, Soft Computing : Fundamentals
and Applications,2nd Ed., Narosa, 2013,
chapter 9
2. S.N.Sivanandam and S.N.Deepa, Principles of
Soft Computing, rded, Wiley
Publications,2018

You might also like