Module 5 Lecture 2
Module 5 Lecture 2
Networks
Lecture -2
Neural Networks
• Neural networks are computing systems with interconnected nodes
that work much like neurons in the human brain.
• Using algorithms, they can recognize hidden patterns and correlations
in raw data, cluster and classify it, and – over time – continuously
learn and improve.
• a neuron is just a node with many inputs and one output.
• A neural network consists of many interconnected neurons.
• A “simple” device that receives data at the input and provides a
response.
• First, the neural network learns to correlate incoming and outcoming
signals with each other — this is called learning.
• And then the neural network begins to work — it receives input data,
generating output signals based on the accumulated knowledge.
Neural Networks
• The Key advantage of Neural Networks are that they are able to
extract the features of the data automatically without a Programmer.
Input---------Dendrites
Output-------Axon
Summation Function
dog and cat gif, the
hidden layers are the
ones responsible to
learn that the dog picture
is linked to the name
dog, and it does this
through a series of
matrix multiplications
and mathematical
transformations to learn
these mappings).
• input data, weights, a bias (or threshold), and an output. (Linear
regression)
Types of Neural Networks
• Bias is a constant which helps the model in a way that it can fit best for the
given data.
Perceptron
• In short, a perceptron is a single-layer neural network.
• They consist of four main parts including input values, weights and bias,
net sum, and an activation function.
• A perceptron is a neural network unit that does a precise computation to
detect features in the input data.
• Perceptron is mainly used to classify the data into two parts. Therefore, it
is also known as Linear Binary Classifier.
• A Perceptron is an algorithm used for supervised learning of binary
classifiers. Binary classifiers decide whether an input, usually
represented by a series of vectors, belongs to a specific class.
A neural network is composed of 3 types
of layers
• Input layer — It is used to pass in our input(an image, text or any
suitable type of data for NN).
• Hidden Layer — These are the layers in between the input and output
layers. These layers are responsible for learning the mapping between
input and output.
• Output Layer — This layer is responsible for giving us the output of
the NN given our inputs.
Example 1:
• Example 2:
• Again, the equation for calculating the predicted output doesn’t have any
parameter. But there’s still variable s (SOP) that already depends on
parameters for its calculation, according to this equation:
• s=X1* W1+ X2*W2+b
1.Network error W.R.T the predicted output.
2.Predicted output W.R.T the SOP.
3.SOP W.R.T each of the 3 parameters.
As a total, there are four intermediate partial derivatives:
• ∂E/∂Predicted, ∂Predicted/∂s, ∂s/W1 and ∂s/W2
To calculate the derivative of the error W.R.T the weights, simply multiply all
the derivatives in the chain from the error to each weight, as in the next 2
equations:
• ∂E/W1=∂E/∂Predicted* ∂Predicted/∂s* ∂s/W1
• ∂E/W2=∂E/∂Predicted* ∂Predicted/∂s* ∂s/W2
Basically back-propagation is to update the weights for reducing loss in the next
iteration. Update weight equation as:
1.Predicted Error: For the derivative of the error W.R.T the predicted
output:
• ∂E/∂Predicted=∂/∂Predicted(1/2(desired-predicted) 2)
• =2*1/2(desired-predicted)2-1*(0-1)
• =(desired-predicted)*(-1)
• =predicted-desired
• By substituting by the values:
• ∂E/∂Predicted=predicted-desired=0.874352143-0.03
• ∂E/∂Predicted=0.844352143 --------------(1)
2. Predicted Output: Remember: the quotient rule can be used to find the derivative of
the sigmoid function as follows:
• 0.874352143(1-0.874352143)
• =0.874352143(0.125647857)
• ∂Predicted/∂s=0.109860473----------(2)
3.For the derivative of SOP W.R.T W1: 4. For the derivative of SOP W.R.T W2:
∂s/W1=∂/∂W1(X1* W1+ X2*W2+b) ∂s/W2=∂/∂W2(X1* W1+ X2*W2+b)
=1*X1*(W1)(1-1)+ 0+0 =0+1*X2*(W2)(1-1)+0
=X1*(W1)(0) =X2*(W2)(0)
=X1(1) =X2(1)
∂s/W1=X1 ∂s/W2=X2
By substituting by the values:
By substituting by the values:
∂s/W1=X1=0.1------------(3) ∂s/W2=X2=0.3----------(4)
• For the derivative of the error W.R.T W1:
• ∂E/W1=0.844352143*0.109860473*0.1 -----from (1)(2)(3)
• ∂E/W1=0.009276093
• For the derivative of the error W.R.T W2:
• ∂E/W2=0.844352143*0.109860473*0.3 -----from (1)(2)(4)
• ∂E/W2=0.027828278
• Finally, there are two values reflecting how the prediction error changes
with respect to the weights:
• 0.009276093 for W1
• 0.027828278 for W2
• What do these values mean? These results need interpretation.
Conclusions based on derivatives
• Derivative sign
• Derivative magnitude (DM)
• If the derivative sign is positive, that means increasing the weight increases the
error. In other words, decreasing the weight decreases the error.
• If the derivative sign is negative, increasing the weight decreases the error. In other
words, if it’s negative then decreasing the weight increases the error.
• But by how much does the error increase or decrease? The derivative magnitude
answers this question.
• For positive derivatives, increasing the weight by p increases the error by DM*p.
• For negative derivatives, increasing the weight by p decreases the error by DM*p.
• Because the result of the ∂E/W1 derivative is positive, this means if
W1 increases by 1, then the total error increases by 0.009276093.
• Because the result of the ∂E/W2 derivative is positive, this means that
if W2 increases by 1 then the total error increases by 0.027828278.
Updating weights
For W1: For W2:
W1new=W1-η*∂E/W1 W2new=W2-η*∂E/W2
=0.5-0.01*0.009276093 =0.2-0.01*0.027828278
W1new=0.49990723907 W2new= 0.1997217172
Note that the derivative is subtracted (not added) from the old value of the
weight, because the derivative is positive.
The new values for the weights are:
W1=0.49990723907
W2= 0.1997217172
New forward pass calculations:
s=X1* W1+ X2*W2+b
s=0.1*0.49990723907+ 0.3*0.1997217172+1.83
s=1.931463718067
f(s)=1/(1+e-s)
f(s)=1/(1+e-1.931463718067)
f(s)=0.873411342830056
E=1/2(0.03-0.873411342830056)2
E=0.35567134660719907
• new error (0.35567134660719907)
• old error (0.356465271)
• reduction of 0.0007939243928009043.
• As long as there’s a reduction, we’re
moving in the right direction.
• The error reduction is small because we’re
using a small learning rate (0.01).
• The forward and backward passes should
be repeated until the error is 0 or for a
number of epochs (i.e. iterations).
Example-3