0% found this document useful (0 votes)
2 views

Module 5 Lecture 2

Uploaded by

Revanth Narne
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Module 5 Lecture 2

Uploaded by

Revanth Narne
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 45

Introduction to Neural

Networks
Lecture -2
Neural Networks
• Neural networks are computing systems with interconnected nodes
that work much like neurons in the human brain.
• Using algorithms, they can recognize hidden patterns and correlations
in raw data, cluster and classify it, and – over time – continuously
learn and improve.
• a neuron is just a node with many inputs and one output.
• A neural network consists of many interconnected neurons.
• A “simple” device that receives data at the input and provides a
response.
• First, the neural network learns to correlate incoming and outcoming
signals with each other — this is called learning.
• And then the neural network begins to work — it receives input data,
generating output signals based on the accumulated knowledge.
Neural Networks

• Neural Networks are modelled to replicate Human Brain.

• The Key advantage of Neural Networks are that they are able to
extract the features of the data automatically without a Programmer.

• First proposed in 1944 by Warren Mc Culloch and Walter Pitts.

• The main purpose is to develop Algorithms that mimics Human Brain


• The computational unit of Human Brain is called as Neuron.

• To provide computation we need,

Input---------Dendrites

Output-------Axon

There are millions of Neurons that are interconnected, they


transfer the data in the form of Electrical impulses.
Interconnection of millions of Neurons forms a Brain

Artificial Neurons interconnected to form Artificial


Neural Network
NODE

Summation Function
dog and cat gif, the
hidden layers are the
ones responsible to
learn that the dog picture
is linked to the name
dog, and it does this
through a series of
matrix multiplications
and mathematical
transformations to learn
these mappings).
• input data, weights, a bias (or threshold), and an output. (Linear
regression)
Types of Neural Networks

1. Artificial Neural Network(ANN/MLP)

2. Convolution Neural Network(CNN)

3. Recurrent Neural Networks(RNN)


Artificial Neural network
• Once an input layer is determined, weights are assigned.
• These weights help determine the importance of any given variable,
with larger ones contributing more significantly to the output
compared to other inputs.
• All inputs are then multiplied by their respective weights and then
summed.
• Afterward, the output is passed through an activation function, which
determines the output.
• If that output exceeds a given threshold, it “fires” (or activates) the
node, passing data to the next layer in the network.
• This results in the output of one node becoming in the input of the
next node. This process of passing data from one layer to the next
layer defines this neural network as a feedforward network.
Why bias???
• Assume the model is having constraint to train itself and find a line which
passes only through the origin.

• Bias is a constant which helps the model in a way that it can fit best for the
given data.
Perceptron
• In short, a perceptron is a single-layer neural network.
• They consist of four main parts including input values, weights and bias,
net sum, and an activation function.
• A perceptron is a neural network unit that does a precise computation to
detect features in the input data.
• Perceptron is mainly used to classify the data into two parts. Therefore, it
is also known as Linear Binary Classifier.
• A Perceptron is an algorithm used for supervised learning of binary
classifiers. Binary classifiers decide whether an input, usually
represented by a series of vectors, belongs to a specific class.
A neural network is composed of 3 types
of layers
• Input layer — It is used to pass in our input(an image, text or any
suitable type of data for NN).
• Hidden Layer — These are the layers in between the input and output
layers. These layers are responsible for learning the mapping between
input and output.
• Output Layer — This layer is responsible for giving us the output of
the NN given our inputs.
Example 1:
• Example 2:

• 1 input layer with 2 inputs


(X1 and X2),
• 1 output layer with 1 output.
There are no hidden layers.
• The weights of the inputs are
W1 and W2, respectively.
• The bias is treated as a new
input neuron to the output
neuron which has a fixed
value +1 and a weight b. Both
the weights and biases could
be referred to as parameters.
• s is the sum of products (SOP) between each input and its corresponding
weight:
• s=X1* W1+ X2*W2+b
Forward pass
• s=X1* W1+ X2*W2+b
• s=0.1* 0.5+ 0.3*0.2+1.83
• s=1.94
• The value 1.94 is then applied to the activation function (sigmoid),
which results in the value 0.874352143.
• Actual output : 0.03
• Predicted output : 0.874352143
• It’s obvious that there’s a difference between the desired and expected
output. But why?
Why backpropagation????
• The backpropagation algorithm is one of the algorithms responsible
for updating network weights with the objective of reducing the
network error. It’s quite important.
• If the current error is high, the network didn’t learn properly from the
data.
• What does this mean? It means that the current set of weights isn’t
accurate enough to reduce the network error and make accurate
predictions.
• As a result, we should update network weights to reduce the network
error.
• s is the sum of products (SOP) between each input and its corresponding
weight:
• s=X1* W1+ X2*W2+b
Forward pass
• s=X1* W1+ X2*W2+b
• s=0.1* 0.5+ 0.3*0.2+1.83
• s=1.94
• The value 1.94 is then applied to the activation function (sigmoid),
which results in the value 0.874352143.
Error of our network based on an error function
• The error functions tell how close the predicted output(s) are from the
desired output(s).
• The optimal value for error is zero, meaning there’s no error at all, and
both desired and predicted results are identical.
• One of the error functions is the squared error function
• Knowing that there’s an error, what should we do? We should minimize
it.
• To minimize network error, we must change something in the network.
• Remember that the only parameters we can change are the weights and
biases.
• We can try different weights and biases, and then test our network.
• We calculate the error, then the forward pass ends, and we should start
the backward pass to calculate the derivatives and update the
parameters.
• The parameters-update equation just depends on the learning rate to
update the parameters. It changes all the parameters in a direction
opposite to the error.
• But, using the backpropagation algorithm, we can know how each
single weight correlates with the error. This tells us the effect of each
weight on the prediction error. That is, which parameters do we
increase, and which ones do we decrease to get the smallest prediction
error?
• For example, the backpropagation algorithm could tell us useful
information, like that increasing the current value of W1 by 1.0
increases the network error by 0.07. This shows us that a smaller value
for W1 is better to minimize the error.
• The prediction error is calculated based on this equation:

• The desired term in the previous equation is a constant, so there’s no


chance for reaching parameters through it.
• The predicted term is calculated based on the sigmoid function, like in
the next equation:

• Again, the equation for calculating the predicted output doesn’t have any
parameter. But there’s still variable s (SOP) that already depends on
parameters for its calculation, according to this equation:
• s=X1* W1+ X2*W2+b
1.Network error W.R.T the predicted output.
2.Predicted output W.R.T the SOP.
3.SOP W.R.T each of the 3 parameters.
As a total, there are four intermediate partial derivatives:
• ∂E/∂Predicted, ∂Predicted/∂s, ∂s/W1 and ∂s/W2
To calculate the derivative of the error W.R.T the weights, simply multiply all
the derivatives in the chain from the error to each weight, as in the next 2
equations:
• ∂E/W1=∂E/∂Predicted* ∂Predicted/∂s* ∂s/W1
• ∂E/W2=∂E/∂Predicted* ∂Predicted/∂s* ∂s/W2
Basically back-propagation is to update the weights for reducing loss in the next
iteration. Update weight equation as:
1.Predicted Error: For the derivative of the error W.R.T the predicted
output:
• ∂E/∂Predicted=∂/∂Predicted(1/2(desired-predicted) 2)
• =2*1/2(desired-predicted)2-1*(0-1)
• =(desired-predicted)*(-1)
• =predicted-desired
• By substituting by the values:
• ∂E/∂Predicted=predicted-desired=0.874352143-0.03
• ∂E/∂Predicted=0.844352143 --------------(1)
2. Predicted Output: Remember: the quotient rule can be used to find the derivative of
the sigmoid function as follows:

• 0.874352143(1-0.874352143)
• =0.874352143(0.125647857)
• ∂Predicted/∂s=0.109860473----------(2)
3.For the derivative of SOP W.R.T W1: 4. For the derivative of SOP W.R.T W2:
∂s/W1=∂/∂W1(X1* W1+ X2*W2+b) ∂s/W2=∂/∂W2(X1* W1+ X2*W2+b)
=1*X1*(W1)(1-1)+ 0+0 =0+1*X2*(W2)(1-1)+0
=X1*(W1)(0) =X2*(W2)(0)
=X1(1) =X2(1)
∂s/W1=X1 ∂s/W2=X2
By substituting by the values:
By substituting by the values:
∂s/W1=X1=0.1------------(3) ∂s/W2=X2=0.3----------(4)
• For the derivative of the error W.R.T W1:
• ∂E/W1=0.844352143*0.109860473*0.1 -----from (1)(2)(3)
• ∂E/W1=0.009276093
• For the derivative of the error W.R.T W2:
• ∂E/W2=0.844352143*0.109860473*0.3 -----from (1)(2)(4)
• ∂E/W2=0.027828278
• Finally, there are two values reflecting how the prediction error changes
with respect to the weights:
• 0.009276093 for W1
• 0.027828278 for W2
• What do these values mean? These results need interpretation.
Conclusions based on derivatives
• Derivative sign
• Derivative magnitude (DM)
• If the derivative sign is positive, that means increasing the weight increases the
error. In other words, decreasing the weight decreases the error.
• If the derivative sign is negative, increasing the weight decreases the error. In other
words, if it’s negative then decreasing the weight increases the error.
• But by how much does the error increase or decrease? The derivative magnitude
answers this question.
• For positive derivatives, increasing the weight by p increases the error by DM*p.
• For negative derivatives, increasing the weight by p decreases the error by DM*p.
• Because the result of the ∂E/W1 derivative is positive, this means if
W1 increases by 1, then the total error increases by 0.009276093.
• Because the result of the ∂E/W2 derivative is positive, this means that
if W2 increases by 1 then the total error increases by 0.027828278.
Updating weights
For W1: For W2:
W1new=W1-η*∂E/W1 W2new=W2-η*∂E/W2
=0.5-0.01*0.009276093 =0.2-0.01*0.027828278
W1new=0.49990723907 W2new= 0.1997217172
Note that the derivative is subtracted (not added) from the old value of the
weight, because the derivative is positive.
The new values for the weights are:
W1=0.49990723907
W2= 0.1997217172
New forward pass calculations:
s=X1* W1+ X2*W2+b
s=0.1*0.49990723907+ 0.3*0.1997217172+1.83
s=1.931463718067
f(s)=1/(1+e-s)
f(s)=1/(1+e-1.931463718067)
f(s)=0.873411342830056
E=1/2(0.03-0.873411342830056)2
E=0.35567134660719907
• new error (0.35567134660719907)
• old error (0.356465271)
• reduction of 0.0007939243928009043.
• As long as there’s a reduction, we’re
moving in the right direction.
• The error reduction is small because we’re
using a small learning rate (0.01).
• The forward and backward passes should
be repeated until the error is 0 or for a
number of epochs (i.e. iterations).
Example-3

You might also like