Chapter 2 ANN
Chapter 2 ANN
Engineering
Subject Code: ECE7419
By
ASTU
L=loss function ; X and Y are Input and out put variable ; E = Expected;
R=Punishment cost
There are three major types of error calculations:
Stochastic gradient descent (First ppt)
Batch and mini batch (First ppt)
Training of a Single-Layer Neural Network: Delta Rule
The previously introduced delta rule is ineffective for training of the multi-layer
neural network. This is because the error, the essential element for applying the
delta rule for training, is not defined in the hidden layers.
The error of the output node is defined as the difference between the correct output
and the output of the neural network. However, the training data does not provide
correct outputs for the hidden layer nodes, and hence the error cannot be calculated
using the same approach for the output nodes.
In 1986, the introduction of the back-propagation algorithm finally solved the
training problem of the multi-layer neural network. The significance of the back-
propagation algorithm was that it provided a systematic method to determine the
error of the hidden nodes.
Once the hidden layer errors are determined, the delta rule is applied to adjust the
weights.
The input data of the neural network travels through the input layer, hidden layer, and output
layer feed forward network.
In contrast, in the back-propagation algorithm, the output error starts from the output layer
and moves backward until it reaches the right next hidden layer to the input layer. This
process is called backpropagation, as it resembles an output error propagating backward.
Even in back-propagation, the signal still flows through the connecting lines and the weights
are multiplied.
Back-Propagation Algorithm
This section explains the back-propagation algorithm using an example of the simple
multi-layer neural network. Consider a neural network that consists of two nodes for
both the input and output and a hidden layer, which has two nodes as well.
When we put this weighted sum, Equation 3.1, into the activation function, we obtain
the output from the hidden nodes.
If we train the neural network with this dataset, we
would get the XOR operation model.
The benefits of using the advanced weight adjustment formulas include higher stability and
faster speeds in the training process of the neural network. These characteristics are especially
favorable for Deep Learning as it is hard to train.
The cost function is a rather mathematical concept that is associated with the
optimization theory. The supervised learning of the neural network is a process of
adjusting the weights to reduce the error of the training data.
In this context, the measure of the neural network’s error is the cost function. There
are two primary types of cost functions for the neural network’s supervised learning.
The cross entropy function is much more sensitive to the error. For this
reason, the learning rules derived from the cross entropy function are
generally known to yield better performance.
It is recommended that you use the cross entropy driven learning rules
except for inevitable cases such as the regression.
We had a long introduction to the cost function because the selection of the cost
function affects the learning rule, i.e., the formula of the back-propagation algorithm.
Specifically, the calculation of the delta at the output node changes slightly.
The following steps detail the procedure in training the neural network with the
sigmoid activation function at the output node using the cross entropy driven back-
propagation algorithm.
Overfitting is a challenging problem that every technique of Machine Learning
faces. You also saw that one of the primary approaches used to overcome
overfitting is making the model as simple as possible using regularization.
In a mathematical sense, the essence of regularization is adding the sum of the
weights to the cost function, as shown here.
This cost function maintains a large value when one of the output errors and the
weight remain large. Therefore, solely making the output error zero will not
suffice in reducing the cost function.
In order to drop the value of the cost function, both the error and weight should be
controlled to be as small as possible.
For this reason, overfitting of the neural network can be improved by adding the
sum of weights to the cost function, thereby reducing it.
31
ASTU