0% found this document useful (0 votes)
10 views17 pages

DL3 Backpropagation

Uploaded by

Siji Satheesan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views17 pages

DL3 Backpropagation

Uploaded by

Siji Satheesan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Backpropagation

Entri
elevate
ANN

Entri
elevate
Entri
elevate
Backpropagation

● Backpropagation is an algorithm that backpropagates the errors from the output nodes to the
input nodes

● It calculate the errors in predictions and then adjusts the weights and biases of the function by
moving backwards through the layers in an effort to train the model

● Together, forward propagation and backpropagation allow a neural network to make


predictions and correct for any errors accordingly. Over time, the algorithm becomes gradually
more accurate
Entri
elevate
Entri
elevate
Backpropagation

● The algorithm is used to effectively train a neural network through a method called chain rule. In
simple terms, after each forward pass through a network, backpropagation performs a backward
pass while adjusting the model’s parameters (weights and biases)

● Neural Networks learn through iterative tuning of parameters (weights and biases) during the
training stage
● At the start, parameters are initialized by randomly generated weights, and the biases are set to
zero. This is followed by a forward pass of the data through the network to get model output
● Entri
Lastly, back-propagation is conducted. The model training process typically entails several
elevate
iterations of a forward pass, back-propagation, and parameters update Entri
elevate
Perceptron Learning Rule- Single layer perceptron

● The Perceptron learning rule is a supervised learning algorithm used in single-layer perceptrons. It is primarily used for binary classification tasks,
where the goal is to separate data points into two classes based on their input features

**Basic Equation of Perceptron Learning Rule**

● For a single-layer perceptron with 'n' input features and a bias term (threshold) 'b', the weight update for the j-th weight (w_j) is given by:

w_j(new) = w_j(old) + learning_rate * (target_output - predicted_output) * input_j

● The perceptron learning rule iteratively updates the weights for each training sample until the model converges to a set of weights that can separate
the data points into the desired classes
Entri
elevate
Entri
elevate
SLP

● Assume (x1,x2,x3……………………….xn) –>set of input vectors

● and (w1,w2,w3…………………..wn) –>set of weights

● y=actual output

● wo=initial weight, wnew=new weight, δw=change in weight, α=learning rate

● actual output(y)=wixi

● learning signal(ej)=ti-y (difference between desired and actual output)

● δw=αxiej

● wnew=wo+δw

● Now, the output can be calculated on the basis of the input and the activation function applied over the net input and can be expressed as:

● y=1, if net input>=θ

● Entri
y=0, if net input<θ
elevate
Entri
elevate
Backpropagation – Multi-layer perceptron

● Backpropagation is a learning algorithm used in multi-layer perceptrons (MLPs) or deep neural networks (DNNs) with one or more hidden layers. It is a more general
and powerful learning algorithm compared to the perceptron learning rule

● Basic Equation of Backpropagation: In backpropagation, the weight update for the j-th weight (w_j) connecting two neurons in the neural network is given by:

w_j(new) = w_j(old) - learning_rate * gradient

Where:

w_j(new) is the new value of the weight after the update

w_j(old) is the current value of the weight

learning_rate is a hyperparameter controlling the step size of weight updates

Entri
elevate
Entri
elevate
MLP

● Learning Rate: The learning rate is a hyperparameter in machine learning algorithms, including
neural networks, that determines the step size at which the model's weights are updated during
training. It controls how much the model adapts its weights based on the gradients computed
during the optimization process

● When training a machine learning model, the goal is to minimize a loss function that quantifies
the difference between the predicted outputs and the true targets (ground truth) in the training
data

Entri
elevate
Entri
elevate
MLP

● A higher learning rate means larger weight updates, which can help the model converge faster, but it also increases the risk of
overshooting the optimal weights and diverging from the optimal solution
● A lower learning rate results in smaller weight updates, leading to slower convergence but potentially better stability

● The learning rate is a critical hyperparameter that requires careful tuning


● Commonly used learning rates are in the range of 0.001 to 0.1, but the optimal value can vary depending on the problem,
architecture, and dataset
● Learning rate schedules, where the learning rate is adjusted during training, can also be employed to improve the convergence
behavior

Entri
elevate
Entri
elevate
MLP

Gradient

● In the context of optimization algorithms like gradient descent and backpropagation, the gradient is a vector that contains the partial
derivatives of the loss function with respect to each model parameter (weight)

● It indicates the direction and magnitude of the steepest increase in the loss function

● The gradient is computed during the backward pass (backpropagation) using the chain rule of calculus

● The backward pass through the network calculates the gradients starting from the output layer and moving backward to the input layer.
Once the gradients are known, they are used in weight update rules, such as gradient descent, to iteratively update the model's weights to
minimize the loss function

● These gradients tell the model how much each weight should be adjusted to reduce the error and improve the model's performance
Entri
elevate
Entri
elevate
● In summary, the learning rate controls the step size of weight updates during training, while the
gradients indicate the direction and magnitude of adjustments required for each weight to
minimize the loss function and improve the model's performance

● The combination of learning rate and gradients plays a crucial role in the success of the
optimization process and the effectiveness of the learning algorithm

Entri
elevate
Entri
elevate
Entri
elevate
Entri
elevate
Entri
elevate
Entri
elevate

Entri
elevate
Entri
elevate
Entri
elevate
Entri
elevate
Local minima

● The point in a curve which is minimum when compared to its preceding and succeeding points is
called local minima

● It is the lowest point in a particular region, but not necessarily the absolute lowest point in the
entire function

Global minima
Entri
●elevate
The point in a curve which is minimum when compared to all points in the curve is called Global
Entri
Minima elevate

● For a curve there can be more than one local minima, but it does have only one global minima
Reaching the global minimum in backpropagation would be ideal, it is not a strict requirement for a
good model. A model that performs well on validation data and generalizes effectively is considered
good, even if it doesn't reach the global minimum of the loss function

Entri
elevate
Entri
elevate

You might also like