Back Propagation
Back Propagation
Introduction
• Backpropagation is a fundamental algorithm
for training artificial neural networks,
especially feedforward neural networks.
• It's a supervised learning algorithm used to
adjust the weights and biases of a neural
network in order to minimize the error
between the predicted and actual outputs.
• Backpropagation is the key technique behind
the success of deep learning.
Backpropagation
• Backpropagation is used in the context of
artificial neural networks, which consist of
layers of interconnected artificial neurons.
• The neural network typically consists of an
input layer, one or more hidden layers, and an
output layer.
• Each neuron in the network processes input
data, applies an activation function, and
produces an output.
Forward Pass
• In the forward pass, input data is propagated
through the network to make predictions.
• Each neuron computes a weighted sum of its
inputs and applies an activation function to
produce an output.
Loss or Cost Function
• A loss or cost function measures the error
between the predicted output and the actual
target.
• Common cost functions include Mean Squared
Error (MSE) for regression tasks and Cross-
Entropy for classification tasks.
Backward Pass (Backpropagation)
• The goal of backpropagation is to minimize the
loss function by adjusting the weights and
biases in the network.
• It's basically an iterative process.
Error Calculation
• The error is calculated for the output layer by
taking the derivative of the loss function with
respect to the predicted output.
• For regression problems:
δ_output = (Target - Predicted Output) *
f'(weighted sum)
• For classification problems with Cross-Entropy
loss:
δ_output = Target - Predicted Output
Propagating Error Backward
• The error is then propagated backward
through the network.
• For each neuron, the error δ is calculated by
taking the derivative of the error with respect
to the weighted sum of inputs.
• This error is used to adjust the neuron's
weights and bias.
Weight and Bias Update
• The weights and biases of each neuron are
updated to minimize the error.
• This update is typically done using gradient
descent:
New Weight = Old Weight - Learning Rate * δ * Input
New Bias = Old Bias - Learning Rate * δ
Iterative Process
• The entire process of forward and backward
passes is repeated iteratively for a defined
number of epochs.
• The learning rate, which controls the step size
in weight updates, is an important
hyperparameter.
Convergence
• Training continues until the loss converges to a
minimum, indicating that the network has
learned the underlying patterns in the data.
Testing and Prediction
• After training, the network can be used to
make predictions on new, unseen data by
performing a forward pass with the learned
weights and biases.
Regularization and Optimization
• To prevent overfitting, techniques like L1 and L2
regularization are often used.
• Variants of gradient descent, like Adam and RMSprop,
are used to improve training efficiency and convergence.
• Backpropagation is a cornerstone of training deep neural
networks and is responsible for the success of deep
learning in various applications, including image
recognition, natural language processing, and more.
• It allows the network to learn complex patterns and
relationships in data through an iterative optimization
process.
Example
• Problem Statement: We want to train a neural
network to classify whether an input (X) is greater
than 0.5 (Class 1) or less than or equal to 0.5 (Class 0).
• Network Architecture:
– Input Layer: 1 neuron
– Hidden Layer: 2 neurons
– Output Layer: 1 neuron
– Activation Function: Sigmoid
– Loss Function: Mean Squared Error (MSE)
– Learning Rate (η): 0.1
Training Data
– Input (X): 0.4
– Target (y): 0 (because 0.4 is less than or equal to
0.5)
Forward Pass
• Initialize weights and biases with random
values.
– Weights: w1 = 0.2, w2 = -0.3, w3 = 0.5, w4 = -0.1,
w5 = 0.6
– Biases: b1 = 0.1, b2 = -0.2, b3 = 0.3
• Calculate the input to the first hidden neuron
(z1):
z1 = (w1 * X) + b1 = (0.2 * 0.4) + 0.1 = 0.18
• Calculate the output of the first hidden neuron (a1) using the sigmoid activation
function:
• Similarly, calculate the input and output of the second hidden neuron (z2 and a2):
• z3 = (w3 * a1) + (w4 * a2) + b3 = (0.5 * 0.545) + (-0.1 * 0.579) + 0.3 ≈ 0.414
E = y - a3 = 0 - 0.602 ≈ -0.602
Backward Pass
• Calculate the delta (δ) for the output neuron:
δ3 = E * f'(z3) = -0.602 * (a3 * (1 - a3)) ≈ -0.138
self.weights_input_hidden = np.random.rand(input_size,
hidden_size)
self.bias_hidden = np.zeros((1, hidden_size))
self.weights_hidden_output = np.random.rand(hidden_size,
output_size)
self.bias_output = np.zeros((1, output_size))
def forward(self, X):
# Forward pass
self.hidden_input = np.dot(X,
self.weights_input_hidden) + self.bias_hidden
self.hidden_output =
sigmoid(self.hidden_input)
self.output =
sigmoid(np.dot(self.hidden_output,
self.weights_hidden_output) +
self.bias_output)
def backward(self, X, y):
# Backpropagation
self.output_error = y - self.output
self.output_delta = self.output_error * sigmoid_derivative(self.output)
self.hidden_error = self.output_delta.dot(self.weights_hidden_output.T)
self.hidden_delta = self.hidden_error *
sigmoid_derivative(self.hidden_output)
# Make predictions
predictions = nn.predict(X)
print("Predictions:")
print(predictions)
Output