Neural Network 2
Neural Network 2
Neural Networks
Activation functions & Propagation
1 Introduction
Neural networks are computational models inspired by the human brain. This document
introduces key concepts in neural networks, with case studies and minimal mathematics.
Each math expression is preceded by a clear explanation.
Figure 1: Neuron
• Inputs (x1 , x2 , . . . , xn ): Values that represent features or outputs from the previous
layer.
• Bias (b): An additional value that shifts the weighted sum to enhance flexibility.
• Output (y): The final result computed by applying the activation function to the
weighted sum of inputs and bias.
In words: The neuron’s output is the result of applying the activation function f to the
sum of the weighted inputs (wi × xi ) and the bias (b).
2.3 Example
Consider a neuron with:
• Inputs: x1 = 2, x2 = 3,
• Bias: b = 1,
• Hidden Layer(s): Processes inputs using weights and biases, and learns patterns.
This equation describes how weights are updated to reduce the total error. Here: - wnew
is the updated weight. - w is the current weight. - η is the learning rate, a factor that
determines the step size for weight updates. - ∂E∂wtotal
is the gradient of the total error with
respect to the weight w.
The gradient ∂E∂w
total
is calculated using the chain rule of differentiation:
4.2 Overview
Forward propagation involves passing inputs through the network to compute the outputs.
Backward propagation adjusts the weights in the network to minimize the error between
the predicted and target outputs. Backpropagation is the fundamental algorithm for
training artificial neural networks. It utilizes gradient descent to iteratively minimize the
error function by adjusting weights. This iterative process involves two primary steps:
2. Backward Pass: Calculate gradients (the rate of change of the error) and propa-
gate errors backward using the chain rule of differentiation.
The ultimate goal of backpropagation is to adjust the weights in such a way that the
error between the predicted output and the actual target is minimized.
The forward pass is the process of computing the outputs of a neural network given
the inputs and the current weights. Here’s how it’s done:
1. Weighted Sum of Inputs: For a neuron, the net input is calculated as the
weighted sum of the inputs to the neuron, plus a bias term:
X
netj = wij xi + bj
i
where:
• Sigmoid:
1
f (netj ) =
1 + e−netj
The sigmoid function squashes the output to a range between 0 and 1.
For example, the output of a neuron can be expressed as:
yj = f (netj )
In the backward pass, we calculate how much each weight in the network contributed
to the overall error, and adjust the weights accordingly.
1. Error Calculation: The error for each output neuron is computed using the Mean
Squared Error (MSE) formula:
1
Ej = (tj − yj )2
2
where:
• tj is the target output,
• yj is the predicted output from the forward pass.
This error measures how far the predicted output yj is from the desired target output
tj .
2. Total Error: The total error of the network across all output neurons is the sum
of the individual errors: X
Etotal = Ej
j
4. Weight Update: The weights are then updated using the gradient descent algo-
rithm:
new ∂Etotal
wij = wij − η ·
∂wij
where η is the learning rate, a hyperparameter that controls the step size of the
weight update.
where tj is the target output and yj is the predicted output. This error guides the weight
update in the backward pass.
4.9 Summary
• Backpropagation computes gradients using the chain rule and updates weights iter-
atively.
• The forward pass calculates the outputs; the backward pass calculates gradients and
propagates errors.
• The weight update formula combines gradients with a learning rate for iterative
improvement.
∂E
= −0.01228 · 0.5744 ≈ −0.00705
∂w2
The updated weight w2 is:
∂E
w2new = w2 − η ·
∂w2
Using a learning rate η = 0.1:
Similarly, all the parameters such as w1 , b1 and b2 will also get updated based on their
gradient as a function of Error E.
Number of parameters = (1000 × 100) + (100 × 10) + 100 + 10 = 101, 110 parameters
Training such a network requires a lot of memory and computational resources, which can
make it difficult to scale to larger datasets or more complex tasks.
Example for Epoch and Iteration: If we have 1000 training examples and a batch
size of 100:
• 10 iterations: Process the images in 10 batches (100 images each) and update the
weights after each batch.
Thus, for 1 epoch, there will be 10 iterations, and after completing all 10 iterations,
the epoch is complete. This process repeats for several epochs to improve the model’s
accuracy.