0% found this document useful (0 votes)
14 views14 pages

Neural Network 2

This document provides an introduction to neural networks, explaining their basic components, structure, and the processes of forward and backward propagation. It details the operations of neurons, the significance of activation functions, and the mathematical representations involved in training neural networks. The document also includes worked examples to illustrate the concepts of forward and backward passes in neural networks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views14 pages

Neural Network 2

This document provides an introduction to neural networks, explaining their basic components, structure, and the processes of forward and backward propagation. It details the operations of neurons, the significance of activation functions, and the mathematical representations involved in training neural networks. The document also includes worked examples to illustrate the concepts of forward and backward passes in neural networks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Minor in AI

Neural Networks
Activation functions & Propagation

December 30, 2024


Minor in AI

1 Introduction
Neural networks are computational models inspired by the human brain. This document
introduces key concepts in neural networks, with case studies and minimal mathematics.
Each math expression is preceded by a clear explanation.

2 Neuron: Basic Building Block


A neuron performs two main operations:

1. Net Operation: Computes the weighted sum of inputs.

2. Out Operation: Applies an activation function to produce the output.

Figure 1: Neuron

2.1 Components of a Neuron


The core elements of a neuron are:

• Inputs (x1 , x2 , . . . , xn ): Values that represent features or outputs from the previous
layer.

• Weights (w1 , w2 , . . . , wn ): Coefficients that signify the importance of each input.

• Bias (b): An additional value that shifts the weighted sum to enhance flexibility.

• Activation Function (f ): A function that introduces non-linearity to the model,


enabling it to learn complex patterns.

• Output (y): The final result computed by applying the activation function to the
weighted sum of inputs and bias.

Activation functions & Propagation 1


Minor in AI

Figure 2: Components of a Neuron

2.2 Mathematical Representation


To compute the output of a neuron, first calculate the weighted sum of inputs and bias:
n
X
Weighted Sum (Net Output) = w i xi + b
i=1

This value is then passed through the activation function:


n
!
X
Output (y) = f w i xi + b
i=1

In words: The neuron’s output is the result of applying the activation function f to the
sum of the weighted inputs (wi × xi ) and the bias (b).

2.3 Example
Consider a neuron with:
• Inputs: x1 = 2, x2 = 3,

• Weights: w1 = 0.5, w2 = 0.8,

• Bias: b = 1,

• Activation Function: Sigmoid (f (x) = 1


1+e−x
).
First, calculate the weighted sum:

Net Output = (0.5 × 2) + (0.8 × 3) + 1 = 4.9

Then apply the activation function:


1
y = f (4.9) = ≈ 0.9927
1 + e−4.9
Thus, the neuron’s output is approximately 0.9927.

Activation functions & Propagation 2


Minor in AI

3 Structure of a Neural Network


A neural network consists of three main layers:
• Input Layer: Accepts raw data and passes it to the next layer.

• Hidden Layer(s): Processes inputs using weights and biases, and learns patterns.

• Output Layer: Produces the final predictions.

Figure 3: Structure of an Artificial Neural Network

3.1 Example: Single Hidden Layer Network


Consider a network with:
• 2 input neurons (x1 , x2 ),

• 2 hidden neurons (h1 , h2 ),

• 2 output neurons (o1 , o2 ).


Each neuron performs two steps:
1. Compute the weighted sum of inputs (net operation).

2. Apply the activation function to get the output (out operation).

4 Forward and Backward Propagation


4.1 Key Equations and Explanations
The weight update process in backpropagation is governed by the following core formula:
∂Etotal
wnew = w − η ·
∂w

Activation functions & Propagation 3


Minor in AI

This equation describes how weights are updated to reduce the total error. Here: - wnew
is the updated weight. - w is the current weight. - η is the learning rate, a factor that
determines the step size for weight updates. - ∂E∂wtotal
is the gradient of the total error with
respect to the weight w.
The gradient ∂E∂w
total
is calculated using the chain rule of differentiation:

∂Etotal ∂Etotal ∂outnode ∂netnode


= · ·
∂w ∂outnode ∂netnode ∂w
∂Etotal
Here: - ∂outnode
: Represents how the total error changes with respect to the output of
∂outnode
the node. - ∂netnode : Captures the sensitivity of the node’s output to its input (typically
the derivative of the activation function). - ∂net
∂w
node
: Represents how the weighted sum of
inputs changes with respect to the weight being updated.

4.2 Overview
Forward propagation involves passing inputs through the network to compute the outputs.
Backward propagation adjusts the weights in the network to minimize the error between
the predicted and target outputs. Backpropagation is the fundamental algorithm for
training artificial neural networks. It utilizes gradient descent to iteratively minimize the
error function by adjusting weights. This iterative process involves two primary steps:

1. Forward Pass: Compute the outputs based on the current weights.

2. Backward Pass: Calculate gradients (the rate of change of the error) and propa-
gate errors backward using the chain rule of differentiation.

The ultimate goal of backpropagation is to adjust the weights in such a way that the
error between the predicted output and the actual target is minimized.

4.3 Forward Pass

Figure 4: Example of Forward Propagation in a Neural Network

The forward pass is the process of computing the outputs of a neural network given
the inputs and the current weights. Here’s how it’s done:

Activation functions & Propagation 4


Minor in AI

1. Weighted Sum of Inputs: For a neuron, the net input is calculated as the
weighted sum of the inputs to the neuron, plus a bias term:
X
netj = wij xi + bj
i

where:

• wij are the weights associated with the input xi ,


• bj is the bias term for neuron j,
• xi are the input features.

2. Activation Function: The output of the neuron is then obtained by applying an


activation function to the net input. Common activation functions include:

• Sigmoid:
1
f (netj ) =
1 + e−netj
The sigmoid function squashes the output to a range between 0 and 1.
For example, the output of a neuron can be expressed as:

yj = f (netj )

where yj is the output of neuron j.

4.4 Backward Pass

Figure 5: Backward Propagation in an Artificial Neural Network

In the backward pass, we calculate how much each weight in the network contributed
to the overall error, and adjust the weights accordingly.

Activation functions & Propagation 5


Minor in AI

1. Error Calculation: The error for each output neuron is computed using the Mean
Squared Error (MSE) formula:
1
Ej = (tj − yj )2
2
where:
• tj is the target output,
• yj is the predicted output from the forward pass.
This error measures how far the predicted output yj is from the desired target output
tj .
2. Total Error: The total error of the network across all output neurons is the sum
of the individual errors: X
Etotal = Ej
j

3. Gradient Calculation: To adjust the weights, we need to compute the gradient of


the total error with respect to each weight. Using the chain rule of differentiation,
the gradient of the total error with respect to a weight wij is:
∂Etotal ∂Ej ∂yj ∂netj
= · ·
∂wij ∂yj ∂netj ∂wij
Each term is computed as follows:
∂Ej
• ∂yj
= −(tj − yj ),
∂yj
• ∂netj
= f ′ (netj ), where f ′ (netj ) is the derivative of the activation function (e.g.,
for the sigmoid, f ′ (netj ) = yj (1 − yj )),
∂netj
• ∂wij
= xi , the input to the neuron.

4. Weight Update: The weights are then updated using the gradient descent algo-
rithm:
new ∂Etotal
wij = wij − η ·
∂wij
where η is the learning rate, a hyperparameter that controls the step size of the
weight update.

4.5 Worked Example


Let’s consider a network with a single weight w5 . Assume:
∂Etotal
w5 = 0.40, η = 0.5, = 0.082167
∂w5
Using the weight update formula:
∂Etotal
w5new = w5 − η ·
∂w5
Substituting the values:
w5new = 0.40 − 0.5 · 0.082167 = 0.358916
Thus, the new weight becomes w5new = 0.358916.

Activation functions & Propagation 6


Minor in AI

4.6 Detailed Gradient Derivation


∂Etotal
To compute the gradient ∂w5
, we sum the contributions from all output nodes connected
to w5 :
∂Etotal ∂E1 ∂E2
= +
∂w5 ∂w5 ∂w5
For a single error term Ej , the chain rule gives:
∂Ej ∂Ej ∂yj ∂netj
= · ·
∂wij ∂yj ∂netj ∂wij

4.6.1 Component Breakdown


1. Error to Output:
∂Ej
= −(tj − yj )
∂yj
This represents how much the error changes with respect to the output of the neuron.
2. Output to Net Input:
∂yj
= f ′ (netj )
∂netj
For the sigmoid activation function:
f ′ (netj ) = yj (1 − yj )
This represents the rate of change of the output with respect to the net input.
3. Net Input to Weight:
∂netj
= xi
∂wij
This represents how much the net input changes with respect to the weight.

4.7 Forward Propagation Example


Let’s consider a simple neural network with inputs, weights, biases, and target outputs.
The network consists of an input layer, a hidden layer, and an output layer.

4.7.1 Step 1: Compute Hidden Layer Outputs


To calculate the outputs of the neurons in the hidden layer, we first compute the weighted
sum of inputs for each neuron in the hidden layer. For the first hidden layer neuron h1 ,
the weighted sum is calculated as:
neth1 = w1 · i1 + w2 · i2 + b1
where w1 , w2 are the weights, i1 , i2 are the input values, and b1 is the bias term. The
output of the hidden neuron is then computed using the sigmoid activation function:
1
Outh1 =
1 + e−neth1
Similarly, for the second hidden layer neuron h2 , the output is calculated in the same
manner:
1
Outh2 =
1 + e−neth2

Activation functions & Propagation 7


Minor in AI

4.7.2 Step 2: Compute Output Layer Outputs


After computing the hidden layer outputs, we move on to the output layer. The output
layer neurons compute a weighted sum of the outputs from the hidden layer neurons. For
output neuron o1 , the weighted sum is:

neto1 = w5 · Outh1 + w6 · Outh2 + b2

The output is then computed using the sigmoid function:


1
Outo1 =
1 + e−neto1
Similarly, for the second output neuron o2 , the output is computed as:
1
Outo2 =
1 + e−neto2

4.8 Error Calculation


The total error is calculated using the Mean Squared Error (MSE) function:
1X
Etotal = (tj − yj )2
2 j

where tj is the target output and yj is the predicted output. This error guides the weight
update in the backward pass.

4.9 Summary
• Backpropagation computes gradients using the chain rule and updates weights iter-
atively.

• The forward pass calculates the outputs; the backward pass calculates gradients and
propagates errors.

• The weight update formula combines gradients with a learning rate for iterative
improvement.

• Careful tuning of the learning rate ensures optimal convergence.

5 Worked Example: Forward Pass and Backward Pass


This section provides a detailed worked example of the forward pass and backward pass
in a simple neural network.

5.1 Problem Setup


Given the following parameters:

• Input value: x = 0.5

Activation functions & Propagation 8


Minor in AI

Figure 6: Illustration of FP and BP

• Weight from input to hidden layer: w1 = 0.4


• Bias for hidden layer: bh = 0.1
• Weight from hidden layer to output layer: w2 = 0.7
• Bias for output layer: by = −0.2
• Target output: t = 0.6

5.2 Forward Pass


The forward pass consists of the following steps:

5.2.1 Step 1: Compute the Hidden Layer Output


The net input to the hidden layer (neth ) is calculated as:
z h = w 1 · x + bh
Substituting the values:
zh = 0.4 · 0.5 + 0.1 = 0.2 + 0.1 = 0.3
The output of the hidden layer (ah ) is obtained using the sigmoid activation function:
1
ah =
1 + e−neth
Substituting neth = 0.3:
1
ah = ≈ 0.5744
1 + e−0.3

Activation functions & Propagation 9


Minor in AI

5.2.2 Step 2: Compute the Output Layer Value


The net input to the output layer (zo ) is calculated as:
zo = w2 · ah + by
Substituting the values:
zo = 0.7 · 0.5744 − 0.2 ≈ 0.4021 − 0.2 = 0.2021
The output of the network (y) is obtained using the sigmoid activation function:
1
y=
1 + e−zo
Substituting zo = 0.2021:
1
y= ≈ 0.5504
1 + e−0.2021

5.3 Backward Pass


The backward pass involves calculating gradients and updating weights.

5.3.1 Step 1: Compute the Output Error


The error (E) for the output layer is given by:
1
E = (y − t)2
2
Substituting t = 0.6 and y = 0.5504:
1 1
E = (0.6 − 0.5504)2 = (0.0496)2 ≈ 0.00123
2 2

5.3.2 Step 2: Compute Gradients for Output Layer


The gradient of the error with respect to the output ( ∂E
∂y
) is:
∂E
= −(t − y)
∂y
Substituting t = 0.6 and y = 0.5504:
∂E
= −(0.6 − 0.5504) = −0.0496
∂y
∂y
The gradient of the output with respect to the net input to the output layer ( ∂zo
) is:
∂y
= y · (1 − y)
∂zo
Substituting y = 0.5504:
∂y
= 0.5504 · (1 − 0.5504) ≈ 0.2475
∂zo
The gradient of the error with respect to the net input to the output layer is:
∂E ∂E ∂y
= ·
∂zo ∂y ∂zo
Substituting values:
∂E
= −0.0496 · 0.2475 ≈ −0.01228
∂zo

Activation functions & Propagation 10


Minor in AI

5.3.3 Step 3: Update the Weight w2


The gradient of the error with respect to w2 is:
∂E ∂E ∂zo
= ·
∂w2 ∂zo ∂w2
∂zo
Substituting ∂w2
= ah ≈ 0.5744:

∂E
= −0.01228 · 0.5744 ≈ −0.00705
∂w2
The updated weight w2 is:
∂E
w2new = w2 − η ·
∂w2
Using a learning rate η = 0.1:

w2new = 0.7 − 0.1 · (−0.00705) ≈ 0.7007

Similarly, all the parameters such as w1 , b1 and b2 will also get updated based on their
gradient as a function of Error E.

5.4 Summary of Results


The final results after performing the forward pass and backward pass are as follows:

• Hidden layer output (ah ): 0.5744

• Network output (y): 0.5504

• Error (E): 0.00123

• Gradient for weight w2 : −0.00705

• Updated weight w2new : 0.7007

6 Drawbacks of Neural Networks


6.1 Lack of Invariance to Shifting and Other Forms of Distortion
One major limitation of neural networks, especially in traditional architectures, is their
inability to be invariant to certain transformations of the input data, such as shifting or
other forms of distortion. This means that small changes or shifts in the input data may
cause significant changes in the output.
For instance, if a neural network is trained to recognize an object in an image, a slight
shift in the position of the object could result in a misclassification. While some models,
such as Convolutional Neural Networks (CNNs), partially address this issue, basic neural
networks do not inherently possess this capability.
Example: If a neural network is trained to recognize a cat in an image and the cat
is moved slightly to the left or right in the input image, the network may not recognize
the cat as the same object anymore, leading to an incorrect prediction.

Activation functions & Propagation 11


Minor in AI

6.2 Large Number of Trainable Parameters


Another drawback is that as the network’s architecture becomes more complex (e.g., more
layers and neurons), the number of trainable parameters (weights and biases) increases
rapidly. This can make the training process computationally expensive, slow, and prone
to overfitting, especially when there is insufficient training data.
For example, if a neural network has 1000 neurons in the input layer, 100 neurons in
the hidden layer, and 10 neurons in the output layer, the number of parameters will be:

Number of parameters = (1000 × 100) + (100 × 10) + 100 + 10 = 101, 110 parameters

Training such a network requires a lot of memory and computational resources, which can
make it difficult to scale to larger datasets or more complex tasks.

7 Epochs and Iterations in Neural Network Training


7.1 What is an Epoch?
An epoch refers to one complete pass through the entire training dataset. In one epoch,
every training sample has been used once to update the weights. The purpose of multiple
epochs is to allow the network to learn and adjust its parameters iteratively to minimize
the error over time.
Example: Suppose we have a dataset of 1000 images, and during one epoch, the
network processes all 1000 images once to adjust the weights. If the training process
involves 10 epochs, then the network will process the 1000 images 10 times.

7.2 What is an Iteration?


An iteration refers to one update of the model’s weights during training. This update
occurs after processing a batch of data. If the dataset is divided into smaller subsets
(batches), each time a batch is processed and the weights are updated, it counts as one
iteration.
Example: Let’s say we have a dataset of 1000 images and we use a batch size of 100.
In this case, there will be 10 iterations per epoch (1000 images ÷ 100 images per batch
= 10 iterations).

7.3 Difference Between Epoch and Iteration


The key difference is that an epoch refers to a complete pass through the entire dataset,
while an iteration refers to a single update step based on a subset (batch) of the data.

• Epoch: One complete pass through the entire training dataset.

• Iteration: One update step based on a batch of data.

Example for Epoch and Iteration: If we have 1000 training examples and a batch
size of 100:

• 1 epoch: Process all 1000 images.

Activation functions & Propagation 12


Minor in AI

Figure 7: Epoch vs Iteration

• 10 iterations: Process the images in 10 batches (100 images each) and update the
weights after each batch.

Thus, for 1 epoch, there will be 10 iterations, and after completing all 10 iterations,
the epoch is complete. This process repeats for several epochs to improve the model’s
accuracy.

Activation functions & Propagation 13

You might also like