0% found this document useful (0 votes)
18 views29 pages

Backpropagation Algorithm

The backpropagation algorithm consists of two phases: the forward pass where inputs are passed through the network to obtain outputs, and the backward pass where the loss gradient is calculated and used to update network weights through backpropagation of error.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views29 pages

Backpropagation Algorithm

The backpropagation algorithm consists of two phases: the forward pass where inputs are passed through the network to obtain outputs, and the backward pass where the loss gradient is calculated and used to update network weights through backpropagation of error.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

10/12/2023

Backpropagation Algorithm
1
The backpropagation algorithm consists of two phases:

1. The forward pass where our inputs are passed through the network and output
predictions obtained (also known as the propagation phase).

2. The backward pass where we compute the gradient of the loss function at the
final layer (i.e., predictions layer) of the network and use this gradient to
recursively apply the chain rule to update the weights in our network (also known
as the weight update phase).

Backpropagation Algorithm
2
Search for the related information of the following concepts:

1. Forward pass

2. Backward pass

3. Gradient

4. Loss function

5. Chain rule

1
10/12/2023

3
The Forward Pass

The purpose of the


forward pass is to propagate
our inputs through the
network by applying a
series of dot products and
activations until we reach
the output layer of the
network (i.e., our
predictions).

4
Gradient

The gradient descent method is an iterative optimization


algorithm that operates over a loss landscape
As we can see, our loss landscape has many peaks and valleys
based on which values our parameters take on. Each peak is a
local maximum that represents very high regions of loss –the
local maximum with the largest loss across the entire loss
landscape is the global maximum.
Similarly, we also have local minimum which represents many
small regions of loss

2
10/12/2023

5
Gradient

The surface of our bowl is the loss landscape, which is a plot of the loss
function. The difference between our loss landscape and your cereal bowl is
that your cereal bowl only exists in three dimensions, while your loss
landscape exists in many dimensions, perhaps tens, hundreds, or thousands of
dimensions.
Each position along the surface of the bowl corresponds to a particular loss
value given a set of parameters W (weight matrix) and b (bias vector). Our
goal is to try different values of W and b, evaluate their loss, and then take a
step towards more optimal values that (ideally) have lower loss.

6
Loss function

The loss function quantifies how “good” or “bad” of a job a given


model is doing classifying data points from the dataset. Model #1
achieves considerably lower loss than Model #2.
The smaller the loss, the better a job the classifier is at modeling the
relationship between the input data and output class labels.
To improve our classification accuracy, we need to tune the
parameters of our weight matrix W or bias vector b. Exactly how
we go about updating these parameters is an optimization problem.

3
10/12/2023

7
Loss function

Backpropagation Algorithm
8

4
10/12/2023

Backpropagation Algorithm
9
we present the feature vector (0,1,1) (and
target output value 1 to the network).
Here we can see that 0, 1, and 1 have
been assigned to the three input nodes in
the network.

To propagate the values through the


network and obtain the final
classification, we need to take the dot
product between the inputs and the
weight values, followed by applying an
activation function (in this case, the
sigmoid function, σ)

The Forward Pass 10

10

5
10/12/2023

11

11

12

The output of the network is thus


0.506. We can apply a step function
to determine if this output is the
correct classification or not:

12

6
10/12/2023

13

Applying the step function with net = 0.506 we see that our network
predicts 1 which is, in fact, the correct class label. However, our
network is not very confident in this class label – the predicted value
0.506 is very close to the threshold of the step. Ideally, this prediction
should be closer to 0.98−0.99, implying that our network has truly
learned the underlying pattern in the dataset. In order for our
network to actually “learn”, we need to apply the backward pass.

13

The Backward Pass 14

14

7
10/12/2023

Implementing Backpropagation with Python


Open up a new file, name it neuralnetwork.py, store it in the nn 15
submodule of pyimagesearch, and let’s get to work:

15

Implementing Backpropagation with Python


16

Line 5 then defines the constructor to our NeuralNetwork class. The constructor requires
a single argument, followed by a second optional one:
• layers: A list of integers which represents the actual architecture of the feedforward
network. For example, a value of [2,2,1] would imply that our first input layer has two
nodes, our hidden layer has two nodes, and our final output layer has one node.
• alpha: Here we can specify the learning rate of our neural network. This value is
applied during the weight update phase.

16

8
10/12/2023

Implementing Backpropagation with Python


17

Line 8 initializes our list of weights for each layer, W.


We then store layers and alpha on Lines 9 and 10.
Our weights list W is empty, so let’s go ahead and initialize it

17

Implementing Backpropagation with Python


18

On Line 14 we start looping over the number of layers in the network (i.e.,
len(layers)), but we stop before the final two layer.
Each layer in the network is randomly initialized by constructing an MxN
weight matrix by sampling values from a standard, normal distribution (Line
18). The matrix is MxN since we wish to connect every node in current layer
to every node in the next layer.

18

9
10/12/2023

Implementing Backpropagation with Python


19

We scale w by dividing by the square root of the number of nodes in the


current layer, thereby normalizing the variance of each neuron’s output
(Line 19).

19

Implementing Backpropagation with Python


20

The final code block of the constructor handles the special case where the
input connections need a bias term, but the output does not:

Again, these weight values are randomly sampled and then normalized.

20

10
10/12/2023

Implementing Backpropagation with Python


21

The next function we define is a Python “magic method” named __repr__ –


this function is useful for debugging:

In our case, we’ll format a string for our NeuralNetwork object by


concatenating the integer value of the number of nodes in each layer.

21

Implementing Backpropagation with Python


22

Given a layers value of (2, 2, 1), the output of calling this function will be:

22

11
10/12/2023

Implementing Backpropagation with Python


23

Next, we can define our sigmoid activation function:

23

Implementing Backpropagation with Python


24

As well as the derivative of the sigmoid which we’ll use during the
backward pass:

Again, note that whenever you perform backpropagation, you’ll always


want to choose an activation function that is differentiable

24

12
10/12/2023

Implementing Backpropagation with Python


25

We’ll draw inspiration from the scikit-learn library and define a function
named fit which will be responsible for actually training our NeuralNetwork

25

Implementing Backpropagation with Python


26

26

13
10/12/2023

Implementing Backpropagation with Python


27

27

Implementing Backpropagation with Python


The actual heart of the backpropagation algorithm is found inside 28
our fit_partial method below:

28

14
10/12/2023

Implementing Backpropagation with Python


From here, we can start the forward propagation phase: 29

29

Implementing Backpropagation with Python


30

The final entry in A is thus the output of the last layer in our network

30

15
10/12/2023

Implementing Backpropagation with Python


Now that the forward pass is done, we can move on to the slightly 31
more complicated backward pass:

31

Implementing Backpropagation with Python


The first phase of the backward pass is to compute our error, or 32
simply the difference between our predicted label and the
ground-truth label (Line 91).
Since the final entry in the activations list A contains the output of
the network, we can access the output prediction via A[-1]. The
value y is the target output for the input data point x.

32

16
10/12/2023

Implementing Backpropagation with Python


Next, we need to start applying the chain rule to build our list of 33
deltas, D. The deltas will be used to update our weight matrices,
scaled by the learning rate alpha.
The first entry in the deltas list is the error of our output layer
multiplied by the derivative of the sigmoid for the output value
(Line 97)

33

Implementing Backpropagation with Python


Given the delta for the final layer in the network, we can now work 34
backward using a for loop:

34

17
10/12/2023

Implementing Backpropagation with Python


35

35

Implementing Backpropagation with Python


Given our deltas list D, we can move on to the weight update phase:36

36

18
10/12/2023

Implementing Backpropagation with Python


37

37

Implementing Backpropagation with Python


38

38

19
10/12/2023

Implementing Backpropagation with Python


39on
Once our network is trained on a given dataset, we’ll want to make predictions
the testing set, which can be accomplished via the predict method below:

39

Implementing Backpropagation with Python


40

40

20
10/12/2023

Implementing Backpropagation with Python


41

41

Implementing Backpropagation with Python


42

42

21
10/12/2023

Implementing Backpropagation with Python


The final function we’ll define inside the NeuralNetwork class will be used to 43
calculate the loss across our entire training set:

43

Backpropagation with Python Example #1: Bitwise XOR


44
Go ahead and open up a new file, name it nn_xor.py, and insert the following
code:

44

22
10/12/2023

Backpropagation with Python Example #1: Bitwise XOR


We can now define our network architecture and train it: 45

45

Backpropagation with Python Example #1: Bitwise XOR


46 to
Once our network is trained, we’ll loop over our XOR datasets, allow the network
predict the output for each one, and display the prediction to our screen:

46

23
10/12/2023

Backpropagation with Python Example #1: Bitwise XOR


47the
To train our neural network using backpropagation with Python, simply execute
following command:

47

Backpropagation with Python Example #1: Bitwise XOR


48
A plot of the squared loss is displayed below (Figure 10.11). As we can see, loss
slowly decreases to approximately zero over the course of training.

48

24
10/12/2023

Backpropagation with Python Example #1: Bitwise XOR


Furthermore, looking at the final four lines of the output we can see our 49
predictions:

49

Backpropagation with Python Example: MNIST Sample


50
Let’s examine a subset of the MNIST dataset for handwritten digit recognition. This
subset of the MNIST dataset is built-into the scikit-learn library and includes 1,797
example digits, each of which are 8×8 grayscale images (the original images are 28×28).
When flattened, these images are represented by an 8×8 = 64-dim vector.

50

25
10/12/2023

51

We also perform min/max normalizing by scaling each digit into the range [0,1] (Line 14).

51

52
Next, let’s construct a training and testing split, using 75% of the data for testing
and 25% for evaluation

We’ll also encode our class label integers as vectors, a process called one-hot encoding that
we will discuss in detail later in this chapter

52

26
10/12/2023

53

Here we can see that we are training a NeuralNetwork with a 64−32−16−10 architecture.
The output layer has ten nodes due to the fact that there are ten possible output classes
for the digits 0-9. We then allow our network to train for 1,000 epochs.

53

Once our network has been trained, we can evaluate it on the testing set: 54

54

27
10/12/2023

55

55

56

56

28
10/12/2023

57

Notice how our loss starts off very


high, but quickly drops during the
training process. Our classification
report demonstrates that we are
obtaining ≈ 98% classification
accuracy on our testing set;
however, we are having some
trouble classifying digits 4 and 5 (95%
and 94% accuracy, respectively).

57

29

You might also like