Backpropagation Algorithm
Backpropagation Algorithm
Backpropagation
Algorithm
365 DATA SCENCE 2
Table of Content
Abstract .....................................................................................................................................3
1. The Specific Net and Annotation We Will Examine ........................................................4
2. Useful Formulas ...................................................................................................................5
3. Backpropagation for the Output Layer ............................................................................5
4. Backpropagation of a Hidden Layer .................................................................................6
5. Backpropagation Generalization........................................................................................7
365 DATA SCENCE 3
Abstract
In order to get a truly deep understanding of deep neural networks, one must look
at the mathematics of it. The backpropagation algorithm trains a neural network
through a method called chain rule. As it is at the core of the optimization process, we
wanted to introduce you to it. This is definitely not a necessary part of the course, as
in TensorFlow, sk-learn, or any other machine learning package (as opposed to simply
NumPy), will have backpropagation methods incorporated.
Figure 1: Backpropagation
We have two inputs: x1 and x2. There is a single hidden layer with 3 units (nodes): h1, h2,
and h3. Finally, there are two outputs: y1 and y2. The arrows that connect them are the
weights. There are two weights matrices: w, and u. The w weights connect the input
layer and the hidden layer. The u weights connect the hidden layer and the output
layer. We have employed the letters w, and u, so it is easier to follow the computation
to follow.
You can also see that we compare the outputs y1 and y2 with the targets t1 and t2.
There is one last letter we need to introduce before we can get to the compu- tations.
Let a be the linear combination prior to activation. Thus, we have: a(1) =xw+b(1) and a (2)
=hu+b(2).
Since we cannot exhaust all activation functions and all loss functions, we will focus on
two of the most common. A sigmoid activation and an L2-norm loss.
With this new information and the new notation, the output y is equal to the ac- tivated
linear combination. Therefore, for the output layer, we have y = σ(a(2)), while for the
hidden layer: h = σ(a(1)).
We will examine backpropagation for the output layer and the hidden layer separately,
as the methodologies differ
365 DATA SCENCE 5
2. Useful Formulas
we must calculate
Let’s take a single weight uij . The partial derivative of the loss w.r.t. uij equals:
where i corresponds to the previous layer (input layer for this transformation) and
j corresponds to the next layer (output layer of the transformation). The partial
derivatives were computed simply following the chain rule.
365 DATA SCENCE 6
Therefore, the update rule for a single weight for the output layer is given by:
Similarly to the backpropagation of the output layer, the update rule for a
single weight, wij would depend on:
and
The actual problem for backpropagation comes from the term . That’s due
to the fact that there is no ”hidden” target. You can follow the solution for weight w11
below. It is advisable to also check Figure 1, while going through the computations.
365 DATA SCENCE 7
From here, we can calculate , which was what we wanted. The final expression
is:
5. Backpropagation Generalization
Using the results for backpropagation for the output layer and the hidden layer, we
can put them together in one formula, summarizing backpropagation, in the presence
of L2-norm loss and sigmoid activations.
Copyright 2022 365 Data Science Ltd. Reproduction is forbidden unless authorized. All rights reserved.
Learn DATA SCIENCE
anytime, anywhere, at your own pace.
If you found this resource useful, check out our e-learning program. We have
everything you need to succeed in data science.
Learn the most sought-after data science skills from the best experts in the field!
Earn a verifiable certificate of achievement trusted by employers worldwide and
future proof your career.
$432 $172.80/year
Iliya Valchanov
Email: team@365datascience.com