0% found this document useful (0 votes)

11 views7 pages

Annette Paper

awesome

Uploaded by

gauthamsreeji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views7 pages

Annette Paper

awesome

Uploaded by

gauthamsreeji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Neural Networks: The Backpropagation Algorithm

Annette Lopez Davila

Math 400, College of William and Mary
Professor Chi-Kwong Li

Abstract
This paper illustrates how basic theories of linear algebra and calculus can be combined with
computer programming methods to create neural networks.
Keywords: Gradient Descent, Backpropagation, Chain Rule, Automatic Differentiation,
Activation and Loss Functions

1 Introduction
As computers advanced in the 1950s, researchers attempted to simulate biologically inspired
models that could recognize binary patterns. This led to the birth of machine learning, an
application of computer science and mathematics in which systems have the ability to “learn” by
improving their performance. Neural networks are algorithms that can learn patterns and find
connections in data for classification, clustering, and prediction problems. Data including
images, sounds, text, and time series are translated numerically into tensors, thus allowing the
system to perform mathematical analysis.
In this paper, we will be exploring fundamental mathematical concepts behind neural networks
including reverse mode automatic differentiation, the gradient descent algorithm, and
optimization functions.

2 Neural Network Architecture

In order to understand Neural Networks, we must first
examine the smallest unit in a system: the neuron. A
neuron is a unit which holds a number; it is a mathematical
function that collects information. These neurons are
connected to each other in layers and are assigned an
activation value; the higher the activation value, the greater
the activation. Each activation number is multiplied with a
corresponding weight which describes connection strength
from node to node. A neural network has an architecture of
input nodes, output nodes, and hidden layers. For each
node in a proceeding layer, the weighted sum is computed:
` 𝑧𝑖 = 𝑤1 𝑎1 + 𝑤2 𝑎2 + ⋯ 𝑤𝑛 𝑎𝑛
𝑤ℎ𝑒𝑟𝑒 𝑖 = [1, # 𝑜𝑓 𝑛𝑒𝑢𝑟𝑜𝑛𝑠 𝑖𝑛 ℎ𝑖𝑑𝑑𝑒𝑛 𝑙𝑎𝑦𝑒𝑟] and n=# of activation numbers
The weighted inputs are added with a bias term in order for the output to be meaningfully active.

1
𝑧𝑖 = 𝑤1 𝑎1 + 𝑤2 𝑎2 + ⋯ 𝑤𝑛 𝑎𝑛 + 𝑏
A neural network’s hidden layers have multiple
nodes. For the first node in the hidden layer, we
multiplied the corresponding weights and biases
against the activation number. This must be
repeated throughout the nodes in the hidden
layer. The above equation can be consolidated
into vectors in order to exemplify this:

Each row in matrix 𝑤

⃗⃗ represents the weights corresponding with each hidden layer, while the
columns represent the weights corresponding to a particular activation number.

3 The Activation Function

The function 𝑧𝑖 is linear in nature; thus, a nonlinear activation function is applied for more
complex performance. Activation functions commonly used include sigmoid functions,
piecewise functions, gaussian functions, tangent functions, threshold functions, or ReLu
functions.

Function N am e Function
1
Sigmoid/Logistic 𝑓(𝑥) =
1 + 𝑒 −𝛽𝑥
0 𝑖𝑓 𝑥 ≤ 𝑥𝑚𝑖𝑛
Piecewise Linear 𝑓(𝑥) = {𝑚𝑥 + 𝑏 𝑖𝑓 𝑥max > 𝑥 > 𝑥𝑚𝑖𝑛
1 𝑖𝑓 𝑥 ≥ 𝑥max
1 −(𝑥−𝜇)2
Gaussian 𝑓(𝑥) = 𝑒 2𝜎2
√2𝜋𝜎
0 𝑖𝑓 0 > 𝑥
Threshold/Unit Step 𝑓(𝑥) = {
1 𝑖𝑓 𝑥 ≥ 0
ReLu 𝑓(𝑥) = 𝑚𝑎𝑥(0, 𝑥)
Tanh 𝑓(𝑥) = tanh (𝑥)

2
Activation function choice depends on the
range needed for the data, error, and speed.
Without an activation function, the neural
network behaves like a linear regression
model. The need for an activation function
comes from the definition of linear
functions and transformations. Previously
we discussed the linear algebra from the
input step to the hidden layer. The solution
of the function would resolve as a matrix of
weighted sums. In order to calculate an
output, the weighted sums matrix becomes
the “new” activation layer. These activation
numbers have their own sets of weights and biases. When we substitute the activation matrix for
the weighted sums matrix, we see that a composition of two linear functions is a linear function
itself. Hence, an activation function is needed.
Proof: Composition of Linear Functions
𝑧1 = 𝑤
⃗⃗ 1 𝑎 + 𝑏1
𝑧2 = 𝑤
⃗⃗ 2 𝑧1 + 𝑏2
𝑧2 = 𝑤
⃗⃗ 2 (𝑤
⃗⃗ 1 𝑎 + 𝑏1 ) + 𝑏2

𝑧2 = [𝑤 ⃗⃗ 2 𝑏⃗1 + 𝑏⃗2 ]
⃗⃗ 1 ]𝑎 + [𝑤
⃗⃗ 2 𝑤

𝐼𝑓 𝑊 = [𝑤 ⃗⃗ 2 𝑏⃗1 + 𝑏⃗2 ], 𝑡ℎ𝑒𝑛 𝑧2 = 𝑊𝑎 + 𝐵, 𝑤ℎ𝑖𝑐ℎ 𝑖𝑠 𝑎𝑙𝑠𝑜 𝑎 𝑙𝑖𝑛𝑒𝑎𝑟 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛

⃗⃗ 1 ] 𝑎𝑛𝑑 𝐵 = [𝑤
⃗⃗ 2 𝑤

With the activation function, the new weighted sum becomes:

ℎ𝑖 = 𝜎(𝑧𝑖 )= 𝜎(𝑤1 𝑎1 + 𝑤2 𝑎2 + ⋯ 𝑤𝑛 𝑎𝑛 + 𝑏)
̅̅̅1 = 𝜎(𝑤
h ⃗⃗ 𝑎 + 𝑏⃗)
4 The Cost/Loss Function
A neural network may have thousands of parameters. Some combinations of weights and biases
will produce better output for the model. For example, in a binary classification problem, the
algorithm will classify some input as one of two things. The output node with the highest
activation number will determine how the input is classified. In a binary classification problem,
there are two labels. For example, an image can be determined to be a cat or dog; the feature
“cat” is given the label of 0 and “dog” is given label 1. Different weights and biases will produce
different output. How can we determine which combination of parameters will be most accurate?
In order to measure error, a loss function is necessary. The loss function tells the machine how
far away the combination of weights and biases is from the optimal solution. There are many loss

3
functions that can be used in neural networks; Mean Squared Error and Cross Entropy Loss are
two of the most common.
MSE Cost= 𝛴0.5(𝑦 − 𝑦̂)2
Cross Entropy Cost= 𝛴 ( 𝑦̂ 𝑙𝑜𝑔(𝑦) + (1 − 𝑦̂) 𝑙𝑜𝑔(1 − 𝑦))
The loss function contains every weight and bias in the neural network. That can be a very big
function!
𝐶(𝑤1 , 𝑤2 , … , 𝑤ℎ , 𝑏1 , … , 𝑏𝑖 )

5 The Backpropagation Algorithm

The objective of machine learning involves the optimization of the chosen loss function. With
every epoch, the machine “learns” by adapting the weights and biases to minimize the loss.
Optimization theory centers itself on calculus. For neural networks in particular, reverse-mode
automatic differentiation serves a core role.
In order to minimize the cost function, one must determine which weights and biases to adjust.
Computing the gradient with respect to the parameters can help us do just that, as by definition
the gradient is a vector of partial derivatives of 𝐶(𝑤1 , 𝑤2 , … , 𝑤ℎ , 𝑏1 , … , 𝑏𝑖 ). As we recall,
derivatives measure the change of a function’s output with respect to its input. The gradient of
the cost function tells us in which direction 𝐶(𝑤1 , 𝑤2 , … , 𝑤ℎ , 𝑏1 , … , 𝑏𝑖 ) decreases most quickly.
This is often known as Gradient Descent. With each epoch, the machine converges towards the
local minimum. Automatic differentiation combines the chain rule with massive computational
power in order to derive the gradient from a potentially massive, complex model. In reverse, this
algorithm is better known as Backpropagation. Backpropagation is recursively done through
every single layer of the neural network.
In order to understand the basic workings of backpropagation, let us look at the simplest example
of a neural network: a network with only one node per layer.

We have derived the equations for cost, weighted sum, and activated weighted sum:

𝑧 𝐿 = 𝑤 𝐿 𝑎𝐿−1 + 𝑏 𝐿
𝑎𝐿 = 𝜎(𝑧 𝐿 )
1
𝐶 = (𝑎𝐿 − 𝑦)2 ∗

1
The cost function is simplified for proof of concept

4
We can determine how sensitive the cost function is to changes in a single weight. Beginning
from the output, we can apply the chain rule to every activation layer. For a weight between the
hidden layer and output layer, our derivative is:
𝛿𝐶𝑘 𝛿𝑧 𝐿 𝛿𝑎𝐿 𝛿𝐶𝑘
=
𝛿𝑤 𝐿 𝛿𝑤 𝐿 𝛿𝑧 𝐿 𝛿𝑎𝐿
With the definition of the functions, we can easily solve for the partial derivatives:
𝛿𝐶𝑘
= 2(𝑎𝐿 − 𝑦)
𝛿𝑎
𝛿𝑎𝐿
= 𝜎 ′ (𝑧 𝐿 )
𝛿𝑧 𝐿
𝛿𝑧
= 𝑎𝐿−1
𝛿𝑤 𝐿
𝛿𝐶𝑘
= 𝑎𝐿−1 𝜎 ′ (𝑧 𝐿 ) 2(𝑎𝐿 − 𝑦)
𝛿𝑤 𝐿

This method is iterated through every weight, activation number, and bias in the system.
Previously, we calculated the derivative of one particular cost function with one variable.
However, in order to account for every weight in that layer, the average of the derivatives is
taken:
𝑛−1
𝛿𝐶 1 𝛿𝐶𝑘
= ∑
𝛿𝑤 𝐿 𝑛 𝛿𝑤 𝐿
𝑘=0

Similarly, we can calculate the sensitivity of the cost function with respect to a single bias
between the hidden layer and the output layer and the derivative accounting for every bias in a
layer:
𝑛−1
𝛿𝐶𝑘 𝛿𝑧 𝐿 𝛿𝑎𝐿 𝛿𝐶 𝛿𝐶 1 𝛿𝐶𝑘
= 𝛿𝑏𝐿 𝛿𝑧 𝐿 𝛿𝑎𝐿= 𝜎 ′ (𝑧 𝐿 ) 2(𝑎𝐿 − 𝑦) = 𝑛∑
𝛿𝑏 𝐿 𝛿𝑏 𝐿 𝑘=0 𝛿𝑏
𝐿

What happens when we go beyond the output layer and the preceding hidden layer? The chain
rule is applied once more, and the derivative changes in account to its partials. For example, the
derivative below accounts for the partials of the cost function with respect to an input activation
number.
𝛿𝐶𝑘 𝛿𝑧 𝐿 𝛿𝑎𝐿 𝛿𝐶
= 𝛿𝑎𝐿−1 𝛿𝑧 𝐿 𝛿𝑎𝐿= 𝑤 𝐿 𝜎 ′ (𝑧 𝐿 ) 2(𝑎𝐿 − 𝑦)
𝛿𝑎𝐿−1

Neural Networks tend to have several thousand inputs, outputs, and nodes; the above equations
seem highly oversimplified. Although adding complexity changes the formulas slightly, the
concepts remain the same, as seen below:

5
𝑛𝐿 −1
2
𝐶𝑚 = ∑ (𝑎𝑗𝐿 − 𝑦𝑗 )
𝑗=0

𝑎𝑗 = 𝜎(𝑧𝑗𝐿 )

𝑧𝑗𝐿 = ⋯ + 𝑤𝑗𝑘
𝐿 𝐿−1
𝑎𝑘 + ⋯

𝛿𝐶𝑚 𝛿𝑧𝑗 𝐿 𝛿𝑎𝑗 𝐿 𝛿𝐶𝑚

=
𝛿𝑤𝑗𝑘 𝐿 𝛿𝑤𝑗𝑘 𝐿 𝛿𝑧𝑗 𝐿 𝛿𝑎𝑗 𝐿
𝑛𝐿 −1
𝛿𝐶𝑚 𝛿𝑧𝑗 𝐿 𝛿𝑎𝑗 𝐿 𝛿𝐶𝑚
= ∑
𝛿𝑎𝐿−1 𝛿𝑎𝑘 𝐿−1 𝛿𝑧𝑗 𝐿 𝛿𝑎𝑗 𝐿
𝑗=0

By calculating every derivative of each weight and bias, the gradient vector can be found.
Although one could try to compute the gradient of a neural network by hand, the vector will
usually be in complex dimensions unfathomable for us to decipher. Thus, with computational
help, our neural network can perform such intricate calculations, and repeat them hundreds, if
not thousands of times until the minimum is reached.
𝛿𝐶
𝛿𝑤 1
𝛿𝐶
𝛿𝑏1
∇𝐶 = ⋮
𝛿𝐶
𝛿𝑤 𝐿
𝛿𝐶
[ 𝛿𝑏 𝐿 ]

6 Applications and Further Research

Automatic differentiation has many applications other than in machine learning such as in Data
Assimilation, Design Optimization, Numerical Methods, and Sensitivity Analysis. It is efficient,
stable, precise, and known to be a better choice than other types of computer-based
differentiation. Backpropagation has been called into question recently, as it does not learn
continuously. For example, our brains learn continuously; they do not forget information when
we learn something new. Because of this, backpropagation may be sidelined in Machine
Learning in the future.
Applications of Neural Networks trained with Backpropagation vary greatly. Such applications
include sonar target recognition, text recognition, network controlled steering of cars, face
recognition software, remote sensing, and robotics.

6
7 Works Cited
Images
(n.d.). Retrieved September 03, 2020, from
https://www.bing.com/images/search?view=detailV2,fashion mnist shoe
(n.d.). Retrieved September 03, 2020, from
https://www.bing.com/images/search?view=detailV2,Gradient Descent Animation 3D
(n.d.). Retrieved September 03, 2020, from
https://www.bing.com/images/search?view=detailV2,loss vs accuracy function neural
network
(n.d.). Retrieved September 03, 2020, from
https://www.bing.com/images/search?view=detailV2,neural netwrok diagram
(n.d.). Retrieved September 03, 2020, from
https://www.bing.com/images/search?view=detailV2,neural netwrok diagram
(n.d.). Retrieved September 03, 2020, from
https://www.bing.com/images/search?view=detailV2,sigmoid function
(n.d.). Retrieved September 03, 2020, from
https://www.saedsayad.com/artificial_neural_network.htm
Sources
Gajawada, S. (2019, November 19). The Math behind Artificial Neural Networks. Retrieved
September 03, 2020, from https://towardsdatascience.com/the-heart-of-artificial-neural-
networks-26627e8c03ba
Kostadinov, S. (2019, August 12). Understanding Backpropagation Algorithm. Retrieved
September 03, 2020, from https://towardsdatascience.com/understanding-backpropagation-
algorithm-7bb3aa2f95fd
Repetto, A. (2017, August 19). The Problem with Back-Propagation. Retrieved September 03,
2020, from https://towardsdatascience.com/the-problem-with-back-propagation-
13aa84aabd71
Silva, S. (2020, March 28). The Maths behind Back Propagation. Retrieved September 03, 2020,
from https://towardsdatascience.com/the-maths-behind-back-propagation-cf6714736abf
Skalski, P. (2020, February 16). Deep Dive into Math Behind Deep Networks. Retrieved
September 03, 2020, from https://towardsdatascience.com/https-medium-com-piotr-
skalski92-deep-dive-into-deep-networks-math-17660bc376ba
Victor Zhou. (n.d.). Machine Learning for Beginners: An Introduction to Neural Networks.
Retrieved September 03, 2020, from https://victorzhou.com/blog/intro-to-neural-networks/

Apache Spark With Scala - Cheatsheet
No ratings yet
Apache Spark With Scala - Cheatsheet
7 pages
Notes On Introduction To Deep Learning
No ratings yet
Notes On Introduction To Deep Learning
19 pages
Output Primitives
No ratings yet
Output Primitives
31 pages
How To Build Your Own Neural Network From Scratch in
No ratings yet
How To Build Your Own Neural Network From Scratch in
6 pages
Lacey Thacker Tutorial
No ratings yet
Lacey Thacker Tutorial
10 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
Understanding Backpropagation Algorithm - Towards Data Science
No ratings yet
Understanding Backpropagation Algorithm - Towards Data Science
11 pages
Backpropagation Algorithm
No ratings yet
Backpropagation Algorithm
12 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Assignment - 4
No ratings yet
Assignment - 4
24 pages
Ann MJJ-1
No ratings yet
Ann MJJ-1
64 pages
Types of MAC Protocols
No ratings yet
Types of MAC Protocols
16 pages
Activation Function To Back Pro
No ratings yet
Activation Function To Back Pro
22 pages
MLS 1 - Presentation
No ratings yet
MLS 1 - Presentation
11 pages
Neural Networks Essay Feranmi Dere
No ratings yet
Neural Networks Essay Feranmi Dere
7 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
Neural
No ratings yet
Neural
53 pages
Working of Multi-Layer Perceptron
No ratings yet
Working of Multi-Layer Perceptron
16 pages
Understanding and Creating Neural Networks
No ratings yet
Understanding and Creating Neural Networks
69 pages
Single Neuron Model
No ratings yet
Single Neuron Model
16 pages
NN 2
No ratings yet
NN 2
12 pages
EE769 7 Introduction To Neural Networks
No ratings yet
EE769 7 Introduction To Neural Networks
52 pages
Neural Network
100% (1)
Neural Network
54 pages
Classification BP Regression KNN Other Classifiers - Final
No ratings yet
Classification BP Regression KNN Other Classifiers - Final
116 pages
DEEP LEARNING Paper
No ratings yet
DEEP LEARNING Paper
12 pages
EPS-DL-Handout3-Build ANN From Scratch Basics
No ratings yet
EPS-DL-Handout3-Build ANN From Scratch Basics
25 pages
Neural Networks
No ratings yet
Neural Networks
52 pages
An Introduction To Mathematics Behind Neural Networks
No ratings yet
An Introduction To Mathematics Behind Neural Networks
5 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
07autodiff Nnets
No ratings yet
07autodiff Nnets
12 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
Mid 1 DL Notes
No ratings yet
Mid 1 DL Notes
15 pages
cst414 - Deep Learning
No ratings yet
cst414 - Deep Learning
34 pages
Neural Network (Perceptrons)
No ratings yet
Neural Network (Perceptrons)
31 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
36 pages
Unit - 4 Artificial Neural Networks
No ratings yet
Unit - 4 Artificial Neural Networks
33 pages
Tensorflow Keras Pytorch: Step 1: For Each Input, Multiply The Input Value X With Weights W
No ratings yet
Tensorflow Keras Pytorch: Step 1: For Each Input, Multiply The Input Value X With Weights W
6 pages
NN Concepts
No ratings yet
NN Concepts
4 pages
06 AIS302 ANN Backpropagation
No ratings yet
06 AIS302 ANN Backpropagation
83 pages
Main
No ratings yet
Main
25 pages
DL U-I Introduction Part-2
No ratings yet
DL U-I Introduction Part-2
48 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
34 pages
ANN-Implemetation of Back-Prop
No ratings yet
ANN-Implemetation of Back-Prop
89 pages
Introduction Deep Eng
No ratings yet
Introduction Deep Eng
50 pages
Computing Gradient Using Backpropagation: ZV0GDF798E
No ratings yet
Computing Gradient Using Backpropagation: ZV0GDF798E
5 pages
Neural Networks - Learning
No ratings yet
Neural Networks - Learning
26 pages
Lecture20 Backprop
No ratings yet
Lecture20 Backprop
77 pages
Back Propagation
No ratings yet
Back Propagation
27 pages
Session XX - Neural Network
No ratings yet
Session XX - Neural Network
43 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
Lecture 0.4 - Neural Networks
No ratings yet
Lecture 0.4 - Neural Networks
51 pages
Automatic Differentiation and Neural Networks
No ratings yet
Automatic Differentiation and Neural Networks
13 pages
cs188 sp23 Note25
No ratings yet
cs188 sp23 Note25
8 pages
Neural Network 2
No ratings yet
Neural Network 2
14 pages
Unit 2 Deep Learning and Neural Networks
No ratings yet
Unit 2 Deep Learning and Neural Networks
38 pages
Unit III
No ratings yet
Unit III
37 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
Unit 2 Deep Learning
No ratings yet
Unit 2 Deep Learning
19 pages
Data Mining Techniques: Presentation On Neural Network
No ratings yet
Data Mining Techniques: Presentation On Neural Network
55 pages
Dm00215061 Stm32 Crypto Library Stmicroelectronics
No ratings yet
Dm00215061 Stm32 Crypto Library Stmicroelectronics
189 pages
117 Polynomial Problems From Amsp Toc
No ratings yet
117 Polynomial Problems From Amsp Toc
3 pages
Image Features Using Wavelets and Applications To Document Image Processing
No ratings yet
Image Features Using Wavelets and Applications To Document Image Processing
71 pages
Operation Reaserch Concept
No ratings yet
Operation Reaserch Concept
2 pages
DAA Unit-1
No ratings yet
DAA Unit-1
78 pages
Python Programming KNC 402 - 22 23
No ratings yet
Python Programming KNC 402 - 22 23
2 pages
14-Solving Fem Equations
No ratings yet
14-Solving Fem Equations
14 pages
MS14E Chapter 21 Final
No ratings yet
MS14E Chapter 21 Final
12 pages
Dip Part-C PDF
No ratings yet
Dip Part-C PDF
4 pages
Introduction To Finite Difference Method
No ratings yet
Introduction To Finite Difference Method
16 pages
Deep Learning Techniques For Vehicle Detection and Classification From Images Videos - A Survey
No ratings yet
Deep Learning Techniques For Vehicle Detection and Classification From Images Videos - A Survey
35 pages
Huang2024 - Improved Genetic Algorithm For Multi-Threshold Optimization in Digital Pathology Image Segmentation
No ratings yet
Huang2024 - Improved Genetic Algorithm For Multi-Threshold Optimization in Digital Pathology Image Segmentation
21 pages
DSP 1
No ratings yet
DSP 1
3 pages
深度强化学习（初稿）
No ratings yet
深度强化学习（初稿）
289 pages
Computer Graphics: Bresenham Circle Generation Algorithm Taher S. Vijay Computer Academy
No ratings yet
Computer Graphics: Bresenham Circle Generation Algorithm Taher S. Vijay Computer Academy
13 pages
Mba 2 Sem Operations Research Nmba 025 2016 17
No ratings yet
Mba 2 Sem Operations Research Nmba 025 2016 17
2 pages
SEHH1008 Chapter 11 Linear Programming - Sensitivity Analysis
No ratings yet
SEHH1008 Chapter 11 Linear Programming - Sensitivity Analysis
23 pages
Shell Sort
No ratings yet
Shell Sort
7 pages
Module 2A Divide and Conquer Introduction
No ratings yet
Module 2A Divide and Conquer Introduction
12 pages
DFS, BFS, TSP
No ratings yet
DFS, BFS, TSP
4 pages
Polynomial Extra Questions (Student Copy)
No ratings yet
Polynomial Extra Questions (Student Copy)
2 pages
Experiment-1-B E-Example - 1 Date:13-10-2020: %discretizing The Interval I %finding The Values of F (X) at T Values
No ratings yet
Experiment-1-B E-Example - 1 Date:13-10-2020: %discretizing The Interval I %finding The Values of F (X) at T Values
6 pages
2020HW7
No ratings yet
2020HW7
2 pages
05 - Spatial Filtering
No ratings yet
05 - Spatial Filtering
70 pages
Exp No 2
No ratings yet
Exp No 2
5 pages
Programming Questions
No ratings yet
Programming Questions
16 pages
A22 Sayson Ce50p 2 La2 Excel Solution
No ratings yet
A22 Sayson Ce50p 2 La2 Excel Solution
6 pages

Annette Paper

Uploaded by

Annette Paper

Uploaded by

Neural Networks: The Backpropagation Algorithm

Annette Lopez Davila

2 Neural Network Architecture

Each row in matrix 𝑤

3 The Activation Function

𝐼𝑓 𝑊 = [𝑤 ⃗⃗ 2 𝑏⃗1 + 𝑏⃗2 ], 𝑡ℎ𝑒𝑛 𝑧2 = 𝑊𝑎 + 𝐵, 𝑤ℎ𝑖𝑐ℎ 𝑖𝑠 𝑎𝑙𝑠𝑜 𝑎 𝑙𝑖𝑛𝑒𝑎𝑟 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛

With the activation function, the new weighted sum becomes:

5 The Backpropagation Algorithm

𝛿𝐶𝑚 𝛿𝑧𝑗 𝐿 𝛿𝑎𝑗 𝐿 𝛿𝐶𝑚

6 Applications and Further Research

You might also like