0% found this document useful (0 votes)

56 views38 pages

Curs3site PDF

Uploaded by

Gigi Florica

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views38 pages

Curs3site PDF

Uploaded by

Gigi Florica

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

18 October, 2016

Neural Networks
Course 3: Gradient Descent and Backpropagation
Overview

 Feed Forward Network Architecture

 Gradient Descent
 Backpropagation

A network to read digits

 Conclusions
Feed Forward Network
Architecture
Feed Forward Network Architecture

 Composed of at least 3 layers:

 The first one is called the input layer
 The last one is the output layer
 The ones in the middle are called hidden layers. The hidden layer is composed of
hidden units.
 Each unit is symmetrically connected with all the units in the layer above (there
are no loops)
Feed Forward Network Architecture

 The neuron in the hidden layer and in the output layer are non-linear
neurons. Most of the times are logistic neurons (sigmoid ), but can also be
tanh, rectified linear units or based on other non-linear function

 This type of network is also called Multi Layer Perceptron (MLP), but this is
confusing, since the perceptron is rarely used in this kind of architecture

 Why not use perceptron?

Feed Forward Network Architecture

 Why not use the perceptron

 Learning is performed by slightly modifying weights and observing the output.
However, since the perceptron only outputs 0 or 1 it is possible to update the
weights and see not change at all

y=1 y=1

y=0.5

𝜃 -6 -4 -2 0 2 4 6
𝑤𝑖 𝑥𝑖
𝑖
Gradient Descent
Gradient Descent

 Adjust the weights and biases such that to minimize a cost function
 A cost function is a mathematical function that assigns a value (cost) to
how bad a sample is classified
 A common used function (that we will also use) is the mean squared error
1
𝐶 𝑤, 𝑏 = 𝑥| 𝑡 − 𝑎 |2
2𝑛

1
𝐶 𝑤, 𝑏 = (𝑡 − 𝑎)2
2𝑛
𝑥

w=all weights in the network b = all biases in the network

t = target output vector for input x a = output when input is x. y(x)
| 𝑣 | = length of the vector
Gradient Descent

 Why use a cost function? (why not just count the correct outputs)
 A small update in the weights might not result in a change in the number of
correctly classified samples

 Why use Mean Square Error ?

 We can interpret the formula as being very similar to the Euclidian distance, thus
this can be interpreted as minimizing the distance between the target and the
output
 The formula also resembles the one for the variance (which computes how far the
elements are from the mean). So, for example in regression, this would be
equivalent to reducing how far the elements go from the mean (the target
hyperplane)
 It is continuous and it is easy differentiable (which will be useful later)
Gradient Descent

 What is a gradient?
 A gradient is just a fancy word for derivate 
 The gradient (first derivative) determines the slope of the tangent of the graph of
the function. (points in the direction of the greatest rate of increase)
Gradient Descent

 A function with multiple variables have multiple derivatives:

𝜕𝑓 𝜕𝑓 𝜕𝑓
f(x,y,z) has 3 derivatives, , ,
𝜕𝑥 𝜕𝑦 𝜕𝑧
𝜕𝑓
means partial derivative and is obtained by differentiating f with
𝜕𝑥
respect to x and considering all the other variables (y, z) as constants

 If x changes with Δ𝑥, y with Δ𝑦 and z with Δ𝑧 then:

𝜕𝑓 𝜕𝑓 𝜕𝑓
Δ𝑓 = Δ𝑥 + Δy + Δz
𝜕𝑥 𝜕𝑦 𝜕𝑧
Gradient Descent

 Minimizing the Cost function:

Suppose the Cost functions has
only two variables (v1 and v2 )
and its geometric representation
is a quadratic bowl.

If we move in the direction v1 by

Δ𝑣1 and in the direction v2 by
Δ𝑣2 then

𝜕𝐶 𝜕𝐶
Δ𝐶 𝑣1, 𝑣2 = 𝜕𝑣1 Δ𝑣1 + 𝜕𝑣2 Δ𝑣2
Gradient Descent

 Minimizing the Cost function:

𝜕𝐶 𝜕𝐶
𝛻𝐶 = ( , )
𝜕𝑣1 𝜕𝑣2
and
Δ𝑣 = (Δ𝑣1, Δ𝑣2)
then
Δ𝐶 𝑣1, 𝑣2 = 𝛻𝐶 ∙ Δv
Since we want to always move downwards (minimizing C function) we
want Δ𝐶 to be negative. So

Δv = −𝜂𝛻𝐶

Where 𝜂 is a small number, called learning rate

Gradient Descent

 Minimizing the Cost function:

 So, by adjusting v1 and v2 such that Δ𝑣 = −𝜂𝛻𝐶 will always lead to a
smaller value for C.

 Repeating the above step multiple times drives us to a local minimum

 The learning rate must be small enough to not jump over the minimum

 Even though we have used just two variables, the same principle can be
used for any derivable Cost function of many variables
Gradient Descent

 Example:
 Performing gradient descent for the Adaline perceptron.
Adaline Perceptron:

The activation function:

y = 𝑤𝑥 + 𝑏

We will use the mean square error as the cost functions

1 2
𝐶 𝑤, 𝑏 = 𝑥 𝑡−𝑦
2𝑛
Gradient Descent

 Example:
 Performing gradient descent for the Adaline perceptron.
1. Compute the gradient 𝛻𝐶

𝑑𝐶 𝑑𝐶
𝛻𝐶 = ( , )
𝑑𝑤 𝑑𝑏

Remember the chain rule:

1
C = a function of the variable y (output) 𝑡−𝑦
𝑥
2
2𝑛
Y = a function of the variable (w) 𝑦 = 𝑤𝑥 + 𝑏

𝜕𝐶 𝜕𝐶 𝜕𝑦
= ∙
𝜕𝑤 𝜕𝑦 𝜕𝑤
Gradient Descent

 Example:
 Performing gradient descent for the Adaline perceptron.
1. Compute the gradient 𝛻𝐶

1 ′ 1 1
2
C′ = 𝑥 𝑡−𝑦 = 2 𝑥 (𝑡 − 𝑦)′ = 𝑥(𝑡 − 𝑦)(−𝑦)′
2𝑛 2𝑛 𝑛

𝑑𝑦 𝑑 𝑤𝑥+𝑏
= =𝑥
𝑑𝑤 𝑑𝑤
𝑑𝑦 𝑑 𝑤𝑥+𝑏
= =1
𝑑𝑏 𝑑𝑏
-----------------------------------------------------------------------------------------------
𝑑𝐶 1 𝑑𝐶 1
=− 𝑥 𝑡−𝑦 𝑥 =− 𝑥 𝑡−𝑦
𝑑𝑤 𝑛 𝑑𝑏 𝑛
Gradient Descent

 Example:
 Performing gradient descent for the Adaline perceptron.
2. Choose a learning rate 𝜂

3. For a fixed number of iterations:

𝑑𝐶 𝜂
adjust w: 𝑤 = 𝑤 − 𝜂 =𝑤+ 𝑡−𝑦 𝑥
𝑑𝑤 𝑛 𝑥
𝑑𝐶 𝜂
adjust b: 𝑏 = 𝑏 − 𝜂 =𝑏+ 𝜂(𝑡 − 𝑦)
𝑑𝑏 𝑛 𝑥

 This is the same update rule we used for the Adaline in the previous
course. The formula looks different since now we are averaging over all
samples
Gradient Descent

 Wewill usually be performing stochastic gradient descent (mini-

batch) instead gradient descent.

 The idea is very simple:

 Choose a subset of smaller size that approximates the 𝛻𝐶. Use that
instead of the real 𝛻𝐶.
 Update the weights using the previously computed value
 Choose another subset from the training data and repeat the previous
steps until all the samples have been processed

𝑠𝑖𝑧𝑒 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑑𝑎𝑡𝑎

This speeds up the learning process by a factor of 𝑠𝑖𝑧𝑒 𝑜𝑓 𝑚𝑖𝑛𝑖𝑏𝑎𝑡𝑐ℎ
Backpropagation

 Using
gradient descent we can optimize a perceptron.
However, can we train a network with multiple neurons?
 Yes, but not yet ! 
 Why?
Backpropagation

 Using
gradient descent we can optimize a perceptron.
However, can we train a network with multiple layers?
 Yes, but not yet ! 
 Why? We know the error
at the last layers,
so we can update
We don’t know how the immediate
much of the error weights that affect
depends on the that error
previous layers.

We need to
backpropagate the
error
Backpropagation
 Some notations:
 wijl = the weight from the neuron i from the l − 1 layer to the neuron j in the l layer
 L = the last layer
 yil = the activation of neuron i from the l layer
 bli = the bias of neuron i from the l layer
 𝑧𝑖𝑙 = the net input for the neuron 𝑖 from the 𝑙 layer ( 𝑙 𝑙−1
𝑗 𝑤𝑗𝑖 𝑦𝑗 + 𝑏𝑖𝑙 )

Layer 2 Layer 3 𝑦43

Layer 1
Layer 4 𝑧43

3 3
𝑤14 𝑤44
3 3
𝑤24 𝑤34

𝑦12 𝑦42
3
𝑤34 𝑦22 𝑦32
Backpropagation

 We can adjust the error by adjusting the bias and the weights of each
neuron 𝑖 from each layer 𝑙. This will change the net input from 𝑧𝑖𝑙 to (𝑧𝑖𝑙 +
Δ𝑧𝑖𝑙 ) and the activation from 𝜎(𝑧𝑖𝑙 ) to 𝜎(𝑧𝑖𝑙 + Δ𝑧𝑖𝑙 )

𝜕𝐶
 In this case, the cost function will change by Δ𝐶 = Δ𝑧𝑖𝑙
𝜕𝑧𝑖𝑙

Layer 2 Layer 3
Layer 1
Layer 4

𝜕𝐶 𝑙
𝐶+ 𝑙 Δ𝑧𝑖
Δ𝑧𝑖𝑙 𝜕𝑧𝑖
Backpropagation

𝜕𝐶
 We can minimize 𝐶 by making Δ𝐶 = Δ𝑧𝑖𝑙 negative:
𝜕𝑧𝑖𝑙
𝜕𝐶
 Δ𝑧𝑖𝑙 must have an opposite sign to
𝜕𝑧𝑖𝑙

Considering that Δ𝑧𝑖𝑙 is a small number (since we want to make small changes), the
𝜕𝐶
amount by how the error is minimized depends on how large is
𝜕𝑧𝑖𝑙
𝜕𝐶
If is close to zero, then the cost can not be further reduced
𝜕𝑧𝑖𝑙

𝜕𝐶
Thus, we can consider that represents a measure of the error for the neuron 𝑖 in
𝜕𝑧𝑖𝑙
the layer 𝑙
𝜕𝐶
We will use 𝛿𝑖𝑙 = to represent the error .
𝜕𝑧𝑖𝑙

Note, that we could have also represented the error in respect to the output 𝑦𝑖𝑙 but
the current variant results in using less formulas
Backpropagation

 How backpropagation algorithm works:

 We will compute the error for each neuron at the last layer: 𝛿𝑖𝐿
For each layer 𝑙 , starting from the last one to the first:
For each neuron 𝑖 in the layer 𝑙
 We will compute the error : 𝛿𝑖𝑙
𝜕𝐶 𝜕𝐶
 Using this value we will compute and
𝜕𝑏𝑖𝑙 𝑙
𝜕𝑤𝑖𝑗

 We will back-propagate the error to the neurons in the previous layer and
will repeat the above steps
Backpropagation

 Compute how the cost function depends on the error from the last layer

𝜕𝐶 𝜕𝐶 𝜕𝑦𝑖𝐿 𝜕𝐶 𝐿
𝐿 = 𝐿 ∙ 𝐿 = 𝐿 ∙ 𝜎′(𝑧𝑖)
𝜕𝑧𝑖 𝜕𝑦𝑖 𝜕𝑧𝑖 𝜕𝑦𝑖
Layer L-1

𝜕𝐶 𝜕𝐶 Last layer
𝛿𝑖𝐿 = 𝐿 = 𝐿 ∙ 𝜎′(𝑧𝑖𝐿 )
𝜕𝑧𝑖 𝜕𝑦𝑖 𝑧𝑖𝐿 𝑦𝑖𝑙
𝐶
Backpropagation

 Backpropagate the error (write the error in respect to the error in the next
layer)
𝜕𝐶 𝜕𝐶 𝜕𝑦𝑖𝑙 𝜕𝑦𝑖𝑙 𝜕𝐶 𝜕𝑧𝑘𝑙+1 ′ 𝑧𝑖
𝑙 = 𝑙∙ 𝑙 = 𝑙 ∙ ∙ = 𝜎 𝑙 𝛿𝑖𝑙+1 ∙ 𝑤𝑖𝑘
𝑙+1
𝜕𝑧𝑖 𝜕𝑦𝑖 𝜕𝑧𝑖 𝜕𝑧𝑖 𝜕𝑧𝑘𝑙+1 𝜕𝑦𝑖𝑙
𝑘 𝑘

Layer l+1
𝜕𝐶 𝑧𝑘𝑙+1
𝛿𝑖𝑙 = 𝑙 = 𝜎 ′ 𝑧𝑖𝑙 𝛿𝑖𝑙+1 ∙ 𝑤𝑖𝑘
𝑙+1
𝜕𝑧𝑖 𝐶
𝑘

𝑙+1
𝑤𝑖𝑘

𝑦𝑖𝑙
𝑧𝑖𝑙
Backpropagation

 Compute how the cost function depends on a weight

𝜕𝐶 𝜕𝐶 𝜕𝑧𝑖𝑙 𝑙 𝑙−1
𝑙 = ∙ = 𝛿𝑖 ∙ 𝑎𝑘
𝜕𝑤𝑘𝑖 𝜕𝑧𝑖𝑙 𝜕𝑤𝑘𝑖
𝑙
Layer 3

Layer 4

𝑙 𝑧𝑖𝑙
𝑤𝑘𝑖
𝜕𝐶 𝑙 𝑙−1 𝑎𝑘𝑙−1
𝑙 = 𝛿𝑖 ∙ 𝑎𝑘
𝜕𝑤𝑘𝑖 𝑧𝑘𝑙−1
Backpropagation

 Compute how the cost function depends on a bias

𝜕𝐶 𝜕𝐶 𝜕𝑧𝑖𝑙 𝑙
𝑙 = 𝑙 ∙ 𝑙 = 𝛿𝑖 ∙
𝜕𝑏𝑖 𝜕𝑧𝑖 𝜕𝑏𝑖 Layer 2 Layer 3

Layer 4

𝜕𝐶 𝑙
𝑙 = 𝛿𝑖
𝜕𝑏𝑖

𝑧𝑖𝑙

𝑏𝑖𝑙
Backpropagation

 Doing the math for the 𝜎′

′
𝑑𝜎 ′ 1
= −𝑧
= ( 1 + 𝑒 −𝑧 −1 )′ = −1(1 + 𝑒 −𝑧 )′ 1 + 𝑒 −𝑧 −2 =
𝑑𝑧 1+𝑒
′ −𝑧 −𝑧 −𝑧
1 −𝑧 𝑒 𝑒 1 𝑒
=− −𝑧 2
1 + 𝑒 −𝑧 ′ = − −𝑧 2
= −𝑧 2
= ∙
−𝑧 1 + 𝑒 −𝑧
=
1+𝑒 1 + 𝑒 1 + 𝑒 1 + 𝑒
1 + 𝑒 −𝑧 1
=𝜎 𝑧 ∙ − =𝜎 𝑧 ∙ 1−𝜎 𝑧
1 + 𝑒 −𝑧 1 + 𝑒 −𝑧

𝜎 ′ 𝑧𝑖𝑙 = 𝑦𝑖𝑙 (1 − 𝑦𝑖𝑙 )

Backpropagation

 Doing the math for 𝛿𝑖𝐿

1 𝐿 2
𝜕𝐶 𝑑 𝑡
𝑗2 𝑗 − 𝑦 𝑗 ′
∙ 𝜎 ′ 𝑧𝐿 = 𝑦𝑖𝐿 1 − 𝑦𝑖𝐿 ∙ = 𝑦 𝐿
1 − 𝑦 𝐿
𝑡 − 𝑦 𝐿
=
𝑖 𝑖 𝑖 𝑖 𝑖
𝜕𝑦𝑖𝐿 𝑑𝑦𝑖𝐿
=𝑦𝑖𝐿 1 − 𝑦𝑖𝐿 𝑡𝑖 − 𝑦𝑖𝐿 −1 = 𝑎𝑖𝐿 (1 − 𝑦𝑖𝐿 )(𝑦𝑖𝐿 − 𝑡𝑖 )

𝛿𝑖𝐿 = 𝑦𝑖𝐿 (1 − 𝑦𝑖𝐿 )(𝑦𝑖𝐿 − 𝑡𝑖 )

Backpropagation

 Putting it all together

0. Compute the error for the final layer:
𝛿𝑖𝐿 = 𝑦𝑖𝐿 1 − 𝑦𝑖𝐿 𝑦𝑖𝐿 − 𝑡𝑖

Process the layer below:

1. Compute the error for the previous layer:
𝛿𝑖𝑙 = 𝑦𝑖𝑙 (1 − 𝑦𝑖𝑙 ) 𝛿𝑖𝑙+1 ∙ 𝑤𝑖𝑘
𝑙+1

𝑘
2. Compute the gradient for the weights in the current layer:
𝜕𝐶
= 𝛿𝑖𝑙 𝑦 𝑙−1
𝜕𝑤𝑖𝑗

3. Compute the gradient for the biases in the current layers:

𝜕𝐶 𝑙
= 𝛿 𝑖
𝜕𝑏 𝑖
Repeat until we reach the input layer
A network to read digits
A network to read digits

 We will train a feed forward network, using the backpropagation algorithm

that can recognize handwritten digits

 The dataset can be downloaded from here: (:

http://deeplearning.net/data/mnist/mnist.pkl.gz)
A network to read digits

 Each image is 28x28 in size and is represented as a vector of 784 pixels,

each pixel having an intensity.

 We will use a network of 3 layers:

 784 for the input layer
 36 for the hidden layer
 10 for the output layer

Each neuron from the output layer will activate for a certain digit. The outputted
digit of the network will be given by the output neuron that has the highest
confidence. (the largest outputted activation)
A network to read digits

 Training info:
 The learning rate used is 𝜂 = 3.0
 Learning is performed using SGD with minibatch size of 10
 Training is performed for 30 iterations
 Training data consists of 50000 images.
 Test data consists of 10000 images

 Results:
 ANN identifies approx. 95% of the tested images
 A network made of perceptron (10), detects approx. 83%

(watch demo)
Questions & Discussion
Bibliography

http://neuralnetworksanddeeplearning.com/
Chris Bishop, “Neural Network for Pattern Recognition”
http://sebastianraschka.com/Articles/2015_singlelayer_neurons.html
http://deeplearning.net/

Breakout Play (Trend Following) - Trading Plan - Full (Sample)
91% (11)
Breakout Play (Trend Following) - Trading Plan - Full (Sample)
15 pages
Manual de Servicio de Analizador de Química Clínica
0% (1)
Manual de Servicio de Analizador de Química Clínica
516 pages
Mod 2.4,2.5,2.6 Architecture Design
No ratings yet
Mod 2.4,2.5,2.6 Architecture Design
20 pages
Computing Gradient Using Backpropagation: ZV0GDF798E
No ratings yet
Computing Gradient Using Backpropagation: ZV0GDF798E
5 pages
Unit 1
No ratings yet
Unit 1
72 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
Neural Networks and Fuzzy Systems: Multi-Layer Feed Forward Networks
No ratings yet
Neural Networks and Fuzzy Systems: Multi-Layer Feed Forward Networks
27 pages
Backpropagation Algorithm
No ratings yet
Backpropagation Algorithm
12 pages
Understanding Backpropagation Algorithm - Towards Data Science
No ratings yet
Understanding Backpropagation Algorithm - Towards Data Science
11 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
CS601 - Machine Learning - Unit 2 New
No ratings yet
CS601 - Machine Learning - Unit 2 New
56 pages
ML807 Distributed and Federated Learning Slides 2
No ratings yet
ML807 Distributed and Federated Learning Slides 2
211 pages
Artificial Neural Networks - Lect - 3
No ratings yet
Artificial Neural Networks - Lect - 3
16 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
68 pages
Mid 1 DL Notes
No ratings yet
Mid 1 DL Notes
15 pages
DL U-I Introduction Part-2
No ratings yet
DL U-I Introduction Part-2
48 pages
Lecture Notes 3 &4
No ratings yet
Lecture Notes 3 &4
35 pages
An Introduction To Mathematics Behind Neural Networks
No ratings yet
An Introduction To Mathematics Behind Neural Networks
5 pages
Module 3 Final
No ratings yet
Module 3 Final
88 pages
NN 2
No ratings yet
NN 2
12 pages
EELU ANN ITF309 Lecture 08 Spring 2023-2024-Sensitivity-Back-Propagation
No ratings yet
EELU ANN ITF309 Lecture 08 Spring 2023-2024-Sensitivity-Back-Propagation
39 pages
Module 3.docxaiml
No ratings yet
Module 3.docxaiml
20 pages
Back Propagation
No ratings yet
Back Propagation
29 pages
Lecture 40,41 BP Algorithm
No ratings yet
Lecture 40,41 BP Algorithm
11 pages
AI Unit II Lec Notes Deep Learning
No ratings yet
AI Unit II Lec Notes Deep Learning
64 pages
SJNanda - Neural Network
No ratings yet
SJNanda - Neural Network
43 pages
SJNanda Neural Network
No ratings yet
SJNanda Neural Network
43 pages
Neural Networks
No ratings yet
Neural Networks
10 pages
A Weight Decides How Much Influence The Input Will Have On The Output
No ratings yet
A Weight Decides How Much Influence The Input Will Have On The Output
1 page
Lecture 13.3 Classification ANN
No ratings yet
Lecture 13.3 Classification ANN
64 pages
Multilayer Perceptrons and Backpropagation Learning: 1 Some History
No ratings yet
Multilayer Perceptrons and Backpropagation Learning: 1 Some History
6 pages
Neural Network - Optimization DRAFT 3.11
No ratings yet
Neural Network - Optimization DRAFT 3.11
66 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
Neural Networks - Learning
No ratings yet
Neural Networks - Learning
26 pages
Tensorflow Keras Pytorch: Step 1: For Each Input, Multiply The Input Value X With Weights W
No ratings yet
Tensorflow Keras Pytorch: Step 1: For Each Input, Multiply The Input Value X With Weights W
6 pages
Advanced Machine Learning CIE
No ratings yet
Advanced Machine Learning CIE
13 pages
Machine Learning: Algorithms and Applications: (Continued)
No ratings yet
Machine Learning: Algorithms and Applications: (Continued)
17 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
19 pages
Multi Percept Ron
No ratings yet
Multi Percept Ron
14 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
14 pages
Assignment - 4
No ratings yet
Assignment - 4
24 pages
Wa0006.
No ratings yet
Wa0006.
70 pages
L6 Neural Network
No ratings yet
L6 Neural Network
57 pages
Topic 4 (Part 2) - NN Learning
No ratings yet
Topic 4 (Part 2) - NN Learning
92 pages
Unit 2 - ML
No ratings yet
Unit 2 - ML
18 pages
EPS-DL-Handout3-Build ANN From Scratch Basics
No ratings yet
EPS-DL-Handout3-Build ANN From Scratch Basics
25 pages
Unit 2
No ratings yet
Unit 2
38 pages
Annette Paper
No ratings yet
Annette Paper
7 pages
Lecture 10
No ratings yet
Lecture 10
155 pages
Experiment 1
No ratings yet
Experiment 1
15 pages
7 - Feedforward and Backpropagation
No ratings yet
7 - Feedforward and Backpropagation
55 pages
Chapter 5 Summary
No ratings yet
Chapter 5 Summary
5 pages
Multi Layer Perceptron Haykin
No ratings yet
Multi Layer Perceptron Haykin
50 pages
Module4 AI
No ratings yet
Module4 AI
12 pages
3EBX0 Lecture Notes Addendum
No ratings yet
3EBX0 Lecture Notes Addendum
10 pages
Neural Network
No ratings yet
Neural Network
97 pages
Neural Networks - III: ICT3212 - Introduction To Intelligence Systems COM3303 - Artificial Intelligence
No ratings yet
Neural Networks - III: ICT3212 - Introduction To Intelligence Systems COM3303 - Artificial Intelligence
44 pages
Neural Network
100% (1)
Neural Network
54 pages
SJNanda Neural Network
No ratings yet
SJNanda Neural Network
47 pages
ANN Notes Updated
0% (1)
ANN Notes Updated
46 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Florin Olariu: "Alexandru Ioan Cuza", University of Iași Department of Computer Science
No ratings yet
Florin Olariu: "Alexandru Ioan Cuza", University of Iași Department of Computer Science
45 pages
Florin Olariu: "Alexandru Ioan Cuza", University of Iași Department of Computer Science
No ratings yet
Florin Olariu: "Alexandru Ioan Cuza", University of Iași Department of Computer Science
25 pages
Florin Olariu: "Alexandru Ioan Cuza", University of Iași Department of Computer Science
No ratings yet
Florin Olariu: "Alexandru Ioan Cuza", University of Iași Department of Computer Science
50 pages
Florin Olariu: "Alexandru Ioan Cuza", University of Iași Department of Computer Science
No ratings yet
Florin Olariu: "Alexandru Ioan Cuza", University of Iași Department of Computer Science
117 pages
Neural Networks: 1 October, 2016
No ratings yet
Neural Networks: 1 October, 2016
51 pages
Florin Olariu: "Alexandru Ioan Cuza", University of Iași Department of Computer Science
No ratings yet
Florin Olariu: "Alexandru Ioan Cuza", University of Iași Department of Computer Science
78 pages
Florin Olariu: "Alexandru Ioan Cuza", University of Iași Department of Computer Science
No ratings yet
Florin Olariu: "Alexandru Ioan Cuza", University of Iași Department of Computer Science
51 pages
Neural Networks: 10 January, 2017
No ratings yet
Neural Networks: 10 January, 2017
74 pages
Florin Olariu: "Alexandru Ioan Cuza", University of Iași Department of Computer Science
No ratings yet
Florin Olariu: "Alexandru Ioan Cuza", University of Iași Department of Computer Science
48 pages
Florin Olariu: "Alexandru Ioan Cuza", University of Iași Department of Computer Science
No ratings yet
Florin Olariu: "Alexandru Ioan Cuza", University of Iași Department of Computer Science
45 pages
Florin Olariu: "Alexandru Ioan Cuza", University of Iași Department of Computer Science
No ratings yet
Florin Olariu: "Alexandru Ioan Cuza", University of Iași Department of Computer Science
44 pages
Curs4site PDF
No ratings yet
Curs4site PDF
44 pages
Curs6site PDF
No ratings yet
Curs6site PDF
40 pages
Curs11 PDF
No ratings yet
Curs11 PDF
41 pages
Florin Olariu & Andrei Arusoaie: "Alexandru Ioan Cuza", University of Iași Department of Computer Science
No ratings yet
Florin Olariu & Andrei Arusoaie: "Alexandru Ioan Cuza", University of Iași Department of Computer Science
20 pages
Crash Report 4 29 1008
No ratings yet
Crash Report 4 29 1008
32 pages
Workflow.: Modeling, Verification, Security
No ratings yet
Workflow.: Modeling, Verification, Security
5 pages
Graph Theory Chap
No ratings yet
Graph Theory Chap
12 pages
Curs7 PDF
No ratings yet
Curs7 PDF
46 pages
Curs5site PDF
No ratings yet
Curs5site PDF
47 pages
Real-Time GraphQL - Tech9
No ratings yet
Real-Time GraphQL - Tech9
57 pages
8-Star-Choosability of A Graph With Maximum Average Degree Less Than 3
No ratings yet
8-Star-Choosability of A Graph With Maximum Average Degree Less Than 3
14 pages
Vina Screen Local
No ratings yet
Vina Screen Local
1 page
Rsa - TCR PDF
No ratings yet
Rsa - TCR PDF
89 pages
Egyptian Heaven and Hell Volume II
No ratings yet
Egyptian Heaven and Hell Volume II
314 pages
Pengaruh Lingkungan Kos-Kosan Terhadap Motivasi Belajar Mahasiswa Stakpn Ambon
No ratings yet
Pengaruh Lingkungan Kos-Kosan Terhadap Motivasi Belajar Mahasiswa Stakpn Ambon
14 pages
Acitivity Sheet Economics
100% (1)
Acitivity Sheet Economics
10 pages
Do Hhjmbfujhfddgbkod
No ratings yet
Do Hhjmbfujhfddgbkod
1 page
Ft-757gx2 User Hb9fax
No ratings yet
Ft-757gx2 User Hb9fax
37 pages
Bamboo Art: Terracotta
No ratings yet
Bamboo Art: Terracotta
2 pages
The Trade - Offs of ChatGPT To Filipino Freelance Content Writers A Diffusion of Innovation Theory Perspective
No ratings yet
The Trade - Offs of ChatGPT To Filipino Freelance Content Writers A Diffusion of Innovation Theory Perspective
7 pages
12 Reach Dealer Parts
No ratings yet
12 Reach Dealer Parts
185 pages
Mahaveer Price List
No ratings yet
Mahaveer Price List
6 pages
2012apr TDM Fortier
No ratings yet
2012apr TDM Fortier
19 pages
Price-Rexroth Hydraulics Division
78% (9)
Price-Rexroth Hydraulics Division
512 pages
IELTS Simon Speaking Part 3 9dee133876
No ratings yet
IELTS Simon Speaking Part 3 9dee133876
37 pages
Diversity of Life Practice Final Exam
No ratings yet
Diversity of Life Practice Final Exam
4 pages
1 RIYA Immmm
No ratings yet
1 RIYA Immmm
60 pages
Applied Sciences: Fficiency Analysis of Manufacturing Line With
No ratings yet
Applied Sciences: Fficiency Analysis of Manufacturing Line With
15 pages
DSP For MATLAB & LabVIEW I Fundamentals of Discrete Signal Processing
100% (1)
DSP For MATLAB & LabVIEW I Fundamentals of Discrete Signal Processing
233 pages
UCSP 1st Q Budget Work
No ratings yet
UCSP 1st Q Budget Work
1 page
Esfuerzos en Vigas - PDF
No ratings yet
Esfuerzos en Vigas - PDF
6 pages
One-Way ANOVA: (Independent Group and Repeated Measures)
No ratings yet
One-Way ANOVA: (Independent Group and Repeated Measures)
36 pages
Activity 1 BRS NSC Mar 2017 Cheques Out)
No ratings yet
Activity 1 BRS NSC Mar 2017 Cheques Out)
1 page
University of Okara: Advertisement No. 2/2020
No ratings yet
University of Okara: Advertisement No. 2/2020
3 pages
Pleuropulmonary Infections
No ratings yet
Pleuropulmonary Infections
40 pages
Unit 1 AP World History Powerpoint
No ratings yet
Unit 1 AP World History Powerpoint
55 pages
Cardiosync Corporate Business Plan
No ratings yet
Cardiosync Corporate Business Plan
7 pages
Onga'nya 24
No ratings yet
Onga'nya 24
23 pages
Carrera, Elma A. Regular Safety Checks of Indoor and Outdoor Environments 02 Kitchen
No ratings yet
Carrera, Elma A. Regular Safety Checks of Indoor and Outdoor Environments 02 Kitchen
7 pages
Agstar Technical Series:: Complete Mix Digesters
No ratings yet
Agstar Technical Series:: Complete Mix Digesters
2 pages
Worksheet KTSP - Kelas 7
No ratings yet
Worksheet KTSP - Kelas 7
31 pages

Curs3site PDF

Uploaded by

Curs3site PDF

Uploaded by

18 October, 2016

 Feed Forward Network Architecture

A network to read digits

 Composed of at least 3 layers:

 Why not use perceptron?

 Why not use the perceptron

w=all weights in the network b = all biases in the network

 Why use Mean Square Error ?

 A function with multiple variables have multiple derivatives:

 If x changes with Δ𝑥, y with Δ𝑦 and z with Δ𝑧 then:

 Minimizing the Cost function:

If we move in the direction v1 by

 Minimizing the Cost function:

Where 𝜂 is a small number, called learning rate

 Minimizing the Cost function:

 Repeating the above step multiple times drives us to a local minimum

The activation function:

We will use the mean square error as the cost functions

Remember the chain rule:

3. For a fixed number of iterations:

 Wewill usually be performing stochastic gradient descent (mini-

 The idea is very simple:

𝑠𝑖𝑧𝑒 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑑𝑎𝑡𝑎

Layer 2 Layer 3 𝑦43

 How backpropagation algorithm works:

 Compute how the cost function depends on a weight

 Compute how the cost function depends on a bias

 Doing the math for the 𝜎′

𝜎 ′ 𝑧𝑖𝑙 = 𝑦𝑖𝑙 (1 − 𝑦𝑖𝑙 )

 Doing the math for 𝛿𝑖𝐿

𝛿𝑖𝐿 = 𝑦𝑖𝐿 (1 − 𝑦𝑖𝐿 )(𝑦𝑖𝐿 − 𝑡𝑖 )

 Putting it all together

Process the layer below:

3. Compute the gradient for the biases in the current layers:

 We will train a feed forward network, using the backpropagation algorithm

 The dataset can be downloaded from here: (:

 Each image is 28x28 in size and is represented as a vector of 784 pixels,

 We will use a network of 3 layers:

You might also like