ML.8-Neural Networks - Deep Learning (Week 12,13)
ML.8-Neural Networks - Deep Learning (Week 12,13)
Chapter 8
• Perceptron
• Neural networks
• Gradient descent
• Backpropagation
CONTENTS
• Perceptron
• Neural networks
• Gradient descent
• Backpropagation
Perceptron
initialized w = 0
…Perceptron
observation (1,-1)
label -1
…Perceptron
observation (1,-1)
label -1
=1
…Perceptron
observation (1,-1)
label -1
…Perceptron
update w
observation (1,-1)
label -1
…Perceptron
update w
no match!
observation (1,-1)
label -1
…Perceptron
observation (-1,1)
label +1
(-1,1)
…Perceptron
(-1,1) (-1,1)
=1
observation (-1,1)
label +1
(-1,1)
…Perceptron
(-1,1) (-1,1)
=1
observation (-1,1)
label +1
(-1,1)
…Perceptron
update w
match!
observation (-1,1)
label +1
update w
…Perceptron
…Perceptron
update w
…Perceptron
update w
…Perceptron
…Perceptron
update w
…Perceptron
repeat …
…Perceptron
Another way to draw it…
inputs output
Activation Function
(e.g., Sigmoid function of weighted sum)
output
• Perceptron
• Neural networks
• Gradient descent
• Backpropagation
25
Neural networks
‘six perceptrons’
Neural networks
Some terminology…
‘hidden’ layer
‘input’ layer ‘output’ layer
• Perceptron.
• Neural networks.
• Gradient descent
• Backpropagation.
31
Gradient descent
Squared Error
(a popular loss function)
Gradient descent
and a perceptron
update rule:
CONTENTS
• Perceptron.
• Neural networks.
• Gradient descent.
• Backpropagation.
• Stochastic gradient descent.
37
Backpropagation
just shorthand
1. Predict
a. Forward pass
b. Compute Loss
2. Update
a. Back Propagation
b. Gradient update
Backpropagation
Gradient Update
Backpropagation
Gradient Descent
1. Predict
a. Forward pass
(side computation to track loss.
b. Compute Loss not needed for backprop)
two lines now
2. Update
a. Back Propagation
b. Gradient update
(adjustable step size)
Backpropagation
multi-layer perceptron
known
activation function
sometimes has unknown
parameters
We need to train the network:
Learning an MLP
Gradient Descent
1. Predict
a. Forward pass
b. Compute Loss
2. Update
vector of parameter partial derivatives
a. Back Propagation
b. Gradient update
vector of parameter update equations
Backpropagation
depends on
depends on
depends on
Backpropagation
Chain Rule!
Backpropagation
already computed.
re-use (propagate)!
The Chain Rule
a.k.a. backpropagation
The chain rule says…
depends on
depends on
already computed.
re-use (propagate)!
depends on
1. Predict
a. Forward pass
b. Compute Loss
2. Update
a. Back Propagation
b. Gradient update
Gradient Descent
1. Predict
a. Forward pass
b. Compute Loss
2. Update
a. Back Propagation
vector of parameter partial derivatives
b. Gradient update
vector of parameter update equations
SUMMARY
• Perceptron
• Neural networks
• Gradient descent
• Backpropagation
Computer Vision 77
MNIST database
Experiments with the MNIST database
• The MNIST database of handwritten digits
• Training set of 60,000 examples, test set of 10,000 examples
• Vectors in 𝑅784 (28x28 images)
• Labels are the digits they represent
• Various methods have been tested with this training set and test set
Artificial Intelligence 79
Nhân bản – Phụng sự – Khai phóng
Machine Learning 80