0% found this document useful (0 votes)
15 views80 pages

ML.8-Neural Networks - Deep Learning (Week 12,13)

Uploaded by

Sơn Trịnh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views80 pages

ML.8-Neural Networks - Deep Learning (Week 12,13)

Uploaded by

Sơn Trịnh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

Nhân bản – Phụng sự – Khai phóng

Chapter 8

Neural Networks & Deep Learning


Machine Learning
CONTENTS

• Perceptron

• Neural networks

• Gradient descent

• Backpropagation
CONTENTS

• Perceptron
• Neural networks

• Gradient descent

• Backpropagation
Perceptron

1950s Age of the Perceptron


1957 The Perceptron (Rosenblatt)
1969 Perceptrons (Minsky, Papert)

1980s Age of the Neural Network


1986 Back propagation (Hinton)

1990s Age of the Graphical Model


2000s Age of the Support Vector Machine

2010s Age of the Deep Network

Deep Learning = Known algorithms + Computing power + Big data


…Perceptron
Inspiration from Biology

• Neural nets/perceptrons are loosely inspired by biology.


• But they certainly are not a model of how the brain works,
or even how neurons work.
…Perceptron

N-d binary vector

perceptron is just one line of code!


sign of zero is +1
…Perceptron

initialized w = 0
…Perceptron

observation (1,-1)
label -1
…Perceptron

observation (1,-1)
label -1

=1
…Perceptron

observation (1,-1)
label -1
…Perceptron

update w

observation (1,-1)
label -1
…Perceptron

update w
no match!

(-1,1) (0,0) -1 (1,-1) 1

observation (1,-1)
label -1
…Perceptron

observation (-1,1)
label +1

(-1,1)
…Perceptron

(-1,1) (-1,1)
=1

observation (-1,1)
label +1

(-1,1)
…Perceptron

(-1,1) (-1,1)
=1

observation (-1,1)
label +1

(-1,1)
…Perceptron
update w
match!

(-1,1) (-1,1) +1 (-1,1) 0

observation (-1,1)
label +1
update w
…Perceptron
…Perceptron

update w
…Perceptron

update w
…Perceptron
…Perceptron

update w
…Perceptron

repeat …
…Perceptron
Another way to draw it…

weights (1) Combine the sum and


activation function

inputs output

Activation Function
(e.g., Sigmoid function of weighted sum)

(2) suppress the bias term


(less clutter)
…Perceptron
Programming the 'forward pass'
Activation function (sigmoid, logistic function)
float f(float a){
return 1.0 / (1.0+ exp(-a));
}

output

Perceptron function (logistic regression)


float perceptron(vector<float> x, vector<float> w){
float a = dot(x,w);
return f(a);
}
CONTENTS

• Perceptron

• Neural networks
• Gradient descent

• Backpropagation

• Stochastic gradient descent

25
Neural networks

Connect a bunch of perceptrons together …

a collection of connected perceptrons

‘six perceptrons’
Neural networks

Some terminology…

‘hidden’ layer
‘input’ layer ‘output’ layer

…also called a Multi-layer Perceptron (MLP)


Neural networks
this layer is a
‘fully connected layer’

all pairwise neurons between layers are connected


Neural networks

How many neurons (perceptrons)? 4+2=6


How many weights (edges)? (3 x 4) + (4 x 2) = 20

How many learnable parameters total? 20 + 6 = 26


Neural networks

performance usually tops out at 2-3 layers,


deeper networks don’t really improve performance...

...with the exception of Convolutional Neural Networks for images


CONTENTS

• Perceptron.

• Neural networks.

• Gradient descent
• Backpropagation.

• Stochastic gradient descent.

31
Gradient descent

Loss Function: defines what is means to be close to the true solution


chose the loss function! (some are better than others depending on what you want to do)

Squared Error
(a popular loss function)
Gradient descent

Loss Function: defines what is means to be close to the true solution


chose the loss function! (some are better than others depending on what you want to do)
Gradient descent
world’s smallest perceptron!

(a.k.a. line equation, linear regression)

Given several examples

and a perceptron

Modify weight such that gets ‘closer’ to

perceptron perceptron true


parameter output label
…Gradient descent

Code to train perceptron:

just one line of code!

Now where does this come from?


…Gradient descent

update rule:
CONTENTS

• Perceptron.

• Neural networks.

• Gradient descent.

• Backpropagation.
• Stochastic gradient descent.

37
Backpropagation

 function of ONE parameter!

Training the world’s smallest perceptron


This is just gradient
descent, that means…

this should be the gradient


of the loss function

Now where does this come from?


Backpropagation

…is the rate at which this will change…

the loss function

… per unit change of this

the weight parameter

Let’s compute the derivative…


Backpropagation

Compute the derivative

just shorthand

That means the weight update for gradient descent is:


move in direction of negative gradient
Backpropagation

Gradient Descent (world’s smallest perceptron)

For each sample

1. Predict

a. Forward pass

b. Compute Loss

2. Update

a. Back Propagation

b. Gradient update
Backpropagation

world’s (second) smallest perceptron!

function of two parameters!


Backpropagation
Derivative computation

Gradient Update
Backpropagation

Gradient Descent

For each sample

1. Predict

a. Forward pass
(side computation to track loss.
b. Compute Loss not needed for backprop)
two lines now
2. Update

a. Back Propagation

b. Gradient update
(adjustable step size)
Backpropagation

multi-layer perceptron

function of FOUR parameters and FOUR layers!


Backpropagation

sum activation activation activation


input weight weight weight

input hidden hidden output


layer 1 layer 2 layer 3 layer 4
Backpropagation

sum activation activation activation


input weight weight weight

input hidden hidden output


layer 1 layer 2 layer 3 layer 4
Backpropagation

sum activation activation activation


input weight weight weight

input hidden hidden output


layer 1 layer 2 layer 3 layer 4
Backpropagation

sum activation activation activation


input weight weight weight

input hidden hidden output


layer 1 layer 2 layer 3 layer 4
Backpropagation

sum activation activation activation


input weight weight weight

input hidden hidden output


layer 1 layer 2 layer 3 layer 4
Backpropagation

sum activation activation activation


input weight weight weight

input hidden hidden output


layer 1 layer 2 layer 3 layer 4
Backpropagation

sum activation activation activation


input weight weight weight

input hidden hidden output


layer 1 layer 2 layer 3 layer 4
Backpropagation

sum activation activation activation


input weight weight weight

input hidden hidden output


layer 1 layer 2 layer 3 layer 4
Backpropagation

sum activation activation activation


input weight weight weight

input hidden hidden output


layer 1 layer 2 layer 3 layer 4
Backpropagation

Entire network can be written out as one long equation

We need to train the network:


What is known? What is unknown?
Backpropagation

Entire network can be written out as a long equation

known

We need to train the network:

What is known? What is unknown?


Backpropagation

Entire network can be written out as a long equation

activation function
sometimes has unknown
parameters
We need to train the network:

What is known? What is unknown?


Backpropagation

Learning an MLP

Given a set of samples and a MLP

Estimate the parameters of the MLP


Backpropagation

Gradient Descent

For each random sample

1. Predict

a. Forward pass

b. Compute Loss

2. Update
vector of parameter partial derivatives
a. Back Propagation

b. Gradient update
vector of parameter update equations
Backpropagation

So we need to compute the partial derivatives


Backpropagation

According to the chain rule…

Intuitively, the effect of weight on loss function :

rest of the network

depends on
depends on
depends on
Backpropagation

rest of the network

Chain Rule!
Backpropagation

rest of the network

Just the partial


derivative of L2 loss
Backpropagation

rest of the network

Let’s use a Sigmoid function


Backpropagation

rest of the network

Let’s use a Sigmoid function


Backpropagation

rest of the network


Backpropagation
Backpropagation

already computed.
re-use (propagate)!
The Chain Rule

a.k.a. backpropagation
The chain rule says…

depends on

depends on depends on depends on depends on depends on


depends on
The chain rule says…

depends on

depends on depends on depends on depends on depends on


depends on

already computed.
re-use (propagate)!
depends on

depends on depends on depends on depends on depends on


depends on
depends on

depends on depends on depends on depends on depends on


depends on
depends on

depends on depends on depends on depends on depends on


depends on
Gradient Descent

For each example sample

1. Predict

a. Forward pass

b. Compute Loss

2. Update

a. Back Propagation

b. Gradient update
Gradient Descent

For each example sample

1. Predict

a. Forward pass

b. Compute Loss

2. Update

a. Back Propagation
vector of parameter partial derivatives

b. Gradient update
vector of parameter update equations
SUMMARY

• Perceptron

• Neural networks

• Gradient descent

• Backpropagation

Computer Vision 77
MNIST database
Experiments with the MNIST database
• The MNIST database of handwritten digits
• Training set of 60,000 examples, test set of 10,000 examples
• Vectors in 𝑅784 (28x28 images)
• Labels are the digits they represent
• Various methods have been tested with this training set and test set

• Linear models: 7% - 12% error


• KNN: 0.5%- 5% error
• Neural networks: 0.35% - 4.7% error
• Convolutional NN: 0.23% - 1.7% error
78
Demo
Tinker With a Neural Network Right Here in Your Browser
• Open source software to play with neural networks in your browser.
• The dots are colored orange or blue for positive and negative examples.
• It’s possible to choose the activation function, architecture, rate etc.
• Very well done! Let’s check it out!

Artificial Intelligence 79
Nhân bản – Phụng sự – Khai phóng

Enjoy the Course…!

Machine Learning 80

You might also like