ANN - Back Propagation
ANN - Back Propagation
1
Decision Boundary
• 0 hidden layers: linear classifier
– Hyperplanes
x1 x2
2
Decision Boundary
• 1 hidden layer
– Boundary of convex region (open or closed)
x1 x2
3
Decision Boundary
y
• 2 hidden layers
– Combinations of convex regions
x1 x2
4
Decision Different Levels of
Functions Abstraction
• We don’t know
the “right”
levels of
abstraction
• So let the model
figure it out!
5
Example from Honglak Lee (NIPS 2010)
Decision Different Levels of
Functions Abstraction
Face Recognition:
– Deep Network
can build up
increasingly
higher levels of
abstraction
– Lines, parts,
regions
6
Example from Honglak Lee (NIPS 2010)
Decision Different Levels of
Functions Abstraction
Output y
c1 c2 … cF
Hidden Layer 3
b1 b2 … bE
Hidden Layer 2
a1 a2 … aD
Hidden Layer 1
x1 x2 x3 … xM
Input
7
Example from Honglak Lee (NIPS 2010)
Neural Network Architectures
Even for a basic Neural Network, there are
many design decisions to make:
1. # of hidden layers (depth)
2. # of units per hidden layer (width)
3. Type of activation function (nonlinearity)
4. Form of objective function
8
Activation Functions
Sigmoid / Logistic Function So far, we’ve
assumed that the
activation function
(nonlinearity) is
always the sigmoid
function…
9
Activation Functions
• A new change: modifying the nonlinearity
– The logistic is not widely used in modern ANNs
Alternate 1:
tanh
11
Multi-Class Output
Output y1 … yK
Hidden Layer a1 a2 … aD
Input x1 x2 x3 … xM
12
Training Backpropagation
• Question 1:
When can we compute the gradients of the
parameters of an arbitrary neural network?
• Question 2:
When can we make the gradient
computation efficient?
13
Training Chain Rule
Given:
Chain Rule:
y1
u1 u2 … uJ
x2
14
Training Chain Rule
Given:
Chain Rule:
y1
Backpropagation
…
is just repeated u1 u2 uJ
application of the
chain rule from
Calculus 101. x2
15
Training Backpropagation
16
Training Backpropagation
17
Training Backpropagation
Output y
Case 1:
Logistic θ1 θ2 θ3 θM
Regression
x1 x2 x3 … xM
Input
18
Training Backpropagation
Output y
z1 z2 … zD
Hidden Layer
x1 x2 x3 … xM
Input
19
Training Backpropagation
Output y
z1 z2 … zD
Hidden Layer
x1 x2 x3 … xM
Input
20
Training Backpropagation
Case 2:
Neural
Network
y
z z z
…
1 2 D
x x x x
…
1 2 3 M
21
Summary
1. Neural Networks…
– provide a way of learning features
– are highly nonlinear prediction functions
– (can be) a highly parallel network of logistic
regression classifiers
– discover useful hidden representations of the
input
2. Backpropagation…
– provides an efficient way to compute gradients
– is a special case of reverse-mode automatic
differentiation
22