Winter1516 Lecture52
Winter1516 Lecture52
recognized
letters of the alphabet
update rule:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 21 20 Jan 2016
A bit of history
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 22 20 Jan 2016
A bit of history
recognizable maths
Reinvigorated research in
Deep Learning
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 24 20 Jan 2016
First strong results
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 25 20 Jan 2016
Overview
1. One time setup
activation functions, preprocessing, weight
initialization, regularization, gradient checking
2. Training dynamics
babysitting the learning process,
parameter updates, hyperparameter optimization
3. Evaluation
model ensembles
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 26 20 Jan 2016
Activation Functions
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 27 20 Jan 2016
Activation Functions
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 28 20 Jan 2016
Leaky ReLU
Activation Functions max(0.1x, x)
Sigmoid
Maxout
tanh tanh(x)
ELU
ReLU max(0,x)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 29 20 Jan 2016
Activation Functions
- Squashes numbers to range [0,1]
- Historically popular since they
have nice interpretation as a
saturating “firing rate” of a neuron
Sigmoid
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 30 20 Jan 2016
Activation Functions
- Squashes numbers to range [0,1]
- Historically popular since they
have nice interpretation as a
saturating “firing rate” of a neuron
3 problems:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 31 20 Jan 2016
x sigmoid
gate
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 32 20 Jan 2016
Activation Functions
- Squashes numbers to range [0,1]
- Historically popular since they
have nice interpretation as a
saturating “firing rate” of a neuron
3 problems:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 33 20 Jan 2016
Consider what happens when the input to a neuron (x)
is always positive:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 34 20 Jan 2016
Consider what happens when the input to a neuron is
always positive... allowed
gradient
update
directions
hypothetical
What can we say about the gradients on w? optimal w
vector
Always all positive or all negative :(
(this is also why you want zero-mean data!)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 35 20 Jan 2016
Activation Functions
- Squashes numbers to range [0,1]
- Historically popular since they
have nice interpretation as a
saturating “firing rate” of a neuron
3 problems:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 36 20 Jan 2016
Activation Functions
tanh(x)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 37 20 Jan 2016
- Computes f(x) = max(0,x)
Activation Functions
- Does not saturate (in +region)
- Very computationally efficient
- Converges much faster than
sigmoid/tanh in practice (e.g. 6x)
ReLU
(Rectified Linear Unit)
[Krizhevsky et al., 2012]
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 38 20 Jan 2016
- Computes f(x) = max(0,x)
Activation Functions
- Does not saturate (in +region)
- Very computationally efficient
- Converges much faster than
sigmoid/tanh in practice (e.g. 6x)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 39 20 Jan 2016
x ReLU
gate
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 40 20 Jan 2016