0% found this document useful (0 votes)

15 views20 pages

Winter1516 Lecture53

Uploaded by

rsethi3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views20 pages

Winter1516 Lecture53

Uploaded by

rsethi3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

active ReLU

DATA CLOUD

dead ReLU
will never activate
=> never update
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 41 20 Jan 2016
active ReLU
DATA CLOUD

=> people like to initialize

ReLU neurons with slightly dead ReLU
positive biases (e.g. 0.01) will never activate
=> never update
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 42 20 Jan 2016
[Mass et al., 2013]
Activation Functions [He et al., 2015]

- Does not saturate

- Computationally efficient
- Converges much faster than
sigmoid/tanh in practice! (e.g. 6x)
- will not “die”.

Leaky ReLU

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 43 20 Jan 2016
[Mass et al., 2013]
Activation Functions [He et al., 2015]

- Does not saturate

- Computationally efficient
- Converges much faster than
sigmoid/tanh in practice! (e.g. 6x)
- will not “die”.

Parametric Rectifier (PReLU)

Leaky ReLU

backprop into \alpha

(parameter)

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 44 20 Jan 2016
[Clevert et al., 2015]
Activation Functions
Exponential Linear Units (ELU)

- All benefits of ReLU

- Does not die
- Closer to zero mean outputs

- Computation requires exp()

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 45 20 Jan 2016
[Goodfellow et al., 2013]
Maxout “Neuron”
- Does not have the basic form of dot product ->
nonlinearity
- Generalizes ReLU and Leaky ReLU
- Linear Regime! Does not saturate! Does not die!

Problem: doubles the number of parameters/neuron :(

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 46 20 Jan 2016
TLDR: In practice:

- Use ReLU. Be careful with your learning rates

- Try out Leaky ReLU / Maxout / ELU
- Try out tanh but don’t expect much
- Don’t use sigmoid

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 47 20 Jan 2016
Data Preprocessing

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 48 20 Jan 2016
Step 1: Preprocess the data

(Assume X [NxD] is data matrix,

each example in a row)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 49 20 Jan 2016
Step 1: Preprocess the data
In practice, you may also see PCA and Whitening of the data

(data has diagonal (covariance matrix is the

covariance matrix) identity matrix)

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 50 20 Jan 2016
TLDR: In practice for Images: center only
e.g. consider CIFAR-10 example with [32,32,3] images
- Subtract the mean image (e.g. AlexNet)
(mean image = [32,32,3] array)
- Subtract per-channel mean (e.g. VGGNet)
(mean along each channel = 3 numbers)

Not common to normalize

variance, to do PCA or
whitening

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 51 20 Jan 2016
Weight Initialization

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 52 20 Jan 2016
- Q: what happens when W=0 init is used?

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 53 20 Jan 2016
- First idea: Small random numbers
(gaussian with zero mean and 1e-2 standard deviation)

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 54 20 Jan 2016
- First idea: Small random numbers
(gaussian with zero mean and 1e-2 standard deviation)

Works ~okay for small networks, but can lead to

non-homogeneous distributions of activations
across the layers of a network.

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 55 20 Jan 2016
Lets look at
some
activation
statistics

E.g. 10-layer net with

500 neurons on each
layer, using tanh non-
linearities, and
initializing as
described in last slide.

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 56 20 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 57 20 Jan 2016
All activations
become zero!
Q: think about the
backward pass.
What do the
gradients look like?

Hint: think about backward

pass for a W*X gate.

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 58 20 Jan 2016
Almost all neurons
*1.0 instead of *0.01
completely
saturated, either -1
and 1. Gradients
will be all zero.

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 59 20 Jan 2016
“Xavier initialization”
[Glorot et al., 2010]

Reasonable initialization.
(Mathematical derivation
assumes linear activations)

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 60 20 Jan 2016

Caterpillar 994F Wheel Loader: Venue Date
100% (2)
Caterpillar 994F Wheel Loader: Venue Date
97 pages
Insolvency and Bankruptcy Code 2016
No ratings yet
Insolvency and Bankruptcy Code 2016
46 pages
Training Neural Networks
No ratings yet
Training Neural Networks
109 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
Winter1516 Lecture52
No ratings yet
Winter1516 Lecture52
20 pages
Winter1516 Lecture54
No ratings yet
Winter1516 Lecture54
20 pages
CS490 Advanced Topics in Computing (Deep Learning)
No ratings yet
CS490 Advanced Topics in Computing (Deep Learning)
37 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
19EEE362:Deep Learning For Visual Computing: Dr.T.Ananthan
No ratings yet
19EEE362:Deep Learning For Visual Computing: Dr.T.Ananthan
23 pages
Module 2
No ratings yet
Module 2
13 pages
9.b Handout-4-Activation Functions
No ratings yet
9.b Handout-4-Activation Functions
4 pages
Lecture 6 Part 2
No ratings yet
Lecture 6 Part 2
136 pages
Deep Learning Module-02 Search Creators
No ratings yet
Deep Learning Module-02 Search Creators
15 pages
Deep Learning
No ratings yet
Deep Learning
78 pages
SS 2020 Solutions
No ratings yet
SS 2020 Solutions
22 pages
Instructor's Solution Manual For Neural Networks
No ratings yet
Instructor's Solution Manual For Neural Networks
40 pages
Instructor Solution Manual To Neural Networks and Deep Learning A Textbook Solutions 3319944622 9783319944623 - Compress
No ratings yet
Instructor Solution Manual To Neural Networks and Deep Learning A Textbook Solutions 3319944622 9783319944623 - Compress
40 pages
Lecture 4
No ratings yet
Lecture 4
146 pages
The Functions of Deep Learning: Gilbert Strang
No ratings yet
The Functions of Deep Learning: Gilbert Strang
1 page
Short Course Machine Learning F de Vuyst 1715052496
No ratings yet
Short Course Machine Learning F de Vuyst 1715052496
74 pages
Lecture 2.1.2activation Function
No ratings yet
Lecture 2.1.2activation Function
15 pages
4 - DNN Tip
No ratings yet
4 - DNN Tip
52 pages
06 AIS302 ANN Backpropagation
No ratings yet
06 AIS302 ANN Backpropagation
83 pages
Neural Network - p3
No ratings yet
Neural Network - p3
218 pages
CS230 Midterm Solutions Fall 2021
No ratings yet
CS230 Midterm Solutions Fall 2021
14 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
02 Neural Networks
No ratings yet
02 Neural Networks
32 pages
Optimization of Deep Networks
No ratings yet
Optimization of Deep Networks
84 pages
What Are The Activation Functions, How Do I Deter...
No ratings yet
What Are The Activation Functions, How Do I Deter...
3 pages
Lecture 2 - GD Linear Regression
No ratings yet
Lecture 2 - GD Linear Regression
28 pages
WS 2021 Solutions
No ratings yet
WS 2021 Solutions
16 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
Winter1516 Lecture55
No ratings yet
Winter1516 Lecture55
22 pages
Lecture 4 PDF
No ratings yet
Lecture 4 PDF
169 pages
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
No ratings yet
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
23 pages
ML PPT Activation Functions
No ratings yet
ML PPT Activation Functions
12 pages
Unit Ii DNN
No ratings yet
Unit Ii DNN
24 pages
SP18 Practice Midterm
No ratings yet
SP18 Practice Midterm
5 pages
Training Neural
No ratings yet
Training Neural
16 pages
cs231n 2018 Midterm Review-2 PDF
No ratings yet
cs231n 2018 Midterm Review-2 PDF
86 pages
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
No ratings yet
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
12 pages
ANN Notes
No ratings yet
ANN Notes
7 pages
DL Quiz1
No ratings yet
DL Quiz1
5 pages
DL M2 Tech
No ratings yet
DL M2 Tech
32 pages
DL Exam 2023-2
No ratings yet
DL Exam 2023-2
5 pages
SS 2020
No ratings yet
SS 2020
21 pages
ML Lec-22
No ratings yet
ML Lec-22
25 pages
Different Activation Functions With The Equations
No ratings yet
Different Activation Functions With The Equations
6 pages
Training Deep Neural Networks
No ratings yet
Training Deep Neural Networks
55 pages
Mcculloh: Linear Activation Function
No ratings yet
Mcculloh: Linear Activation Function
12 pages
Lecture 7
No ratings yet
Lecture 7
138 pages
Perceptron in Machine Learning
No ratings yet
Perceptron in Machine Learning
11 pages
WS 2021
No ratings yet
WS 2021
16 pages
CS 182 Berkeley 2021 Discussion 4
No ratings yet
CS 182 Berkeley 2021 Discussion 4
7 pages
Need and Use of Activation Functions in Anndeep Learning
No ratings yet
Need and Use of Activation Functions in Anndeep Learning
7 pages
4-Neural Networks and Activation Function
No ratings yet
4-Neural Networks and Activation Function
28 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
6.3 HiddenUnits
No ratings yet
6.3 HiddenUnits
26 pages
Curs7 PDF
No ratings yet
Curs7 PDF
46 pages
DL Exp-3 16010422230
No ratings yet
DL Exp-3 16010422230
9 pages
Urban Flibk INVOICE
No ratings yet
Urban Flibk INVOICE
1 page
Forex Trading D
No ratings yet
Forex Trading D
3 pages
Reflection Task #2
No ratings yet
Reflection Task #2
2 pages
I, Hereby Declare That The Research Work Presented in The Summer Training Based Project Report Entitled, Study of Compotators of Frooti Juice
No ratings yet
I, Hereby Declare That The Research Work Presented in The Summer Training Based Project Report Entitled, Study of Compotators of Frooti Juice
98 pages
U00 Syllabus 1
No ratings yet
U00 Syllabus 1
55 pages
ANNEX C 2016 DRRMS SCHOOL MONITORING TOOL FOR Preparedness Response and Reh
No ratings yet
ANNEX C 2016 DRRMS SCHOOL MONITORING TOOL FOR Preparedness Response and Reh
5 pages
The Design Development and Testing of A PDF
No ratings yet
The Design Development and Testing of A PDF
109 pages
Abtc Vaccination Card
No ratings yet
Abtc Vaccination Card
3 pages
CSS Fonts
No ratings yet
CSS Fonts
4 pages
ZKTeco-Quốc - Phone 0904848459
No ratings yet
ZKTeco-Quốc - Phone 0904848459
10 pages
BOQ - Zallaf South Refinery Project - CAMP & TSF
No ratings yet
BOQ - Zallaf South Refinery Project - CAMP & TSF
18 pages
BMC Remedy Service Desk 7.6 Connector Installation and Configuration Guide
No ratings yet
BMC Remedy Service Desk 7.6 Connector Installation and Configuration Guide
50 pages
Gucci Strategic MGT
0% (1)
Gucci Strategic MGT
18 pages
Delegated Content Erasure in IPFS: Future Generation Computer Systems June 2020
No ratings yet
Delegated Content Erasure in IPFS: Future Generation Computer Systems June 2020
10 pages
Winglets Brochure 2009
No ratings yet
Winglets Brochure 2009
4 pages
For Communication Skills
No ratings yet
For Communication Skills
2 pages
Mock Test
No ratings yet
Mock Test
6 pages
Ata Allah Taleizadeh - Imperfect Inventory Systems - Inventory and Production Management-Springer (2021)
No ratings yet
Ata Allah Taleizadeh - Imperfect Inventory Systems - Inventory and Production Management-Springer (2021)
598 pages
Endowment Effect Essay
No ratings yet
Endowment Effect Essay
8 pages
Agilent Technologies E4350B User Manual
No ratings yet
Agilent Technologies E4350B User Manual
129 pages
Application of Graph Theory in Electrical Network: Berdewad O. K., Dr. Deo S. D
No ratings yet
Application of Graph Theory in Electrical Network: Berdewad O. K., Dr. Deo S. D
2 pages
AI Project Cycle Question Bank
No ratings yet
AI Project Cycle Question Bank
14 pages
Contract - II
No ratings yet
Contract - II
8 pages
AC 21 New Features Guide
No ratings yet
AC 21 New Features Guide
39 pages
Untitled Design
No ratings yet
Untitled Design
15 pages
Agilent ERP Failure
No ratings yet
Agilent ERP Failure
2 pages
Sunder Rajan - 2005 - Biocapital
No ratings yet
Sunder Rajan - 2005 - Biocapital
359 pages
MTU GLR 3 4 - Parts
No ratings yet
MTU GLR 3 4 - Parts
52 pages