0% found this document useful (0 votes)

12 views20 pages

Winter1516 Lecture54

Uploaded by

rsethi3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views20 pages

Winter1516 Lecture54

Uploaded by

rsethi3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

but when using the ReLU

nonlinearity it breaks.

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 61 20 Jan 2016
He et al., 2015
(note additional /2)

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 62 20 Jan 2016
He et al., 2015
(note additional /2)

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 63 20 Jan 2016
Proper initialization is an active area of research…
Understanding the difficulty of training deep feedforward neural networks
by Glorot and Bengio, 2010

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks by
Saxe et al, 2013

Random walk initialization for training very deep feedforward networks by Sussillo and
Abbott, 2014

Delving deep into rectifiers: Surpassing human-level performance on ImageNet

classification by He et al., 2015

Data-dependent Initializations of Convolutional Neural Networks by Krähenbühl et al., 2015

All you need is a good init, Mishkin and Matas, 2015

…

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 64 20 Jan 2016
[Ioffe and Szegedy, 2015]
Batch Normalization
“you want unit gaussian activations? just make them so.”

consider a batch of activations at some layer.

To make each dimension unit gaussian, apply:

this is a vanilla
differentiable function...

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 65 20 Jan 2016
[Ioffe and Szegedy, 2015]
Batch Normalization
“you want unit gaussian activations?
just make them so.”

1. compute the empirical mean and

variance independently for each
dimension.
N X
2. Normalize

D
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 66 20 Jan 2016
[Ioffe and Szegedy, 2015]
Batch Normalization

FC Usually inserted after Fully

BN
Connected / (or Convolutional, as
we’ll see soon) layers, and before
tanh
nonlinearity.
FC

BN Problem: do we
necessarily want a unit
tanh gaussian input to a
tanh layer?
...

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 67 20 Jan 2016
[Ioffe and Szegedy, 2015]
Batch Normalization
Normalize:

Note, the network can learn:

And then allow the network to squash
the range if it wants to:

to recover the identity

mapping.

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 68 20 Jan 2016
[Ioffe and Szegedy, 2015]
Batch Normalization
- Improves gradient flow through
the network
- Allows higher learning rates
- Reduces the strong dependence
on initialization
- Acts as a form of regularization
in a funny way, and slightly
reduces the need for dropout,
maybe

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 69 20 Jan 2016
[Ioffe and Szegedy, 2015]
Batch Normalization
Note: at test time BatchNorm layer
functions differently:

The mean/std are not computed

based on the batch. Instead, a single
fixed empirical mean of activations
during training is used.

(e.g. can be estimated during training

with running averages)

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 70 20 Jan 2016
Babysitting the Learning Process

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 71 20 Jan 2016
Step 1: Preprocess the data

(Assume X [NxD] is data matrix,

each example in a row)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 72 20 Jan 2016
Step 2: Choose the architecture:
say we start with one hidden layer of 50 neurons:
50 hidden
neurons

10 output
output layer
input neurons, one
CIFAR-10
layer hidden layer per class
images, 3072
numbers
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 73 20 Jan 2016
Double check that the loss is reasonable:

disable regularization
loss ~2.3.
“correct “ for returns the loss and the
10 classes gradient for all parameters
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 74 20 Jan 2016
Double check that the loss is reasonable:

crank up regularization

loss went up, good. (sanity check)

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 75 20 Jan 2016
Lets try to train now…

Tip: Make sure that

you can overfit very
small portion of the
training data The above code:
- take the first 20 examples from
CIFAR-10
- turn off regularization (reg = 0.0)
- use simple vanilla ‘sgd’

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 76 20 Jan 2016
Lets try to train now…

Tip: Make sure that

you can overfit very
small portion of the
training data

Very small loss,

train accuracy 1.00,
nice!
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 77 20 Jan 2016
Lets try to train now…

I like to start with small

regularization and find
learning rate that
makes the loss go
down.

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 78 20 Jan 2016
Lets try to train now…

I like to start with small

regularization and find
learning rate that
makes the loss go
down.
Loss barely changing

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 79 20 Jan 2016
Lets try to train now…

I like to start with small

regularization and find
learning rate that
makes the loss go
down.
Loss barely changing: Learning rate is
loss not going down: probably too low
learning rate too low

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 80 20 Jan 2016

DL 02 Basics
No ratings yet
DL 02 Basics
94 pages
Training Deep Neural Networks
No ratings yet
Training Deep Neural Networks
55 pages
8 TrainingNN-3
No ratings yet
8 TrainingNN-3
67 pages
4 CNN PDF
No ratings yet
4 CNN PDF
205 pages
DL 02 Basics
No ratings yet
DL 02 Basics
95 pages
Lecture 3
No ratings yet
Lecture 3
105 pages
Lecture 4
No ratings yet
Lecture 4
146 pages
Training Neural Networks
No ratings yet
Training Neural Networks
109 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
59 pages
Ceng403 - Week 6b
No ratings yet
Ceng403 - Week 6b
51 pages
DL - M2 - Deep Feedforward NN
No ratings yet
DL - M2 - Deep Feedforward NN
97 pages
Winter1516 Lecture55
No ratings yet
Winter1516 Lecture55
22 pages
LecML - 3 NN
No ratings yet
LecML - 3 NN
33 pages
CS490 Advanced Topics in Computing (Deep Learning)
No ratings yet
CS490 Advanced Topics in Computing (Deep Learning)
37 pages
Short Course Machine Learning F de Vuyst 1715052496
No ratings yet
Short Course Machine Learning F de Vuyst 1715052496
74 pages
L7 Lecture Image - classification.DNN v4
No ratings yet
L7 Lecture Image - classification.DNN v4
61 pages
Winter1516 Lecture51
No ratings yet
Winter1516 Lecture51
20 pages
Optimization of Deep Networks
No ratings yet
Optimization of Deep Networks
84 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
Winter1516 Lecture53
No ratings yet
Winter1516 Lecture53
20 pages
IBest DeepLearning
No ratings yet
IBest DeepLearning
123 pages
ML Lec 10 Neural Networks
No ratings yet
ML Lec 10 Neural Networks
87 pages
Deep Learning Module-02 Search Creators
No ratings yet
Deep Learning Module-02 Search Creators
15 pages
Winter1516 Lecture52
No ratings yet
Winter1516 Lecture52
20 pages
18 DL Regularization
No ratings yet
18 DL Regularization
41 pages
General Observation
No ratings yet
General Observation
93 pages
Crashcourse DL Pytorch Parr
No ratings yet
Crashcourse DL Pytorch Parr
39 pages
3 DL
No ratings yet
3 DL
15 pages
Deep Feedforward Networks and Regularization: Licheng Zhang
No ratings yet
Deep Feedforward Networks and Regularization: Licheng Zhang
56 pages
Deep Learning
No ratings yet
Deep Learning
15 pages
Midpaper
No ratings yet
Midpaper
16 pages
Ch2-Training, Optimization and Regularization of DNN-new
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new
114 pages
Exercises INF 5860: Exercise 1 Linear Regression
No ratings yet
Exercises INF 5860: Exercise 1 Linear Regression
5 pages
Machine Learning: The Hundred-Page Book
No ratings yet
Machine Learning: The Hundred-Page Book
17 pages
SS 2020
No ratings yet
SS 2020
21 pages
NISS Deep Learning Tutorial
No ratings yet
NISS Deep Learning Tutorial
58 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
L10 Learning II Gradient Based Learning
No ratings yet
L10 Learning II Gradient Based Learning
72 pages
BN Layer
No ratings yet
BN Layer
4 pages
WS 2021 Solutions
No ratings yet
WS 2021 Solutions
16 pages
IoT - Lecture 11
No ratings yet
IoT - Lecture 11
58 pages
SS 2020 Solutions
No ratings yet
SS 2020 Solutions
22 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
Training Neural
No ratings yet
Training Neural
16 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
123 pages
Instructor Solution Manual To Neural Networks and Deep Learning A Textbook Solutions 3319944622 9783319944623 - Compress
No ratings yet
Instructor Solution Manual To Neural Networks and Deep Learning A Textbook Solutions 3319944622 9783319944623 - Compress
40 pages
Deep Learning
No ratings yet
Deep Learning
78 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
A Imprimer 4
No ratings yet
A Imprimer 4
4 pages
Ece18898g Neural Networks
No ratings yet
Ece18898g Neural Networks
47 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
L06 Slides - mlp3
No ratings yet
L06 Slides - mlp3
26 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
cs519 hw2
No ratings yet
cs519 hw2
15 pages
Instructor's Solution Manual For Neural Networks
No ratings yet
Instructor's Solution Manual For Neural Networks
40 pages
26 Neural Nets
No ratings yet
26 Neural Nets
77 pages
WS 2021
No ratings yet
WS 2021
16 pages
Exercises INF 5860 Solution Hints
No ratings yet
Exercises INF 5860 Solution Hints
11 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
Rejinpaul Question Bank: Cs6704 - Resource Management Techniques Question Bank Vii Semester
No ratings yet
Rejinpaul Question Bank: Cs6704 - Resource Management Techniques Question Bank Vii Semester
27 pages
Gradient Based Optimization
No ratings yet
Gradient Based Optimization
8 pages
Hashing PDF
No ratings yet
Hashing PDF
56 pages
Soft Computing Tutorial 2
No ratings yet
Soft Computing Tutorial 2
4 pages
Adaptive Filter Analysis For System Identification Using Various Adaptive Algorithms
No ratings yet
Adaptive Filter Analysis For System Identification Using Various Adaptive Algorithms
7 pages
DS Lab
No ratings yet
DS Lab
30 pages
Unit 3
No ratings yet
Unit 3
18 pages
Runge-Kutta 4 Method
No ratings yet
Runge-Kutta 4 Method
13 pages
Aiot 5 Unit
No ratings yet
Aiot 5 Unit
24 pages
Design and Analysis of Algorithms
No ratings yet
Design and Analysis of Algorithms
163 pages
Shooting Method For Boundary Value Problem Solving
75% (4)
Shooting Method For Boundary Value Problem Solving
6 pages
10-Citra Medis (Edge Detection)
No ratings yet
10-Citra Medis (Edge Detection)
54 pages
54118-mt - Advanced Digital Signal Processing
No ratings yet
54118-mt - Advanced Digital Signal Processing
2 pages
TE - 2019 - (AIML) Artificial Intelligence and Machine Learning
No ratings yet
TE - 2019 - (AIML) Artificial Intelligence and Machine Learning
4 pages
Linear Programming Notes
No ratings yet
Linear Programming Notes
11 pages
ML Lab 09 Manual - Introduction To Scikit Learn (Ver5)
No ratings yet
ML Lab 09 Manual - Introduction To Scikit Learn (Ver5)
6 pages
Network Flow Problems: Total Distance From O To F ?
No ratings yet
Network Flow Problems: Total Distance From O To F ?
7 pages
Vieta's Formula Addendum
No ratings yet
Vieta's Formula Addendum
2 pages
Data Science Module 0 Life Cycle of Data Science
No ratings yet
Data Science Module 0 Life Cycle of Data Science
10 pages
Chapter 2 - Open Methods
No ratings yet
Chapter 2 - Open Methods
15 pages
BSC - Computer Science Cs - Semester 5 - 2023 - April - Operating Systems I 2019 Pattern
No ratings yet
BSC - Computer Science Cs - Semester 5 - 2023 - April - Operating Systems I 2019 Pattern
2 pages
LabProgram 8 K-Nearest Neighbour Classifier
No ratings yet
LabProgram 8 K-Nearest Neighbour Classifier
3 pages
Lect. 2-1numerical Solution of Nonlinear Equations Part1
No ratings yet
Lect. 2-1numerical Solution of Nonlinear Equations Part1
12 pages
Hough Transform in Matlab: - If We Find An Edge Point at (Ix, Iy), We Loop Through All Possible Values of Theta
No ratings yet
Hough Transform in Matlab: - If We Find An Edge Point at (Ix, Iy), We Loop Through All Possible Values of Theta
11 pages
Our Official Android App - REJINPAUL NETWORK From
No ratings yet
Our Official Android App - REJINPAUL NETWORK From
3 pages
A New Eigenvalue Solver For Solving Eigenvalue Problems in Structural Engineering
No ratings yet
A New Eigenvalue Solver For Solving Eigenvalue Problems in Structural Engineering
5 pages
Comparative Analysis of Wavelet Based Constant Envelope OFDM and Constant Envelope OFDM Using Phase Modulation
No ratings yet
Comparative Analysis of Wavelet Based Constant Envelope OFDM and Constant Envelope OFDM Using Phase Modulation
5 pages
PT Filtrage1
No ratings yet
PT Filtrage1
5 pages
Q3 Solution Monday
No ratings yet
Q3 Solution Monday
2 pages
CS302 Design and Analysis and Algorithms - Image.Marked
No ratings yet
CS302 Design and Analysis and Algorithms - Image.Marked
3 pages