0% found this document useful (0 votes)

4 views

lecture_7

Lecture 7 focuses on training neural networks, covering topics such as activation functions, data preprocessing, weight initialization, and transfer learning. It discusses various activation functions like Sigmoid, ReLU, and their advantages and disadvantages in neural network training. The lecture also includes administrative announcements and personal updates from the instructors.

Uploaded by

djonthumaurice185

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

lecture_7

Uploaded by

djonthumaurice185

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 118

Lecture 7:

Training Neural Networks,

Part I

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 1 April 20, 2021

Administrative: Project Proposal

Due yesterday, 4/19 on GradeScope

1 person per group needs to submit, but tag all group members

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 2 April 20, 2021

Personal announcement

Ranjay:
- I am defending my PhD on Friday, April 23rd at 1pm PST.
- Stanford CS defenses are public events.
- Join CS 547's seminar if you want to watch it.
- If you are unable to find the zoom link to watch it and want to,
send me an email by Thursday 3pm.

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 3 April 20, 2021

Administrative: A2

A2 is out, due Wednesday April 30th, 11:59pm

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 4 April 20, 2021

Where we are now...

Computational graphs

x
s (scores) hinge

* loss
+
L

W
R

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 5 April 20, 2021

Where we are now...
Neural Networks
Linear score function:
2-layer Neural Network

x W1 h W2 s
3072 100 10

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 6 April 20, 2021

Where we are now...
Convolutional Neural Networks

Illustration of LeCun et al. 1998 from CS231n 2017 Lecture 1

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 7 April 20, 2021

Where we are now...
Convolutional Layer
activation map
32x32x3 image
5x5x3 filter
32

convolve (slide) over all

spatial locations

32 28
3 1

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 8 April 20, 2021

Where we are now... For example, if we had 6 5x5 filters, we’ll
get 6 separate activation maps:
Convolutional Layer activation maps

Convolution Layer

32 28
3 6
We stack these up to get a “new image” of size 28x28x6!

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 9 April 20, 2021

Where we are now...
Learning network parameters through optimization

Landscape image is CC0 1.0 public domain

Walking man image is CC0 1.0 public domain

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 10 April 20, 2021

Where we are now...

Mini-batch SGD
Loop:
1. Sample a batch of data
2. Forward prop it through the graph
(network), get loss
3. Backprop to calculate the gradients
4. Update the parameters using the gradient

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 11 April 20, 2021

Where we are now...

Hardware + Software
PyTorch

TensorFlow

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 12 April 20, 2021

Today: Training Neural Networks

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 13 April 20, 2021

Overview
1. One time setup
activation functions, preprocessing, weight
initialization, regularization, gradient checking
2. Training dynamics
babysitting the learning process,
parameter updates, hyperparameter optimization
3. Evaluation
model ensembles, test-time augmentation, transfer
learning
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 14 April 20, 2021
Part 1
- Activation Functions
- Data Preprocessing
- Weight Initialization
- Batch Normalization
- Transfer learning

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 15 April 20, 2021

Activation Functions

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 16 April 20, 2021

Activation Functions

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 17 April 20, 2021

Activation Functions
Sigmoid Leaky ReLU

tanh Maxout

ReLU ELU

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 18 April 20, 2021

Activation Functions
- Squashes numbers to range [0,1]
- Historically popular since they
have nice interpretation as a
saturating “firing rate” of a neuron

Sigmoid

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 19 April 20, 2021

Activation Functions
- Squashes numbers to range [0,1]
- Historically popular since they
have nice interpretation as a
saturating “firing rate” of a neuron

3 problems:

1. Saturated neurons “kill” the

Sigmoid gradients

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 20 April 20, 2021

x sigmoid
gate

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 21 April 20, 2021

x sigmoid
gate

What happens when x = -10?

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 22 April 20, 2021

x sigmoid
gate

What happens when x = -10?

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 23 April 20, 2021

x sigmoid
gate

What happens when x = -10?

What happens when x = 0?

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 24 April 20, 2021

x sigmoid
gate

What happens when x = -10?

What happens when x = 0?
What happens when x = 10?

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 25 April 20, 2021

x sigmoid
gate

What happens when x = -10?

What happens when x = 0?
What happens when x = 10?

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 26 April 20, 2021

x sigmoid
gate

Why is this a problem?

If all the gradients flowing back will be
zero and weights will never change

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 27 April 20, 2021

Activation Functions
- Squashes numbers to range [0,1]
- Historically popular since they
have nice interpretation as a
saturating “firing rate” of a neuron

3 problems:

1. Saturated neurons “kill” the

Sigmoid gradients
2. Sigmoid outputs are not
zero-centered

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 28 April 20, 2021

Consider what happens when the input to a neuron is
always positive...

What can we say about the gradients on w?

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 29 April 20, 2021

Consider what happens when the input to a neuron is
always positive...

What can we say about the gradients on w?

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 30 April 20, 2021

Consider what happens when the input to a neuron is
always positive...

What can we say about the gradients on w?

We know that local gradient of sigmoid is always positive

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 31 April 20, 2021

Consider what happens when the input to a neuron is
always positive...

What can we say about the gradients on w?

We know that local gradient of sigmoid is always positive
We are assuming x is always positive

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 32 April 20, 2021

Consider what happens when the input to a neuron is
always positive...

What can we say about the gradients on w?

We know that local gradient of sigmoid is always positive
We are assuming x is always positive

So!! Sign of gradient for all wi is the same as the sign of upstream scalar gradient!

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 33 April 20, 2021

Consider what happens when the input to a neuron is
always positive... allowed
gradient
update
directions

zig zag path

allowed
gradient
update
directions

hypothetical
What can we say about the gradients on w? optimal w
vector
Always all positive or all negative :(

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 34 April 20, 2021

Consider what happens when the input to a neuron is
always positive... allowed
gradient
update
directions

zig zag path

allowed
gradient
update
directions

hypothetical
What can we say about the gradients on w? optimal w
vector
Always all positive or all negative :(
(For a single element! Minibatches help)
35
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 35 April 20, 2021
Activation Functions
- Squashes numbers to range [0,1]
- Historically popular since they
have nice interpretation as a
saturating “firing rate” of a neuron

3 problems:

1. Saturated neurons “kill” the

Sigmoid gradients
2. Sigmoid outputs are not
zero-centered
3. exp() is a bit compute expensive

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 36 April 20, 2021

Activation Functions

- Squashes numbers to range [-1,1]

- zero centered (nice)
- still kills gradients when saturated :(

tanh(x)

[LeCun et al., 1991]

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 37 April 20, 2021

- Computes f(x) = max(0,x)
Activation Functions
- Does not saturate (in +region)
- Very computationally efficient
- Converges much faster than
sigmoid/tanh in practice (e.g. 6x)

ReLU
(Rectified Linear Unit)
[Krizhevsky et al., 2012]

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 38 April 20, 2021

- Computes f(x) = max(0,x)
Activation Functions
- Does not saturate (in +region)
- Very computationally efficient
- Converges much faster than
sigmoid/tanh in practice (e.g. 6x)

- Not zero-centered output

ReLU
(Rectified Linear Unit)

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 39 April 20, 2021

- Computes f(x) = max(0,x)
Activation Functions
- Does not saturate (in +region)
- Very computationally efficient
- Converges much faster than
sigmoid/tanh in practice (e.g. 6x)

- Not zero-centered output

ReLU - An annoyance:
(Rectified Linear Unit)
hint: what is the gradient when x < 0?

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 40 April 20, 2021

x ReLU
gate

What happens when x = -10?

What happens when x = 0?
What happens when x = 10?

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 41 April 20, 2021

active ReLU
DATA CLOUD

dead ReLU
will never activate
=> never update
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 42 April 20, 2021
active ReLU
DATA CLOUD

=> people like to initialize

ReLU neurons with slightly dead ReLU
positive biases (e.g. 0.01) will never activate
=> never update
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 43 April 20, 2021
[Mass et al., 2013]
Activation Functions [He et al., 2015]

- Does not saturate

- Computationally efficient
- Converges much faster than
sigmoid/tanh in practice! (e.g. 6x)
- will not “die”.

Leaky ReLU

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 44 April 20, 2021

[Mass et al., 2013]
Activation Functions [He et al., 2015]

- Does not saturate

- Computationally efficient
- Converges much faster than
sigmoid/tanh in practice! (e.g. 6x)
- will not “die”.

Parametric Rectifier (PReLU)

Leaky ReLU

backprop into \alpha

(parameter)

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 45 April 20, 2021

[Clevert et al., 2015]
Activation Functions
Exponential Linear Units (ELU)
- All benefits of ReLU
- Closer to zero mean outputs
- Negative saturation regime
compared with Leaky ReLU
adds some robustness to noise

- Computation requires exp()

(Alpha default = 1)

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 46 April 20, 2021

[Klambauer et al. ICLR 2017]
Activation Functions
Scaled Exponential Linear Units (SELU)
- Scaled versionof ELU that
works better for deep networks
- “Self-normalizing” property;
- Can train deep SELU networks
without BatchNorm
- (will discuss more later)

α = 1.6733, λ = 1.0507

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 47 April 20, 2021

[Goodfellow et al., 2013]
Maxout “Neuron”
- Does not have the basic form of dot product ->
nonlinearity
- Generalizes ReLU and Leaky ReLU
- Linear Regime! Does not saturate! Does not die!

Problem: doubles the number of parameters/neuron :(

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 48 April 20, 2021

[Ramachandran et al. 2018]
Activation Functions
Swish
- They trained a neural network
to generate and test out
different non-linearities.
- Swish outperformed all other
options for CIFAR-10 accuracy

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 49 April 20, 2021

TLDR: In practice:

- Use ReLU. Be careful with your learning rates

- Try out Leaky ReLU / Maxout / ELU / SELU
- To squeeze out some marginal gains
- Don’t use sigmoid or tanh

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 50 April 20, 2021

Data Preprocessing

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 51 April 20, 2021

Data Preprocessing

(Assume X [NxD] is data matrix,

each example in a row)

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 52 April 20, 2021

Remember: Consider what happens when
the input to a neuron is always positive... allowed
gradient
update
directions

zig zag path

allowed
gradient
update
directions

hypothetical
What can we say about the gradients on w? optimal w
vector
Always all positive or all negative :(
(this is also why you want zero-mean data!)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 53 April 20, 2021
Data Preprocessing

(Assume X [NxD] is data matrix, each example in a row)

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 54 April 20, 2021

Data Preprocessing
In practice, you may also see PCA and Whitening of the data

(data has diagonal (covariance matrix is the

covariance matrix) identity matrix)

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 55 April 20, 2021

Data Preprocessing
Before normalization: classification loss After normalization: less sensitive to small
very sensitive to changes in weight matrix; changes in weights; easier to optimize
hard to optimize

56
Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - April 20, 2021
TLDR: In practice for Images: center only
e.g. consider CIFAR-10 example with [32,32,3] images
- Subtract the mean image (e.g. AlexNet)
(mean image = [32,32,3] array)
- Subtract per-channel mean (e.g. VGGNet)
(mean along each channel = 3 numbers)
- Subtract per-channel mean and
Not common
Divide by per-channel std (e.g. ResNet) to do PCA or
(mean along each channel = 3 numbers) whitening

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 57 April 20, 2021

Weight Initialization

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 58 April 20, 2021

- Q: what happens when W=constant init is used?

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 59 April 20, 2021

- First idea: Small random numbers
(gaussian with zero mean and 1e-2 standard deviation)

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 60 April 20, 2021

- First idea: Small random numbers
(gaussian with zero mean and 1e-2 standard deviation)

Works ~okay for small networks, but problems with

deeper networks.

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 61 April 20, 2021

Weight Initialization: Activation statistics
Forward pass for a 6-layer
net with hidden size 4096

What will happen to the activations for the last layer?

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 62 April 20, 2021

Weight Initialization: Activation statistics
Forward pass for a 6-layer All activations tend to zero
net with hidden size 4096
for deeper network layers
Q: What do the gradients
dL/dW look like?

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 63 April 20, 2021

Weight Initialization: Activation statistics
Forward pass for a 6-layer All activations tend to zero
net with hidden size 4096
for deeper network layers
Q: What do the gradients
dL/dW look like?
A: All zero, no learning =(

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 64 April 20, 2021

Weight Initialization: Activation statistics
Increase std of initial
weights from 0.01 to 0.05

What will happen to the activations for the last layer?

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 65 April 20, 2021

Weight Initialization: Activation statistics
Increase std of initial All activations saturate
weights from 0.01 to 0.05
Q: What do the gradients
look like?

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 66 April 20, 2021

Weight Initialization: Activation statistics
Increase std of initial All activations saturate
weights from 0.01 to 0.05
Q: What do the gradients
look like?
A: Local gradients all zero,
no learning =(

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 67 April 20, 2021

Weight Initialization: “Xavier” Initialization
“Xavier” initialization:
std = 1/sqrt(Din)

Glorot and Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTAT 2010

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 68 April 20, 2021

Weight Initialization: “Xavier” Initialization
“Xavier” initialization: “Just right”: Activations are
std = 1/sqrt(Din) nicely scaled for all layers!

Glorot and Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTAT 2010

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 69 April 20, 2021

Weight Initialization: “Xavier” Initialization
“Xavier” initialization: “Just right”: Activations are
std = 1/sqrt(Din) nicely scaled for all layers!

For conv layers, Din is

filter_size2 * input_channels

Glorot and Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTAT 2010

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 70 April 20, 2021

Weight Initialization: “Xavier” Initialization
“Xavier” initialization: “Just right”: Activations are
std = 1/sqrt(Din) nicely scaled for all layers!

For conv layers, Din is

filter_size2 * input_channels

Let: y = x1w1+x2w2+...+xDinwDin

Glorot and Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTAT 2010

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 71 April 20, 2021

Weight Initialization: “Xavier” Initialization
“Xavier” initialization: “Just right”: Activations are
std = 1/sqrt(Din) nicely scaled for all layers!

For conv layers, Din is

filter_size2 * input_channels

Let: y = x1w1+x2w2+...+xDinwDin
Assume: Var(x1) = Var(x2)= …=Var(xDin)

Glorot and Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTAT 2010

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 72 April 20, 2021

Weight Initialization: “Xavier” Initialization
“Xavier” initialization: “Just right”: Activations are
std = 1/sqrt(Din) nicely scaled for all layers!

For conv layers, Din is

filter_size2 * input_channels

Let: y = x1w1+x2w2+...+xDinwDin
Assume: Var(x1) = Var(x2)= …=Var(xDin)
We want: Var(y) = Var(xi)

Glorot and Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTAT 2010

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 73 April 20, 2021

Weight Initialization: “Xavier” Initialization
“Xavier” initialization: “Just right”: Activations are
std = 1/sqrt(Din) nicely scaled for all layers!

For conv layers, Din is

filter_size2 * input_channels

Let: y = x1w1+x2w2+...+xDinwDin Var(y) = Var(x1w1+x2w2+...+xDinwDin)

[substituting value of y]
Assume: Var(x1) = Var(x2)= …=Var(xDin)
We want: Var(y) = Var(xi)

Glorot and Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTAT 2010

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 74 April 20, 2021

Weight Initialization: “Xavier” Initialization
“Xavier” initialization: “Just right”: Activations are
std = 1/sqrt(Din) nicely scaled for all layers!

For conv layers, Din is

filter_size2 * input_channels

Let: y = x1w1+x2w2+...+xDinwDin Var(y) = Var(x1w1+x2w2+...+xDinwDin)

= Din Var(xiwi)
Assume: Var(x1) = Var(x2)= …=Var(xDin) [Assume all xi, wi are iid]
We want: Var(y) = Var(xi)

Glorot and Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTAT 2010

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 75 April 20, 2021

Weight Initialization: “Xavier” Initialization
“Xavier” initialization: “Just right”: Activations are
std = 1/sqrt(Din) nicely scaled for all layers!

For conv layers, Din is

filter_size2 * input_channels

Let: y = x1w1+x2w2+...+xDinwDin Var(y) = Var(x1w1+x2w2+...+xDinwDin)

= Din Var(xiwi)
Assume: Var(x1) = Var(x2)= …=Var(xDin) = Din Var(xi) Var(wi)
We want: Var(y) = Var(xi) [Assume all xi, wi are zero mean]

Glorot and Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTAT 2010

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 76 April 20, 2021

Weight Initialization: “Xavier” Initialization
“Xavier” initialization: “Just right”: Activations are
std = 1/sqrt(Din) nicely scaled for all layers!

For conv layers, Din is

filter_size2 * input_channels

Let: y = x1w1+x2w2+...+xDinwDin Var(y) = Var(x1w1+x2w2+...+xDinwDin)

= Din Var(xiwi)
Assume: Var(x1) = Var(x2)= …=Var(xDin) = Din Var(xi) Var(wi)
We want: Var(y) = Var(xi) [Assume all xi, wi are iid]

So, Var(y) = Var(xi) only when Var(wi) = 1/Din

Glorot and Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTAT 2010

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 77 April 20, 2021

Weight Initialization: What about ReLU?
Change from tanh to ReLU

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 78 April 20, 2021

Weight Initialization: What about ReLU?
Change from tanh to ReLU Xavier assumes zero
centered activation function

Activations collapse to zero

again, no learning =(

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 79 April 20, 2021

Weight Initialization: Kaiming / MSRA Initialization
ReLU correction: std = sqrt(2 / Din) “Just right”: Activations are
nicely scaled for all layers!

He et al, “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification”, ICCV 2015

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 80 April 20, 2021

Proper initialization is an active area of research…
Understanding the difficulty of training deep feedforward neural networks
by Glorot and Bengio, 2010

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks by Saxe et al, 2013

Random walk initialization for training very deep feedforward networks by Sussillo and Abbott, 2014

Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification by He et

al., 2015

Data-dependent Initializations of Convolutional Neural Networks by Krähenbühl et al., 2015

All you need is a good init, Mishkin and Matas, 2015

Fixup Initialization: Residual Learning Without Normalization, Zhang et al, 2019

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, Frankle and Carbin, 2019

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 81 April 20, 2021

Batch Normalization

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 82 April 20, 2021

[Ioffe and Szegedy, 2015]
Batch Normalization
“you want zero-mean unit-variance activations? just make them so.”

consider a batch of activations at some layer. To make

each dimension zero-mean unit-variance, apply:

this is a vanilla
differentiable function...

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 83 April 20, 2021

[Ioffe and Szegedy, 2015]
Batch Normalization

Input: Per-channel mean,

shape is D

Per-channel var,
shape is D
N X
Normalized x,
Shape is N x D

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - April 20, 2021
[Ioffe and Szegedy, 2015]
Batch Normalization

Input: Per-channel mean,

shape is D

Per-channel var,
shape is D
N X
Normalized x,
Shape is N x D

Problem: What if zero-mean, unit

D variance is too hard of a constraint?

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - April 20, 2021
[Ioffe and Szegedy, 2015]
Batch Normalization

Input: Per-channel mean,

shape is D

Learnable scale and

Per-channel var,
shift parameters: shape is D

Normalized x,
Learning = , Shape is N x D
= will recover the Output,
identity function! Shape is N x D

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - April 20, 2021
Estimates depend on minibatch;
Batch Normalization: Test-Time can’t do this at test-time!

Input: Per-channel mean,

shape is D

Learnable scale and

Per-channel var,
shift parameters: shape is D

Normalized x,
Learning = , Shape is N x D
= will recover the Output,
identity function! Shape is N x D

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - April 20, 2021
Batch Normalization: Test-Time

Input: (Running) average of

values seen during training
Per-channel mean,
shape is D

Learnable scale and

(Running) average of Per-channel var,
shift parameters: values seen during training shape is D

Normalized x,
During testing batchnorm Shape is N x D
becomes a linear operator!
Can be fused with the previous Output,
fully-connected or conv layer Shape is N x D

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - April 20, 2021
[Ioffe and Szegedy, 2015]
Batch Normalization

FC Usually inserted after Fully

BN
Connected or Convolutional layers,
and before nonlinearity.
tanh

tanh

...

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 89 April 20, 2021

[Ioffe and Szegedy, 2015]
Batch Normalization

FC
- Makes deep networks much easier to train!
BN - Improves gradient flow
- Allows higher learning rates, faster convergence
tanh - Networks become more robust to initialization
- Acts as regularization during training
FC - Zero overhead at test-time: can be fused with conv!
- Behaves differently during training and testing: this
BN
is a very common source of bugs!
tanh

...

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 90 April 20, 2021

Batch Normalization for ConvNets
Batch Normalization for
Batch Normalization for convolutional networks
fully-connected networks (Spatial Batchnorm, BatchNorm2D)

x: N × D x: N×C×H×W
Normalize Normalize

𝞵,𝝈: 1 × D 𝞵,𝝈: 1×C×1×1

ɣ,β: 1 × D ɣ,β: 1×C×1×1
y = ɣ(x-𝞵)/𝝈+β y = ɣ(x-𝞵)/𝝈+β

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - April 20, 2021
Layer Normalization Layer Normalization for
Batch Normalization for fully-connected networks
fully-connected networks Same behavior at train and test!
Can be used in recurrent networks

x: N × D x: N × D
Normalize Normalize

𝞵,𝝈: 1 × D 𝞵,𝝈: N × 1
ɣ,β: 1 × D ɣ,β: 1 × D
y = ɣ(x-𝞵)/𝝈+β y = ɣ(x-𝞵)/𝝈+β
Ba, Kiros, and Hinton, “Layer Normalization”, arXiv 2016

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - April 20, 2021
Instance Normalization
Batch Normalization for Instance Normalization for
convolutional networks convolutional networks
Same behavior at train / test!

x: N×C×H×W x: N×C×H×W
Normalize Normalize

𝞵,𝝈: 1×C×1×1 𝞵,𝝈: N×C×1×1

ɣ,β: 1×C×1×1 ɣ,β: 1×C×1×1
y = ɣ(x-𝞵)/𝝈+β y = ɣ(x-𝞵)/𝝈+β
Ulyanov et al, Improved Texture Networks: Maximizing Quality and Diversity in Feed-forward Stylization and Texture Synthesis, CVPR 2017

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - April 20, 2021
Comparison of Normalization Layers

Wu and He, “Group Normalization”, ECCV 2018

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - April 20, 2021
Group Normalization

Wu and He, “Group Normalization”, ECCV 2018

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - April 20, 2021
Transfer learning

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - 96 April 20, 2021
“You need a lot of a data if you want to
train/use CNNs”

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - 97 April 20, 2021
ED
“You need a lot of a data if you want to

ST
train/use CNNs”

BU
Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - 98 April 20, 2021
Transfer Learning with CNNs

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - 99 April 20, 2021
Transfer Learning with CNNs

AlexNet:
64 x 3 x 11 x 11

(More on this in Lecture 13)

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - 100 April 20, 2021
Transfer Learning with CNNs

Test image L2 Nearest neighbors in feature space

(More on this in Lecture 13)

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - 101 April 20, 2021
Donahue et al, “DeCAF: A Deep Convolutional Activation
Feature for Generic Visual Recognition”, ICML 2014

Transfer Learning with CNNs Razavian et al, “CNN Features Off-the-Shelf: An

Astounding Baseline for Recognition”, CVPR Workshops
2014

1. Train on Imagenet
FC-1000
FC-4096
FC-4096

MaxPool
Conv-512
Conv-512

MaxPool
Conv-256
Conv-256

MaxPool
Conv-128
Conv-128

MaxPool
Conv-64
Conv-64

Image

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - 102 April 20, 2021
Donahue et al, “DeCAF: A Deep Convolutional Activation
Feature for Generic Visual Recognition”, ICML 2014

Transfer Learning with CNNs Razavian et al, “CNN Features Off-the-Shelf: An

Astounding Baseline for Recognition”, CVPR Workshops
2014

1. Train on Imagenet 2. Small Dataset (C classes)

FC-1000 FC-C
FC-4096 FC-4096
FC-4096
Reinitialize
FC-4096
this and train
MaxPool MaxPool
Conv-512 Conv-512
Conv-512 Conv-512

MaxPool MaxPool
Conv-512 Conv-512
Conv-512 Conv-512

MaxPool MaxPool Freeze these

Conv-256 Conv-256
Conv-256 Conv-256

MaxPool MaxPool
Conv-128 Conv-128
Conv-128 Conv-128

MaxPool MaxPool
Conv-64 Conv-64
Conv-64 Conv-64

Image Image

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - 103 April 20, 2021
Donahue et al, “DeCAF: A Deep Convolutional Activation
Feature for Generic Visual Recognition”, ICML 2014

Transfer Learning with CNNs Razavian et al, “CNN Features Off-the-Shelf: An

Astounding Baseline for Recognition”, CVPR Workshops
2014

1. Train on Imagenet 2. Small Dataset (C classes)

FC-1000 FC-C
FC-4096 FC-4096
FC-4096
Reinitialize Finetuned from AlexNet
FC-4096
this and train
MaxPool MaxPool
Conv-512 Conv-512
Conv-512 Conv-512

MaxPool MaxPool
Conv-512 Conv-512
Conv-512 Conv-512

MaxPool MaxPool Freeze these

Conv-256 Conv-256
Conv-256 Conv-256

MaxPool MaxPool
Conv-128 Conv-128
Conv-128 Conv-128

MaxPool MaxPool
Conv-64 Conv-64
Donahue et al, “DeCAF: A Deep Convolutional Activation Feature for
Conv-64 Conv-64 Generic Visual Recognition”, ICML 2014
Image Image

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - 104 April 20, 2021
Donahue et al, “DeCAF: A Deep Convolutional Activation
Feature for Generic Visual Recognition”, ICML 2014

Transfer Learning with CNNs Razavian et al, “CNN Features Off-the-Shelf: An

Astounding Baseline for Recognition”, CVPR Workshops
2014

1. Train on Imagenet 2. Small Dataset (C classes) 3. Bigger dataset

FC-1000 FC-C FC-C
FC-4096 FC-4096
Reinitialize FC-4096 Train these
FC-4096 FC-4096 FC-4096
this and train
MaxPool MaxPool MaxPool
Conv-512 Conv-512 Conv-512 With bigger
Conv-512 Conv-512 Conv-512
dataset, train
MaxPool MaxPool MaxPool
more layers
Conv-512 Conv-512 Conv-512
Conv-512 Conv-512 Conv-512

MaxPool MaxPool Freeze these MaxPool

Conv-256 Conv-256 Conv-256 Freeze these
Conv-256 Conv-256 Conv-256

MaxPool MaxPool MaxPool

Conv-128 Conv-128 Conv-128 Lower learning rate
Conv-128 Conv-128 Conv-128 when finetuning;
MaxPool MaxPool MaxPool 1/10 of original LR
Conv-64 Conv-64 Conv-64
is good starting
Conv-64 Conv-64 Conv-64
point
Image Image Image

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - 105 April 20, 2021
FC-1000
FC-4096
very similar very different
FC-4096 dataset dataset
MaxPool
Conv-512
Conv-512

MaxPool
very little data ? ?
Conv-512
More specific
Conv-512

MaxPool
Conv-256
Conv-256
More generic
MaxPool
Conv-128
Conv-128
quite a lot of ? ?
MaxPool
Conv-64
data
Conv-64

Image

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - 106 April 20, 2021
FC-1000
FC-4096
very similar very different
FC-4096 dataset dataset
MaxPool
Conv-512
Conv-512

MaxPool
very little data Use Linear ?
Conv-512
More specific Classifier on
Conv-512
top layer
MaxPool
Conv-256
Conv-256
More generic
MaxPool
Conv-128
Conv-128
quite a lot of Finetune a ?
MaxPool
Conv-64
data few layers
Conv-64

Image

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - 107 April 20, 2021
FC-1000
FC-4096
very similar very different
FC-4096 dataset dataset
MaxPool
Conv-512
Conv-512

MaxPool
very little data Use Linear You’re in
Conv-512
More specific Classifier on trouble… Try
Conv-512
top layer linear classifier
MaxPool
Conv-256 from different
Conv-256
More generic stages
MaxPool
Conv-128
Conv-128
quite a lot of Finetune a Finetune a
MaxPool
Conv-64
data few layers larger number
Conv-64 of layers
Image

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - 108 April 20, 2021
Transfer learning with CNNs is pervasive…
(it’s the norm, not an exception)
Object Detection
(Fast R-CNN) Image Captioning: CNN + RNN

Karpathy and Fei-Fei, “Deep Visual-Semantic Alignments for

Girshick, “Fast R-CNN”, ICCV 2015 Generating Image Descriptions”, CVPR 2015
Figure copyright Ross Girshick, 2015. Reproduced with permission. Figure copyright IEEE, 2015. Reproduced for educational purposes.

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - 109 April 20, 2021
Transfer learning with CNNs is pervasive…
(it’s the norm, not an exception)
Object Detection
CNN pretrained Image Captioning: CNN + RNN
(Fast R-CNN)
on ImageNet

Karpathy and Fei-Fei, “Deep Visual-Semantic Alignments for

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - 110 April 20, 2021
Transfer learning with CNNs is pervasive…
(it’s the norm, not an exception)
Object Detection
CNN pretrained Image Captioning: CNN + RNN
(Fast R-CNN)
on ImageNet

Word vectors pretrained

Karpathy and Fei-Fei, “Deep Visual-Semantic Alignments for
Girshick, “Fast R-CNN”, ICCV 2015
Figure copyright Ross Girshick, 2015. Reproduced with permission.
with word2vec Generating Image Descriptions”, CVPR 2015
Figure copyright IEEE, 2015. Reproduced for educational purposes.

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - 111 April 20, 2021
Transfer learning with CNNs is pervasive…
(it’s the norm, not an exception)

1. Train CNN on ImageNet

2. Fine-Tune (1) for object detection on
Visual Genome
1. Train BERT language model on lots of text
2. Combine(2) and (3), train for joint image /
language modeling
3. Fine-tune (4) for image captioning, visual
question answering, etc.

Zhou et al, “Unified Vision-Language Pre-Training for Image Captioning and VQA” CVPR 2020 Krishna et al, “Visual genome: Connecting language and vision using crowdsourced dense image annotations” IJCV 2017
Figure copyright Luowei Zhou, 2020. Reproduced with permission. Devlin et al. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” ArXiv 2018

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - 112 April 20, 2021
Transfer learning with CNNs -
Architecture matters

We will discuss different architectures in

detail in two lectures

Girshick, “The Generalized R-CNN Framework for Object Detection”, ICCV 2017 Tutorial on Instance-Level Visual Recognition

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - 113 April 20, 2021
Transfer learning with CNNs is pervasive…
But recent results show it might not always be necessary!
Training from scratch can work just as
well as training from a pretrained
ImageNet model for object detection

But it takes 2-3x as long to train.

They also find that collecting more data

is better than finetuning on a related
task

He et al, “Rethinking ImageNet Pre-training”, ICCV 2019

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - 114 April 20, 2021
Takeaway for your projects and beyond:

Source: AI & Deep Learning Memes For Back-propagated Poets

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - 115 April 20, 2021
Takeaway for your projects and beyond:
Have some dataset of interest but it has < ~1M images?

1. Find a very large dataset that has

similar data, train a big ConvNet there
2. Transfer learn to your dataset
Deep learning frameworks provide a “Model Zoo” of pretrained
models so you don’t need to train your own

TensorFlow: https://github.com/tensorflow/models
PyTorch: https://github.com/pytorch/vision

Fei-Fei Li & Ranjay Krishna & Danfei Xu Lecture 7 - 116 April 20, 2021
Summary TLDRs
We looked in detail at:

- Activation Functions (use ReLU)

- Data Preprocessing (images: subtract mean)
- Weight Initialization (use Xavier/He init)
- Batch Normalization (use this!)
- Transfer learning (use this if you can!)

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 117 April 20, 2021
Next time:
Training Neural Networks, Part 2
- Parameter update schemes
- Learning rate schedules
- Gradient checking
- Regularization (Dropout etc.)
- Babysitting learning
- Evaluation (Ensembles etc.)
- Hyperparameter Optimization
- Transfer learning / fine-tuning

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 7 - 118 April 20, 2021

Artificial Intelligence in HR
80% (5)
Artificial Intelligence in HR
12 pages
Spectral Methods For Time-Dependent Problems
No ratings yet
Spectral Methods For Time-Dependent Problems
281 pages
Lecture 7
No ratings yet
Lecture 7
138 pages
Training Neural Networks
No ratings yet
Training Neural Networks
109 pages
lecture_6_part_2
No ratings yet
lecture_6_part_2
136 pages
winter1516_lecture52
No ratings yet
winter1516_lecture52
20 pages
cs231n 2018 Midterm Review-2 PDF
No ratings yet
cs231n 2018 Midterm Review-2 PDF
86 pages
Machine Learning (CSO851) - Lecture 08
No ratings yet
Machine Learning (CSO851) - Lecture 08
27 pages
NN 02
No ratings yet
NN 02
25 pages
Neural Networks
No ratings yet
Neural Networks
27 pages
L7
No ratings yet
L7
11 pages
DeepLearing Theory
No ratings yet
DeepLearing Theory
51 pages
Lecture 4 PDF
No ratings yet
Lecture 4 PDF
169 pages
Neural Networks and Neural Language Models
No ratings yet
Neural Networks and Neural Language Models
27 pages
UNIT II DNN
No ratings yet
UNIT II DNN
24 pages
Lecture 4
No ratings yet
Lecture 4
146 pages
7_NN_Apr_28_2021
No ratings yet
7_NN_Apr_28_2021
81 pages
Lect 5- Non Linear Activation Functions
No ratings yet
Lect 5- Non Linear Activation Functions
41 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
Lec08-1Activation Functions
No ratings yet
Lec08-1Activation Functions
19 pages
7 NN Apr 28 2021
No ratings yet
7 NN Apr 28 2021
81 pages
NN unit_1
No ratings yet
NN unit_1
27 pages
Activation Functions and Their Characteristics in Deep Neural Networks
No ratings yet
Activation Functions and Their Characteristics in Deep Neural Networks
6 pages
6 Lecture CNN
No ratings yet
6 Lecture CNN
45 pages
Neural Network Notes
No ratings yet
Neural Network Notes
8 pages
Activation functions 2
No ratings yet
Activation functions 2
5 pages
AD3451 ML UNIT 4 NOTES
No ratings yet
AD3451 ML UNIT 4 NOTES
36 pages
DL Unit2 HD
No ratings yet
DL Unit2 HD
141 pages
Activation Functions and Convolutional Neural Networks
No ratings yet
Activation Functions and Convolutional Neural Networks
137 pages
SCexp2 C071
No ratings yet
SCexp2 C071
9 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
Deep Learning
No ratings yet
Deep Learning
189 pages
Lecture1 INL
No ratings yet
Lecture1 INL
132 pages
08 Neural Networks
No ratings yet
08 Neural Networks
47 pages
Deep Learning (1)
No ratings yet
Deep Learning (1)
19 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
4 - DNN Tip
No ratings yet
4 - DNN Tip
52 pages
Activation functions
No ratings yet
Activation functions
9 pages
Activation Functions in Neural Networks - 241102 - 224129
No ratings yet
Activation Functions in Neural Networks - 241102 - 224129
7 pages
ML_Lec-22
No ratings yet
ML_Lec-22
25 pages
Deep Learning Lab Manual
No ratings yet
Deep Learning Lab Manual
73 pages
Session NN
No ratings yet
Session NN
32 pages
06 AIS302 ANN backpropagation
No ratings yet
06 AIS302 ANN backpropagation
83 pages
Soft_Computing_2 with numericals
No ratings yet
Soft_Computing_2 with numericals
64 pages
CS490 Advanced Topics in Computing (Deep Learning)
No ratings yet
CS490 Advanced Topics in Computing (Deep Learning)
37 pages
Pr1_ANN_Writeup.docx
No ratings yet
Pr1_ANN_Writeup.docx
7 pages
Lec 23
No ratings yet
Lec 23
13 pages
lecture 9-NN- modified
No ratings yet
lecture 9-NN- modified
94 pages
Activation Function
No ratings yet
Activation Function
43 pages
Unit-1 and 2 and 3 (1)
No ratings yet
Unit-1 and 2 and 3 (1)
212 pages
SoftComp 02
No ratings yet
SoftComp 02
33 pages
CS217_2024_lec11
No ratings yet
CS217_2024_lec11
7 pages
Module1 - Upto Loss Function
No ratings yet
Module1 - Upto Loss Function
137 pages
DL M2 Tech
No ratings yet
DL M2 Tech
32 pages
Soft Computing Manual.-1
No ratings yet
Soft Computing Manual.-1
45 pages
Activation
No ratings yet
Activation
7 pages
9.b Handout-4-Activation Functions
No ratings yet
9.b Handout-4-Activation Functions
4 pages
NNFL Unit III For ECE & EEE
No ratings yet
NNFL Unit III For ECE & EEE
29 pages
19EEE362:Deep Learning For Visual Computing: Dr.T.Ananthan
No ratings yet
19EEE362:Deep Learning For Visual Computing: Dr.T.Ananthan
23 pages
Different Activation Functions With The Equations
No ratings yet
Different Activation Functions With The Equations
6 pages
Introduction to Deep Learning
From Everand
Introduction to Deep Learning
Eugene Charniak
No ratings yet
The Deep Learning Revolution
From Everand
The Deep Learning Revolution
Terrence J. Sejnowski
3.5/5 (7)
Long Answer Type Questions (Part 4)
No ratings yet
Long Answer Type Questions (Part 4)
15 pages
Data Driven Control
No ratings yet
Data Driven Control
2 pages
TIFR Question With Solution 2019
No ratings yet
TIFR Question With Solution 2019
27 pages
Group_5_Practical
No ratings yet
Group_5_Practical
6 pages
Kubiak_S3R-Net_A_Single-Stage_Approach_to_Self-Supervised_Shadow_Removal_CVPRW_2024_paper
No ratings yet
Kubiak_S3R-Net_A_Single-Stage_Approach_to_Self-Supervised_Shadow_Removal_CVPRW_2024_paper
11 pages
Operations Research
No ratings yet
Operations Research
4 pages
Time Series and Panel Data Econometrics
No ratings yet
Time Series and Panel Data Econometrics
95 pages
AA3 - Linear Regression - 2024
No ratings yet
AA3 - Linear Regression - 2024
26 pages
Floating-Point Inverse Square Root Algorithm Based On Taylor-Series Expansion
No ratings yet
Floating-Point Inverse Square Root Algorithm Based On Taylor-Series Expansion
5 pages
Presentation in Linear Programming
100% (1)
Presentation in Linear Programming
34 pages
Identify Ways of Representing Algorithms
No ratings yet
Identify Ways of Representing Algorithms
33 pages
Advance Database Management System: Unit - 2 .Query Processing and Optimization
No ratings yet
Advance Database Management System: Unit - 2 .Query Processing and Optimization
38 pages
EEE 3153-Control System: Lecture-On State Space Modeling & Analysis
No ratings yet
EEE 3153-Control System: Lecture-On State Space Modeling & Analysis
42 pages
Quantum Monte Carlo Approaches for Correlated Systems 1st Edition Federico Becca - Download the ebook now to start reading without waiting
100% (1)
Quantum Monte Carlo Approaches for Correlated Systems 1st Edition Federico Becca - Download the ebook now to start reading without waiting
80 pages
Cipher
No ratings yet
Cipher
6 pages
9 Run Length Codes
No ratings yet
9 Run Length Codes
9 pages
Jesse Walker We P
No ratings yet
Jesse Walker We P
9 pages
Economic Analysis Lecture - 2 Dr. Amany Fakher
No ratings yet
Economic Analysis Lecture - 2 Dr. Amany Fakher
20 pages
Introduction to Communication Systems 1st Edition Madhow Solutions Manual - All Chapters Are Available In PDF Format For Download
100% (3)
Introduction to Communication Systems 1st Edition Madhow Solutions Manual - All Chapters Are Available In PDF Format For Download
60 pages
Python PRG and Numerical Methods
100% (1)
Python PRG and Numerical Methods
483 pages
FRL's & Service Road Levels
No ratings yet
FRL's & Service Road Levels
16 pages
Flight Connection Prediction For Airline
No ratings yet
Flight Connection Prediction For Airline
22 pages
DIRECT Optimization Algorithm User Guide
No ratings yet
DIRECT Optimization Algorithm User Guide
14 pages
Unequally Numerical Differentiation
No ratings yet
Unequally Numerical Differentiation
4 pages
Set3 Growth of Functions
No ratings yet
Set3 Growth of Functions
61 pages
Signal Processing Techniques For Software Radio: Behrouz Farhang-Boroujeny
No ratings yet
Signal Processing Techniques For Software Radio: Behrouz Farhang-Boroujeny
7 pages
Company Bankruptcy Prediction With SMOTE
No ratings yet
Company Bankruptcy Prediction With SMOTE
8 pages
Artificial Neural Networks - Short Answers
No ratings yet
Artificial Neural Networks - Short Answers
5 pages