0% found this document useful (0 votes)

9 views

lecture_6_part_2

The document discusses the training of neural networks, focusing on activation functions, data preprocessing, and weight initialization techniques. It highlights the importance of using appropriate activation functions like ReLU and its variants, as well as the significance of proper weight initialization methods such as Xavier initialization to prevent issues like vanishing gradients. The lecture also emphasizes the need for careful data preprocessing to enhance model performance.

Uploaded by

vishalatdwork573

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

lecture_6_part_2

Uploaded by

vishalatdwork573

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 136

Lecture 6 (Part 2):

Training Neural Networks

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 1 April 17, 2024
Where we are now...
CNN Architectures

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 2 April 17, 2024
Where we are now...
Learning network parameters through optimization

Landscape image is CC0 1.0 public domain

Walking man image is CC0 1.0 public domain

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 3 April 17, 2024
Where we are now...

Mini-batch SGD
Loop:
1. Sample a batch of data
2. Forward prop it through the graph
(network), get loss
3. Backprop to calculate the gradients
4. Update the parameters using the gradient

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 4 April 17, 2024
Today: Training Neural Networks

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 5 April 17, 2024
Overview
1. One time set up: activation functions, preprocessing,
weight initialization, regularization, gradient checking

1. Training dynamics: babysitting the learning process,

parameter updates, hyperparameter optimization

1. Evaluation: model ensembles, test-time

augmentation, transfer learning

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 6 April 17, 2024
Activation Functions

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 7 April 17, 2024
Activation Functions

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 8 April 17, 2024
Activation Functions
Sigmoid Leaky ReLU

tanh Maxout

ReLU ELU

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 9 April 17, 2024
Activation Functions
- Squashes numbers to range [0,1]
- Historically popular since they
have nice interpretation as a
saturating “firing rate” of a neuron

Sigmoid

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 10 April 17, 2024
Activation Functions
- Squashes numbers to range [0,1]
- Historically popular since they
have nice interpretation as a
saturating “firing rate” of a neuron

3 problems:

1. Saturated neurons “kill” the

Sigmoid gradients

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 11 April 17, 2024
x sigmoid
gate

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 12 April 17, 2024
x sigmoid
gate

What happens when x = -10?

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 13 April 17, 2024
x sigmoid
gate

What happens when x = -10?

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 14 April 17, 2024
x sigmoid
gate

What happens when x = -10?

What happens when x = 0?

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 15 April 17, 2024
x sigmoid
gate

What happens when x = -10?

What happens when x = 0?
What happens when x = 10?

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 16 April 17, 2024
x sigmoid
gate

What happens when x = -10?

What happens when x = 0?
What happens when x = 10?

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 17 April 17, 2024
x sigmoid
gate

Why is this a problem?

If all the gradients flowing back will be
zero and weights will never change

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 18 April 17, 2024
Activation Functions

- Squashes numbers to range [-1,1]

- zero centered (nice)
- still kills gradients when saturated :(

tanh(x)

[LeCun et al., 1991]

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 19 April 17, 2024
- Computes f(x) = max(0,x)
Activation Functions
- Does not saturate (in +region)
- Very computationally efficient
- Converges much faster than
sigmoid/tanh in practice (e.g. 6x)

ReLU
(Rectified Linear Unit)
[Krizhevsky et al., 2012]

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 20 April 17, 2024
- Computes f(x) = max(0,x)
Activation Functions
- Does not saturate (in +region)
- Very computationally efficient
- Converges much faster than
sigmoid/tanh in practice (e.g. 6x)

- Not zero-centered output

ReLU
(Rectified Linear Unit)

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 21 April 17, 2024
- Computes f(x) = max(0,x)
Activation Functions
- Does not saturate (in +region)
- Very computationally efficient
- Converges much faster than
sigmoid/tanh in practice (e.g. 6x)

- Not zero-centered output

ReLU - An annoyance:
(Rectified Linear Unit)
hint: what is the gradient when x < 0?

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 22 April 17, 2024
x ReLU
gate

What happens when x = -10?

What happens when x = 0?
What happens when x = 10?

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 23 April 17, 2024
active ReLU
DATA CLOUD

dead ReLU
will never activate
=> never update
Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 24 April 17, 2024
[Mass et al., 2013]
Activation Functions [He et al., 2015]

- Does not saturate

- Computationally efficient
- Converges much faster than
sigmoid/tanh in practice! (e.g. 6x)
- will not “die”.

Leaky ReLU

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 26 April 17, 2024
[Mass et al., 2013]
Activation Functions [He et al., 2015]

- Does not saturate

- Computationally efficient
- Converges much faster than
sigmoid/tanh in practice! (e.g. 6x)
- will not “die”.
- Φ(x)

Parametric Rectifier (PReLU)

Leaky ReLU

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 27 April 17, 2024
[Hendrycks et al., 2016]
Activation Functions
- Computes f(x) = x*Φ(x)

Φ(x)
GELU
(Gaussian Error
Linear Unit) Sources:
https://en.wikipedia.org/wiki/Normal_distribution,
https://en.m.wikipedia.org/wiki/File:Cumulative_di
stribution_function_for_normal_distribution,_mea
n_0_and_sd_1.png

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 28 April 17, 2024
[Hendrycks et al., 2016]
Activation Functions
- Computes f(x) = x*Φ(x)

Source: https://en.m.wikipedia.org/wiki/File:ReLU_and_GELU.svg

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 29 April 17, 2024
[Hendrycks et al., 2016]
Activation Functions
- Computes f(x) = x*Φ(x)

- Very nice behavior around 0

- Smoothness facilitates training in
practice

- Higher computational cost than ReLU

Source: https://en.m.wikipedia.org/wiki/File:ReLU_and_GELU.svg - Large negative values can still have
GELU gradient → 0
(Gaussian Error
Linear Unit)

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 30 April 17, 2024
TLDR: In practice:

- Use ReLU. Be careful with your learning rates

- Try out Leaky ReLU / PReLU / GELU
- To squeeze out some marginal gains
- Don’t use sigmoid or tanh

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 31 April 17, 2024
Data Preprocessing

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 32 April 17, 2024
Data Preprocessing

(Assume X [NxD] is data matrix,

each example in a row)

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 33 April 17, 2024
TLDR: In practice for Images: center only
e.g. consider CIFAR-10 example with [32,32,3] images
- Subtract the mean image (e.g. AlexNet)
(mean image = [32,32,3] array)
- Subtract per-channel mean (e.g. VGGNet)
(mean along each channel = 3 numbers)
- Subtract per-channel mean and
Divide by per-channel std (e.g. ResNet and beyond)
(mean along each channel = 3 numbers)

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 34 April 17, 2024
Weight Initialization

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 35 April 17, 2024
- Q: what happens when W=constant init is used?

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 36 April 17, 2024
- First idea: Small random numbers
(gaussian with zero mean and 1e-2 standard deviation)

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 37 April 17, 2024
- First idea: Small random numbers
(gaussian with zero mean and 1e-2 standard deviation)

Works ~okay for small networks, but problems with

deeper networks.

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 38 April 17, 2024
Weight Initialization: Activation statistics
Forward pass for a 6-layer
net with hidden size 4096

What will happen to the activations for the last layer?

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 39 April 17, 2024
Weight Initialization: Activation statistics
Forward pass for a 6-layer All activations tend to zero
net with hidden size 4096
for deeper network layers
Q: What do the gradients
dL/dW look like?

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 40 April 17, 2024
Weight Initialization: Activation statistics
Forward pass for a 6-layer All activations tend to zero
net with hidden size 4096
for deeper network layers
Q: What do the gradients
dL/dW look like?
A: All zero, no learning =(

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 41 April 17, 2024
Weight Initialization: Activation statistics
Increase std of initial
weights from 0.01 to 0.05

What will happen to the activations for the last layer?

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 42 April 17, 2024
Weight Initialization: Activation statistics
Increase std of initial All activations saturate
weights from 0.01 to 0.05
Q: What do the gradients
look like?

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 43 April 17, 2024
Weight Initialization: Activation statistics
Increase std of initial All activations saturate
weights from 0.01 to 0.05
Q: What do the gradients
look like?
A: Local gradients all zero,
no learning =(

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 44 April 17, 2024
Weight Initialization: “Xavier” Initialization
“Xavier” initialization:
std = 1/sqrt(Din)

Glorot and Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTAT 2010

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 45 April 17, 2024
Weight Initialization: “Xavier” Initialization
“Xavier” initialization: “Just right”: Activations are
std = 1/sqrt(Din) nicely scaled for all layers!

Glorot and Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTAT 2010

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 46 April 17, 2024
Weight Initialization: “Xavier” Initialization
“Xavier” initialization: “Just right”: Activations are
std = 1/sqrt(Din) nicely scaled for all layers!

For conv layers, Din is

filter_size2 * input_channels

Glorot and Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTAT 2010

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 47 April 17, 2024
Weight Initialization: What about ReLU?
Change from tanh to ReLU

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 48 April 17, 2024
Weight Initialization: What about ReLU?
Change from tanh to ReLU Xavier assumes zero
centered activation function

Activations collapse to zero

again, no learning =(

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 49 April 17, 2024
Weight Initialization: Kaiming / MSRA Initialization
ReLU correction: std = sqrt(2 / Din) “Just right”: Activations are
nicely scaled for all layers!

He et al, “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification”, ICCV 2015

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 50 April 17, 2024
Proper initialization is an ongoing area of research…
Understanding the difficulty of training deep feedforward neural networks
by Glorot and Bengio, 2010

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks by Saxe et al, 2013

Random walk initialization for training very deep feedforward networks by Sussillo and Abbott, 2014

Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification by He et

al., 2015

Data-dependent Initializations of Convolutional Neural Networks by Krähenbühl et al., 2015

All you need is a good init, Mishkin and Matas, 2015

Fixup Initialization: Residual Learning Without Normalization, Zhang et al, 2019

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, Frankle and Carbin, 2019

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 51 April 17, 2024
Training vs. Testing Error

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 52 April 17, 2024
Beyond Training Error

Better optimization algorithms But we really care about error on

help reduce training loss new data - how to reduce the gap?

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 53 April 17, 2024
Early Stopping: Always do this
Train
Loss Accuracy Val

Stop training here

Iteration Iteration

Stop training the model when accuracy on the validation set decreases
Or train for a long time, but always keep track of the model snapshot
that worked best on val

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 54 April 17, 2024
Model Ensembles

1. Train multiple independent models

2. At test time average their results
(Take average of predicted probability distributions, then choose argmax)

Enjoy 2% extra performance

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 55 April 17, 2024
How to improve single-model performance?

Regularization

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 56 April 17, 2024
Regularization: Add term to loss

In common use:
(Weight decay)
L2 regularization
L1 regularization
Elastic net (L1 + L2)

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 57 April 17, 2024
Regularization: Dropout
In each forward pass, randomly set some neurons to zero
Probability of dropping is a hyperparameter; 0.5 is common

Srivastava et al, “Dropout: A simple way to prevent neural networks from overfitting”, JMLR 2014

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 58 April 17, 2024
Regularization: Dropout Example forward
pass with a 3-
layer network
using dropout

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 59 April 17, 2024
Regularization: Dropout
How can this possibly be a good idea?
Forces the network to have a redundant representation;
Prevents co-adaptation of features

has an ear X
has a tail
is furry X cat
score
has claws
mischievous X
look

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 60 April 17, 2024
Regularization: Dropout
How can this possibly be a good idea?
Another interpretation:

Dropout is training a large ensemble of

models (that share parameters).

Each binary mask is one model

An FC layer with 4096 units has

24096 ~ 101233 possible masks!
Only ~ 1082 atoms in the universe...
Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 61 April 17, 2024
Dropout: Test time Output Input
(label) (image)
Random
Dropout makes our output random! mask

Want to “average out” the randomness at test-time

But this integral seems hard …

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 62 April 17, 2024
Dropout: Test time
Want to approximate
the integral
Consider a single neuron.
a

w1 w2

x y

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 63 April 17, 2024
Dropout: Test time
Want to approximate
the integral
Consider a single neuron.
a
At test time we have:
w1 w2

x y

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 64 April 17, 2024
Dropout: Test time
Want to approximate
the integral
Consider a single neuron.
a
At test time we have:
w1 w2 During training we have:

x y

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 65 April 17, 2024
Dropout: Test time
Want to approximate
the integral
Consider a single neuron.
a
At test time we have:
w1 w2 During training we have:

x y At test time, multiply

by dropout probability

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 66 April 17, 2024
Dropout: Test time

At test time all neurons are active always

=> We must scale the activations so that for each neuron:
output at test time = expected output at training time

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 67 April 17, 2024
Dropout Summary

drop in train time

scale at test time

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 68 April 17, 2024
More common: “Inverted dropout”

test time is unchanged!

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 69 April 17, 2024
Regularization: A common pattern
Training: Add some kind of randomness

Testing: Average out randomness (sometimes approximate)

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 70 April 17, 2024
Regularization: A common pattern
Training: Add some kind Example: Batch
of randomness Normalization

Training:
Normalize using
Testing: Average out randomness stats from random
(sometimes approximate) minibatches

Testing: Use fixed

stats to normalize
Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 71 April 17, 2024
Regularization: Data Augmentation

“cat”
Load image
and label
Compute
loss
CNN

This image by Nikita is

licensed under CC-BY 2.0

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 72 April 17, 2024
Regularization: Data Augmentation

“cat”
Load image
and label
Compute
loss
CNN

Transform image

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 73 April 17, 2024
Data Augmentation
Horizontal Flips

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 74 April 17, 2024
Data Augmentation
Random crops and scales
Training: sample random crops / scales
ResNet:
1. Pick random L in range [256, 480]
2. Resize training image, short side = L
3. Sample random 224 x 224 patch

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 75 April 17, 2024
Data Augmentation
Random crops and scales
Training: sample random crops / scales
ResNet:
1. Pick random L in range [256, 480]
2. Resize training image, short side = L
3. Sample random 224 x 224 patch

Testing: average a fixed set of crops

ResNet:
1. Resize image at 5 scales: {224, 256, 384, 480, 640}
2. For each size, use 10 224 x 224 crops: 4 corners + center, + flips

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 76 April 17, 2024
Data Augmentation
Color Jitter
Simple: Randomize
contrast and brightness

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 77 April 17, 2024
Data Augmentation
Get creative for your problem!
Examples of data augmentations:
- translation
- rotation
- stretching
- shearing,
- lens distortions, … (go crazy)

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 78 April 17, 2024
Automatic Data Augmentation

Cubuk et al., “AutoAugment: Learning Augmentation Strategies from Data”, CVPR 2019

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 79 April 17, 2024
Regularization: Cutout
Training: Set random image regions to zero
Testing: Use full image
Examples:
Dropout
Batch Normalization
Data Augmentation
Cutout / Random Crop

Works very well for small datasets like CIFAR,

DeVries and Taylor, “Improved Regularization of
Convolutional Neural Networks with Cutout”, arXiv 2017
less common for large datasets like ImageNet

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 80 April 17, 2024
Regularization - In practice
Training: Add random noise
Testing: Marginalize over the noise
Examples:
- Consider dropout for large fully-
Dropout
connected layers
Batch Normalization
- Batch normalization and data
Data Augmentation
augmentation almost always a
Cutout / Random Crop
good idea
- Try cutout especially for small
classification datasets

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 81 April 17, 2024
Choosing Hyperparameters
(without tons of GPUs)

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 82 April 17, 2024
Choosing Hyperparameters
Step 1: Check initial loss

Turn off weight decay, sanity check loss at initialization

e.g. log(C) for softmax with C classes

Random guessing → 1/C probability for each class

Softmax Loss → -log(1/C) = log(C)

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 83 April 17, 2024
Choosing Hyperparameters
Step 1: Check initial loss
Step 2: Overfit a small sample

Try to train to 100% training accuracy on a small sample of

training data (~5-10 minibatches); fiddle with architecture,
learning rate, weight initialization

Loss not going down? LR too low, bad initialization

Loss explodes to Inf or NaN? LR too high, bad initialization

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 84 April 17, 2024
Choosing Hyperparameters
Step 1: Check initial loss
Step 2: Overfit a small sample
Step 3: Find LR that makes loss go down

Use the architecture from the previous step, use all training
data, turn on small weight decay, find a learning rate that
makes the loss drop significantly within ~100 iterations

Good learning rates to try: 1e-1, 1e-2, 1e-3, 1e-4

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 85 April 17, 2024
Choosing Hyperparameters
Step 1: Check initial loss
Step 2: Overfit a small sample
Step 3: Find LR that makes loss go down
Step 4: Coarse grid, train for ~1-5 epochs

Choose a few values of learning rate and weight decay around

what worked from Step 3, train a few models for ~1-5 epochs.

Good weight decay to try: 1e-4, 1e-5, 0

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 86 April 17, 2024
Choosing Hyperparameters
Step 1: Check initial loss
Step 2: Overfit a small sample
Step 3: Find LR that makes loss go down
Step 4: Coarse grid, train for ~1-5 epochs
Step 5: Refine grid, train longer

Pick best models from Step 4, train them for longer (~10-20
epochs) with constant learning rate

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 87 April 17, 2024
Choosing Hyperparameters
Step 1: Check initial loss
Step 2: Overfit a small sample
Step 3: Find LR that makes loss go down
Step 4: Coarse grid, train for ~1-5 epochs
Step 5: Refine grid, train longer
Step 6: Look at loss and accuracy curves

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 88 April 17, 2024
Accuracy Accuracy still going up, you
need to train longer

Train

Val

time

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 89 April 17, 2024
Accuracy Huge train / val gap means
overfitting! Increase regularization,
get more data
Train

Val

time

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 90 April 17, 2024
Accuracy No gap between train / val means
underfitting: train longer, can use
a bigger model
Train

Val

time

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 91 April 17, 2024
Look at learning curves!
Training Loss Train / Val Accuracy

Losses may be noisy, use a

scatter plot and also plot moving
average to see trends better

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 92 April 17, 2024
Cross-validation

We develop "command
centers" to visualize all our
models training with different
hyperparameters

check out weights and biases

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 93 April 17, 2024
You can plot all your loss curves for different hyperparameters on a single plot

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 94 April 17, 2024
Don't look at accuracy or loss curves for too long!

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 95 April 17, 2024
Choosing Hyperparameters
Step 1: Check initial loss
Step 2: Overfit a small sample
Step 3: Find LR that makes loss go down
Step 4: Coarse grid, train for ~1-5 epochs
Step 5: Refine grid, train longer
Step 6: Look at loss and accuracy curves
Step 7: GOTO step 5

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 96 April 17, 2024
Random Search for Hyper-
Random Search vs. Grid Search Parameter Optimization
Bergstra and Bengio, 2012
Grid Layout Random Layout

Unimportant Parameter

Unimportant Parameter
Important Parameter Important Parameter
Illustration of Bergstra et al., 2012 by Shayne
Longpre, copyright CS231n 2017

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 97 April 17, 2024
Summary TLDRs
We looked in detail at:

- Activation Functions (use ReLU)

- Data Preprocessing (images: subtract mean)
- Weight Initialization (use Xavier/Kaiming init)
- Batch Normalization (use this!)
- Transfer learning (use this if you can!)

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 98 April 17, 2024
In Lecture: Recap of Content + QA

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 99 April 17, 2024
Appendix – Slides from Previous Years of the Course

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 100 April 17, 2024
Activation Functions
- Squashes numbers to range [0,1]
- Historically popular since they
have nice interpretation as a
saturating “firing rate” of a neuron

3 problems:

1. Saturated neurons “kill” the

Sigmoid gradients
2. Sigmoid outputs are not zero-
centered

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 101 April 17, 2024
Consider what happens when the input to a neuron is
always positive...

What can we say about the gradients on w?

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 102 April 17, 2024
Consider what happens when the input to a neuron is
always positive...

What can we say about the gradients on w?

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 103 April 17, 2024
Consider what happens when the input to a neuron is
always positive...

What can we say about the gradients on w?

We know that local gradient of sigmoid is always positive

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 104 April 17, 2024
Consider what happens when the input to a neuron is
always positive...

What can we say about the gradients on w?

We know that local gradient of sigmoid is always positive
We are assuming x is always positive

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 105 April 17, 2024
Consider what happens when the input to a neuron is
always positive...

What can we say about the gradients on w?

We know that local gradient of sigmoid is always positive
We are assuming x is always positive

So!! Sign of gradient for all wi is the same as the sign of upstream scalar gradient!

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 106 April 17, 2024
Consider what happens when the input to a neuron is
always positive... allowed
gradient
update
directions

zig zag path

allowed
gradient
update
directions

hypothetical
What can we say about the gradients on w? optimal w
vector
Always all positive or all negative :(

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 107 April 17, 2024
Consider what happens when the input to a neuron is
always positive... allowed
gradient
update
directions

zig zag path

allowed
gradient
update
directions

hypothetical
What can we say about the gradients on w? optimal w
vector
Always all positive or all negative :(
(For a single element! Minibatches help)
108
Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 108 April 17, 2024
Activation Functions
- Squashes numbers to range [0,1]
- Historically popular since they
have nice interpretation as a
saturating “firing rate” of a neuron

3 problems:

1. Saturated neurons “kill” the

Sigmoid gradients
2. Sigmoid outputs are not zero-
centered
3. exp() is a bit compute expensive

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 109 April 17, 2024
[Clevert et al., 2015]
Activation Functions
Exponential Linear Units (ELU)
- All benefits of ReLU
- Closer to zero mean outputs
- Negative saturation regime
compared with Leaky ReLU
adds some robustness to noise

- Computation requires exp()

(Alpha default = 1)

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 110 April 17, 2024
[Klambauer et al. ICLR 2017]
Activation Functions
Scaled Exponential Linear Units (SELU)
- Scaled version of ELU that
works better for deep networks
- “Self-normalizing” property;
- Can train deep SELU networks
without BatchNorm

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 111 April 17, 2024
Maxout “Neuron” [Goodfellow et al., 2013]

- Does not have the basic form of dot product ->

nonlinearity
- Generalizes ReLU and Leaky ReLU
- Linear Regime! Does not saturate! Does not die!

Problem: doubles the number of parameters/neuron :(

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 112 April 17, 2024
Remember: Consider what happens when
the input to a neuron is always positive... allowed
gradient
update
directions

zig zag path

allowed
gradient
update
directions

hypothetical
What can we say about the gradients on w? optimal w
vector
Always all positive or all negative :(
(this is also why you want zero-mean data!)
Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 113 April 17, 2024
Data Preprocessing

(Assume X [NxD] is data matrix, each example in a row)

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 114 April 17, 2024
Data Preprocessing
In practice, you may also see PCA and Whitening of the data

(data has diagonal (covariance matrix is the

covariance matrix) identity matrix)

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 115 April 17, 2024
Data Preprocessing
Before normalization: classification loss After normalization: less sensitive to small
very sensitive to changes in weight matrix; changes in weights; easier to optimize
hard to optimize

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 7 - 116 April 25, 2023
Xavier Initialization: Proof of Optimality
“Xavier” initialization: “Just right”: Activations are
std = 1/sqrt(Din) nicely scaled for all layers!

For conv layers, Din is

filter_size2 * input_channels

Let: y = x1w1+x2w2+...+xDinwDin

Glorot and Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTAT 2010

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 117 April 17, 2024
Weight Initialization: “Xavier” Initialization
“Xavier” initialization: “Just right”: Activations are
std = 1/sqrt(Din) nicely scaled for all layers!

For conv layers, Din is

filter_size2 * input_channels

Let: y = x1w1+x2w2+...+xDinwDin
Assume: Var(x1) = Var(x2)= …=Var(xDin)

Glorot and Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTAT 2010

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 118 April 17, 2024
Weight Initialization: “Xavier” Initialization
“Xavier” initialization: “Just right”: Activations are
std = 1/sqrt(Din) nicely scaled for all layers!

For conv layers, Din is

filter_size2 * input_channels

Let: y = x1w1+x2w2+...+xDinwDin
Assume: Var(x1) = Var(x2)= …=Var(xDin)
We want: Var(y) = Var(xi)

Glorot and Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTAT 2010

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 119 April 17, 2024
Weight Initialization: “Xavier” Initialization
“Xavier” initialization: “Just right”: Activations are
std = 1/sqrt(Din) nicely scaled for all layers!

For conv layers, Din is

filter_size2 * input_channels

Let: y = x1w1+x2w2+...+xDinwDin Var(y) = Var(x1w1+x2w2+...+xDinwDin)

[substituting value of y]
Assume: Var(x1) = Var(x2)= …=Var(xDin)
We want: Var(y) = Var(xi)

Glorot and Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTAT 2010

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 120 April 17, 2024
Weight Initialization: “Xavier” Initialization
“Xavier” initialization: “Just right”: Activations are
std = 1/sqrt(Din) nicely scaled for all layers!

For conv layers, Din is

filter_size2 * input_channels

Let: y = x1w1+x2w2+...+xDinwDin Var(y) = Var(x1w1+x2w2+...+xDinwDin)

= Din Var(xiwi)
Assume: Var(x1) = Var(x2)= …=Var(xDin) [Assume all x , w are iid]
i i
We want: Var(y) = Var(xi)

Glorot and Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTAT 2010

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 121 April 17, 2024
Weight Initialization: “Xavier” Initialization
“Xavier” initialization: “Just right”: Activations are
std = 1/sqrt(Din) nicely scaled for all layers!

For conv layers, Din is

filter_size2 * input_channels

Let: y = x1w1+x2w2+...+xDinwDin Var(y) = Var(x1w1+x2w2+...+xDinwDin)

= Din Var(xiwi)
Assume: Var(x1) = Var(x2)= …=Var(xDin) = Din Var(xi) Var(wi)
We want: Var(y) = Var(xi) [Assume all xi, wi are zero mean]

Glorot and Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTAT 2010

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 122 April 17, 2024
Weight Initialization: “Xavier” Initialization
“Xavier” initialization: “Just right”: Activations are
std = 1/sqrt(Din) nicely scaled for all layers!

For conv layers, Din is

filter_size2 * input_channels

Let: y = x1w1+x2w2+...+xDinwDin Var(y) = Var(x1w1+x2w2+...+xDinwDin)

= Din Var(xiwi)
Assume: Var(x1) = Var(x2)= …=Var(xDin) = Din Var(xi) Var(wi)
We want: Var(y) = Var(xi) [Assume all xi, wi are iid]

So, Var(y) = Var(xi) only when Var(wi) = 1/Din

Glorot and Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTAT 2010

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 123 April 17, 2024
Data Augmentation More Complex:
Color Jitter 1. Apply PCA to all [R, G, B]
Simple: Randomize pixels in training set
contrast and brightness 2. Sample a “color offset”
along principal component
directions
3. Add offset to all pixels of a
training image

(As seen in [Krizhevsky et al. 2012], ResNet, etc)

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 124 April 17, 2024
Regularization: A common pattern
Training: Add random noise
Testing: Marginalize over the noise
Examples:
Dropout
Batch Normalization
Data Augmentation

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 125 April 17, 2024
Regularization: DropConnect
Training: Drop connections between neurons (set weights to 0)
Testing: Use all the connections
Examples:
Dropout
Batch Normalization
Data Augmentation
DropConnect

Wan et al, “Regularization of Neural Networks using DropConnect”, ICML 2013

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 126 April 17, 2024
Regularization: Fractional Pooling
Training: Use randomized pooling regions
Testing: Average predictions from several regions
Examples:
Dropout
Batch Normalization
Data Augmentation
DropConnect
Fractional Max Pooling

Graham, “Fractional Max Pooling”, arXiv 2014

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 127 April 17, 2024
Regularization: Stochastic Depth
Training: Skip some layers in the network
Testing: Use all the layer
Examples:
Dropout
Batch Normalization
Data Augmentation
DropConnect
Fractional Max Pooling
Stochastic Depth

Huang et al, “Deep Networks with Stochastic Depth”, ECCV 2016

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 128 April 17, 2024
Regularization: Mixup
Training: Train on random blends of images
Testing: Use original images
Examples:
Dropout Target label:
Batch Normalization CNN cat: 0.4
Data Augmentation dog: 0.6
DropConnect
Fractional Max Pooling
Stochastic Depth Randomly blend the pixels
Cutout / Random Crop of pairs of training images,
e.g. 40% cat, 60% dog
Mixup
Zhang et al, “mixup: Beyond Empirical Risk Minimization”, ICLR 2018

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 129 April 17, 2024
Transfer learning

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 130 April 17, 2024
You need a lot of a data if you want to
train/use CNNs?

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 131 April 17, 2024
Donahue et al, “DeCAF: A Deep Convolutional Activation
Feature for Generic Visual Recognition”, ICML 2014

Transfer Learning with CNNs Razavian et al, “CNN Features Off-the-Shelf: An

Astounding Baseline for Recognition”, CVPR Workshops
2014

1. Train on Imagenet
FC-1000
FC-4096
FC-4096

MaxPool
Conv-512
Conv-512

MaxPool
Conv-256
Conv-256

MaxPool
Conv-128
Conv-128

MaxPool
Conv-64
Conv-64

Image

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 132 April 17, 2024
Donahue et al, “DeCAF: A Deep Convolutional Activation
Feature for Generic Visual Recognition”, ICML 2014

Transfer Learning with CNNs Razavian et al, “CNN Features Off-the-Shelf: An

Astounding Baseline for Recognition”, CVPR Workshops
2014

1. Train on Imagenet 2. Small Dataset (C classes)

FC-1000 FC-C
FC-4096 FC-4096
FC-4096 FC-4096
Reinitialize
MaxPool
this and train
MaxPool
Conv-512 Conv-512
Conv-512 Conv-512

MaxPool MaxPool
Conv-512 Conv-512
Conv-512 Conv-512

MaxPool MaxPool Freeze these

Conv-256 Conv-256
Conv-256 Conv-256

MaxPool MaxPool
Conv-128 Conv-128
Conv-128 Conv-128

MaxPool MaxPool
Conv-64 Conv-64
Conv-64 Conv-64

Image Image

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 133 April 17, 2024
Donahue et al, “DeCAF: A Deep Convolutional Activation
Feature for Generic Visual Recognition”, ICML 2014

Transfer Learning with CNNs Razavian et al, “CNN Features Off-the-Shelf: An

Astounding Baseline for Recognition”, CVPR Workshops
2014

1. Train on Imagenet 2. Small Dataset (C classes)

FC-1000 FC-C
FC-4096 FC-4096
FC-4096 FC-4096
Reinitialize Finetuned from AlexNet
MaxPool
this and train
MaxPool
Conv-512 Conv-512
Conv-512 Conv-512

MaxPool MaxPool
Conv-512 Conv-512
Conv-512 Conv-512

MaxPool MaxPool Freeze these

Conv-256 Conv-256
Conv-256 Conv-256

MaxPool MaxPool
Conv-128 Conv-128
Conv-128 Conv-128

MaxPool MaxPool
Conv-64 Conv-64
Donahue et al, “DeCAF: A Deep Convolutional Activation
Conv-64 Conv-64 Feature for Generic Visual Recognition”, ICML 2014
Image Image

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 134 April 17, 2024
Donahue et al, “DeCAF: A Deep Convolutional Activation
Feature for Generic Visual Recognition”, ICML 2014

Transfer Learning with CNNs Razavian et al, “CNN Features Off-the-Shelf: An

Astounding Baseline for Recognition”, CVPR Workshops
2014

1. Train on Imagenet 2. Small Dataset (C classes) 3. Bigger dataset

FC-1000 FC-C FC-C
FC-4096 FC-4096
Reinitialize
FC-4096
Train these
FC-4096 FC-4096 FC-4096

MaxPool
this and train MaxPool
MaxPool
Conv-512 Conv-512
Conv-512
With bigger
Conv-512 Conv-512 Conv-512
dataset, train
MaxPool MaxPool MaxPool
Conv-512 Conv-512
more layers
Conv-512
Conv-512 Conv-512 Conv-512

MaxPool MaxPool Freeze these MaxPool

Conv-256 Conv-256
Conv-256
Freeze these
Conv-256 Conv-256 Conv-256

MaxPool MaxPool MaxPool

Conv-128 Conv-128 Conv-128 Lower learning rate
Conv-128 Conv-128 Conv-128 when finetuning;
MaxPool MaxPool MaxPool 1/10 of original LR
Conv-64 Conv-64 Conv-64
is good starting
Conv-64 Conv-64 Conv-64
point
Image Image Image

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 135 April 17, 2024
Takeaway for your projects and beyond:
Have some dataset of interest but it has < ~1M images?

1. Find a very large dataset that has

similar data, train a big ConvNet there
2. Transfer learn to your dataset
Deep learning frameworks provide a “Model Zoo” of pretrained
models so you don’t need to train your own

TensorFlow: https://github.com/tensorflow/models
PyTorch: https://github.com/pytorch/vision

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 136 April 17, 2024
Summary
- Improve your training error:
- Optimizers
- Learning rate schedules
- Improve your test error:
- Regularization
- Choosing Hyperparameters

Fei-Fei Li, Ehsan Adeli, Zane Durante Lecture 7 - 137 April 17, 2024

Submitted To: The Tamilnadu Dr. M.G.R. Medical University Chennai, Tamilnadu
No ratings yet
Submitted To: The Tamilnadu Dr. M.G.R. Medical University Chennai, Tamilnadu
134 pages
Medical Colleges
100% (1)
Medical Colleges
7 pages
Hotel Bill
60% (43)
Hotel Bill
2 pages
ERP Lab Manual
100% (1)
ERP Lab Manual
28 pages
FE1073 E1 Manual 2021 - 22 - PCEDL
No ratings yet
FE1073 E1 Manual 2021 - 22 - PCEDL
14 pages
Training Neural Networks
No ratings yet
Training Neural Networks
109 pages
Lecture 7
No ratings yet
Lecture 7
138 pages
winter1516_lecture52
No ratings yet
winter1516_lecture52
20 pages
cs231n 2018 Midterm Review-2 PDF
No ratings yet
cs231n 2018 Midterm Review-2 PDF
86 pages
Ml Ppt Activation Functions
No ratings yet
Ml Ppt Activation Functions
12 pages
06 AIS302 ANN backpropagation
No ratings yet
06 AIS302 ANN backpropagation
83 pages
Lec08-1Activation Functions
No ratings yet
Lec08-1Activation Functions
19 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
Activation Functions
No ratings yet
Activation Functions
4 pages
UNIT II DNN
No ratings yet
UNIT II DNN
24 pages
winter1516_lecture53
No ratings yet
winter1516_lecture53
20 pages
9.b Handout-4-Activation Functions
No ratings yet
9.b Handout-4-Activation Functions
4 pages
Module 2
No ratings yet
Module 2
13 pages
4 - DNN Tip
No ratings yet
4 - DNN Tip
52 pages
Activation
No ratings yet
Activation
7 pages
Activation Functions
No ratings yet
Activation Functions
34 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
4-Neural Networks and Activation Function
No ratings yet
4-Neural Networks and Activation Function
28 pages
DeepLearing Theory
No ratings yet
DeepLearing Theory
51 pages
Lect 5- Non Linear Activation Functions
No ratings yet
Lect 5- Non Linear Activation Functions
41 pages
Cours 2 - Training Deep Neural Networks
No ratings yet
Cours 2 - Training Deep Neural Networks
42 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
Activation Functions - Ipynb - Colaboratory
No ratings yet
Activation Functions - Ipynb - Colaboratory
10 pages
Deep Learning
No ratings yet
Deep Learning
189 pages
Activation F
No ratings yet
Activation F
4 pages
Deep Learning Tutorial 3
No ratings yet
Deep Learning Tutorial 3
12 pages
Activation Function in NN
No ratings yet
Activation Function in NN
29 pages
Implementation of Activation Layer
No ratings yet
Implementation of Activation Layer
17 pages
Training Deep Neural Networks
No ratings yet
Training Deep Neural Networks
55 pages
Lecture 2.1.2activation Function
No ratings yet
Lecture 2.1.2activation Function
15 pages
Activation Function
No ratings yet
Activation Function
36 pages
Ijisae 4865
No ratings yet
Ijisae 4865
8 pages
what are the activation functions, how do i deter...
No ratings yet
what are the activation functions, how do i deter...
3 pages
NN unit_1
No ratings yet
NN unit_1
27 pages
Activation Function
No ratings yet
Activation Function
43 pages
Activation Functions and Convolutional Neural Networks
No ratings yet
Activation Functions and Convolutional Neural Networks
137 pages
Deep Learning
No ratings yet
Deep Learning
189 pages
Deep Learning: International Islamic University of Chittagong
No ratings yet
Deep Learning: International Islamic University of Chittagong
31 pages
Lecture 4
No ratings yet
Lecture 4
146 pages
LLM Ai Interview SS
No ratings yet
LLM Ai Interview SS
187 pages
7 Types of Neural Network Activation Functions
No ratings yet
7 Types of Neural Network Activation Functions
16 pages
Activation functions 2
No ratings yet
Activation functions 2
5 pages
Study of Ensemble of Activation Functions in Deep Learning
No ratings yet
Study of Ensemble of Activation Functions in Deep Learning
10 pages
19EEE362:Deep Learning For Visual Computing: Dr.T.Ananthan
No ratings yet
19EEE362:Deep Learning For Visual Computing: Dr.T.Ananthan
23 pages
CS490 Advanced Topics in Computing (Deep Learning)
No ratings yet
CS490 Advanced Topics in Computing (Deep Learning)
37 pages
DL Unit2 HD
No ratings yet
DL Unit2 HD
141 pages
Ad3451 Ml Unit 4 Notes
No ratings yet
Ad3451 Ml Unit 4 Notes
34 pages
Neural Networks From Scratch: 3.1 Formal Neuron
No ratings yet
Neural Networks From Scratch: 3.1 Formal Neuron
8 pages
6 Lecture CNN
No ratings yet
6 Lecture CNN
45 pages
Activation Functions: Sigmoid, Tanh, Relu, Leaky Relu, Prelu, Elu, Threshold Relu and Softmax Basics For Neural Networks and Deep Learning
No ratings yet
Activation Functions: Sigmoid, Tanh, Relu, Leaky Relu, Prelu, Elu, Threshold Relu and Softmax Basics For Neural Networks and Deep Learning
15 pages
Artificial Neural Networks(ANN)
No ratings yet
Artificial Neural Networks(ANN)
67 pages
Pr1_ANN_Writeup.docx
No ratings yet
Pr1_ANN_Writeup.docx
7 pages
ML_Lec-22
No ratings yet
ML_Lec-22
25 pages
Deep Learning (1)
No ratings yet
Deep Learning (1)
19 pages
Activation Functions
No ratings yet
Activation Functions
8 pages
Deep Learning
No ratings yet
Deep Learning
78 pages
Activation Function
No ratings yet
Activation Function
13 pages
Module1 - Upto Loss Function
No ratings yet
Module1 - Upto Loss Function
137 pages
26- netinput activation function forward and back propogation
No ratings yet
26- netinput activation function forward and back propogation
41 pages
Enhancing Deep Learning Performance Using Displaced Rectifier Linear Unit
From Everand
Enhancing Deep Learning Performance Using Displaced Rectifier Linear Unit
David Macêdo
No ratings yet
Bellman Ford
No ratings yet
Bellman Ford
36 pages
41_rtgs_and_cheque_printing_rbsoft
No ratings yet
41_rtgs_and_cheque_printing_rbsoft
6 pages
Bellman Ford
No ratings yet
Bellman Ford
36 pages
Dynamic Programming
No ratings yet
Dynamic Programming
56 pages
Bda Lab Manual
0% (1)
Bda Lab Manual
40 pages
Names of Offices Where Fee For Applying Online To GDS Engagement Can Be Made S.No Circle Region Division Officename Pincode
No ratings yet
Names of Offices Where Fee For Applying Online To GDS Engagement Can Be Made S.No Circle Region Division Officename Pincode
1 page
Men Will Be Men: Organization of Natural Growth/continuation of Affiliation/Permanent Affiliation of Private Classes
No ratings yet
Men Will Be Men: Organization of Natural Growth/continuation of Affiliation/Permanent Affiliation of Private Classes
1 page
Curriculum Vitae: Career Objective
No ratings yet
Curriculum Vitae: Career Objective
2 pages
Relationship Classification Page: (Show The Bellow Details For Logged in User. E.g)
No ratings yet
Relationship Classification Page: (Show The Bellow Details For Logged in User. E.g)
3 pages
Estimating Missing Values of Heterogeneous Datasets by Clustering
No ratings yet
Estimating Missing Values of Heterogeneous Datasets by Clustering
24 pages
J Hexane Economization in PKO Plant
No ratings yet
J Hexane Economization in PKO Plant
8 pages
Break-Out Box Guide
No ratings yet
Break-Out Box Guide
8 pages
Conbextra HF
No ratings yet
Conbextra HF
4 pages
Advanced RSS
No ratings yet
Advanced RSS
5 pages
Order - Cancelled.20240801 20240816
No ratings yet
Order - Cancelled.20240801 20240816
17 pages
Red Acci One Sen Ingles
No ratings yet
Red Acci One Sen Ingles
5 pages
Leakage Reactance
No ratings yet
Leakage Reactance
2 pages
Eduardo Viveiros de Castro Cannibal Metaphysics For A Poststructural Anthropology
100% (2)
Eduardo Viveiros de Castro Cannibal Metaphysics For A Poststructural Anthropology
234 pages
Amino Acid Profile of C Gachua
No ratings yet
Amino Acid Profile of C Gachua
19 pages
RAY OPTICS MCQs
No ratings yet
RAY OPTICS MCQs
5 pages
Solution Manual For Introduction To Robotics Mechanics and Control 3rd Edition by Craig
100% (52)
Solution Manual For Introduction To Robotics Mechanics and Control 3rd Edition by Craig
5 pages
RAMVAC Dental Products
No ratings yet
RAMVAC Dental Products
16 pages
Metrology Kharidar
No ratings yet
Metrology Kharidar
3 pages
Muscular System Reviewer
No ratings yet
Muscular System Reviewer
4 pages
AirBnB Data Analysis - LLD
No ratings yet
AirBnB Data Analysis - LLD
11 pages
Futuro Simple
No ratings yet
Futuro Simple
5 pages
Intracranial Berry Aneurysm - Wikipedia, The Free Encyclopedia
No ratings yet
Intracranial Berry Aneurysm - Wikipedia, The Free Encyclopedia
3 pages
Cambridge IGCSE: MATHEMATICS 0580/23
No ratings yet
Cambridge IGCSE: MATHEMATICS 0580/23
16 pages
Patient Support Letter Word Leishmaniasi
No ratings yet
Patient Support Letter Word Leishmaniasi
2 pages
Gcp1a Zambia Manuscript Art of Mabisi
No ratings yet
Gcp1a Zambia Manuscript Art of Mabisi
25 pages
Makita Table Saw
No ratings yet
Makita Table Saw
2 pages
Special Handling Codes - Description: SPL Code Describtion SPL Code Description SPL Code Description
No ratings yet
Special Handling Codes - Description: SPL Code Describtion SPL Code Description SPL Code Description
2 pages
Poems - Dalit Studies
100% (1)
Poems - Dalit Studies
6 pages
Lab Report Skeletal System
0% (1)
Lab Report Skeletal System
9 pages
Pikot
No ratings yet
Pikot
8 pages
Method Statement
No ratings yet
Method Statement
3 pages
Lloyd E. Brownell, Edwin H. Young - Process Equipment Design-Wiley-Interscience (1959) - 2
No ratings yet
Lloyd E. Brownell, Edwin H. Young - Process Equipment Design-Wiley-Interscience (1959) - 2
420 pages
EC303_SIGNAL_Lecture 3,4,5
No ratings yet
EC303_SIGNAL_Lecture 3,4,5
12 pages