0% found this document useful (0 votes)
6 views19 pages

Module-4 4

Uploaded by

as.business.023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views19 pages

Module-4 4

Uploaded by

as.business.023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Regularization in Deep

Learning
Different Scenarios
p-norms visualized

lines indicate penalty = 1


w1

w2
p w2
1 0.5
For example, if w1 = 0.5 1.5 0.75
2 0.87
3 0.95
∞ 1
p-norms visualized

all p-norms penalize larger


weights

p < 2 tends to create sparse


(i.e. lots of 0 weights)

p > 2 tends to like similar


weights
Regularizers summarized
L2 is popular because it tends to result in sparse
solutions (i.e. lots of zero weights)
However, it is not differentiable, so it only works for gradient
descent solvers

L1 is also popular because for some loss functions, it


can be solved directly (no gradient descent required,
though often iterative solvers still)

Lp is less popular since they don’t tend to shrink the


weights enough
Regularization:Dropout

• In addition to the L2 and L1 regularization, another famous and


powerful regularization technique is called the dropout regularization.
The procedure behind dropout regularization is quite simple.
• In a nutshell, dropout means that during training with some
probability P a neuron of the neural network gets turned off during
training. Let’s look at a visual example.
Dropout

• Assume on the left side we have a feedforward neural


network with no dropout. Using dropout with let’s say a
probability of P=0.5 that a random neuron gets turned
off during training would result in a neural network on
the right side.
Dropout
• In this case, you can observe that approximately half of
the neurons are not active and are not considered as a
part of the neural network. And as you can observe the
neural network becomes simpler.
• A simpler version of the neural network results in less
complexity that can reduce overfitting. The deactivation
of neurons with a certain probability P is applied at each
forward propagation and weight update step.
In a nutshell….
• Overfitting occurs in more complex neural network models (many
layers, many neurons)
• Complexity of the neural network can be reduced by using L1 and
L2 regularization as well as dropout
• L1 regularization forces the weight parameters to become zero
• L2 regularization forces the weight parameters towards zero (but
never exactly zero)
• Smaller weight parameters make some neurons neglectable →
neural network becomes less complex → less overfitting
• During dropout, some neurons get deactivated with a random
probability P → Neural network becomes less complex → less
overfitting
Dataset augmentation
• Dataset augmentation is a process of generating data
artificially from the existing training data by doing minor
changes like rotation, flips, adding blur to some pixels in
the original image, or translations. Augmenting with
more data will make it harder for the neural network to
drive the training error to zero.
Data Augmentation
• By generating more data, the network will have a better chance of
performing better on the test data. Depending on the task at hand,
we might use all the augmentation techniques and generate more
training data.
• To apply data augmentation, we can make use of the existing
methods present in the frameworks like Keras, PyTorch.
• In Keras, we can use ImageDataGenerator to augment or create more
data by doing transformations, and similarly, we can use the
transforms class present in torchvision from PyTorch to augment data.
Early Stopping

• The idea behind early stopping is that when we’re fitting a neural
network on the training data and model is evaluated on the unseen
data after each iteration. If the performance of the model on the
validation data is not improving i.e…validation error is increasing or
remaining the same for certain iterations, then there is no point in
training the model further. This process of stopping model training
before it reaches the lowest training error is known as early stopping.
Early Stopping

• Let’s consider that we have set the patience of 5 epochs


(i.e. the number of epochs to wait before early stop). For
5 epochs, we’ll monitor the validation error, and if it
isn’t improving (either remains constant or increases)
while the training error decreases, then we don’t want
to train any further.
Early Stopping

• By using the early stopping technique, we’re making


sure that the model doesn’t remember the patterns and
noise present in the training data. Instead, we’re
pushing it towards generalizing the training data.
• Early stopping can be applied manually during the
training process, or you can do even better by
integrating these rules in your experiment through the
hooks/callbacks provided in most common frameworks
like Pytorch, Keras and TensorFlow.
Sample code: L2 regularization to a
Dense layer.

from keras import regularizers


model.add(Dense(64, input_dim=64,
kernel_regularizer=regularizers.l2(0.01)

Here the value 0.01 is the value of regularization parameter, i.e., lambda, which we need to optimize
further.
We can optimize it using the grid-search method.
Code: Dropout

In keras, we can implement dropout using the keras core layer.


Below is the python code for it:

from keras.layers.core
import Dropout model = Sequential([ Dense(output_dim=hidden1_num_units,
input_dim=input_num_units, activation='relu'), Dropout(0.25),
Dense(output_dim=output_num_units, input_dim=hidden5_num_units,
activation='softmax'), ])

As you can see, we have defined 0.25 as the probability of dropping.


We can tune it further for better results using the grid search method.
Data Augmentation
Data augmentation is a big leap in improving the accuracy of the model.
It can be considered as a mandatory trick in order to improve our predictions.
In keras, we can perform all of these transformations
using ImageDataGenerator. It has a big list of arguments which you
can use to pre-process your training data.
Below is the sample code to implement it.

from keras.preprocessing.image import ImageDataGenerator datagen =


ImageDataGenerator(horizontal flip=True)
datagen.fit(train)
Callbacks API

• A callback is an object that can perform actions at various


stages of training (e.g. at the start or end of an epoch, before
or after a single batch, etc).
• You can use callbacks to:
• Write TensorBoard logs after every batch of training to monitor
your metrics
• Periodically save your model to disk
• Do early stopping
• Get a view on internal states and statistics of a model during
training
• ...and more
Dropout
• In keras, we can apply early stopping using the callbacks function. Below is the sample
code for it.
from keras.callbacks import EarlyStopping
EarlyStopping(monitor='val_err', patience=5)

• Here, monitor denotes the quantity that needs to be monitored and ‘val_err’ denotes
the validation error.
• Patience denotes the number of epochs with no further improvement after which the
training will be stopped. For better understanding, let’s take a look at the above image
again. After the dotted line, each epoch will result in a higher value of validation error.
Therefore, 5 epochs after the dotted line (since our patience is equal to 5), our model
will stop because no further improvement is seen.

You might also like