0% found this document useful (0 votes)
20 views10 pages

What Is Regularization.

I hope it'll be useful

Uploaded by

py4041548
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views10 pages

What Is Regularization.

I hope it'll be useful

Uploaded by

py4041548
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

What is Regularization?

Regularization is a technique used in machine learning and deep


learning to prevent overfitting and improve a model’s generalization
performance. It involves adding a penalty term to the loss
function during training.

This penalty discourages the model from becoming too complex or


having large parameter values, which helps in controlling the model’s
ability to fit noise in the training data. Regularization in deep
learning methods includes L1 and L2 regularization, dropout, early
stopping, and more. By applying regularization for deep learning,
models become more robust and better at making accurate predictions
on unseen data.

Before we deep dive into the topic, take a look at this image:

Have you seen this image before? As we move towards the right in this
image, our model tries to learn too well the details and the noise from
the training data, ultimately resulting in poor performance on the
unseen data.

In other words, while going toward the right, the complexity of the
model increases such that the training error reduces but the testing
error doesn’t. This is shown in the image below:
If you’ve built a neural network before, you know how complex they
are. This makes them more prone to overfitting.

Regularization is a technique that modifies the learning algorithm


slightly so that the model generalizes better. This, in turn, improves the
model’s performance on unseen data as well.
How does Regularization help Reduce Overfitting?

Let’s consider a neural network that is overfitting on the training data


as shown in the image below:

Assume that our regularization coefficient is so high that some of the


weight matrices are nearly equal to zero.

This will result in a much simpler linear network and slight


underfitting of the training data.
Such a large value of the regularization coefficient is not that useful.
We need to optimize the value of the regularization coefficient to
obtain a well-fitted model as shown in the image below:

Also Read: How to avoid Over-fitting using Regularization?

Different Regularization Techniques in Deep Learning

Now that we understand how regularization helps reduce overfitting,


we’ll learn a few different techniques for applying regularization in
deep learning.

L2&L1 Regularization

L1 and L2 are the most common types of regularization deep learning.


These update the general cost function by adding another term known
as the regularization term.

 Cost function = Loss (say, binary cross entropy) +


Regularization term

Due to the addition of this regularization term, the values of weight


matrices decrease because it assumes that a neural network with
smaller weight matrices leads to simpler models. Therefore, it will also
reduce overfitting to quite an extent.

However, this regularization term differs in L1 and L2.

For L2:

For L1:

In L2, we have: ||w||^2 = Σ w_i^2. This is known as ridge regression,


where lambda is the regularization parameter. It is the hyperparameter
whose value is optimized for better results. L2 regularization is also
known as weight decay as it forces the weights to decay towards zero
(but not exactly zero).

In L1, we having: ||w||=Σ |w_i|. In this, we penalize the absolute value


of the weights. Unlike L2, the weights may be reduced to zero here. L1
regularization is also called lasso regression. Hence, it is very useful
when we are trying to compress our model. Otherwise, we usually
prefer L2 over it.

In keras, we can directly apply regularization for deep learning to any


layer using the regularizers.

Below is the sample code to apply L2 regularization to a Dense layer:


Dropout

This is one of the most interesting types of regularization techniques . It


also produces very good results and is consequently the most
frequently used regularization technique in the field of deep learning.

To understand dropout, let’s say our neural network structure is to the


one shown below:

So what does dropout do? At every iteration, it randomly selects some


nodes and removes them along with all of their incoming and outgoing
connections as shown below:
Each iteration has a different set of nodes, which results in a different
set of outputs. This can also be thought of as an ensemble technique in
machine learning.

Ensemble models usually perform better than a single model as they


capture more randomness. Similarly, dropout models also perform
better than normal neural network models.

This probability of choosing how many nodes should be dropped is the


hyperparameter of the dropout function. As seen in the image above,
dropout can be applied to both the hidden layers as well as the input
layers.
Due to these reasons, dropout is usually preferred when we have a
large neural network structure to introduce more randomness.

As you can see, we have defined 0.25 as the probability of dropping.


We can tune it further for better results using the grid search method.

Data Augmentation

The simplest way to reduce overfitting is to increase the training data


size. In machine learning, however, increasing the training data size
was impossible as the labeled data was too costly.

But now, let’s consider we are dealing with images. In this case, there
are a few ways of increasing the size of the training data—rotating the
image, flipping, scaling, shifting, etc. In the image below, some
transformation has been done on the handwritten digits dataset.
This technique is known as data augmentation. It usually provides a
big leap in improving the accuracy of the model, and it can be
considered a mandatory trick to improve our predictions.

In keras, we can perform all of these transformations


using ImageDataGenerator . It has a big list of arguments that you can
use to pre-process your training data.

Early Stopping

Early stopping is a cross-validation strategy in which we keep one part


of the training set as the validation set. When we see that the
performance on the validation set is getting worse, we immediately
stop the training on the model.

In the above image, we will stop training at the dotted line since, after
that, our model will start overfitting on the training data.
Patience denotes the number of epochs with no further improvement
after which the training will be stopped. For a better understanding,
let’s look at the above image again. After the dotted line, each epoch
will result in a higher validation error value. Therefore, our model will
stop 5 epochs after the dotted line (since our patience equals 5) because
no further improvement is seen.

You might also like