0% found this document useful (0 votes)
32 views19 pages

Regularization

The document discusses methods to reduce overfitting in machine learning, focusing on regularization techniques such as L1 (Lasso) and L2 (Ridge) regularization, early stopping, and data augmentation. Regularization adds penalties to the loss function to prevent models from memorizing training data, while early stopping halts training when performance on a validation set begins to decline. Data augmentation generates new training examples through transformations, enhancing model generalization, particularly in image processing.

Uploaded by

devanand272003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views19 pages

Regularization

The document discusses methods to reduce overfitting in machine learning, focusing on regularization techniques such as L1 (Lasso) and L2 (Ridge) regularization, early stopping, and data augmentation. Regularization adds penalties to the loss function to prevent models from memorizing training data, while early stopping halts training when performance on a validation set begins to decline. Data augmentation generates new training examples through transformations, enhancing model generalization, particularly in image processing.

Uploaded by

devanand272003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Regularization L1, L2

Early Stopping
Data Augmentation
Mr. Sivadasan E T
Associate Professor
Vidya Academy of Science and Technology, Thrissur
What is Regularization

Regularization is a method in machine learning used to


reduce overfitting and improve the model's ability to
generalize to unseen data.

It works by adding a penalty or constraint to the model's


objective function (usually the loss function), discouraging it
from fitting too closely to the training data.
Why Regularization

When a machine learning model is too complex, it can


memorize the training data instead of learning the underlying
patterns.

This leads to overfitting, where the model performs well on


training data but poorly on new, unseen data. Regularization
helps control this complexity.
How Does Regularization Work?
In a typical machine learning model, the goal is to minimize
the loss function, which measures the difference between the
model’s predictions and the actual values.

Regularization modifies this loss function by adding a


penalty term based on the model's parameters.
New loss function:
Loss=Original Loss + Penalty Term
L1 Regularization (Lasso):
Adds the sum of the absolute values of the model parameters
as a penalty.
Drives some parameters to zero, effectively removing them
(feature selection).
Formula for the penalty:
L2 Regularization (Ridge):
Adds the sum of the squared values of the model parameters
as a penalty.
Reduces the magnitude of parameters but doesn't set them to
zero.
Formula for the penalty
Early Stopping

Neural networks are trained using variations of gradient-


descent methods.
In most optimization models, gradient-descent methods are
executed to convergence.
However, executing gradient descent to convergence
optimizes the loss on the training data, but not necessarily on
the out-of-sample test data.
Early Stopping

This is because the final few steps often overfit to the specific
nuances of the training data, which might not generalize well
to the test data.
Another common form of regularization is early stopping, in
which the gradient descent is ended after only a few
iterations.
One way to decide the stopping point is by holding out a part
of the training data, and then testing the error of the model on
the held-out set.
Early Stopping

The gradient-descent approach is terminated when the error


on the held-out set begins to rise.
Early stopping essentially reduces the size of the parameter
space to a smaller neighborhood within the initial values of
the parameters.
From this point of view, early stopping acts as a regularizer
because it effectively restricts the parameter space.
Early Stopping

In this method, a portion of the training data is held out as a


validation set.
The backpropagation-based training is only applied to the
portion of the training data that does not include the
validation set.
At the same time, the error of the model on the validation set
is continuously monitored.
Early Stopping

At some point, this error begins to rise on the validation set,
even though it continues to reduce on the training set.
This is the point at which further training causes overfitting.
Therefore, this point can be chosen for termination.
Early Stopping

It is important to keep track of the best solution achieved so


far in the learning process (as computed on the validation
data).

This is because one does not perform early stopping after tiny
increases in the out-of-sample error (which might be caused
by noisy variations), but it is advisable to continue to train to
check if the error continues to rise.
Early Stopping

In other words, the termination point is chosen in hindsight


after the error on the validation set continues to rise, and all
hope is lost of improving the error performance on the
validation set.
Data Augmentation

A common trick to reduce overfitting in convolutional neural


networks is the idea of data augmentation.
In data augmentation, new training examples are generated
by using transformations on the original examples.
Data Augmentation

It works better in some domains than others. Image


processing is one domain to which data augmentation is very
well suited.
This is because many transformations such as translation,
rotation, patch extraction, and reflection do not
fundamentally change the properties of the object in an
image.
Data Augmentation

However, they do increase the generalization power of the


data set when trained with the augmented data set.

For example, if a data set is trained with mirror images and


reflected versions of all the bananas in it, then the model is
able to better recognize bananas in different orientations.
Data Augmentation

Many of these forms of data augmentation require very little


computation, and therefore the augmented images do not
need to be explicitly generated up front.

Rather, they can be created at training time, when an image is


being processed.
Data Augmentation

For example, while processing an image of a banana, it can


be reflected into a modified banana at training time.

Similarly, the same banana might be represented in somewhat


different color intensities in different images, and therefore it
might be helpful to create representations of the same image
in different color intensities.
Thank You!

You might also like