0% found this document useful (0 votes)
33 views10 pages

5-Introduction To regularization-03-Aug-2020Material - I - 03-Aug-2020 - Module3 - Regularization

Regularization techniques are used to reduce overfitting in machine learning models. They modify the learning algorithm to favor simpler models by adding constraints or penalty terms to the objective function. Common regularization strategies include L2 regularization, which penalizes weights with large magnitudes, driving them closer to zero. This helps control model complexity and improve generalization to new data.

Uploaded by

Anand Amsuri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views10 pages

5-Introduction To regularization-03-Aug-2020Material - I - 03-Aug-2020 - Module3 - Regularization

Regularization techniques are used to reduce overfitting in machine learning models. They modify the learning algorithm to favor simpler models by adding constraints or penalty terms to the objective function. Common regularization strategies include L2 regularization, which penalizes weights with large magnitudes, driving them closer to zero. This helps control model complexity and improve generalization to new data.

Uploaded by

Anand Amsuri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Regularization for Deep Learning

• Ability to perform well on training data and new inputs

• Available strategies are designed to reduce the test error

with increased training error


• Modifying the LA to reduce its generalization (test) error

and not the training error


• General regularization strategies
• Adding extra constraints / parameters on the ML model
• Introduce extra terms in the cost / objective function

• Other types – Ensemble methods


Regularization Strategies
• Generally based on regularizing estimators
• Effective regularizer – trade-off with Increased bias by decreasing
variance
• Generalization and overfitting (Model)
– Excluded the data generating process (underfitting)
– Matched the true data generating process
– Include the generating process
• Model complexity
– Finding the model of right size with right number of parameters
– Determine the best fitting model (large) that has been regularized
properly
– Intention is to create a large, deep, regularized model
Parameter Norm Penalties
•• Limit
  the models capacity (NN, LR, LoR)
– Add a parameter norm penalty to the objective function
– , , where
• For NNs (Parameter Norm Penalty - PNP) impacts the
weights across each layer and the biases remain unregularized
• ω – vector for weights affected by
• θ – vector for parameters comprising ω and the unregularized
parameters
• Alternatively, NNs deploy  coefficient different for each layer
of the N/w.
L2 Parameter Regularization
• Simplest and commonly utilized
• L2 PNP is known as weight decay
• To avoid overfitting, a weight update w with the respect
to∇J/∇w  and subtract from λ∙w, thereby the weights
decay towards zero – weight update
• Drive the weights closer to the origin by adding a
regularization term to the objective function
• Ridge regression / Tikhonov regularization
Regularization (revisited)
• Regularization refers to the act of modifying a learning
algorithm to favor “simpler” prediction rules to avoid
overfitting.
• Most commonly, regularization refers to modifying the
loss function to penalize certain values of the weights you
are learning.
• Specifically, penalize weights that are large.
• Identify large weights using
• L2 norm of w – vector’s length / Euclidean norm
L2 Regularization (ctd..)

• New goal for minimization –


Loss minimizing
function this, we
prefer
solutions
where
w is closer to 0.

• λ - hyperparameter that adjusts the trade-off


between having low training loss and having
low weights
L2 Regularization (ctd..)
• Assuming no bias (i.e. θ is just ω), then
cost function is
• And the gradient
• Updating weight

• Further quadratic approximation to J to


yield minimal unregularized training cost
by tuning weights
L2 Regularization (ctd..)
•  
• Then J becomes
• H – Hessian matrix of J
• Minimum of J occurs at
• Adding weight decay gradient

– When =0, approaches


– When  grows perform eigen decomposition of H
L2 Regularization (ctd..)
•  
• H is decomposed into diagonal matrix and
orthonormal basis of eigen vectors as
• Therefore, becomes
L2 Regularization (ctd..)
• Extending to linear regression - Cost
function J in terms of sum of squared errors

• Applying L2 regularization modifies J

• Therefore weight decay becomes

You might also like