0% found this document useful (0 votes)
13 views

Loss_Functions

The document discusses various loss functions used in training machine learning models, highlighting their properties and applications. Key loss functions include Mean Squared Error, Mean Absolute Error, Huber Loss, Log-Cosh Loss, and others, each with distinct characteristics regarding sensitivity to outliers and differentiability. Additionally, it covers specialized loss functions for classification tasks, such as Binary Cross Entropy and Hinge Loss, emphasizing their importance in optimizing model performance.

Uploaded by

poojaexplore1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Loss_Functions

The document discusses various loss functions used in training machine learning models, highlighting their properties and applications. Key loss functions include Mean Squared Error, Mean Absolute Error, Huber Loss, Log-Cosh Loss, and others, each with distinct characteristics regarding sensitivity to outliers and differentiability. Additionally, it covers specialized loss functions for classification tasks, such as Binary Cross Entropy and Hinge Loss, emphasizing their importance in optimizing model performance.

Uploaded by

poojaexplore1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Loss Functions

Dr. V. Sowmya,
Associate Professor,
Amrita School of Artificial
Intelligence,
Coimbatore,
Amrita Vishwa Vidyapeetham,
India.
27-01-2025.
Loss Functions

• During training, a loss function is used to


optimize the model’s parameters.

• Measures the difference between the predicted


and expected outputs of the model.

• The objective of training is to minimize this


difference.
Loss Functions - Properties
Mean Squared Error (MSE) / L2 Loss

Properties:
• Non-negative.
• Sensitive to Outliers.
• Differentiable.
• Convex (non-convex due to the multiple layers of
non-linear activation functions in DL).
• Susceptible to outliers in the data.
• Loss function and performance metric.
• Scale-dependent.
Mean Absolute Error (MSE) / L1 Loss

Properties:
• Non-negative.
• Robust to Outliers.
• Non-Differentiable.
• Convex (non-convex due to the
multiple layers of non-linear activation
functions in DL).
• Loss function and performance metric.
• Scale Dependent.
Mean Absolute Percentage Error (MAPE) or Normalized Mean Absolute Error (NMAE) to
compare models across different scales or units.
Huber Loss

Properties:
• Robust to Outliers.
• Differentiable.
• Used in time series
forecasting.

δ to a small value if the data has a lot of noise and to a


large value if the data has outliers.
Log-Cosh Loss

Properties:
• Smooth and Differentiable.
• Less Sensitive to Outliers than MSE.
• More sensitive to small errors than the
Huber loss.
Huber Loss - when we have a reason to define a specific point where the loss
function should switch from quadratic to linear, depending on the noise
characteristics of the data.
Log – Cosh Loss - when we do not have clear reasons to manually set a transition
threshold as in Huber loss.
Quantile Loss

Used for predicting an interval instead of a


single value.
The loss is scaled by q for underestimations and (1 − q) for
overestimations.
When q = 0.5, the quantile loss is equivalent to the Mean
Absolute Error (MAE), making it a generalization of MAE that
allows for asymmetric penalties for underestimations and
overestimations.
Financial Risk Management, Supply Chain and Inventory
Management, Energy Production, Economic Forecasting, Weather
Forecasting, Real Estate Pricing, Healthcare.
Poisson Loss

when the target variable represents


count data

Traffic Modelling, Healthcare, Insurance, Customer


Service, Internet Usage, Manufacturing, Crime Analysis.
Binary Cross Entropy (BCE) and
Weighted BCE

Assigns a higher weight to the minority class, helping to balance the


influence of each class on the training process.
Categorical Cross Entropy (CCE)

Sparse Categorical Cross Entropy


(CCE)
Cross-Entropy Loss with Label
Smoothing
This technique has been shown to improve the generalization
of models, particularly in scenarios with many categories or
when the dataset contains noisy labels.
Negative Log Likelihood
(NLL)
Poly Loss

ϵ = 0, Poly-1 reduces to the standard cross-entropy


loss. When ϵ > 0, the loss function becomes more
sensitive to confident predictions, reducing
overfitting in imbalanced datasets or tasks
requiring higher precision
When dealing with imbalanced datasets, To
simplify the hyperparameter optimization process
Hinge Loss
maximum-margin classification
tasks

y · f(x) reflects the raw margin, which measures how


far the predicted value is from the decision boundary
in terms of alignment and distance

Squared Hinge
Loss

You might also like