0% found this document useful (0 votes)
244 views

Loss Functions

The document discusses loss functions in machine learning. It defines loss functions as measures of how accurate a model's predictions are compared to actual outcomes. The document categorizes loss functions into regression losses like mean squared error and mean absolute error, which are used for predicting continuous values, and classification losses like log loss and negative log likelihood, which are used for predicting categorical values. It provides formulas and examples to explain several common loss functions.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
244 views

Loss Functions

The document discusses loss functions in machine learning. It defines loss functions as measures of how accurate a model's predictions are compared to actual outcomes. The document categorizes loss functions into regression losses like mean squared error and mean absolute error, which are used for predicting continuous values, and classification losses like log loss and negative log likelihood, which are used for predicting categorical values. It provides formulas and examples to explain several common loss functions.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 37

Loss Functions in machine

learning
Submitted by:
Sumaira Rasool (Ph.D Scholar)
Department of Computer Science
University of Peshawar.

Supervised by:
Dr. Muhammad Naeem
Outlines

• Introduction to loss functions

• Categories of loss functions


– Regression losses
– Classification losses

prepared by Sumaira Rasool 2


Introduction to Loss Functions
• Machines learn by means of a loss function.
• A loss function is a measure of how good a
prediction model does in terms of being able to
predict the expected outcome.
• It’s a method of evaluating how well specific
algorithm models the given data.
• If predictions deviates too much from actual
results, loss function would generate a very large
number.
prepared by Sumaira Rasool 3
Categories of loss function
• Regression Losses
– Regression deals with predicting a continuous value.

» for example given floor area, number of rooms, size of rooms, predict the
price of room

• Classification Losses
• In classification, we are trying to predict output from set of finite
categorical values.
» For example, Given large data set of images of hand written digits,
categorizing them into one of 0–9 digits.

prepared by Sumaira Rasool 4


Regression Losses

Loss functions for regression includes:

Mean Square Error(MSE)/Quadratic loss


Mean Absolute Error(MAE)
Mean Square Percentage Error(MSPE)
Mean Square logistic Error(MSLE)

prepared by Sumaira Rasool 5


Mean Square Error (MSE)/ Quadratic loss

 Mean Square Error (MSE) is the most


commonly used regression loss function.
 MSE is the average of squared distances
between our target variable and predicted
values.

prepared by Sumaira Rasool 6


Example:Regression Analysis

 Technique concerned with predicting some


variables by knowing others.
 The process of predicting variable Y using
variable X
 Calculates the “best-fit” line for a certain set of
data.
 The regression line makes the sum of the
squares of the residuals smaller than for any
other line. Regression minimizes residuals.
Linear Equations

Y
Y = bX + a
Change
b = Slope in Y
Change in X
a = Y-intercept
X
Hours studying and grades
Regressing grades on hours


Linear Regression


90.00 Final grade in course = 59.95 + 3.17 * study
R-Square = 0.88


80.00  

70.00  

2.00 4.00 6.00 8.00 10.00

Number of hours spent studying

Predicted final grade in class =


59.95 + 3.17*(number of hours you study per week)
Predicted final grade in class = 59.95 + 3.17*(hours of study)

Predict the final grade of…

• Someone who studies for 12 hours


• Final grade = 59.95 + (3.17*12)
• Final grade = 97.99

• Someone who studies for 1 hour:


• Final grade = 59.95 + (3.17*1)
• Final grade = 63.12
• Gradient Descent is a general function for
minimizing a function, in this case the Mean
Squared Error cost function.
• However, the square loss function tends to
penalize outliers excessively, leading to slower
convergence rates

prepared by Sumaira Rasool 12


prepared by Sumaira Rasool 13
prepared by Sumaira Rasool 14
Mean Absolute Error(MAE)
– Mean Absolute Error(MAE) is the average of sum of
absolute differences between the target and
predicted variables.

prepared by Sumaira Rasool 15


• MAE is more robust to
outliers since it does not
make use of square but its
derivatives are not
continuous, making it
inefficient to find the
solution.

prepared by Sumaira Rasool 16


prepared by Sumaira Rasool 17
Mean Square Percentage Error(MSPE)

• Also called weighted version of MSE.


• In MSPE the difference is divided by target-
square value which gives us the relative error.
This division by it’s target-square can also be
read as adding weight to MSE.

prepared by Sumaira Rasool 18


Mean squared logarithmic error (MSLE)

• MSLE is, as the name suggests, a variation of


the Mean Squared Error.
• The loss is the mean over the seen data of the
squared differences between the log-
transformed true and predicted values. This
loss can be interpreted as a measure of the
ratio between actual and predicted.
• Formula:-

prepared by Sumaira Rasool 19


MSLE
• The introduction of the logarithm makes MSLE only care
about the relative difference between the real and the
predicted value, or in other words, it only cares about
the percentual difference between them. This means
that MSLE will treat small differences between small
true and predicted values approximately the same as
big differences between large true and predicted values.
• It can be used when you don’t want to penalize huge
differences when both the values are huge numbers.
• Also, this can be used when you want to penalize under
estimates more than over estimates.
prepared by Sumaira Rasool 20
Example

• Case a) : Pi = 600, Ai = 1000


RMSE = 400, RMSLE = 0.5108
• Case b) : Pi = 1400, Ai = 1000
RMSE = 400, RMSLE = 0.3365
• As it is evident, the differences are same
between actual and predicted in both the
cases. RMSE treated them equally however
RMSLE penalized the under estimate more
than over estimate. 

prepared by Sumaira Rasool 21


Classification Losses
• Loss functions for classification includes:
– Log loss/Binary Cross Entropy Loss
– Negative Log Likelihood
– Hinge Loss

prepared by Sumaira Rasool 22


Log loss/Binary Cross Entropy
• Log loss score is kind of penalty for the
classification. For pretty bad prediction log loss
penalizes heavily (expect a higher score).
• Minimizing log loss maximizes accuracy.
• Log Loss returns high values for bad predictions
and low values for good predictions.

prepared by Sumaira Rasool 23


Log loss/Binary cross entropy
• The goal of our machine learning models is
to minimize this value.
• A perfect model would have a log loss of 0.
• Formula:-

prepared by Sumaira Rasool 24


Negative Log-Likelihood (NLL)

 This is a widely used loss function in neural networks.


 It measures the accuracy of a classifier.
 It is used when the model outputs a probability for
each class rather than just the most likely class.
 In practice, the softmax function is used in tandem
with the negative log-likelihood (NLL). This loss
function is very interesting if we interpret it in relation
to the behavior of softmax.
• Formula:-
L(y)=−log(y)

prepared by Sumaira Rasool 25


Negative Log-Likelihood (NLL)

• When training a model, we try to find the minima of a


loss function given a set of parameters (in a neural
network, these are the weights and biases). We can
interpret the loss as the “unhappiness” of the network
with respect to its parameters. The higher the loss, the
higher the unhappiness but we don’t want that. We
actually want to make our model happy.
• So if we are using the negative log-likelihood as our
loss function, the question is that when does it
become unhappy and when does it become happy.

prepared by Sumaira Rasool 26


In the figure shown below, the loss function reaches infinity
when input is 0, and the loss function reaches 0 when input is 1.

prepared by Sumaira Rasool 27


Negative Log-Likelihood (NLL)

prepared by Sumaira Rasool 28


Negative Log-Likelihood (NLL)

prepared by Sumaira Rasool 29


Hinge Loss
• Hinge loss is used for maximum-margin classification,
most notably for support vector machines.
• A margin is a separation of line to the closest class
points.
• A good margin is one where this separation is larger
for both the classes. Images below gives to visual
example of good and bad margin. A good margin
allows the points to be in their respective classes
without crossing to other class.

prepared by Sumaira Rasool 30


Hinge Loss
 Formula:-
SVMLoss=max(0,1-yf(x)) Or

 Although not differentiable, it’s a convex function


which makes it easy to work with usual convex
optimizers used in machine learning domain.

prepared by Sumaira Rasool 31


Hinge Loss

prepared by Sumaira Rasool 32


• Correctly classified points lying outside the
margin boundaries of the support vectors are
not penalized, whereas points within the
margin boundaries or on the wrong side of the
hyperplane are penalized in a linear fashion
compared to their distance from the correct
boundary.

prepared by Sumaira Rasool 33


prepared by Sumaira Rasool 34
Optimal hyperplane

prepared by Sumaira Rasool 35


loss functions in Keras
1) mean_squared_error
keras.losses.mean_squared_error(y_true, y_pred)
2) mean_absolute_error
keras.losses.mean_absolute_error(y_true, y_pred)
3) mean_absolute_percentage_error
keras.losses.mean_absolute_percentage_error(y_true, y_pred)
4) mean_squared_logarithmic_error
keras.losses.mean_squared_logarithmic_error(y_true, y_pred)
5) hinge
keras.losses.hinge(y_true, y_pred)
6) categorical_crossentropy
keras.losses.categorical_crossentropy(y_true, y_pred)
7) binary_crossentropy
keras.losses.binary_crossentropy(y_true, y_pred)

prepared by Sumaira Rasool 36


References
1. https://towardsdatascience.com/common-loss-func
tions-in-machine-learning-46af0ffc4d23
2. https://heartbeat.fritz.ai/5-regression-loss-function
s-all-machine-learners-should-know-4fb140e9d4b0
3. https://machinelearningmastery.com/loss-and-loss-
functions-for-training-deep-learning-neural-network
s/
4. https://peltarion.com/knowledge-center/document
ation/modeling-view/loss-functions/mean-squared-
logarithmic-error
5. https://ljvmiranda921.github.io/notebook/2017/08
/13/softmax-and-the-negative-log-likelihood/
prepared by Sumaira Rasool 37

You might also like