0% found this document useful (0 votes)

355 views37 pages

Loss Functions

The document discusses loss functions in machine learning. It defines loss functions as measures of how accurate a model's predictions are compared to actual outcomes. The document categorizes loss functions into regression losses like mean squared error and mean absolute error, which are used for predicting continuous values, and classification losses like log loss and negative log likelihood, which are used for predicting categorical values. It provides formulas and examples to explain several common loss functions.

Uploaded by

Asimullah, M.Phil. Scholar Department of Computer Science, UoP

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

355 views37 pages

Loss Functions

Uploaded by

Asimullah, M.Phil. Scholar Department of Computer Science, UoP

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 37

Loss Functions in machine

learning
Submitted by:
Sumaira Rasool (Ph.D Scholar)
Department of Computer Science
University of Peshawar.

Supervised by:
Dr. Muhammad Naeem
Outlines

• Introduction to loss functions

• Categories of loss functions

– Regression losses
– Classification losses

prepared by Sumaira Rasool 2

Introduction to Loss Functions
• Machines learn by means of a loss function.
• A loss function is a measure of how good a
prediction model does in terms of being able to
predict the expected outcome.
• It’s a method of evaluating how well specific
algorithm models the given data.
• If predictions deviates too much from actual
results, loss function would generate a very large
number.
prepared by Sumaira Rasool 3
Categories of loss function
• Regression Losses
– Regression deals with predicting a continuous value.

» for example given floor area, number of rooms, size of rooms, predict the
price of room

• Classification Losses
• In classification, we are trying to predict output from set of finite
categorical values.
» For example, Given large data set of images of hand written digits,
categorizing them into one of 0–9 digits.

prepared by Sumaira Rasool 4

Regression Losses

Loss functions for regression includes:

Mean Square Error(MSE)/Quadratic loss

Mean Absolute Error(MAE)
Mean Square Percentage Error(MSPE)
Mean Square logistic Error(MSLE)

prepared by Sumaira Rasool 5

Mean Square Error (MSE)/ Quadratic loss

 Mean Square Error (MSE) is the most

commonly used regression loss function.
 MSE is the average of squared distances
between our target variable and predicted
values.

prepared by Sumaira Rasool 6

Example:Regression Analysis

 Technique concerned with predicting some

variables by knowing others.
 The process of predicting variable Y using
variable X
 Calculates the “best-fit” line for a certain set of
data.
 The regression line makes the sum of the
squares of the residuals smaller than for any
other line. Regression minimizes residuals.
Linear Equations

Y
Y = bX + a
Change
b = Slope in Y
Change in X
a = Y-intercept
X
Hours studying and grades
Regressing grades on hours


Linear Regression



90.00 Final grade in course = 59.95 + 3.17 * study
R-Square = 0.88




80.00  

70.00  

2.00 4.00 6.00 8.00 10.00

Number of hours spent studying

Predicted final grade in class =

59.95 + 3.17*(number of hours you study per week)
Predicted final grade in class = 59.95 + 3.17*(hours of study)

Predict the final grade of…

• Someone who studies for 12 hours

• Final grade = 59.95 + (3.17*12)
• Final grade = 97.99

• Someone who studies for 1 hour:

• Final grade = 59.95 + (3.17*1)
• Final grade = 63.12
• Gradient Descent is a general function for
minimizing a function, in this case the Mean
Squared Error cost function.
• However, the square loss function tends to
penalize outliers excessively, leading to slower
convergence rates

prepared by Sumaira Rasool 12

prepared by Sumaira Rasool 13
prepared by Sumaira Rasool 14
Mean Absolute Error(MAE)
– Mean Absolute Error(MAE) is the average of sum of
absolute differences between the target and
predicted variables.

prepared by Sumaira Rasool 15

• MAE is more robust to
outliers since it does not
make use of square but its
derivatives are not
continuous, making it
inefficient to find the
solution.

prepared by Sumaira Rasool 16

prepared by Sumaira Rasool 17
Mean Square Percentage Error(MSPE)

• Also called weighted version of MSE.

• In MSPE the difference is divided by target-
square value which gives us the relative error.
This division by it’s target-square can also be
read as adding weight to MSE.

prepared by Sumaira Rasool 18

Mean squared logarithmic error (MSLE)

• MSLE is, as the name suggests, a variation of

the Mean Squared Error.
• The loss is the mean over the seen data of the
squared differences between the log-
transformed true and predicted values. This
loss can be interpreted as a measure of the
ratio between actual and predicted.
• Formula:-

prepared by Sumaira Rasool 19

MSLE
• The introduction of the logarithm makes MSLE only care
about the relative difference between the real and the
predicted value, or in other words, it only cares about
the percentual difference between them. This means
that MSLE will treat small differences between small
true and predicted values approximately the same as
big differences between large true and predicted values.
• It can be used when you don’t want to penalize huge
differences when both the values are huge numbers.
• Also, this can be used when you want to penalize under
estimates more than over estimates.
prepared by Sumaira Rasool 20
Example

• Case a) : Pi = 600, Ai = 1000

RMSE = 400, RMSLE = 0.5108
• Case b) : Pi = 1400, Ai = 1000
RMSE = 400, RMSLE = 0.3365
• As it is evident, the differences are same
between actual and predicted in both the
cases. RMSE treated them equally however
RMSLE penalized the under estimate more
than over estimate.

prepared by Sumaira Rasool 21

Classification Losses
• Loss functions for classification includes:
– Log loss/Binary Cross Entropy Loss
– Negative Log Likelihood
– Hinge Loss

prepared by Sumaira Rasool 22

Log loss/Binary Cross Entropy
• Log loss score is kind of penalty for the
classification. For pretty bad prediction log loss
penalizes heavily (expect a higher score).
• Minimizing log loss maximizes accuracy.
• Log Loss returns high values for bad predictions
and low values for good predictions.

prepared by Sumaira Rasool 23

Log loss/Binary cross entropy
• The goal of our machine learning models is
to minimize this value.
• A perfect model would have a log loss of 0.
• Formula:-

prepared by Sumaira Rasool 24

Negative Log-Likelihood (NLL)

 This is a widely used loss function in neural networks.

 It measures the accuracy of a classifier.
 It is used when the model outputs a probability for
each class rather than just the most likely class.
 In practice, the softmax function is used in tandem
with the negative log-likelihood (NLL). This loss
function is very interesting if we interpret it in relation
to the behavior of softmax.
• Formula:-
L(y)=−log(y)

prepared by Sumaira Rasool 25

Negative Log-Likelihood (NLL)

• When training a model, we try to find the minima of a

loss function given a set of parameters (in a neural
network, these are the weights and biases). We can
interpret the loss as the “unhappiness” of the network
with respect to its parameters. The higher the loss, the
higher the unhappiness but we don’t want that. We
actually want to make our model happy.
• So if we are using the negative log-likelihood as our
loss function, the question is that when does it
become unhappy and when does it become happy.

prepared by Sumaira Rasool 26

In the figure shown below, the loss function reaches infinity
when input is 0, and the loss function reaches 0 when input is 1.

prepared by Sumaira Rasool 27

Negative Log-Likelihood (NLL)

prepared by Sumaira Rasool 28

Negative Log-Likelihood (NLL)

prepared by Sumaira Rasool 29

Hinge Loss
• Hinge loss is used for maximum-margin classification,
most notably for support vector machines.
• A margin is a separation of line to the closest class
points.
• A good margin is one where this separation is larger
for both the classes. Images below gives to visual
example of good and bad margin. A good margin
allows the points to be in their respective classes
without crossing to other class.

prepared by Sumaira Rasool 30

Hinge Loss
 Formula:-
SVMLoss=max(0,1-yf(x)) Or

 Although not differentiable, it’s a convex function

which makes it easy to work with usual convex
optimizers used in machine learning domain.

prepared by Sumaira Rasool 31

Hinge Loss

prepared by Sumaira Rasool 32

• Correctly classified points lying outside the
margin boundaries of the support vectors are
not penalized, whereas points within the
margin boundaries or on the wrong side of the
hyperplane are penalized in a linear fashion
compared to their distance from the correct
boundary.

prepared by Sumaira Rasool 33

prepared by Sumaira Rasool 34
Optimal hyperplane

prepared by Sumaira Rasool 35

loss functions in Keras
1) mean_squared_error
keras.losses.mean_squared_error(y_true, y_pred)
2) mean_absolute_error
keras.losses.mean_absolute_error(y_true, y_pred)
3) mean_absolute_percentage_error
keras.losses.mean_absolute_percentage_error(y_true, y_pred)
4) mean_squared_logarithmic_error
keras.losses.mean_squared_logarithmic_error(y_true, y_pred)
5) hinge
keras.losses.hinge(y_true, y_pred)
6) categorical_crossentropy
keras.losses.categorical_crossentropy(y_true, y_pred)
7) binary_crossentropy
keras.losses.binary_crossentropy(y_true, y_pred)

prepared by Sumaira Rasool 36

References
1. https://towardsdatascience.com/common-loss-func
tions-in-machine-learning-46af0ffc4d23
2. https://heartbeat.fritz.ai/5-regression-loss-function
s-all-machine-learners-should-know-4fb140e9d4b0
3. https://machinelearningmastery.com/loss-and-loss-
functions-for-training-deep-learning-neural-network
s/
4. https://peltarion.com/knowledge-center/document
ation/modeling-view/loss-functions/mean-squared-
logarithmic-error
5. https://ljvmiranda921.github.io/notebook/2017/08
/13/softmax-and-the-negative-log-likelihood/
prepared by Sumaira Rasool 37

Unit - 3 Feature Engineering
No ratings yet
Unit - 3 Feature Engineering
29 pages
Python Seaborn Notes
No ratings yet
Python Seaborn Notes
28 pages
Chest X Ray Detection Project Report
100% (1)
Chest X Ray Detection Project Report
45 pages
Data Pre-Processing (Pandas)
No ratings yet
Data Pre-Processing (Pandas)
19 pages
Data Preprocessing
No ratings yet
Data Preprocessing
38 pages
Chapter 5.3-Mulitple Linear Regression
No ratings yet
Chapter 5.3-Mulitple Linear Regression
26 pages
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
100% (1)
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
72 pages
Outliers, Hypothesis and Natural Language Processing
100% (1)
Outliers, Hypothesis and Natural Language Processing
7 pages
Bagging and Boosting Regression Algorithms
100% (1)
Bagging and Boosting Regression Algorithms
84 pages
Gradient Descent
No ratings yet
Gradient Descent
15 pages
Curse of Dimensionality
No ratings yet
Curse of Dimensionality
9 pages
Matplotlib PDF
No ratings yet
Matplotlib PDF
16 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
3 pages
Machine Learning (Analytics Vidhya) : What Is Logistic Regression?
100% (1)
Machine Learning (Analytics Vidhya) : What Is Logistic Regression?
5 pages
Unit II Visualizing Using Matplotlib
No ratings yet
Unit II Visualizing Using Matplotlib
24 pages
Multicollinearity Exercise
100% (1)
Multicollinearity Exercise
6 pages
02 ML Supervised Learning
No ratings yet
02 ML Supervised Learning
32 pages
Logistic Regression
100% (1)
Logistic Regression
14 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
Python Setup For Machine Learning
100% (1)
Python Setup For Machine Learning
3 pages
Bias and Variance
No ratings yet
Bias and Variance
6 pages
Data Science Intervieew Questions
100% (1)
Data Science Intervieew Questions
16 pages
ML0101EN Clas Logistic Reg Churn Py v1
100% (1)
ML0101EN Clas Logistic Reg Churn Py v1
13 pages
Assignment # 01 Bscs - 7 Semester: Machine Learning
100% (1)
Assignment # 01 Bscs - 7 Semester: Machine Learning
5 pages
Data Science
No ratings yet
Data Science
39 pages
The Problem of Overfitting: Overfitting With Linear Regression
No ratings yet
The Problem of Overfitting: Overfitting With Linear Regression
32 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
19 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Supervised Learning 1 PDF
100% (1)
Supervised Learning 1 PDF
162 pages
Unit 2 Preparing To Model
No ratings yet
Unit 2 Preparing To Model
49 pages
Machine Learning Cheat Sheet
No ratings yet
Machine Learning Cheat Sheet
1 page
CCS355 Neural Networks and Deep Learning Lab
No ratings yet
CCS355 Neural Networks and Deep Learning Lab
43 pages
Lecture 4 Linear Regression
100% (1)
Lecture 4 Linear Regression
44 pages
RBF, KNN, SVM, DT
No ratings yet
RBF, KNN, SVM, DT
9 pages
71A Machine Learning
No ratings yet
71A Machine Learning
8 pages
Linear Regression
100% (1)
Linear Regression
51 pages
Lecture 03 Gradient Descent
No ratings yet
Lecture 03 Gradient Descent
26 pages
Unit V - Classification and Prediction 2020-21
100% (1)
Unit V - Classification and Prediction 2020-21
68 pages
Supervised Learning - Regression - Annotated
No ratings yet
Supervised Learning - Regression - Annotated
97 pages
ML Lab File
No ratings yet
ML Lab File
53 pages
Statistical Machine Learning
100% (1)
Statistical Machine Learning
12 pages
Deploy A Machine Learning Model Using Flask - Towards Data Science
No ratings yet
Deploy A Machine Learning Model Using Flask - Towards Data Science
12 pages
IRIS BPNN - Ipynb - Colaboratory
100% (1)
IRIS BPNN - Ipynb - Colaboratory
4 pages
ML0101EN Clas K Nearest Neighbors CustCat Py v1
100% (1)
ML0101EN Clas K Nearest Neighbors CustCat Py v1
11 pages
Dimension Reduction
No ratings yet
Dimension Reduction
15 pages
Statistics in Details
100% (2)
Statistics in Details
283 pages
Classification Algorithms
100% (2)
Classification Algorithms
23 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
Python Numpy (1) : Intro To Multi-Dimensional Array & Numerical Linear Algebra
100% (1)
Python Numpy (1) : Intro To Multi-Dimensional Array & Numerical Linear Algebra
27 pages
Classification and Regression Trees
100% (1)
Classification and Regression Trees
60 pages
2.building Blocks of Neural Networks
100% (1)
2.building Blocks of Neural Networks
2 pages
Regression Notes
100% (1)
Regression Notes
20 pages
TP Regression
100% (1)
TP Regression
1 page
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Pandas Commands
No ratings yet
Pandas Commands
3 pages
Gradient Descent Algorithms and Variations - PyImageSearch
No ratings yet
Gradient Descent Algorithms and Variations - PyImageSearch
21 pages
Effective Amazon Machine Learning
From Everand
Effective Amazon Machine Learning
Alexis Perrier
No ratings yet
Mastering Parallel Programming with R
From Everand
Mastering Parallel Programming with R
Simon R. Chapple
No ratings yet
Basics of Deep Learning: Pierre-Marc Jodoin and Christian Desrosiers
No ratings yet
Basics of Deep Learning: Pierre-Marc Jodoin and Christian Desrosiers
183 pages
ResNet Strikes Back: An Improved Training Procedure in Timm
No ratings yet
ResNet Strikes Back: An Improved Training Procedure in Timm
20 pages
Deep Learning Assignment2 Solutions PDF
No ratings yet
Deep Learning Assignment2 Solutions PDF
16 pages
MScFE 650 MLF - Video - Transcripts - M3
No ratings yet
MScFE 650 MLF - Video - Transcripts - M3
19 pages
Deep Feedforward Networks
No ratings yet
Deep Feedforward Networks
103 pages
20 1 7 Rubinstein
No ratings yet
20 1 7 Rubinstein
58 pages
Deep Learning-Based Depression Detection From Social Media
No ratings yet
Deep Learning-Based Depression Detection From Social Media
20 pages
Logistic Regression Notes
No ratings yet
Logistic Regression Notes
23 pages
601.465/665 - Natural Language Processing Assignment 1: Designing Context-Free Grammars
No ratings yet
601.465/665 - Natural Language Processing Assignment 1: Designing Context-Free Grammars
11 pages
Tensorflow/Keras Assignment: Problem Specification
No ratings yet
Tensorflow/Keras Assignment: Problem Specification
10 pages
8 PDF
No ratings yet
8 PDF
76 pages
Paper (Related Project-3)
No ratings yet
Paper (Related Project-3)
9 pages
Metrics For Multi-Class Classification
No ratings yet
Metrics For Multi-Class Classification
17 pages
ED6001 Project Report
No ratings yet
ED6001 Project Report
9 pages
DeepNotes Softmax&Crossentropy
No ratings yet
DeepNotes Softmax&Crossentropy
14 pages
ML Logistic Regression
No ratings yet
ML Logistic Regression
19 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
10 pages
Maths For ML
No ratings yet
Maths For ML
156 pages
The Relativistic Discriminator: A Key Element Missing From Standard GAN
No ratings yet
The Relativistic Discriminator: A Key Element Missing From Standard GAN
25 pages
Loss Functions
No ratings yet
Loss Functions
37 pages
3 - Loss Functions
No ratings yet
3 - Loss Functions
14 pages
Activations, Loss Functions & Optimizers in ML
No ratings yet
Activations, Loss Functions & Optimizers in ML
29 pages
Deep Reinforcement Learning PDF
No ratings yet
Deep Reinforcement Learning PDF
150 pages
CS224n: Natural Language Processing With Deep Learning
No ratings yet
CS224n: Natural Language Processing With Deep Learning
14 pages
Part 1.2. Back Propagation
No ratings yet
Part 1.2. Back Propagation
30 pages
A Web Based Application For Automating Bank Loan Eligibility Using Machine Learning
No ratings yet
A Web Based Application For Automating Bank Loan Eligibility Using Machine Learning
43 pages
Freshwater Fish Image Classifier
No ratings yet
Freshwater Fish Image Classifier
54 pages
Lesson 4 Deep Neural Network and Tools
No ratings yet
Lesson 4 Deep Neural Network and Tools
159 pages

Loss Functions

Uploaded by

Loss Functions

Uploaded by

Loss Functions in machine

• Introduction to loss functions

• Categories of loss functions

prepared by Sumaira Rasool 2

prepared by Sumaira Rasool 4

Loss functions for regression includes:

Mean Square Error(MSE)/Quadratic loss

prepared by Sumaira Rasool 5

 Mean Square Error (MSE) is the most

prepared by Sumaira Rasool 6

 Technique concerned with predicting some

2.00 4.00 6.00 8.00 10.00

Number of hours spent studying

Predicted final grade in class =

Predict the final grade of…

• Someone who studies for 12 hours

• Someone who studies for 1 hour:

prepared by Sumaira Rasool 12

prepared by Sumaira Rasool 15

prepared by Sumaira Rasool 16

• Also called weighted version of MSE.

prepared by Sumaira Rasool 18

• MSLE is, as the name suggests, a variation of

prepared by Sumaira Rasool 19

• Case a) : Pi = 600, Ai = 1000

prepared by Sumaira Rasool 21

prepared by Sumaira Rasool 22

prepared by Sumaira Rasool 23

prepared by Sumaira Rasool 24

 This is a widely used loss function in neural networks.

prepared by Sumaira Rasool 25

• When training a model, we try to find the minima of a

prepared by Sumaira Rasool 26

prepared by Sumaira Rasool 27

prepared by Sumaira Rasool 28

prepared by Sumaira Rasool 29

prepared by Sumaira Rasool 30

 Although not differentiable, it’s a convex function

prepared by Sumaira Rasool 31

prepared by Sumaira Rasool 32

prepared by Sumaira Rasool 33

prepared by Sumaira Rasool 35

prepared by Sumaira Rasool 36

You might also like