0% found this document useful (0 votes)
58 views7 pages

Loss Functions

The document discusses different types of loss functions used in neural networks for regression and classification tasks. For regression, it describes mean squared error (MSE), mean absolute error (MAE), and Huber loss. MSE is the most commonly used but is affected by outliers, while MAE is more robust to outliers but has discontinuities. Huber loss combines the benefits of MSE and MAE. For classification, it details cross-entropy loss, which is most commonly used and measures the distance between predicted and actual probabilities for classification problems. It also provides the mathematical formulations for many of these loss functions.

Uploaded by

raja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views7 pages

Loss Functions

The document discusses different types of loss functions used in neural networks for regression and classification tasks. For regression, it describes mean squared error (MSE), mean absolute error (MAE), and Huber loss. MSE is the most commonly used but is affected by outliers, while MAE is more robust to outliers but has discontinuities. Huber loss combines the benefits of MSE and MAE. For classification, it details cross-entropy loss, which is most commonly used and measures the distance between predicted and actual probabilities for classification problems. It also provides the mathematical formulations for many of these loss functions.

Uploaded by

raja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Loss Functions

 Neural Network uses optimizing strategies like stochastic gradient descent to minimize the error
in the algorithm. The way we actually compute this error is by using a Loss Function. It is used to
quantify how good or bad the model is performing.
 Loss functions can be classified into two major categories depending upon the type of learning
task we are dealing with — Regression losses and Classification losses.
 In classification, we are trying to predict output from set of finite categorical values i.e Given
large data set of images of hand written digits, categorizing them into one of 0–9 digits.
 Regression, on the other hand, deals with predicting a continuous value for example given floor
area, number of rooms, size of rooms, predict the price of room.
NOTE
n - Number of training examples.
i - ith training example in a data set.
y(i) - Ground truth label for ith training example.
y_hat(i) - Prediction for ith training example.

Regression Losses
1. Mean Square Error/Quadratic Loss/L2 Loss
Mathematical formulation :-

As the name suggests, Mean square error is measured as the average of squared difference between
predictions and actual observations.

It’s only concerned with the average magnitude of error irrespective of their direction. However, due to
squaring, predictions which are far away from actual values are penalized heavily in comparison to less
deviated predictions. Calculating gradient of MSE is easier.

# calculate mean squared error


def mean_squared_error(actual, predicted):
sum_square_error = 0.0
for i in range(len(actual)):
sum_square_error += (actual[i] - predicted[i])**2.0
mean_square_error = 1.0 / len(actual) * sum_square_error
return mean_square_error

from sklearn.metrics import mean_squared_error


>>> y_true = [3, -0.5, 2, 7]
>>> y_pred = [2.5, 0.0, 2, 8]
>>> mean_squared_error(y_true, y_pred)
0.375
2. Mean Absolute Error/L1 Loss

Mean absolute error, on the other hand, is measured as the average of sum of absolute differences
between predictions and actual observations. Like MSE, this as well measures the magnitude of error
without considering their direction.
MAE is more robust to outliers since it does not make use of square.
Mathematical formulation :-

.
from sklearn.metrics import mean_absolute_error
>>> y_true = [3, -0.5, 2, 7]
>>> y_pred = [2.5, 0.0, 2, 8]
>>> mean_absolute_error(y_true, y_pred)
0.5
MAE loss is useful if the training data is corrupted with outliers (i.e. we erroneously receive
unrealistically huge negative/positive values in our training environment, but not our testing
environment).
Deciding which loss function to use

If the outliers represent anomalies that are important for business and should be detected, then we
should use MSE. On the other hand, if we believe that the outliers just represent corrupted data,
then we should choose MAE as loss.

 L1 loss is more robust to outliers, but its derivatives are not continuous, making it inefficient to
find the solution. L2 loss is sensitive to outliers, but gives a more stable and closed form solution

3. Huber Loss:
 Mean Square Error (MSE) is greater for learning the outliers in the dataset, on the other
hand, Mean Absolute Error(MAE) is good to ignore the outliers.

 But in some cases, the data which looks like outliers should not be ignored and also those
points should not get high priority. Here where Huber Loss comes in.
Huber Loss = Combination of both MSE and MAE

 Huber loss is both MSE and MAE means it is quadratic(MSE) when the error is small else
MAE. Here delta is the hyperparameter to define the range for MAE and MSE which can be
iterative to make sure the correct delta value.

Classification Losses

1. Cross Entropy Loss

 Cross-entropy loss is often simply referred to as “cross-entropy,” “logarithmic loss,” “logistic


loss,” or “log loss” for short.

 It gives the probability value between 0 and 1 for a classification task. Cross-Entropy calculates
the average difference between the predicted and actual probabilities.

 Each predicted probability is compared to the actual class output value (0 or 1) and a score is
calculated that penalizes the probability based on the distance from the expected value. The
penalty is logarithmic, offering a small score for small differences (0.1 or 0.2) and enormous
score for a large difference (0.9 or 1.0).

 This is the most common setting for classification problems. Cross-entropy loss increases as the
predicted probability diverges from the actual label.

Consider a 4-class classification task where an image is classified as either a dog, cat,
horse or cheetah.
Let us calculate the probability generated by the first logit after Softmax is applied

E= 2.73

In the above Figure, Softmax converts logits into probabilities. The purpose of the Cross-Entropy is to
take the output probabilities (P) and measure the distance from the truth values (as shown in Figure
below).
Cross-entropy is defined as

Binary Cross-Entropy Loss


For binary classification, we have binary cross-entropy defined as

Or it can be written as follows

from sklearn.metrics import log_loss


>>> log_loss(["spam", "ham", "ham", "spam"],
... [[.1, .9], [.9, .1], [.8, .2], [.35, .65]])
0.21616...

from math import log


 
# calculate binary cross entropy
def binary_cross_entropy(actual, predicted):
sum_score = 0.0
for i in range(len(actual)):
sum_score += actual[i] * log(1e-15 + predicted[i])
mean_sum_score = 1.0 / len(actual) * sum_score
return mean_sum_score

You might also like