Loss Functions
Loss Functions
Networks
We will discuss some of the loss functions that are widely used in Neural Networks
Remember:
The objective is to try to minimize the loss
between the predictions and the actual outputs
Mean Squared Error (L2 Loss)
Calculate the squared error (squared difference between the actual output and the
predicted output for each sample. Sum them up and take their average.
𝑛 𝑛
1 1
𝑀𝑆𝐸 = (𝑌𝑖 − 𝑌𝑖 ) = (𝑌𝑖 − 𝑌𝑖 )2
2
𝑛 𝑛
𝑖=1 𝑖=1
3
1 157
144 + 4 + 9 = = 52.3 BIG!!!
3 3 MINIMIZE
𝑖=1
Note on the side:
If there is more than one output neuron, you would add the error for each output neuron, in each of the training
samples and take the average. Then we take the average over all samples
Number
of output
𝑛 𝑗 neurons
1 1
𝑀𝑆𝐸 = (𝑌𝑖 − 𝑌𝑖 )2
𝑛 𝑗
Complete Guide to Neural Networks with Python: Theory and Applications
𝑠=1 𝑖=1
Negative of the Logarithmic Function
Binary Cross Entropy The Complete Neural Networks Bootcamp: Theory, Applications
𝑛 𝑐
1
− [𝑦𝑖 log 𝑝𝑖 + 1 − 𝑦𝑖 log 1 − 𝑝𝑖 ]
𝑛
𝑗=1 𝑖=1
If label is 1: −log 𝑝𝑖
Class 1 𝒑 : 0.6
Class 2 𝒑 : 1 - 0.6 = 0.4 We do this procedure for all samples n
and then take the average
Let’s see what’s happening
Consider a problem with two classes [1 or 0]:
𝑛
1
𝐵𝐶𝐸 𝐿𝑜𝑠𝑠 = − [𝑦 log 𝑝 + 1 − 𝑦 log 1 − 𝑝 ]
𝑛
𝑖=1
Consider one sample n=1
If the label y is 1 and the prediction p is 0.1 −𝑦 log 𝑝 = − log 0.1 → 𝐿𝑜𝑠𝑠 𝑖𝑠 𝐻𝑖𝑔ℎ
If the label y is 1 and the prediction p is 0.9 −𝑦 log 𝑝 = − log(0.9) → 𝐿𝑜𝑠𝑠 𝑖𝑠 𝐿𝑜𝑤
If the label y is 0 and the prediction p is 0.9 − 1 − y log 1 − p = −log 1 − 0.9 =
− log 0.1 → 𝐿𝑜𝑠𝑠 𝑖𝑠 𝐻𝑖𝑔h
If the label y is 0 and the prediction p is 0.1 − 1 − y log 1 − p = −log 1 − 0.1 =
− log 0.9 → 𝐿𝑜𝑠𝑠 𝑖𝑠 𝐿𝑜𝑤
If the label is 1 and the prediction is 0.1 −𝑦 log 𝑝 = − log 0.1 → 𝐿𝑜𝑠𝑠 𝑖𝑠 𝐻𝑖𝑔ℎ Minimize!
If the label is 1 and the prediction is 0.9 −𝑦 log 𝑝 = − log(0.9) → 𝐿𝑜𝑠𝑠 𝑖𝑠 𝐿𝑜𝑤
If the label is 0 and the prediction is 0.9 − 1 − y log 1 − p = −log 1 − 0.9 = Minimize!
− log 0.1 → 𝐿𝑜𝑠𝑠 𝑖𝑠 𝐻𝑖𝑔h
If the label is 0 and the prediction is 0.1 − 1 − y log 1 − p = −log 1 − 0.1 =
− log 0.9 → 𝐿𝑜𝑠𝑠 𝑖𝑠 𝐿𝑜𝑤
Ideal case
When label is 1 and prediction is 1 -log(1) = 0
Complete Guide to Neural Networks with Python: Theory and Applications
When label is 0 and prediction is 0 -log(1-0) = 0
PyTorch example on using BCE Loss
input – Tensor of arbitrary shape
target – Tensor of the same shape as input
Returns a tensor filled with random numbers Returns a tensor filled with random
from a uniform distribution on the interval [0, 1] numbers from a normal distribution with
mean 0 and variance 1 (also called the
standard normal distribution).
Multi-label Classification
The Complete Neural Networks Bootcamp: Theory, Applications
Loss
0.6 1
Sigmoid Output
0.9 0
0.2 0
0.8 1
Consider n=1 sample
𝑐
𝐵𝐶𝐸 𝐿𝑜𝑠𝑠 = − [log 0.6 + log 1 − 0.9 + log 1 − 0.2 + log 0.8 ]
Example of Multi-Label Classification
Predicted Attributes:
Beach
Dog
Brown
Sitting
Laying
People
Walking
Cross Entropy
𝑛 𝑐 i = class number
1 c = number of classes
𝐶𝐸 = − 𝑦𝑖 log 𝑦ො𝑖 𝑦𝑖 = actual label
𝑛 𝑦ො𝑖 = predicted label
𝑗=1 𝑖=1
1 0 0 0
Class 1 Class 2 1 Class 3 0 Class 4 0
0
0 0 1 0
0 0 0 1
If Label = 0 (wrong) No Loss Calculation! Loss only calculated for correct predictions!
If Label = 1 (correct) Loss Calculation It penalizes probabilities of correct classes only!
For example
Suppose you have 4 different classes to classify:
For a single Training example:
The ground truth (actual) labels are: [1 0 0 0]
The predicted labels (after softmax) are: [0.1 0.4 0.2 0.3]
Wrong! Should be 1
𝐶𝑟𝑜𝑠𝑠 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝐿𝑜𝑠𝑠 = − 1 × log( 0.1) + 0 + 0 + 0 = −log 0.1 = 2.303 Loss is High
Sometimes, the cross entropy loss is averaged over the training samples n:
n mini-batch size if using mini-batch training
n complete training samples if not using mini-batch training
𝑛 In case of a one-hot vector, for each sample,
1
𝐽 = − ( 𝑦𝑖 log 𝑦ො𝑖 ) we only have one correct class. All other
𝑛 classes are 0. This, summation over classes c is
𝑖=1
eliminated.
For example
Suppose you have 4 different classes to classify:
For a single Training example:
The ground truth (actual) labels are: [1 0 0 0 ]
The predicted labels are: [0.9 0.01 0.05 0.04]
Correct! Almost 1
𝐶𝑟𝑜𝑠𝑠 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝐿𝑜𝑠𝑠 = − 1 × log( 0.9) + 0 + 0 + 0 = −log 0.9 = 0.04 Loss is Low
Ignore the loss for the 0 labels
The loss doesn't depend on the probabilities for the incorrect classes!