0% found this document useful (0 votes)

3 views15 pages

Loss Functions

The document discusses various loss functions used in neural networks, primarily focusing on Mean Squared Error (MSE) and Binary Cross Entropy (BCE). MSE is calculated as the average of squared differences between predicted and actual outputs, while BCE is used for binary classification tasks and involves logarithmic calculations based on predicted probabilities. Additionally, the document covers the concept of cross-entropy loss for multi-class classification and its implications in training neural networks.

Uploaded by

Cát Lăng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views15 pages

Loss Functions

Uploaded by

Cát Lăng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Loss Functions in Neural

Networks
We will discuss some of the loss functions that are widely used in Neural Networks

Remember:
The objective is to try to minimize the loss
between the predictions and the actual outputs
Mean Squared Error (L2 Loss)
Calculate the squared error (squared difference between the actual output and the
predicted output for each sample. Sum them up and take their average.
𝑛 𝑛
1 1
𝑀𝑆𝐸 = ෍(𝑌𝑖 − 𝑌෠𝑖 ) = ෍(𝑌෠𝑖 − 𝑌𝑖 )2
2
𝑛 𝑛
𝑖=1 𝑖=1

𝑌෠𝑖 − 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑂𝑢𝑡𝑝𝑢𝑡

𝑌𝑖 − 𝐴𝑐𝑡𝑢𝑎𝑙 𝑂𝑢𝑡𝑝𝑢𝑡
- Quadratic/Convex 𝑛 − 𝑇𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑆𝑎𝑚𝑝𝑙𝑒𝑠 𝑖𝑛 𝑒𝑎𝑐ℎ 𝑚𝑖𝑛𝑖𝑏𝑎𝑡𝑐ℎ
- One Global Minimum to find
- Getting stuck at local
minimum is eliminated
𝐼𝑓 𝑛𝑜𝑡 𝑢𝑠𝑖𝑛𝑔 𝑚𝑖𝑛𝑖𝑏𝑎𝑡𝑐ℎ 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔, 𝑡ℎ𝑒𝑛 𝑛 = 𝑇𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑆𝑎𝑚𝑝𝑙𝑒𝑠

Complete Guide to Neural Networks with Python: Theory and Applications

Variations of Mean Squared Error
• Half of the Mean Squared Error 𝑛
1
𝑀𝑆𝐸 = ෍(𝑌෠𝑖 − 𝑌𝑖 )2
2𝑛
𝑖=1
• Root Mean Squared Error
𝑛
1
𝑅𝑀𝑆𝐸 = ෍(𝑌෠𝑖 − 𝑌𝑖 )2
𝑛
𝑖=1

Complete Guide to Neural Networks with Python: Theory and Applications

Example of MSE
Sample Predicted Actual Error Squared Error
1 48 60 -12 144
2 51 53 -2 4
3 57 60 -3 9

3
1 157
෍ 144 + 4 + 9 = = 52.3 BIG!!!
3 3 MINIMIZE
𝑖=1
Note on the side:
If there is more than one output neuron, you would add the error for each output neuron, in each of the training
samples and take the average. Then we take the average over all samples
Number
of output
𝑛 𝑗 neurons
1 1
𝑀𝑆𝐸 = ෍ ෍(𝑌𝑖 − 𝑌෠𝑖 )2
𝑛 𝑗
Complete Guide to Neural Networks with Python: Theory and Applications
𝑠=1 𝑖=1
Negative of the Logarithmic Function
Binary Cross Entropy The Complete Neural Networks Bootcamp: Theory, Applications

• Usually used when the output labels have values of 0 or 1.

• It can also be used when the output labels have values between 0 and 1
• It is also widely used when we have only two classes (0 or 1) (example: yes or no)

𝑛 𝑐
1
− ෍ ෍[𝑦𝑖 log 𝑝𝑖 + 1 − 𝑦𝑖 log 1 − 𝑝𝑖 ]
𝑛
𝑗=1 𝑖=1

𝑦 − 𝐴𝑐𝑡𝑢𝑎𝑙 𝑐𝑙𝑎𝑠𝑠 𝑙𝑎𝑏𝑒𝑙 0 𝑜𝑟 1

𝑝 − 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠
𝑐 − 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠
𝑛 − 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
When we only have two classes (binary classification)
𝑛
1
− ෍[𝑦𝑖 log 𝑝𝑖 + 1 − 𝑦𝑖 log 1 − 𝑝𝑖 ]
𝑛
𝑖=1
For each sample

If label is 1: −log 𝑝𝑖

Sigmoid 0.6 If label is 0: −log 1 − 𝑝𝑖

Class 1 𝒑 : 0.6
Class 2 𝒑 : 1 - 0.6 = 0.4 We do this procedure for all samples n
and then take the average
Let’s see what’s happening
Consider a problem with two classes [1 or 0]:
𝑛
1
𝐵𝐶𝐸 𝐿𝑜𝑠𝑠 = − ෍[𝑦 log 𝑝 + 1 − 𝑦 log 1 − 𝑝 ]
𝑛
𝑖=1
Consider one sample n=1

If the label y is 1 and the prediction p is 0.1  −𝑦 log 𝑝 = − log 0.1 → 𝐿𝑜𝑠𝑠 𝑖𝑠 𝐻𝑖𝑔ℎ
If the label y is 1 and the prediction p is 0.9  −𝑦 log 𝑝 = − log(0.9) → 𝐿𝑜𝑠𝑠 𝑖𝑠 𝐿𝑜𝑤
If the label y is 0 and the prediction p is 0.9  − 1 − y log 1 − p = −log 1 − 0.9 =
− log 0.1 → 𝐿𝑜𝑠𝑠 𝑖𝑠 𝐻𝑖𝑔h
If the label y is 0 and the prediction p is 0.1  − 1 − y log 1 − p = −log 1 − 0.1 =
− log 0.9 → 𝐿𝑜𝑠𝑠 𝑖𝑠 𝐿𝑜𝑤

Complete Guide to Neural Networks with Python: Theory and Applications

What does that mean….
𝑛
1
𝐵𝐶𝐸 𝐿𝑜𝑠𝑠 = − ෍[𝑦 log 𝑝 + 1 − 𝑦 log 1 − 𝑝 ]
𝑛
𝑖=1
Consider n=1

If the label is 1 and the prediction is 0.1  −𝑦 log 𝑝 = − log 0.1 → 𝐿𝑜𝑠𝑠 𝑖𝑠 𝐻𝑖𝑔ℎ Minimize!
If the label is 1 and the prediction is 0.9  −𝑦 log 𝑝 = − log(0.9) → 𝐿𝑜𝑠𝑠 𝑖𝑠 𝐿𝑜𝑤
If the label is 0 and the prediction is 0.9  − 1 − y log 1 − p = −log 1 − 0.9 = Minimize!
− log 0.1 → 𝐿𝑜𝑠𝑠 𝑖𝑠 𝐻𝑖𝑔h
If the label is 0 and the prediction is 0.1  − 1 − y log 1 − p = −log 1 − 0.1 =
− log 0.9 → 𝐿𝑜𝑠𝑠 𝑖𝑠 𝐿𝑜𝑤

Ideal case
When label is 1 and prediction is 1  -log(1) = 0
Complete Guide to Neural Networks with Python: Theory and Applications
When label is 0 and prediction is 0  -log(1-0) = 0
PyTorch example on using BCE Loss
input – Tensor of arbitrary shape
target – Tensor of the same shape as input

Returns a tensor filled with random numbers Returns a tensor filled with random
from a uniform distribution on the interval [0, 1] numbers from a normal distribution with
mean 0 and variance 1 (also called the
standard normal distribution).
Multi-label Classification
The Complete Neural Networks Bootcamp: Theory, Applications

Loss

0.6 1

Sigmoid Output
0.9 0
0.2 0
0.8 1
Consider n=1 sample
𝑐

𝐵𝐶𝐸 𝐿𝑜𝑠𝑠 = − ෍[𝑦𝑖 log 𝑝𝑖 + 1 − 𝑦𝑖 log 1 − 𝑝𝑖 ]

𝑖=1

𝐵𝐶𝐸 𝐿𝑜𝑠𝑠 = − ෍[log 0.6 + log 1 − 0.9 + log 1 − 0.2 + log 0.8 ]
Example of Multi-Label Classification

Predicted Attributes:
Beach
Dog
Brown
Sitting
Laying
People
Walking
Cross Entropy
𝑛 𝑐 i = class number
1 c = number of classes
𝐶𝐸 = − ෍ ෍ 𝑦𝑖 log 𝑦ො𝑖 𝑦𝑖 = actual label
𝑛 𝑦ො𝑖 = predicted label
𝑗=1 𝑖=1

The Actual Outputs should be in the form of a one-hot vector

Suppose you have 4 labels (4 outputs), then:

1 0 0 0
Class 1 Class 2 1 Class 3 0 Class 4 0
0
0 0 1 0
0 0 0 1

If Label = 0 (wrong)  No Loss Calculation! Loss only calculated for correct predictions!
If Label = 1 (correct)  Loss Calculation It penalizes probabilities of correct classes only!
For example
Suppose you have 4 different classes to classify:
For a single Training example:
The ground truth (actual) labels are: [1 0 0 0]
The predicted labels (after softmax) are: [0.1 0.4 0.2 0.3]
Wrong! Should be 1

𝐶𝑟𝑜𝑠𝑠 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝐿𝑜𝑠𝑠 = − 1 × log( 0.1) + 0 + 0 + 0 = −log 0.1 = 2.303 Loss is High

Ignore the loss for the 0 labels

The loss doesn't depend on the probabilities for the incorrect classes!

Sometimes, the cross entropy loss is averaged over the training samples n:
n  mini-batch size if using mini-batch training
n  complete training samples if not using mini-batch training
𝑛 In case of a one-hot vector, for each sample,
1
𝐽 = − (෍ 𝑦𝑖 log 𝑦ො𝑖 ) we only have one correct class. All other
𝑛 classes are 0. This, summation over classes c is
𝑖=1
eliminated.
For example
Suppose you have 4 different classes to classify:
For a single Training example:
The ground truth (actual) labels are: [1 0 0 0 ]
The predicted labels are: [0.9 0.01 0.05 0.04]
Correct! Almost 1

𝐶𝑟𝑜𝑠𝑠 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝐿𝑜𝑠𝑠 = − 1 × log( 0.9) + 0 + 0 + 0 = −log 0.9 = 0.04 Loss is Low
Ignore the loss for the 0 labels
The loss doesn't depend on the probabilities for the incorrect classes!

Market Potential For Selected Platform Chemicals
100% (1)
Market Potential For Selected Platform Chemicals
173 pages
Cnn
No ratings yet
Cnn
98 pages
DeepLearning Workshop Humayun
No ratings yet
DeepLearning Workshop Humayun
63 pages
Chapter 2
No ratings yet
Chapter 2
35 pages
Slide 2-f2
No ratings yet
Slide 2-f2
52 pages
chapter2 (1)
No ratings yet
chapter2 (1)
35 pages
Lect 8
No ratings yet
Lect 8
117 pages
Lecture 4 - Linear Classification
No ratings yet
Lecture 4 - Linear Classification
34 pages
Practice QuestionsV1
No ratings yet
Practice QuestionsV1
7 pages
Neural Networks - 2
No ratings yet
Neural Networks - 2
79 pages
PowerPoint Presentation-2
No ratings yet
PowerPoint Presentation-2
52 pages
Neural Networks
No ratings yet
Neural Networks
63 pages
Practice QuestionsV1
No ratings yet
Practice QuestionsV1
7 pages
L10 Learning II Gradient Based Learning
No ratings yet
L10 Learning II Gradient Based Learning
72 pages
DL 02 Basics
No ratings yet
DL 02 Basics
94 pages
unit 2 DL
No ratings yet
unit 2 DL
70 pages
APKA Report
No ratings yet
APKA Report
3 pages
HODL Lec 2 Training NNs Intro TF
No ratings yet
HODL Lec 2 Training NNs Intro TF
83 pages
Module 1 - Problems in Neural Network
No ratings yet
Module 1 - Problems in Neural Network
20 pages
DL - M2 - Deep Feedforward NN
No ratings yet
DL - M2 - Deep Feedforward NN
97 pages
Coding Neural Networks-Classification & Regression
No ratings yet
Coding Neural Networks-Classification & Regression
39 pages
Lec 04 Deep Networks 2
No ratings yet
Lec 04 Deep Networks 2
78 pages
Deep learning
No ratings yet
Deep learning
15 pages
DL145611_03_Shallow
No ratings yet
DL145611_03_Shallow
92 pages
SML_Lecture5
No ratings yet
SML_Lecture5
45 pages
IBest_DeepLearning
No ratings yet
IBest_DeepLearning
123 pages
Week 2 Artificial Neural Networks
No ratings yet
Week 2 Artificial Neural Networks
62 pages
Deep Learning Summer School 2015: Introduction To Machine Learning
No ratings yet
Deep Learning Summer School 2015: Introduction To Machine Learning
46 pages
Dat 300
No ratings yet
Dat 300
12 pages
DNN - M2 - Deep Feedforward NN 23dec
No ratings yet
DNN - M2 - Deep Feedforward NN 23dec
97 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
cs231n Github Io Neural Networks Case Study
No ratings yet
cs231n Github Io Neural Networks Case Study
17 pages
Pemanfaatan Lumpur Tinja Sebagai Pupuk Kompos Pada Instalasi Pengolahan Lumpur Tinja (Iplt) Pulo Gebang
No ratings yet
Pemanfaatan Lumpur Tinja Sebagai Pupuk Kompos Pada Instalasi Pengolahan Lumpur Tinja (Iplt) Pulo Gebang
15 pages
Crashcourse DL Pytorch Parr
No ratings yet
Crashcourse DL Pytorch Parr
39 pages
Chapter 5 A - Handbook - On - SEM - Zainudin - Awang - Univer
80% (5)
Chapter 5 A - Handbook - On - SEM - Zainudin - Awang - Univer
20 pages
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
No ratings yet
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
32 pages
DR 10.01 Instructions For Non-Destructive Testing of Welds REV 05 2011-07
No ratings yet
DR 10.01 Instructions For Non-Destructive Testing of Welds REV 05 2011-07
13 pages
practicalMachineLearning_lecture3
No ratings yet
practicalMachineLearning_lecture3
25 pages
NLP-NeuralNetworks Reading Notes
No ratings yet
NLP-NeuralNetworks Reading Notes
13 pages
Ch2-Training, Optimization and Regularization of DNN-new (1)
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new (1)
114 pages
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
No ratings yet
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
86 pages
6.Neural Networks 2
No ratings yet
6.Neural Networks 2
44 pages
Domnic Object Detecion Basics
No ratings yet
Domnic Object Detecion Basics
62 pages
Homework2
No ratings yet
Homework2
3 pages
Video_7_-_Building_a_Multilayer_Feedforward_Network_for_Classification_in_PyTorch
No ratings yet
Video_7_-_Building_a_Multilayer_Feedforward_Network_for_Classification_in_PyTorch
18 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
Lec 2
No ratings yet
Lec 2
5 pages
Practical-5_2CEIT606_Artificial Intelligence
No ratings yet
Practical-5_2CEIT606_Artificial Intelligence
14 pages
Lecture 07
No ratings yet
Lecture 07
29 pages
03-Linear Classification
No ratings yet
03-Linear Classification
17 pages
10 Gradient Based Learning 10-08-2024
No ratings yet
10 Gradient Based Learning 10-08-2024
22 pages
CH 1
No ratings yet
CH 1
24 pages
Draft LoA (Ahlone)
No ratings yet
Draft LoA (Ahlone)
3 pages
lecture19
No ratings yet
lecture19
8 pages
1
No ratings yet
1
7 pages
Module 6_Loss Function
No ratings yet
Module 6_Loss Function
22 pages
Backpropogation Algorithm
No ratings yet
Backpropogation Algorithm
48 pages
W02 MLOptDL
No ratings yet
W02 MLOptDL
23 pages
Cream and Brown Minimalist Let's Learn Presentation - 20230928 - 174206 - 0000
No ratings yet
Cream and Brown Minimalist Let's Learn Presentation - 20230928 - 174206 - 0000
21 pages
Types of Neural Networks
No ratings yet
Types of Neural Networks
7 pages
Agile Landscape - Chriss Webb
0% (1)
Agile Landscape - Chriss Webb
2 pages
Cosine Similarity
No ratings yet
Cosine Similarity
5 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
Red Ocean Traps
No ratings yet
Red Ocean Traps
12 pages
RNNs and LSTMs
No ratings yet
RNNs and LSTMs
41 pages
Loss Function
No ratings yet
Loss Function
9 pages
loss-functions
No ratings yet
loss-functions
8 pages
Sensor Current CCT en
No ratings yet
Sensor Current CCT en
13 pages
Regularization and Normalization
No ratings yet
Regularization and Normalization
29 pages
Dl 02 Basics
No ratings yet
Dl 02 Basics
95 pages
Laser Robot Vacuum Cleaner Series Models Repair Manual PDF
No ratings yet
Laser Robot Vacuum Cleaner Series Models Repair Manual PDF
38 pages
CMOS Circuit Speed and Buffer Optimization
No ratings yet
CMOS Circuit Speed and Buffer Optimization
12 pages
4 Vo5G Solution Introduction ISSUE 1.00
No ratings yet
4 Vo5G Solution Introduction ISSUE 1.00
60 pages
DL Unit-2
No ratings yet
DL Unit-2
24 pages
New Product Line Achievement (PLA) FAQ - July 2023
No ratings yet
New Product Line Achievement (PLA) FAQ - July 2023
7 pages
Midterm DSP
No ratings yet
Midterm DSP
7 pages
الاتصال في المؤسسة وتطوره عبر مدارس الفكر الإداري والتنظيمي
100% (1)
الاتصال في المؤسسة وتطوره عبر مدارس الفكر الإداري والتنظيمي
17 pages
Cross Interopy
No ratings yet
Cross Interopy
7 pages
Understanding the Concepts of Genes and Chromosomes
No ratings yet
Understanding the Concepts of Genes and Chromosomes
7 pages
EXP-9 To Determine The Material Fringe Constant Using Compression Method in Two
No ratings yet
EXP-9 To Determine The Material Fringe Constant Using Compression Method in Two
3 pages
Jane Eyre Coursework Help
100% (2)
Jane Eyre Coursework Help
5 pages
Amended Annexure - I: Ref:Mempl/Wo/Kptcl-Thabakadahonnalli/Benakatti/2021-22/071A Dated: 05.12.2021
100% (1)
Amended Annexure - I: Ref:Mempl/Wo/Kptcl-Thabakadahonnalli/Benakatti/2021-22/071A Dated: 05.12.2021
19 pages
KL Divergence
No ratings yet
KL Divergence
8 pages
Essay - Ugoran Prasad For Theatre State by Jompet Kuswidananto
No ratings yet
Essay - Ugoran Prasad For Theatre State by Jompet Kuswidananto
4 pages
Lecture 2 - Supervised Learning
No ratings yet
Lecture 2 - Supervised Learning
6 pages
Hyperlynx Thermal User
No ratings yet
Hyperlynx Thermal User
116 pages
DLP Values Ed.
No ratings yet
DLP Values Ed.
5 pages
Softmax
No ratings yet
Softmax
5 pages
Very Highspeed BJT Buffer For Trackandhold Amplifiers With Enhan
No ratings yet
Very Highspeed BJT Buffer For Trackandhold Amplifiers With Enhan
4 pages
CFS Mid2
No ratings yet
CFS Mid2
2 pages
Chapter one Introduction
No ratings yet
Chapter one Introduction
3 pages
Product Life Cycle Management by Saathwik Chandan Nune
No ratings yet
Product Life Cycle Management by Saathwik Chandan Nune
5 pages
Multiplexer and Demultiplexer
No ratings yet
Multiplexer and Demultiplexer
11 pages
LG SBR1502 - TDS
No ratings yet
LG SBR1502 - TDS
1 page
The Activities For Audiolingual The Method Is Distinctively Various in Its Application
No ratings yet
The Activities For Audiolingual The Method Is Distinctively Various in Its Application
2 pages
Zulu Paper 3 Imibhalo Yokuziqambela Preview PDF
No ratings yet
Zulu Paper 3 Imibhalo Yokuziqambela Preview PDF
1 page
Experiment 2 _ Properties of Acids and Bases Class x
No ratings yet
Experiment 2 _ Properties of Acids and Bases Class x
3 pages
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
From Everand
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)