Loss Functions in Neural Networks PDF

Uploaded by

Vignesh S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

245 views6 pages

Loss Functions in Neural Networks PDF

Uploaded by

Vignesh S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 6

‘202019 Loss Functions in Neural Networks | Isaac Changhau Loss Functions in Neural Networks Wed, Jun 7, 2017 + 10 min read loss function is an important part in artificial neural networks, which is used to measure the inconsistency between predicted value j) and actual label (Its a non- negative value, where the robustness of model increases along with the decrease of the value of loss function. Loss function isthe hard core of empirical risk function as well as a significant component of structural risk function. Generally, the structural risk function of a model is consist of empitcal risk term andl regularization term, which can be represented as 9 = argmin £(0) + d- ®(8) 1 (0 = argmin ~ L(y, 9%) + A- 28) = argmin 41> L(y, fx, 0)) + d+ (0) a where (6) is the regularization term or penalty term, 6 is the parameters of model to be teamed. f(-) represents the activation function and 2 = (20, 209,...,210} € RM denotes the a watning sample. Here we only concentrate on the empirical risk term (los function) Lem ecu £(8) = = SO U(y", £8) and introduce the mathematical expressions of several commonly-used loss functions as well as the corresponding expression in Deeplearingad (https:/deeplearning4) org). Mean Squared Error ‘Mean Squated Error (MSE), or quadratic, loss function is widely used in linear regression as the performance measure, and the method of minimizing MSE (s called Ordinary Least squares cosy Guttps:inww google.comsgjur? sastBrctejSq+Gesre=sGsourcesweblicdslécad=rjaGuact=8tved=OahUKEWIN- [NPRSKrUAHULLYBKHQUOATEQFageMAABt FdcrrRmcRnimknA), the basic principle of OSL is that the optimized fitting line should be a line which minimizes the sum of distance of each point to the regression line, Le. minimizes the quadratic sum. The standard form of MSE loss function is defined as 2-239 9 where (y!) — 9!) ts named as residual, and the target of MSE loss function is to minimize the residual sum of squares. In Deeplearning4), it is Lossrunctions LorsFunction.HSE Of LossFunctions.LossFunction.sQuaRED_Loss (they are same in DIAD. However, fa using Sigmotd (nttps:/Asaacchanghau github io/2017/0si2/Activation-Functions-in-Antficial-Neural- Networks/#Sigmoid-Units) as the activation function, the quadratic loss function would suffer the problem of slow convergence (learning speed), for other activation funtions, it would not have such problem. For example, by using sigmoid, 9 = o(a) = o(6"x!), simply, we only consider one sample, say, (y — o(2))?, and it derivative is computed by yo) -0'@)-x hitps'saacchanghau.gthub olpostloss_functions! \ips'k3A%2FH2Fen wikipedia orgh2Fwiki%2FOrdinary_least_squar 18‘202019 Loss Funetons in Neural Networks | Isaae Changhau according to the shape and feature of Sigmoid, when o(2) tends to 0 or 1, (2) is close to zero, and when o(2) close to 0.5, o'(a) will ach it maximum. In this case, when the difference between predicted value and true label (y ~ o(2)) is large, o/(2) will close to 0, wich decreases the convergence speed, this is improper, since we expect that the learning speed should be fast when the error is large. Mean Squared Logarithmic Error ‘Mean Squared Logarithmic Error (MSLE) loss function is a variant of MSE, which is defined as £=1S° (tog(y? +1) — logis” +2)? SLE Is ls used to measure the dierent between actual and preted. By taking he fog of tte predictions and actual values, what changes isthe valance tat you are measuring usualy used when you donot want fo penalize huge dflerences inthe predicted andthe actus values when both predicied and aus values are huge nubers, Another things that MSUE penalizes underestimates more han over-estimate 1L If both predicted and actual values are small: MSE and MSLE {s same. 2 Ifeither predicted or the actual value is big MSE > MSLE. 3.Ifboth predicted and actual values are big, MSE > MSLE (Msit becomes almost negligible. It is expressed as LossFunctions.LossFunction.MEAK_SQUARED_LOGARITHNZC_eRRER in Deeplearningad. Lz 12 loss function is the square of the L2 norm of the difference between actual value and predicted value. It is mathematically similar to MSE, only do not have division by n. itis computed by c= 9F a For more details, typically in mathematic, please read the paper: On Loss Functions for Deep Neural Networks in Classification —_nttps:iiwww.google.com spurt? sastbrctejGqeGesre=sGsourcesweblicd=26cadsrjaGuact=86ved=“OahUKEwiBuMG- ‘N6VUANWKchQKHCISCIQQFgEMMAEBurl=httpshIAK2FH2Farxv.org’h2Fpar2F1702.056596usp-AFOICNGQLOBW! which gives comprehensive explanation about several commomly-used loss functions, including 12, L1 loss function. In DeepLeaing4J, it is expressed as LossFunctions. LossFunction.(2 Mean Absolute Error ‘Mean Absolute Error (MAE) is a quantity used to measure how close forecasts or predictions are to the eventual outcomes, which ts computed by A “a c=2 yy a where |-| denotes the absolute value, Albeit, both MSE and MAE ate used in predictive modeling, there are several differences between them. MSE has nice mathematical properties which makes it easier to compute the gradient, However, MAE requires more complicated tools such as linear programming to compute the gradient, Because of the square, large errors have relatively greater influence on MSE than do the smaller error. Therefore, MAE is more robust to outliers since it does not make use of square. On the other hand, MSE (s more useful if concerning about large errors whose consequences are much bigger than equivalent smaller ones. MSE also corresponds to maximizing the likelihood of Gaussian random variables. In DeeplearningJ. it is expressed as LossFunctsons. Lo#sFunetion. NEA ABSOLUTE_ ERROR Mean Absolute Percentage Error hitps'saacchanghau.gthub olpostloss_functions! 216‘202019 Loss Functions in Neural Networks | Isaac Changhau ‘Mean Absolute Percentage Error (MAPE) is a variant of MAE, itis computed by ie £=2> 9 EF |: 100 ro Although the concept of MAPE sounds very simple and convincing, it has major {drawbacks in practical application: L It cannot be used if there are zero values (hich sometimes happens for example in demand data) because there would be a division by zero. 2. For forecasts which are too low the percentage error cannot exceed 100, but for forecasts which are too high there is no upper limit to the percentage erro. 8. When MAPE Is used to compare the accuracy of prediction methods it is biased in. ‘that it will systematically select a method whose forecasts are too low. This lttle- Known but serious issue can be overcome by using an accuracy measure based on the ratlo of the predicted to actual value (called the Accuracy Ratlo), this approach, leads to superior statistical properties and leads to predictions which can be interpreted in terms of the geometric mean. IL 1S expressed as LossFunctions.LossFunction.MEAN ABSOLUTE PERCENTAGE ERROR iN DeepLearningad Li 11 loss function is sum of absolute errors of the difference between actual value and predicted value. Similar to the relation between MSE and 12, 11 is mathematically similar to MAE, only do not have division by n, and itis defined as c=DW- In DeepLearninga, it is expressed as LossFunctions.tossFunction.Li i Kullback Leibler (KL) Divergence XL Divergence, also known as relative entiopy, information dlvergenceigain, 1s a measure of how one probability distibution diverges from a second expected probability distribution. KL divergence loss function is computed by Le Pe £= => Derly|I9) 12 0 = EE Woe] = ai 12 12 = 235 6 og) 2 (toe (G we? Toeto)) 5 D(H “toatG)) where the first term is entropy and another (s cross entropy (another kind of loss function which will be introduced later). KL divergence is a distribution-wise asymmetric measure and thus does not qualify as a statistical metric of spread. In the simple case, a KL divergence of 0 indicates that we can expect similar, f not the same, behavior of two different distributions, while a KL divergence of 1 indicates that the two distributions behave in such @ different manner that the expectation given the fist distribution approaches zero, For more detalls, please visit the wikipedia: —Mintd (hitps:fen wikipedia org/wikiKullback*&E2%80%93Leibler divergence), In DeepLearningad, itis expressed as Lossrunctions.Lossfunction.Kt_ DIVERGENCE . Moreover, the implementation of Reconstruction Gross Entropy (https:fenwikipediaorg/wikiCross.entropy) in Deepleaming4J is same as Kullback Ieibler )_—=—Civergence, thus, == you. = can also. use LossFunctions LossFunct.on. RECONSTRUCTION CROSSERTROPY Cross Entropy hitps'saacchanghau.gthub olpostloss_functions! sie

DL Unit2 HD
No ratings yet
DL Unit2 HD
141 pages
Loss Function
No ratings yet
Loss Function
3 pages
Loss Functions
No ratings yet
Loss Functions
37 pages
Most Influential Data Science Research Papers
No ratings yet
Most Influential Data Science Research Papers
628 pages
Detailed Guide 7 Loss Functions Machine Learning Python Code
No ratings yet
Detailed Guide 7 Loss Functions Machine Learning Python Code
16 pages
Mod 2.3 - Activation Function, Loss Functions
No ratings yet
Mod 2.3 - Activation Function, Loss Functions
12 pages
01 Lecturenote SRM
No ratings yet
01 Lecturenote SRM
9 pages
Assignment 1 - Machine Learning
No ratings yet
Assignment 1 - Machine Learning
9 pages
4-Loss Function
No ratings yet
4-Loss Function
8 pages
Lecture 5
No ratings yet
Lecture 5
18 pages
3 - Loss Functions
No ratings yet
3 - Loss Functions
14 pages
Loss Function
No ratings yet
Loss Function
23 pages
Loss Fuction: by Fatema Khairunnasa Lecturer, Dept. of Statistics, Bsmrstu
No ratings yet
Loss Fuction: by Fatema Khairunnasa Lecturer, Dept. of Statistics, Bsmrstu
7 pages
Loss Function - Ipynb - Colaboratory
No ratings yet
Loss Function - Ipynb - Colaboratory
6 pages
Sample Midterm With Solutions (Updated)
No ratings yet
Sample Midterm With Solutions (Updated)
26 pages
Quiz1 Solutions Quiz 1 Soln
No ratings yet
Quiz1 Solutions Quiz 1 Soln
7 pages
A General and Adaptive Robust Loss Function: Jonathan T. Barron Google Research
No ratings yet
A General and Adaptive Robust Loss Function: Jonathan T. Barron Google Research
19 pages
Loss Functions in Deep Learning - MLearning - Ai
No ratings yet
Loss Functions in Deep Learning - MLearning - Ai
14 pages
Lesson02-Python Calculus Maths
No ratings yet
Lesson02-Python Calculus Maths
19 pages
9-Deep Neural Networks - Forward and Back Propagation-01-08-2024
No ratings yet
9-Deep Neural Networks - Forward and Back Propagation-01-08-2024
10 pages
Machine Learning Models
No ratings yet
Machine Learning Models
52 pages
Ai - W3L6
No ratings yet
Ai - W3L6
29 pages
DeepLearning Lect2 3
No ratings yet
DeepLearning Lect2 3
89 pages
The Maths That Drives Ai: Work / Technology & Tools
No ratings yet
The Maths That Drives Ai: Work / Technology & Tools
3 pages
Loss Functions
No ratings yet
Loss Functions
7 pages
A General and Adaptive Robust Loss Function
No ratings yet
A General and Adaptive Robust Loss Function
9 pages
Huber Loss
No ratings yet
Huber Loss
5 pages
Lecture 07
No ratings yet
Lecture 07
29 pages
Linear Regression
No ratings yet
Linear Regression
9 pages
Loss Functions Types
No ratings yet
Loss Functions Types
11 pages
Lect 9 - Loss Functions
No ratings yet
Lect 9 - Loss Functions
28 pages
Deep Learning (Part 2) - Loss Function and Gradient Function - by Sumbatilinda - Medium
No ratings yet
Deep Learning (Part 2) - Loss Function and Gradient Function - by Sumbatilinda - Medium
30 pages
Group 30
No ratings yet
Group 30
33 pages
Lecture 11
No ratings yet
Lecture 11
26 pages
DL Practical 3 Loss Function
No ratings yet
DL Practical 3 Loss Function
6 pages
Loss Functions
No ratings yet
Loss Functions
8 pages
Unit IV BPA GD
No ratings yet
Unit IV BPA GD
12 pages
Loss Functions
No ratings yet
Loss Functions
7 pages
Loss Functions
No ratings yet
Loss Functions
29 pages
Deep Learning Assignment2 Solutions PDF
No ratings yet
Deep Learning Assignment2 Solutions PDF
16 pages
Using The Mean Absolute Percentage Error For Regression Models
No ratings yet
Using The Mean Absolute Percentage Error For Regression Models
7 pages
Module 6 - Loss Function
No ratings yet
Module 6 - Loss Function
22 pages
Machine Vesion hw6
No ratings yet
Machine Vesion hw6
18 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
05 AIS302 ANN-Optimization
No ratings yet
05 AIS302 ANN-Optimization
44 pages
SC Assignment 2
No ratings yet
SC Assignment 2
10 pages
Loss Functions
No ratings yet
Loss Functions
17 pages
Op Tim Ization
No ratings yet
Op Tim Ization
18 pages
L1, L2 and Huber Loss
No ratings yet
L1, L2 and Huber Loss
8 pages
Lesson 04 Deep Neural Network
No ratings yet
Lesson 04 Deep Neural Network
81 pages
Deep Learning Notes-2
No ratings yet
Deep Learning Notes-2
16 pages
Unit 2b
No ratings yet
Unit 2b
11 pages
Loss
No ratings yet
Loss
18 pages
Practical-5 - 2CEIT606 - Artificial Intelligence
No ratings yet
Practical-5 - 2CEIT606 - Artificial Intelligence
14 pages
Fundamentals of Neural Network
No ratings yet
Fundamentals of Neural Network
84 pages
DL Assi02
No ratings yet
DL Assi02
9 pages
Loss Functions in Deep Learning: A Comprehensive Review
No ratings yet
Loss Functions in Deep Learning: A Comprehensive Review
36 pages

Loss Functions in Neural Networks PDF

Uploaded by

Loss Functions in Neural Networks PDF

Uploaded by

You might also like