0% found this document useful (0 votes)
27 views6 pages

Loss Functions For Semantic Segmentation

This document summarizes 15 loss functions that are commonly used for semantic segmentation in deep learning models. It categorizes the loss functions into four types: distribution-based, region-based, boundary-based, and compounded. The author introduces a new log-cosh dice loss function and compares its performance to other loss functions on a skull segmentation dataset. Code implementing the various loss functions is available online. The loss functions can help in the fast and accurate convergence of segmentation models depending on the characteristics of the dataset.

Uploaded by

Himashree Kalita
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views6 pages

Loss Functions For Semantic Segmentation

This document summarizes 15 loss functions that are commonly used for semantic segmentation in deep learning models. It categorizes the loss functions into four types: distribution-based, region-based, boundary-based, and compounded. The author introduces a new log-cosh dice loss function and compares its performance to other loss functions on a skull segmentation dataset. Code implementing the various loss functions is available online. The loss functions can help in the fast and accurate convergence of segmentation models depending on the characteristics of the dataset.

Uploaded by

Himashree Kalita
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

A survey of loss functions for semantic

segmentation
Shruti Jadon
IEEE Member
[email protected]

Abstract—Image Segmentation has been an active field of have proposed a new log-cosh dice loss function for semantic
research as it has a wide range of applications, ranging from segmentation. To showcase its efficiency, we compared the
automated disease detection to self driving cars. In the past performance of all loss functions on NBFS Skull-stripping
5 years, various papers came up with different objective loss
arXiv:2006.14822v4 [eess.IV] 3 Sep 2020

functions used in different cases such as biased data, sparse dataset [1] and shared the outcomes in form of Dice Co-
segmentation, etc. In this paper, we have summarized some of the efficient, Sensitivity, and Specificity. The code implementa-
well-known loss functions widely used for Image Segmentation tion is available at GitHub: https://github.com/shruti-jadon/
and listed out the cases where their usage can help in fast Semantic-Segmentation-Loss-Functions.
and better convergence of a model. Furthermore, we have also
introduced a new log-cosh dice loss function and compared its
performance on NBFS skull-segmentation open source data-set
with widely used loss functions. We also showcased that certain
loss functions perform well across all data-sets and can be taken
as a good baseline choice in unknown data distribution scenarios.

Index Terms—Computer Vision, Image Segmentation, Medical


Image, Loss Function, Optimization, Healthcare, Skull Stripping,
Deep Learning

I. I NTRODUCTION
Deep learning has revolutionized various industries ranging
from software to manufacturing. Medical community has Fig. 1. Sample Brain Lesion Segmentation CT Scan [2]. In this segmentation
also benefited from deep learning. There have been multiple mask you can see, that number of pixels of white area(targeted lesion) is less
than number of black pixels.
innovations in disease classification, example, tumor segmen-
tation using U-Net and cancer detection using SegNet. Image
segmentation is one of the crucial contribution of deep learning
TABLE I
community to medical fields. Apart from telling that some T YPES OF S EMANTIC S EGMENTATION L OSS F UNCTIONS [3]
disease exists it also showcases where exactly it exists. It
has drastically helped in creating algorithms to detect tumors, Type Loss Function
Distribution-based Loss Binary Cross-Entropy
lesions etc. in various types of medical scans. Weighted Cross-Entropy
Image Segmentation can be defined as classification task Balanced Cross-Entropy
on pixel level. An image consists of various pixels, and these Focal Loss
Distance map derived loss penalty term
pixels grouped together define different elements in image. A Region-based Loss Dice Loss
method of classifying these pixels into the a elements is called Sensitivity-Specificity Loss
semantic image segmentation. The choice of loss/objective Tversky Loss
Focal Tversky Loss
function is extremely important while designing complex Log-Cosh Dice Loss(ours)
image segmentation based deep learning architectures as they Boundary-based Loss Hausdorff Distance loss
instigate the learning process of algorithm. Therefore, since Shape aware loss
Compounded Loss Combo Loss
2012, researchers have experimented with various domain Exponential Logarithmic Loss
specific loss function to improve results for their datasets.
In this paper we have summarized fifteen such segmentation
based loss functions that have been proven to provide state
II. L OSS F UNCTIONS
of art results in different domains. These loss function can
be categorized into 4 categories: Distribution-based, Region- Deep Learning algorithms use stochastic gradient descent
based, Boundary-based, and Compounded (Refer I). We have approach to optimize and learn the objective. To learn an
also discussed the conditions to determine which objective/loss objective accurately and faster, we need to ensure that our
function might be useful in a scenario. Apart from this, we mathematical representation of objectives, also known as loss

978-1-7281-9468-4/20/$31.00 ©2020 IEEE


functions are able to cover even the edge cases. The intro- just positive examples [8], we also weight also the negative
duction of loss functions have roots in traditional machine examples. Balanced Cross-Entropy can be defined as follows:
learning, where these loss functions were derived on basis
of distribution of labels. For example, Binary Cross Entropy LBCE (y, ŷ) = −(β∗ylog(ŷ)+(1−β)∗(1−y)log(1− ŷ)) (3)
is derived from Bernoulli distribution and Categorical Cross- Here, β is defined as 1 − y
H∗W
Entropy from Multinoulli distribution. In this paper, we have
focused on Semantic Segmentation instead of Instance Seg- D. Focal Loss
mentation, therefore the number of classes at pixel level is
Focal loss (FL) [9] can also be seen as variation of Binary
restricted to 2. Here, we will go over 15 widely used loss
Cross-Entropy. It down-weights the contribution of easy
functions and understand their use-case scenarios.
examples and enables the model to focus more on learning
hard examples. It works well for highly imbalanced class
scenarios, as shown in fig 1. Lets look at how this focal loss
is designed. We will first look at binary cross entropy loss
and learn how Focal loss is derived from cross-entropy.
(
− log(p), if y = 1
CE = (4)
− log(1 − p), otherwise
To make convenient notation, Focal Loss defines the estimated
probability of class as:
(
p, if y = 1
pt = (5)
1 − p, otherwise
Therefore, Now Cross-Entropy can be written as,
Fig. 2. Graph of Binary Cross Entropy Loss Function. Here, Entropy is
defined on Y-axis and Probability of event is on X-axis.
CE(p, y) = CE(pt ) = −log(pt ) (6)

Focal Loss proposes to down-weight easy examples and focus


A. Binary Cross-Entropy training on hard negatives using a modulating factor, ((1 −
Cross-entropy [4] is defined as a measure of the difference p)t)γ as shown below:
between two probability distributions for a given random
variable or set of events. It is widely used for classification F L(pt ) = −αt (1 − pt )γ log(pt ) (7)
objective, and as segmentation is pixel level classification it Here, γ > 0 and when γ = 1 Focal Loss works like Cross-
works well. Entropy loss function. Similarly, α generally range from [0,1],
Binary Cross-Entropy is defined as: It can be set by inverse class frequency or treated as a hyper-
LBCE (y, ŷ) = −(ylog(ŷ) + (1 − y)log(1 − ŷ)) (1) parameter.

Here, ŷ is the predicted value by the prediction model. E. Dice Loss


The Dice coefficient is widely used metric in computer
B. Weighted Binary Cross-Entropy
vision community to calculate the similarity between two
Weighted Binary cross entropy (WCE) [5] is a variant of images. Later in 2016, it has also been adapted as loss function
binary cross entropy variant. In this the positive examples get known as Dice Loss [10].
weighted by some coefficient. It is widely used in case of
2y p̂ + 1
skewed data [6] as shown in figure 1. Weighted Cross Entropy DL(y, p̂) = 1 − (8)
can be defined as: y + p̂ + 1
Here, 1 is added in numerator and denominator to ensure that
LW −BCE (y, ŷ) = −(β ∗ ylog(ŷ) + (1 − y)log(1 − ŷ)) (2) the function is not undefined in edge case scenarios such as
Note: β value can be used to tune false negatives and false when y = p̂ = 0.
positives. E.g; If you want to reduce the number of false
F. Tversky Loss
negatives then set β > 1, similarly to decrease the number
of false positives, set β < 1. Tversky index (TI) [11] can also be seen as an generalization
of Dices coefficient. It adds a weight to FP (false positives)
C. Balanced Cross-Entropy and FN (false negatives) with the help of β coefficient.
Balanced cross entropy (BCE) [7] is similar to Weighted pp̂
Cross Entropy. The only difference is that in this apart from T I(p, p̂) = (9)
pp̂ + β(1 − p)p̂ + (1 − β)p(1 − p̂)
Here, when β = 1/2, It can be solved into regular Dice K. Exponential Logarithmic Loss
coefficient. Similar to Dice Loss, Tversky loss can also be Exponential Logarithmic loss [16] function focuses on less
defined as: accurately predicted structures using combined formulation of
Dice Loss and Cross Entropy loss. Wong et al. [16] proposes
1 + pp̂ to make exponential and logarithmic transforms to both Dice
T L(p, p̂) = 1− (10)
1 + pp̂ + β(1 − p)p̂ + (1 − β)p(1 − p̂) loss an cross entropy loss so as to incorporate benefits of finer
G. Focal Tversky Loss decision boundaries and accurate data distribution. It is defined
Similar to Focal Loss, which focuses on hard example as:
by down-weighting easy/common ones. Focal Tversky loss
[12] also attempts to learn hard-examples such as with small LExp = wDice LDice + wcross Lcross (19)
ROIs(region of interest) with the help of γ coefficient as shown where
below: LDice = E(−ln(DC)γDice ) (20)
X
FTL = (1 − T Ic )γ (11)
c Lcross = E(wl (−ln(pl ))γcross )) (21)
here, T I indicates tversky index, and γ can range from [1,3]. Wong et al. [16] have used γcross = γDice for simplicity.
H. Sensitivity Specificity Loss
L. Distance map derived loss penalty term
Similar to Dice Coefficient, Sensitivity and Specificity are
Distance Maps can be defined as distance (euclidean, ab-
widely used metrics to evaluate the segmentation predictions.
solute, etc.) between the ground truth and the predicted map.
In this loss function, we can tackle class imbalance problem
There are two ways to incorporate distance maps, either create
using w parameter. The loss [13] is defined as:
neural network architecture where there’s a reconstruction
SSL = w ∗ sensitivity + (1 − w) ∗ specif icity (12) head along with segmentation, or induce it into loss function.
where, Following same theory, Caliva et al. [17] have used distance
TP maps derived from ground truth masks and created a custom
sensitivity = (13) penalty based loss function. Using this approach, its easy to
TP + FN
and guide the networks focus towards hard-to-segment boundary
TN regions. The loss function is defined as:
specif icity = (14)
TN + FP
N
I. Shape-aware Loss 1 X
L(y, p) = (1 + φ)( )LCE (y, p) (22)
Shape-aware loss [14] as the name suggests takes shape into N i=1
account. Generally, all loss functions work at pixel level, how- Here, φ are generated distance maps
ever, Shape-aware loss calculates the average point to curve Note Here, constant 1 is added to avoid vanishing gradient
Euclidean distance among points around curve of predicted problem in U-Net and V-Net architectures.
segmentation to the ground truth and use it as coefficient to
cross-entropy loss function. It is defined as follows:
Ei = D(Ĉ, CGT ) (15)
X X
Lshape−aware =− CE(y, ŷ) − iEi CE(y, ŷ) (16)
i
Using Ei the network learns to produce a prediction masks
similar to the training shapes.
J. Combo Loss
Combo loss [15] is defined as a weighted sum of Dice
loss and a modified cross entropy. It attempts to leverage the
flexibility of Dice loss of class imbalance and at same time Fig. 3. Hausdorff Distance between point sets X and Y [18]
use cross-entropy for curve smoothing. It’s defined as:
M. Hausdorff Distance Loss
1 X
Lm−bce = − β(y − log(ŷ)) + (1 − β)(1 − y)log(1 − ŷ) Hausdorff Distance (HD) is a metric used by segmentation
N i
approaches to track the performance of a model. It is defined
(17)
as:
CL(y, ŷ) = αLm−bce − (1 − α)DL(y, ŷ) (18)
Here DL is Dice Loss. d(X, Y ) = maxxX minyY ||x − y||2 (23)
The objective of any segmentation model is to maximize the O. Log-Cosh Dice Loss
Hausdorff Distance [19], but due to its non-convex nature, Dice Coefficient is a widely used metric to evaluate the
its not widely used as loss function. Karimi et al. [18] has segmentation output. It has also been modified to be used as
proposed 3 variants of Hausdorff Distance based loss functions loss function as it fulfills the mathematical representation of
which incorporates the metric use case and ensures that the segmentation objective. But due to its non-convex nature, it
loss function is tractable. These 3 variants are designed on might fail in achieving the optimal results. Lovsz-Softmax loss
basis of how we can use Hausdorff Distance as part of loss [21] aimed to tackle the problem of non-convex loss function
function: (i) taking max of all HD errors, (ii) minimum of all by adding the smoothing using Lovsz extension. Log-Cosh
errors obtained by placing a circular structure of radius r, and approach has been widely used in regression based problem
(iii) max of a convolutional kernel placed on top of missing for smoothing the curve.
segmented pixels.

N. Correlation Maximized Structural Similarity Loss


A lot of semantic based segmentation loss functions focus
on classification error at pixel level while disregarding the
pixel level structural information. Some other loss functions
[20] have attempted to add information using structural priors
such as CRF, GANs, etc. In this loss functions, zhao et al. [20]
have introduced a Structural Similarity Loss (SSL) to achieve
a high positive linear correlation between the ground truth
map and the predicted map. Its divided into 3 steps: Structure Fig. 4. Cosh(x) function is the average of ex and e− x
Comparison, Cross-Entropy weight coefficient determination,
and mini-batch loss definition.
As part of Structure comparison, authors have calculated e-
coefficient, which can measure the degree of linear correlation
between ground truth and prediction:

y − µy + C4 p − µp + C4
e=| − | (24)
σy + C4 σp + C4

Here, C4 is stability factor set to 0.01 as an empirical


observed value. µy and σy are the local mean and standard
deviation of the ground truth y respectively. y locates at the
center of the local region and p is the predicted probability. Fig. 5. tanh(x) function is continuous and finite. It ranges from [−1, 1]
After calculating the degree of correlation, zhao et al. [20]
have used it as coefficient for cross entropy loss function, Hyperbolic functions have been used by deep learning
defined as: community in terms of non-linearities such as tanh layer. They
are tractable as well as easily differentiable. Cosh(x) is defined
fn,c = 1 ∗ en,c > βemax (25) as (ref 4):
ex + e−x
Using this coefficient function, we can define SSL loss as: coshx = (28)
2
and
ex − e−x
Lossssl (yn,c , pn,c ) = en,c fn,c LCE (yn,c , pn,c ) (26) cosh0 x = = sinhx (29)
2
but, at present coshx range can go up to infinity. So, to capture
and finally for mini-batch loss calculation, The SSL can be it in range, log space is used, making the log-cosh function to
defined as: be:
N C L(x) = log(coshx) (30)
1 XX
Lssl = Lssl (yn,c , pn,c ) (27) and using chain rule
M n=1 c=1
sinhx
PN PC L0 (x) = = tanhx (31)
where, M is n=1 c=1 fn,c Using above formula, loss func- coshx
tion will automatically abandon those pixel level predictions, which is continuous and finite in nature, as tanhx ranges from
which doesn’t show correlation in terms of structure. [−1, 1]
TABLE II
TABULAR S UMMARY OF S EMANTIC S EGMENTATION L OSS F UNCTIONS

Loss Function Use cases


Binary Cross-Entropy Works best in equal data distribution among classes scenarios
Bernoulli distribution based loss function
Weighted Cross-Entropy Widely used with skewed dataset
Weighs positive examples by β coefficient
Balanced Cross-Entropy Similar to weighted-cross entropy, used widely with skewed dataset
weighs both positive as well as negative examples by β and 1 − β respectively
Focal Loss works best with highly-imbalanced dataset
down-weight the contribution of easy examples, enabling model to learn hard examples
Distance map derived loss penalty term Variant of Cross-Entropy
Used for hard-to-segment boundaries
Dice Loss Inspired from Dice Coefficient, a metric to evaluate segmentation results.
As Dice Coefficient is non-convex in nature, it has been modified to make it more tractable.
Sensitivity-Specificity Loss Inspired from Sensitivity and Specificity metrics
Used for cases where there is more focus on True Positives.
Tversky Loss Variant of Dice Coefficient
Add weight to False positives and False negatives.
Focal Tversky Loss Variant of Tversky loss with focus on hard examples

Log-Cosh Dice Loss(ours) Variant of Dice Loss and inspired regression log-cosh approach for smoothing
Variations can be used for skewed dataset
Hausdorff Distance loss Inspired by Hausdorff Distance metric used for evaluation of segmentation
Loss tackle the non-convex nature of Distance metric by adding some variations
Shape aware loss Variation of cross-entropy loss by adding a shape based coefficient
used in cases of hard-to-segment boundaries.
Combo Loss Combination of Dice Loss and Binary Cross-Entropy
used for lightly class imbalanced by leveraging benefits of BCE and Dice Loss
Exponential Logarithmic Loss Combined function of Dice Loss and Binary Cross-Entropy
Focuses on less accurately predicted cases
Correlation Maximized Structural Similarity Loss Focuses on Segmentation Structure.
Used in cases of structural importance such as medical images.

TABLE III
C OMPARISON OF SOME ABOVE MENTIONED LOSS FUNCTIONS ON BASIS
OF D ICE SCORES , S ENSITIVITY AND S PECIFICITY FOR S KULL
S EGMENTATION

Loss Evaluation Metrics


Functions Dice Coefficient Sensitivity Specificity
Binary Cross-Entropy 0.968 0.976 0.998
Weighted Cross-Entropy 0.962 0.966 0.998
Focal Loss 0.936 0.952 0.999
Dice Loss 0.970 0.981 0.998
Tversky Loss 0.965 0.979 0.996
Focal Tversky Loss 0.977 0.990 0.997
Sensitivity-Specificity Loss 0.957 0.980 0.996
Exp-Logarithmic Loss 0.972 0.982 0.997
Log Cosh Dice Loss 0.989 0.975 0.997
Fig. 6. Sample CT scan image from NBFS Skull Stripping Dataset [1]

On basis of above proof which showcased that Log of Cosh


function will remain continuous and finite after first order
of 125 skull CT scans, and each scan consists of 120 slices
differentiation. We are proposing Log-Cosh Dice Loss function
(refer figure 6). For training, we have used batch size of 32
for its tractable nature while encapsulating the features of dice
and adam optimizer with learning rate 0.001 and learning rate
coefficient. It can defined as:
reduction up to 10−8 . As part of training, validation, and test
Llc−dce = log(cosh(DiceLoss)) (32) data, we have split data-set into 60-20-20. We have performed
experiments using only 9 loss functions as other loss functions
III. E XPERIMENTS were either resolving into our existing chosen loss function or
For experiments, we have implemented simple 2D U-Net weren’t fit for NBFS skull dataset. After training the model for
model [2] architecture for segmentation with 10 convolution different loss functions, we have evaluated them on basis of
encoded layers and 8 decoded convolutional transpose layers. well known evaluation metrics: Dice Coefficient, Sensitivity,
We have used NBFS Skull-stripping dataset [1], which consists and Specificity.
1) Evaluation Metrics: Evaluation Metrics plays an impor- [3] Ma Jun. Segmentation loss odyssey. arXiv preprint arXiv:2005.13449,
tant role in assessing the outcomes of segmentation models. 2020.
[4] Ma Yi-de, Liu Qing, and Qian Zhi-Bai. Automated image segmentation
In this work, we have analyzed our results using Dice Coeffi- using improved pcnn model based on cross-entropy. In Proceedings
cient, Sensitivity, and Specificity metric. Dice Coefficient, also of 2004 International Symposium on Intelligent Multimedia, Video and
known as overlapping index measures the overlapping between Speech Processing, 2004., pages 743–746. IEEE, 2004.
[5] Vasyl Pihur, Susmita Datta, and Somnath Datta. Weighted rank ag-
ground truth and predicted output. Similarly, Sensitivity gives gregation of cluster validation measures: a monte carlo cross-entropy
more weightage to True Positives and Sensitivity calculates the approach. Bioinformatics, 23(13):1607–1615, 2007.
ratio of True Negatives. Collectively, these metrics examine the [6] Yaoshiang Ho and Samuel Wookey. The real-world-weight cross-entropy
loss function: Modeling the costs of mislabeling. IEEE Access, 8:4806–
model performance effectively. 4813, 2019.
2T P [7] Saining Xie and Zhuowen Tu. Holistically-nested edge detection. In
DC = , (33) Proceedings of the IEEE international conference on computer vision,
2T P + F P + F N pages 1395–1403, 2015.
[8] Shiwen Pan, Wei Zhang, Wanjun Zhang, Liang Xu, Guohua Fan,
TP Jianping Gong, Bo Zhang, and Haibo Gu. Diagnostic model of coronary
Sensitivity(T P R) = , and (34)
TP + FN microvascular disease combined with full convolution deep network with
balanced cross-entropy cost function. IEEE Access, 7:177997–178006,
TN 2019.
Specif icity(T N R) = (35)
TN + FP [9] TY Lin, P Goyal, R Girshick, K He, and P Dollár. Focal loss for dense
object detection. arxiv 2017. arXiv preprint arXiv:1708.02002, 2002.
In Conclusion, by using 40,000 annotated segmented ex- [10] Carole H Sudre, Wenqi Li, Tom Vercauteren, Sebastien Ourselin, and
amples, we achieved an optimal dice coefficient of 0.98 M Jorge Cardoso. Generalised dice overlap as a deep learning loss
using Focal Tversky Loss. Log-Cosh Dice Loss function function for highly unbalanced segmentations. In Deep learning in
medical image analysis and multimodal learning for clinical decision
also achieved similar results of the dice coefficient 0.975, support, pages 240–248. Springer, 2017.
very close to the best results. As of sensitivity, i.e., True [11] Seyed Sadegh Mohseni Salehi, Deniz Erdogmus, and Ali Gholipour.
Positive Rate, Focal Tversky Loss outperformed all other loss Tversky loss function for image segmentation using 3d fully convolu-
tional deep networks. In International Workshop on Machine Learning
functions, whereas specificity(True Negative Rate) remained in Medical Imaging, pages 379–387. Springer, 2017.
consistent across all loss functions. We have also observed [12] Nabila Abraham and Naimul Mefraz Khan. A novel focal tversky loss
similar outcomes in our past research [2] Focal Tversky loss function with improved attention u-net for lesion segmentation. In 2019
IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019),
and Tversky loss generally gives optimal results with right pages 683–687. IEEE, 2019.
parameter values. [13] Seyed Raein Hashemi, Seyed Sadegh Mohseni Salehi, Deniz Erdogmus,
Sanjay P Prabhu, Simon K Warfield, and Ali Gholipour. Asymmet-
IV. C ONCLUSION ric loss functions and deep densely-connected networks for highly-
imbalanced medical image segmentation: Application to multiple scle-
Loss functions play an essential role in determining the rosis lesion detection. IEEE Access, 7:1721–1735, 2018.
model performance. For complex objectives such as seg- [14] Zeeshan Hayder, Xuming He, and Mathieu Salzmann. Shape-aware
mentation, it’s not possible to decide on a universal loss instance segmentation. arXiv preprint arXiv:1612.03129, 2(5):7, 2016.
[15] Saeid Asgari Taghanaki, Yefeng Zheng, S Kevin Zhou, Bogdan
function. The majority of the time, it depends on the data- Georgescu, Puneet Sharma, Daguang Xu, Dorin Comaniciu, and Ghas-
set properties used for training, such as distribution, skewness, san Hamarneh. Combo loss: Handling input and output imbalance
boundaries, etc. None of the mentioned loss functions have the in multi-organ segmentation. Computerized Medical Imaging and
Graphics, 75:24–33, 2019.
best performance in all the use cases. However, we can say [16] Ken CL Wong, Mehdi Moradi, Hui Tang, and Tanveer Syeda-Mahmood.
that highly imbalanced segmentation works better with focus 3d segmentation with exponential logarithmic loss for highly unbalanced
based loss functions. Similarly, binary-cross entropy works object sizes. In International Conference on Medical Image Computing
and Computer-Assisted Intervention, pages 612–619. Springer, 2018.
best with balanced data-sets, whereas mildly skewed data-sets [17] Francesco Caliva, Claudia Iriondo, Alejandro Morales Martinez,
can work around smoothed or generalized dice coefficient. In Sharmila Majumdar, and Valentina Pedoia. Distance map loss penalty
this paper, we have summarized 14 well-known loss functions term for semantic segmentation. arXiv preprint arXiv:1908.03679, 2019.
[18] Davood Karimi and Septimiu E Salcudean. Reducing the hausdorff
for semantic segmentation and proposed a tractable variant of distance in medical image segmentation with convolutional neural net-
dice loss function for better and accurate optimization. In the works. IEEE Transactions on medical imaging, 39(2):499–513, 2019.
future, we will use this work as a baseline implementation for [19] Javier Ribera, David Güera, Yuhao Chen, and Edward J. Delp. Weighted
hausdorff distance: A loss function for object localization. ArXiv,
few-shot segmentation [22] experiments. abs/1806.07564, 2018.
[20] Shuai Zhao, Boxi Wu, Wenqing Chu, Yao Hu, and Deng Cai. Correlation
R EFERENCES maximized structural similarity loss for semantic segmentation. arXiv
[1] Benjamin Puccio, James P. Pooley, John Pellman, Elise C Taverna, and preprint arXiv:1910.08711, 2019.
R. Cameron Craddock. The preprocessed connectomes project repository [21] Maxim Berman, Amal Rannen Triki, and Matthew B. Blaschko. The
of manually corrected skull-stripped t1-weighted anatomical mri data. lovsz-softmax loss: A tractable surrogate for the optimization of the
GigaScience, 5, 2016. intersection-over-union measure in neural networks, 2017.
[2] Shruti Jadon, Owen P. Leary, Ian Pan, Tyler J. Harder, David W. [22] S. Jadon. Hands-on one-shot learning with python: A Practical Guide
Wright, Lisa H. Merck, and Derek L. Merck. A comparative study to Implementing Fast and Accurate Deep Learning Models with Fewer
of 2D image segmentation algorithms for traumatic brain lesions using Training. Packt publshing Limited, 2019.
CT data from the ProTECTIII multicenter clinical trial. In Po-Hao [23] Jan Hendrik Moltz, Annika Hänsch, Bianca Lassen-Schmidt, Benjamin
Chen and Thomas M. Deserno, editors, Medical Imaging 2020: Imaging Haas, A Genghi, J Schreier, Tomasz Morgas, and Jan Klein. Learning a
Informatics for Healthcare, Research, and Applications, volume 11318, loss function for segmentation: A feasibility study. In 2020 IEEE 17th
pages 195 – 203. International Society for Optics and Photonics, SPIE, International Symposium on Biomedical Imaging (ISBI), pages 357–360.
2020. IEEE, 2020.

You might also like