2022 - The Effect of Loss Function On Conditional Generative Adversarial Networks
2022 - The Effect of Loss Function On Conditional Generative Adversarial Networks
a r t i c l e i n f o a b s t r a c t
Article history: Conditional Generative Adversarial Network (cGAN) is a general purpose approach for many image-to-
Received 4 August 2021 image translation tasks, which aims to translate images from one form to another resulting in high-
Revised 2 February 2022 quality translated images. In this paper, the loss function of the cGAN model is modified by combining
Accepted 16 February 2022
the adversarial loss of state-of-the-art Generative Adversarial Network (GAN) models with a new combi-
Available online 4 March 2022
nation of non-adversarial loss functions to enhance model performance and generate more realistic
images. Specifically, the effect of the Wasserstein GAN (WGAN), the WGAN with Gradient Penalty
Keywords:
(WGAN-GP), and least Squared GAN (lsGAN) adversarial loss functions are explored. Several comparisons
Generative adversarial network
Conditional generative adversarial network
are performed to select an optimized combination of L1 with structure, gradient, content-based, Kullback-
Pixel2Pixel Leibler divergence, and softmax non-adversarial loss functions. For experimentation purposes, the
Loss functions Facades dataset is used in case of image-to-image translation task. Peak-signal-to-noise-ratio (PSNR),
Structural Similarity Index (SSIM), Universal Quality Index (UQI), and Visual Information Fidelity (VIF)
are used to quantitatively evaluate the translated images. Based on our experimental results, the best
combination of the loss functions for image-to-image translation on facade dataset is (WGAN) adversarial
loss with (L1 and content) non-adversarial loss functions. The model generates fine structure images, and
captures both high and low frequency details of translated images. Image in-painting and lesion segmen-
tation is investigated to demonstrate practicality of proposed work.
Ó 2022 The Authors. Published by Elsevier B.V. on behalf of King Saud University. This is an open access
article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
https://doi.org/10.1016/j.jksuci.2022.02.018
1319-1578/Ó 2022 The Authors. Published by Elsevier B.V. on behalf of King Saud University.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
A. Abu-Srhan, Mohammad A.M. Abushariah and O.S. Al-Kadi Journal of King Saud University – Computer and Information Sciences 34 (2022) 6977–6988
The definition of the loss function is a very critical aspect in the Abbreviation Meaning
design of GAN models. The loss function used by GAN is called an cGAN Conditional Generative Adversarial Network
adversarial loss function that calculates the distance between the GAN Generative Adversarial Network
GAN distribution of the generated data and the distribution of WGAN Wasserstein Generative Adversarial Network
the actual data. Any GAN model has two loss functions, one to train WGAN-GP Wasserstein GAN with Gradient Penalty
lsGAN least Squared GAN
the generator network and the other to train the discriminator net- PSNR Peak-signal-to-noise-ratio
work. These two loss functions work together to form an adversar- SSIM Structural Similarity Index
ial loss function (Tzeng et al., 2017). It is beneficial to combine the UQI Universal Quality Index
GAN adversarial loss function of the generator network with tradi- VIF Visual Information Fidelity
Pix2Pix pixel-to-pixel
tional loss functions such as the pixel-to-pixel (Pix2Pix) model
PAN Perceptual Adversarial Network
(Isola et al., 2017), which is an interesting extension of the cGAN ID-CGAN Image De-raining cGAN
architecture in which the loss function has been modified by add- ReLU Rectified Linear Unit
ing L1 loss functions to produce more powerful results. It should be GDL Gradient Difference Loss Function
noted that both the cGAN and the Pix2Pix models need to be KLD Kullback–Leibler Divergence
FR-IQA Full-Reference Image Quality Assessment
trained using paired images, where a mapping between the input
MSE Mean Square Error
and the target images exists. NSS Natural Scene Statistics
The research done in the field of GAN can be divided into two HVS Human Visual System
categories: Architecture improvement and loss function improve- Std standard deviation
MR Magnetic Resonance
ments. Many architecture variants and loss variants of GAN have
been introduced as a result of this research. The architecture vari-
ants aim to improve performance by focusing on vanishing gradi-
ent, image quality, and mode diversity, while the loss variants
aim to improve performance based on mode collapse, vanishing
Training set
gradient, and image quality.
One of the most common failure modes in GAN is mode col-
lapse. Pix2pix and cGAN models, as loss variants of GAN, are less Fake
Discriminator
vulnerable to the mode collapse problem, which can be minimized
by combining multiple loss functions. Furthermore, using condi- Real
tional generative models that explicitly maximize likelihood, such Random
Generator
Noise
as cGAN, could avoid the mode collapse problem.
In this paper, we focus on the adversarial loss functions used to Fake Image
train the cGAN to improve its performance in terms of the quality
Fig. 1. A typical GAN architecture.
of the generated images. The adversarial loss function of cGAN
model is replaced based on a comparison of a set of state-of-
the-art adversarial loss functions. In addition, we implemented
different effective loss functions that are used in literature and
generator and discriminator. The generator is used to generate
performed some combination between them with cGAN
images, whereas the function of the discriminator is to distinguish
adversarial loss function.
between real images and the generated images. Fig. 1 shows the
The main contributions of this research are as follows:
architecture of GAN. The original GAN starts with generating new
images from noise data. However, the idea then expanded to solve
1. The effect of loss function on cGAN model has been
the problem of image to image translation with more promising
investigated.
results. Many works were done to enhance the performance of
2. Both high and low frequency components of the output images
GAN either by enhancing the GAN architecture or modifying the
are captured by enhancing the adversarial loss function with an
loss function used to train the model, which led to enhance the
optimized combination of non-adversarial loss functions.
training process.
3. Performance is quantitatively evaluated by four different well-
Mirza et al. proposed a novel model named cGAN, which is
known image quality assessment metrics. considered as an extension to the original GAN, but with modifi-
4. The proposed approach has been experimented with image-to-
cations done in the input of discriminator; the input image of the
image translation of facade dataset, and applied for image seg- generator is feed also to the discriminator. They applied the
mentation and image in-painting tasks.
model on MNIST digits, and obtained results that outperform
6978
A. Abu-Srhan, Mohammad A.M. Abushariah and O.S. Al-Kadi Journal of King Saud University – Computer and Information Sciences 34 (2022) 6977–6988
behind L1 loss function is to minimize the absolute difference 5. The Content Loss
between the generated and ground truth images (Ma et al., The content Loss (reconstruction loss) has been proposed by
2020). LL1 is described in Eq. 1. Gatys et al. (2016), which can be used with adversarial loss to
form a perceptual loss function. It is a feature domain
LL1 ðGÞ ¼ Ex;y;z ½jjy Gðx; zÞjj1 ð1Þ
element-wise loss that is computed from a pre-trained network
Where x is the input image, y is the ground truth image, z is the such as VGG, where the VGG is a network that has been pre-
random noise, and Gðx; zÞ is the generated image. trained on the ImageNet dataset. For image generation, content
The main goal of using L1 is to capture the low frequency details. loss works with the content representation of the source and
Thus, using the L1 loss function will enforce low frequencies cor- the generated images in order to minimize the difference
rectness. It is effective to combine this type of loss functions between them. If we have the source image p and the generated
with another loss function to improve the quality of the gener- image x for the given layer l, then the content loss is defined as
ated images (Isola et al., 2017). in Eq. 6.
2. The Structural Similarity Index (SSIM) 1 X l 2
SSIM is a perception-based model that has been widely used to Lcontent ðp; x; lÞ ¼ F i;j Pli;j ð6Þ
2 i;j
evaluate image processing algorithms and as a loss function for
many image processing applications (Abobakr et al., 2019;
where P li;j and F li;j are the content representations of p and x
Setiadi, 2021). The SSIM loss function (Lstructure ðP Þ) is defined
images in layer l, respectively.
in Eq. 2.
The content loss is also known as the reconstruction loss, which
1X offers the training stability required for convergence. It therefore
Lstructure ðp1 ; p2 Þ ¼ p1 ; p2 ½1 SSIMðp1 ; p2 Þ: ð2Þ
N leads to powerful results.
6. Softmax Cross-entropy Loss Function
Where SSIM(p1 ; p2 ) is the SSIM for pixel p1 and p2 and can be Softmax cross-entropy loss function is a soft version of max
defined as in Eq. 3. function that takes an N-dimensional real number vector and
converts it to a vector in range (0,1). The output of this function
2lp1 lp2 þ c1 2rp1 p2 þ c2 is probability distribution, which makes it suitable for use in
SSIMðp1 ; p2 Þ ¼ ð3Þ
l2p1 þ l2p2 þ c1 r2p1 þ r2p2 þ c2 many classification tasks and deep learning applications. Eq. 7
shows the log-softmax loss function for the generated image x
Where lp1 is the average of p1 ; lp2 is the average of p2 ; rp21 is the and the real image y (Lin et al., 2017).
X
variance of p1 ; rp2 is the variance of p2 , and rp1 p2 is the covari- Lsoftmax ðx; yÞ ¼ y log exi ð7Þ
2
ance of p1 and p2 . i
3. Gradient Difference Loss Function (GDL) As shown in Eq. 7, it can be concluded that the log-softmax loss
The GDL penalizes the differences in the image gradient predic- function is considered an exponential loss function.
tion in order to sharpen the images (Hognon et al., 2020). In 7. Adversarial Loss Function
addition, it can be used for texture matching and robust feature. Below is the description of the adversarial loss functions used in
This type of loss functions is used to overcome the blurry output cGAN, lsGAN, WGAN, and WGAN-GP models:
image problem (Bhattacharjee and Das, 2018).
The GDL loss function between the generated image Gð X Þ and
the ground truth image Y is defined in Eq. 4 (Hognon et al.,
2020). (a) Adversarial and Conditional Loss
X The formulation of the conditional loss for the generative adver-
Lgradient ðGð X Þ; Y Þ ¼ jjY i;j Y i1;j j jGð X Þi;j
sarial network is defined in Eq. 8 (Isola et al., 2017):
i;j ð4Þ
Gð X Þi1;j j þ jY i;j Y i;j1 j jGð X Þi;j Gð X Þi;j1 jj LcGAN ðG; DÞ ¼ Ex;y ½logDðx; yÞþ
ð8Þ
Ex;z ½log ð1 Dðx; Gðx; zÞÞ
Where the GDL loss function computes the average gradient dif-
ference loss between the generated and ground truth images. where G is the generator, D is the discriminator, x is the input
jY i;j Y i1;j j and jY i;j Y i;j1 j are the component of temporal dif- image, y is the ground truth image, and z is the random noise vector.
ference loss. The cGAN loss function differs from the original GAN adversarial
4. Kullback-Leibler Divergence (KLD) loss, where the discriminator observes the input image in the cGAN,
The KLD (relative entropy) is a difference between two proba- while not in the original GAN. The formulation of the original GAN
bility distributions (a distribution-wise measure). It is therefore adversarial loss function is shown in Eq. 9:
important to transform the image into a probability distribution
LGAN ðG; DÞ ¼ Ey ½logDð yÞþ
to apply this type of loss function for images generation task. In ð9Þ
the simple term, if the result of the KLD between two distribu- Ex;z ½log ð1 DðGðx; zÞÞ
tions is 0, it indicates that these two distributions are identical. where log ðDðxÞÞ denotes the likelihood that the generator correctly
Otherwise, there are some differences between the two distri- classifying the real image. Maximizing log(1-D(G(z))) would help it
butions. It is related to maximum likelihood estimation, which to correctly label the fake image that comes from the generator.
is easy to optimize and becoming popular to use in many appli- (b) The Wasserstein GAN (WGAN)
cations such as applied statistics, fluid mechanics, and machine The WGAN model uses the Wasserstein distance that calculates
learning (Bellemare et al., 2017). The definition of KLD as a loss the difference between the generated and the target distribu-
function is shown in Eq. 5. tions instead of using the loss function that is used by the orig-
inal GAN model. The WGAN model is easy to train compared to
LKLD ¼ Y true log Y true =Y pred ð5Þ
the original GAN model and has achieved impressive results
Where Y true is the ground truth image, and Y pred is the generated (Alotaibi, 2020). Eq. 10 and Eq. 11 show the adversarial loss
image.
6980
A. Abu-Srhan, Mohammad A.M. Abushariah and O.S. Al-Kadi Journal of King Saud University – Computer and Information Sciences 34 (2022) 6977–6988
used by WGAN that includes the generator and the discrimina- 1. PSNR
tor loss functions, respectively. The most effective measurement method to evaluate synthe-
sised images is PSNR, which is widely used by many researchers
1X
LWasserstein G ¼ i ¼ 1m f G zðiÞ : ð10Þ for image comparison and image synthesis, because it is simple
m and easy to implement (Setiadi, 2021; Sara et al., 2019). PSNR is
a pixel loss-based evaluation metric that measures how far the
1X generated image pixels are from the ground truth. The testing
LWasserstein D ¼ i ¼ 1m f xðiÞ f G zðiÞ : ð11Þ dataset consists of paired images that need pixel loss-based
m
metrics, such as PSNR and SSIM. The higher the PSNR, the better
where m is the number of pixel in the image, x is the input image, the quality of the generated image. Eq. 13 shows the formula of
and z is the random noise. PSNR:
(c) The WGAN-Gradient Penalty (WGAN-GP) !
R2
The WGAN-gp model (Gulrajani et al., 2017) is an extension to PSNR ¼ 20 log 10 ð13Þ
the WGAN model that overcomes the drawbacks of the WGAN MSE
model, where the WGAN sometimes fails to converge and could
Based on Eq. 13, R is the maximum fluctuation in the input
generate low-quality images using the gradient penalty instead
image data type. If the input image has a double-precision
of the weight clipping, leading to performance improvements
floating-point data type, then R is 1. If it has an 8-bit unsigned
over the WGAN model and high-quality image generation.
integer data type, then R is 255.
(d) least Squared GAN (lsGAN)
In addition, Mean Square Error (MSE) is formulated as the cumu-
The lsGAN model (Mao et al., 2017) uses the least squared dis-
lative squared error between generated image X(m,n) and
tance (ls) as an adversarial loss function for both the generator
ground truth image Y(m,n). The lower the value of MSE, the
and the discriminator networks, where ls or L2 is the average
lower the error. Eq. 14 shows the MSE formula:
squared differences between the predicted and the ground
truth images. The results of this loss function are always posi- 1 X X
MSE ¼ m ¼ 1M n
tive so that it is used for minimization optimization process MN
and is also a stable alternative to the original GAN loss function.
¼ 1N ðX ðm; nÞ Y ðm; nÞÞ2 ð14Þ
Eq. 12 shows the L2 for the generated GðzÞ and real images y
(Anas et al., 2020). Where N and M denote the number of rows and columns in the
image, respectively.
1
L2 ¼ Ex;y;z ½DðGðzÞÞ 1Þ2 ð12Þ 2. UQI
2 UQI is mathematically determined without the use of any
where x is the input image, y is the output image, z is the random human visual system model, and it is designed to provide a
noise, and DðGðzÞÞ is the output of the discriminator in which its comparison of the distortion information between the original
input is the generated image from the generator G. image and the distorted image. UQI is a combination of three
factors, namely; loss of correlation, distortion of luminance,
A short description of the loss functions used in our experi- and distortion of contrast. This metric is easy to calculate and
ments is summarized in Table 2. can be used in various image processing applications (Fadl
et al., 2018). UQI is defined in Eq. 15:
3.3. Evaluation metrics rxy 2bxb
y 2r x r y
UQI ¼ ð15Þ
rx ry bx 2 þ by 2 r2x þ r2y
Automatic perceptual quality evaluation of a distorted image in
comparison with a reference image is called Full-Reference Image The three components of the equation represent loss of correla-
Quality Assessment (FR-IQA). PSNR and SSIM are FR-IQA’s state- tion, distortion of luminance, and distortion of contrast factors.
of-the-art evaluation metrics to evaluate image performance over Where x is the original image and y is the generated image. x
test sets (Saha and Wu, 2016). In the case of image synthesis, these and y are defined in Eq. 16 and Eq. 17, respectively.
metrics calculate the amount of distortion in the generated images. 1X
The simplest way to assess image quality is to calculate PSNR. b
x¼ i ¼ 1N ðxi Þ ð16Þ
N
However, PSNR does not always correlate with human visual per-
ception and image quality. Additional parameters were recom-
1X
mended to resolve the constraint of PSNR metrics, that is, SSIM. b
y¼ i ¼ 1N ðyi Þ ð17Þ
We also used UQI and VIF to evaluate the results. N
6981
A. Abu-Srhan, Mohammad A.M. Abushariah and O.S. Al-Kadi Journal of King Saud University – Computer and Information Sciences 34 (2022) 6977–6988
by the human visual system (Saha and Wu, 2016). VIF has three variety of architectural styles. The facade dataset is made up
components, namely; source, distortion, and Human Visual Sys- of 12 basic classes and sub-classes: facade, molding, cornice,
tem (HVS) as shown in Fig. 3 where C is the source image, D is pillar, window, door, sill, blind, balcony, shop, deco, and back-
the distorted image, E is the output of HVS for the source image, ground. This dataset has been manually annotated (Tylecek,
and F is the output of HVS for the distorted image. VIF is 2012). We divide the dataset into training and testing datasets,
expressed in Eq. 18, where E is the Reference Image Information where the training dataset contains 506 aligned pairs of images,
and F is the Distorted Image Information. and the testing dataset contains 100 aligned image pairs.
2. Image In-painting Dataset
Distorted Image Information
VIF ¼ ð18Þ We modified the facade dataset to be used for image in-
Reference Image Information
painting. The images in this dataset are paired, with each paired
image consisting of a modified facade image and its correspond-
4. Experimental results and discussion ing original facade image. The modified facade image has been
prepared with a white rectangle indicating the lost area. A sam-
4.1. Dataset ple of the used image in-painting dataset is shown in Fig. 4.
3. Lesion Segmentation Dataset
1. Image-to-Image Translation Dataset For lesion segmentation dataset, we used our prepared Mag-
Our model is applied to a facade dataset that contains 606 netic Resonance (MR) dataset. The images are paired for MR
images of facades collected from different sources. This data- dataset. Each paired image contains of the MR image and its
set’s images are from various cities around the world, with a corresponding manual segmentation mask. Our dataset consists
of 179 paired images (MR image and its corresponding mask).
The manual segmentation mask is prepared where the mask
is a black-and-white image with the white area indicating the
lesion. Fig. 5 shows an example of the MR lesion segmentation
dataset.
4.2. Experiments
Table 3
Image quality evaluation metrics for cGAN and Pix2Pix models with an adversarial loss replacement.
cGAN þ LL1 (Pix2Pix) 24.966 2.015 0.269 0.025 0.718 0.064 0.012 0.003
WGAN þ LL1 27.991 2.623 0.332 0.047 0.754 0.068 0.269 0.179
WGAN-GP +LL1 25.165 2.594 0.205 0.065 0.738 0.074 0.149 0.107
lsGAN þ LL1 28.089 2.561 0.352 0.090 0.900 0.074 0.283 0.091
6982
A. Abu-Srhan, Mohammad A.M. Abushariah and O.S. Al-Kadi Journal of King Saud University – Computer and Information Sciences 34 (2022) 6977–6988
Fig. 6. Results of the cGAN model with an adversarial loss replacement (a) input, (b) ground truth, (c) cGAN, (d) lsGAN, (e) WGAN, (f) WGAN-gp, respectively.
Fig. 7. Results of the Pix2Pix model with an adversarial loss replacement (a) input, (b) ground truth, (c) cGAN, (d) lsGAN, (e) WGAN, (f) WGAN-gp, respectively.
Table 4 shows the image evaluation metrics (PSNR, SSIM, UQI, results than using L1 alone. However, adding a non-
and VIF) results after adding non-adversarial loss functions to adversarial loss function to the cGAN model will not always
the cGAN and Pix2Pix adversarial loss. Fig. 8 and Fig. 9 show help because it does not use the L1 loss function and thus will
the results of four images from the cGAN and Pix2Pix models not capture the low-frequency components of the image cor-
after modifying their loss function by adding the non- rectly. For example, the SSIM values of cGAN þ Lgradient ;
adversarial loss functions mentioned earlier, respectively. cGAN þ Lcontent , and cGAN þ Lstructural are less than the cGAN
The results show that adding L1 to any of the applied non- value. This highlights the significance of combining non-
adversarial loss functions gives better results. For instance, adversarial loss with the L1 loss function. Furthermore, we
SSIM loss preserves contrast in high-frequency regions. On the notice that the pix2pix model outperforms the cGAN model.
other hand, L1 maintains low-frequency. This indicates that We proceed by adding the non-adversarial loss function to
the combination of SSIM and L1 loss functions produces better WGAN and lsGAN adversarial loss functions in addition to L1
results than using SSIM alone. In addition, adding a non- loss function. One loss function is added at a time. We com-
adversarial loss function to the Pix2Pix model produces better pared the results to determine the best combination of loss
6983
A. Abu-Srhan, Mohammad A.M. Abushariah and O.S. Al-Kadi Journal of King Saud University – Computer and Information Sciences 34 (2022) 6977–6988
Table 4
Image quality evaluation metrics for cGAN and Pix2Pix after adding non-adversarial loss function to the generator network.
Fig. 8. Results of cGAN model after adding non-adversarial loss function to the generator network (a) input, (b) ground truth, (c) Lgradient , (d) LKLD , (e) Lstructural , (f) Lsoftmax , (g)
Lcontent , respectively.
Fig. 9. Results of Pix2Pix model after adding non-adversarial loss function to the generator network (a) input, (b) ground truth, (c) Lgradient , (d) LKLD , (e) Lstructural , (f) Lsoftmax , (g)
Lcontent , respectively.
6984
A. Abu-Srhan, Mohammad A.M. Abushariah and O.S. Al-Kadi Journal of King Saud University – Computer and Information Sciences 34 (2022) 6977–6988
Table 5
Image quality evaluation metrics for lsGAN and WGAN after adding non-adversarial loss function to the generator network.
lsGAN
GAN model PSNR SSIM UQI VIF
Mean (Std) Mean (Std) Mean (Std) Mean (Std)
Fig. 10. Results of lsGAN model after adding non-adversarial loss function to the generator network (a) input, (b) ground truth, (c) Lgradient þ L1 , (d)LKLD þ L1 , (e) Lstructural þ L1 ,
(f) Lsoftmax þ L1 , (g) Lcontent þ L1 , respectively.
Fig. 11. Results of WGAN model after adding non-adversarial loss function to the generator network (a) input, (b) ground truth, (c) Lgradient þ L1 , (d) LKLD þ L1 , (e)
Lstructural þ L1 , (f) Lsoftmax þ L1 , (g) Lcontent þ L1 , respectively.
6985
A. Abu-Srhan, Mohammad A.M. Abushariah and O.S. Al-Kadi Journal of King Saud University – Computer and Information Sciences 34 (2022) 6977–6988
Table 6
Image in-painting quality evaluation metrics for lsGAN and WGAN after adding non-adversarial loss function to the generator network.
lsGAN
GAN model PSNR SSIM UQI VIF
Mean (Std) Mean (Std) Mean (Std) Mean (Std)
Fig. 12. Image in-painting results of lsGAN model after adding non-adversarial loss function to the generator network (a) input, (b) ground truth, (c) Lgradient þ L1 ,
(d)LKLD þ L1 , (e) Lstructural þ L1 , (f) Lsoftmax þ L1 , (g) Lcontent þ L1 , respectively.
Fig. 13. Image in-painting results of WGAN model after adding non-adversarial loss function to the generator network (a) input, (b) ground truth, (c) Lgradient þ L1 , (d)
LKLD þ L1 , (e) Lstructural þ L1 , (f) Lsoftmax þ L1 , (g) Lcontent þ L1 , respectively.
functions as shown in the Table 5. Fig. 10 and Fig. 11 show the Utilizing WGAN adversarial loss results in the highest possible
results of four images using the combination of lsGAN and performance. Furthermore, the use of content or gradient loss
WGAN with the non-adversarial loss functions, respectively. with L1 loss function results in an overall improved perfor-
The results show that the best loss function is mance. Output samples are blurred and lack a high-frequency
WGAN þ Lcontent þ L1 or WGAN þ Lgradient þ L1 . structure that uses L1 loss function on its own, while content
6986
A. Abu-Srhan, Mohammad A.M. Abushariah and O.S. Al-Kadi Journal of King Saud University – Computer and Information Sciences 34 (2022) 6977–6988
Fig. 14. Pix2pix Segmentation Results with its Loss Function Replacement on MR Images.
loss offers the training stability required for convergence. The L1 Dice, Hausdorff Distance (HD), and cross-entropy loss functions.
loss function handles the low-frequency components of the The segmentation results of the pix2pix model after its loss
image, and the content loss function deals with the high- fre- function replacement are shown in Figs. 14. The Dice, Jaccard
quency image components. Thus, the combination of these results of pix2pix after loss function replacement are shown
two functions can handle low and high frequency components. in Tables 7. The results show that using the Dice loss function
Content loss is used to detect the features in images, which with the pix2pix model outperforms the use of other loss
allows the loss function to know what features are in the target functions.
ground truth image rather than merely comparing pixel differ-
ences. This process allows the model being trained with this
5. Conclusions
loss function to produce a much finer detail of the generated
features and outputs.
The results of cGAN as a general-purpose image-to-image trans-
2. Image In-painting ResultscGAN and Pix2pix models can be
lation model are very impressive. They allowed greater control
used for image in-painting. We also evaluate the proposed
over the final output from the generator. We showed that the qual-
method on the in-painting problem of facade dataset. We per-
ity of cGAN results improved significantly with the use of effective
form the combination of L1 and non-adversarial loss functions
adversarial loss function, even when the network architecture
when WGAN and lsGAN adversarial loss functions are used.
remains unchanged. In addition, it is beneficial to combine the
Table 6 shows the comparison that we perform to determine
cGAN adversarial loss with non-adversarial loss functions as it
the best combination of loss functions. Fig. 12 and Fig. 13 show
enhances the generation power. Content and gradient loss func-
the results of three images using the combination of lsGAN+L1
tions are two of the most powerful functions that gain better
and WGAN+L2 with the non-adversarial loss functions, respec-
results than other loss functions, especially when combined with
tively. The results show that the best loss function is
the L1 loss function. In this paper, we concentrate on modifying
WGAN þ Lcontent þ L1 . Furthermore, the results show that incor-
the loss function in order to obtain more accurate results. There-
porating non-adversarial loss functions into the L1 loss function
fore, additional research to improve the accuracy of the proposed
improves the results.
model’s architecture could be conducted in the future. Further-
3. Lesion Segmentation Results
more, the proposed model could be combined with another unsu-
Pix2pix can be used for image segmentation. We evaluate the
pervised GAN model to construct a model that can take advantage
proposed method on the segmentation problem of brain MR
of the combined models. This paper demonstrates the effect of loss
images. We modify the pix2pix model’s loss function to
functions on cGAN with a single output. This work can be extended
improve its segmentation performance and to show the effect
to multi-modeling, in which the model produces multiple outputs.
of loss function replacement on this task. We replace L1 with
6987
A. Abu-Srhan, Mohammad A.M. Abushariah and O.S. Al-Kadi Journal of King Saud University – Computer and Information Sciences 34 (2022) 6977–6988
International Conference on Systems, Man and Cybernetics (SMC). IEEE. pp. Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A., 2017. Image-to-image translation with
1234–1238. conditional adversarial networks. In: Proceedings of the IEEE conference on
Alotaibi, A., 2020. Deep generative adversarial networks for image-to-image computer vision and pattern recognition, pp. 1125–1134.
translation: A review. Symmetry 12 (10), 1705. Johnson, J., Alahi, A., Fei-Fei, L., 2016. Perceptual losses for real-time style transfer
Anas, E.R., Onsy, A., Matuszewski, B.J., 2020. Ct scan registration with 3d dense and super-resolution. In: European conference on computer vision. Springer.
motion field estimation using lsgan. In: Annual Conference on Medical Image pp. 694–711.
Understanding and Analysis. Springer. pp. 195–207. Lin, M., 2017. Softmax gan, arXiv preprint arXiv:1704.06191.
Andreini, P., Bonechi, S., Bianchini, M., Mecocci, A., Scarselli, F., 2020. Image Liu, F., Jiao, L., Tang, X., 2019. Task-oriented gan for polsar image classification and
generation by gan and style transfer for agar plate image segmentation. clustering. IEEE Trans. Neural Networks Learn. Syst. 30 (9), 2707–2719.
Comput. Methods Programs Biomed. 184, 105268. Liu, R., Wang, X., Lu, H., Wu, Z., Fan, Q., Li, S., Jin, X., 2021. Sccgan: Style and
Bellemare, M.G., Danihelka, I., Dabney, W., Mohamed, S., Lakshminarayanan, B., characters inpainting based on cgan. Mobile Networks Appl. pp. 1–10.
Hoyer, S., Munos, R., 2017. The cramer distance as a solution to biased Liu, H., Wan, Z., Huang, W., Song, Y., Han, X., Liao, J., 2021. Pd-gan: Probabilistic
wasserstein gradients. arXiv preprint arXiv:1705.10743. diverse gan for image inpainting. In: Proceedings of the IEEE/CVF Conference on
Bhattacharjee, P., Das, S., 2018. Context graph based video frame prediction using Computer Vision and Pattern Recognition, pp. 9371–9381.
locally guided objective. In: Proceedings of the European Conference on Ma, Y., Wei, B., Feng, P., He, P., Guo, X., Wang, G., 2020. Low-dose ct image denoising
Computer Vision (ECCV). using a generative adversarial network with a hybrid loss function for noise
Chrysos, G.G., Kossaifi, J., Zafeiriou, S., 2018. Robust conditional generative learning. IEEE Access 8, 67 519–67 529.
adversarial networks. arXiv preprint arXiv:1805.08657. Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S., 2017. Least squares
Emami, H., Aliabadi, M.M., Dong, M., Chinnam, R.B., 2020. Spa-gan: Spatial attention generative adversarial networks. In: Proceedings of the IEEE International
gan for image-to-image translation. IEEE Trans. Multimedia 23, 391–401. Conference on Computer Vision, pp. 2794–2802.
Fadl, S., Han, Q., Li, Q., 2018. Surveillance video authentication using universal Saha, A., Wu, Q.J., 2016. Full-reference image quality assessment by combining
image quality index of temporal average. In: International Workshop on Digital global and local distortion measures. Signal Process. 128, 186–197.
Watermarking. Springer. pp. 337–350. Sara, U., Akter, M., Uddin, M.S., 2019. Image quality assessment through fsim, ssim,
Gatys, L.A., Ecker, A.S., Bethge, M., 2016. Image style transfer using convolutional mse and psnra comparative study. J. Comput. Commun. 7 (3), 8–18.
neural networks. In: Proceedings of the IEEE conference on computer vision and Setiadi, D.R.I.M., 2021. Psnr vs ssim: imperceptibility quality assessment for image
pattern recognition, pp. 2414–2423. steganography. Multimedia Tools Appl. 80, 8423–8444.
Goel, T., Murugan, R., Mirjalili, S., Chakrabartty, D.K., 2021. Automatic screening of Tang, Y., Yang, X., Wang, N., Song, B., Gao, X., 2020. Cgan-tm: A novel domain-to-
covid-19 using an optimized generative adversarial network. Cogn. Comput., 1– domain transferring method for person re-identification. IEEE Trans. Image
16 Process. 29, 5641–5651.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Tang, H., Liu, H., Xu, D., Torr, P.H., Sebe, N., 2021. Attentiongan: Unpaired image-to-
Courville, A., Bengio, Y., 2014. Generative adversarial nets. In: Advances in image translation using attention-guided generative adversarial networks. IEEE
neural information processing systems, pp. 2672–2680. Trans. Neural Networks Learn. Syst., 1–16 early access.
Padalkar, G.R., Patil, S.D., Hegadi, M.M., Jaybhaye, N.K., 2021. Drug discovery using Tylecek, R., 2012. The cmp facade database, Research Report CTU–CMP–2012–24
generative adversarial network with reinforcement learning. In: 2021 (Tech. Rep.). Czech Technical University.
International Conference on Computer Communication and Informatics Tzeng, E., Hoffman, J., Saenko, K., Darrell, T., 2017. Adversarial discriminative
(ICCCI). IEEE. pp. 1–3. domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C., 2017. Improved and Pattern Recognition, pp. 7167–7176.
training of wasserstein gans. In: Advances in neural information processing Waheed, A., Goyal, M., Gupta, D., Khanna, A., Al-Turjman, F., Pinheiro, P.R., 2020.
systems, pp. 5767–5777. Covidgan: data augmentation using auxiliary classifier gan for improved covid-
Han, C., Rundo, L., Murao, K., Noguchi, T., Shimahara, Y., Milacski, Z.Á., Koshino, S., 19 detection. IEEE Access 8, 91 916–91 923.
Sala, E., Nakayama, H., Satoh, S., 2021. Madgan: unsupervised medical anomaly Wang, C., Xu, C., Wang, C., Tao, D., 2018. Perceptual adversarial networks for image-
detection gan using multiple adjacent brain mri slice reconstruction. BMC to-image transformation. IEEE Trans. Image Process. 27 (8), 4066–4079.
Bioinf. 22 (2), 1–20. Yu, Y., Gong, Z., Zhong, P., Shan, J., 2017. Unsupervised representation learning with
Hognon, C., Tixier, F., Colin, T., Gallinato, O., Visvikis, D., Jaouen, V., 2020. Influence deep convolutional neural network for remote sensing images. In: International
of gradient difference loss on mr to pet brain image synthesis using gans. Conference on Image and Graphics. Springer. pp. 97–108.
Zhang, H., Sindagi, V., Patel, V.M., 2017. Image de-raining using a conditional
generative adversarial network,” arXiv preprint arXiv:1701.05957.
6988