0% found this document useful (0 votes)
39 views12 pages

A Generative Adversarial Network With Adaptive Con

1) The document proposes a novel end-to-end model called ACGAN for multi-focus image fusion using generative adversarial networks. 2) ACGAN uses an adaptive weight block to determine focused vs blurred regions in source images based on gradient differences, and guides the generator to fuse images in a way that matches the focused regions. 3) Experiments show ACGAN achieves better visual quality and objective metrics for fused images compared to state-of-the-art methods, without needing post-processing steps like decision map optimization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views12 pages

A Generative Adversarial Network With Adaptive Con

1) The document proposes a novel end-to-end model called ACGAN for multi-focus image fusion using generative adversarial networks. 2) ACGAN uses an adaptive weight block to determine focused vs blurred regions in source images based on gradient differences, and guides the generator to fuse images in a way that matches the focused regions. 3) Experiments show ACGAN achieves better visual quality and objective metrics for fused images compared to state-of-the-art methods, without needing post-processing steps like decision map optimization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Neural Computing and Applications (2020) 32:15119–15129

https://doi.org/10.1007/s00521-020-04863-1(0123456789().,-volV)(0123456789().
,- volV)

ORIGINAL ARTICLE

A generative adversarial network with adaptive constraints


for multi-focus image fusion
Jun Huang1 • Zhuliang Le1 • Yong Ma1 • Xiaoguang Mei1 • Fan Fan1

Received: 17 January 2020 / Accepted: 14 March 2020 / Published online: 30 March 2020
Ó Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract
In this paper, we propose a novel end-to-end model for multi-focus image fusion based on generative adversarial networks,
termed as ACGAN. In our model, due to the different gradient distribution between the corresponding pixels of two source
images, an adaptive weight block is proposed in our model to determine whether source pixels are focused or not based on
the gradient. Under this guidance, we design a special loss function for forcing the fused image to have the same
distribution as the focused regions in source images. In addition, a generator and a discriminator are trained to form a
stable adversarial relationship. The generator is trained to generate a real-like fused image, which is expected to fool the
discriminator. Correspondingly, the discriminator is trained to distinguish the generated fused image from the ground truth.
Finally, the fused image is very close to ground truth in probability distribution. Qualitative and quantitative experiments
are conducted on publicly available datasets, and the results demonstrate the superiority of our ACGAN over the state-of-
the-art, in terms of both visual effect and objective evaluation metrics.

Keywords Multi-focus image fusion  Adaptive weight block  Generative adversarial networks  End-to-end

1 Introduction is beneficial for human or computer operators, and for


further image-processing tasks, e.g., segmentation, feature
Due to the limitations of optical lenses, it is often difficult extraction and object recognition [15, 18]. Therefore,
for an imaging device to take an image in which all the multi-focus image fusion has become a significant research
objects are captured in focus [13]. Thus, only the objects topic in the field of image processing [10].
within the depth-of-field (DOF) have sharp appearance in In the past few decades, many methods for multi-focus
the photograph while other objects are likely to be blurred. image fusion are continually proposed by researchers, and
Multi-focus image fusion is known as a valuable technique these methods can be attributed to two categories: spatial
to obtain an all-in-focus image by fusing multiple images domain methods and transform domain methods. The
of the same scene taken with different focal settings, which methods based on spatial domain can be further divided
into three groups according to different fusion rules
[2, 9, 10]: pixel-based, block-based, and region-based
& Fan Fan
[email protected] fusion methods. Among them, the activity level measure-
ments generally adopt the gradient information as a refer-
Jun Huang
[email protected] ence. In terms of transform domain methods, after the
source images are transferred into other transform domains,
Zhuliang Le
[email protected] the fusion process is mainly implemented in the trans-
formed domains according to the characteristics of the
Yong Ma
[email protected] domains. The methods based on transform domain includes
discrete wavelet transform (DWT) [30], nonsubsampled
Xiaoguang Mei
[email protected] contourlet transform (NSCT) [9], sparse representation
(SR) [28, 33], subspace [29], etc.
1
Electronic Information School, Wuhan University,
Wuhan 430072, China

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


15120 Neural Computing and Applications (2020) 32:15119–15129

The existing fusion methods present excellent perfor- Second, our method does not need to generate decision
mance in some respects. However, there are still some map in the intermediate process, but extracts and recon-
shortages. First, existing methods often require manual structs pixel information into a fused image in pixel units,
design of activity level measurements and fusion rules, so there is no blurring near the boundary line. Finally, the
which become more complex and inadequate. Second, adaptive weight block in our method guides the generator
generating a decision map is a very common step in the to generate a fused image that is consistent with the
existing multi-focus fusion methods, which is more likely focused regions in source images, which will not suffer
to be a classification problem based on sharpness detection. from the neutralization phenomenon.
However, although these methods can correctly classify in The major contributions of this paper involve the fol-
most regions, it is often difficult to accurately determine lowing three aspects: First, the proposed ACGAN is an
the focused and defocused regions well near the boundary end-to-end deep learning-based method, which gets rid of
lines. manually designing complex activity level measurements
With the unprecedented success of deep learning, some and fusion rules, and does not require any postprocessing.
deep learning-based fusion methods have been proposed. Second, the adaptive weight block based on gradient is
We will discuss the detailed exposition of deep learning- proposed in our method, guiding the generator to adap-
based fusion methods later in Sect. 2.1. These works have tively learn the distribution of the focused pixels. Third,
provided new ideas for multi-focus image fusion and our fused results have good visual effect, which can avoid
achieved promising performance. Nevertheless, there are the problems of blurring near the boundary line in decision
still some aspects need to be improved. On the one hand, map-based methods and the neutralization phenomenon in
the deep learning framework is generally only applied to a non-decision map-based methods.
small part, e.g., feature processing, while the overall fusion The remainder of this paper is arranged as follows.
process is still in traditional frameworks. On the other Sect. 2 describes some related work, including an overview
hand, almost all methods based on deep learning face the of existing deep learning-based fusion methods and a the-
need for post-processing, such as consistency checks and oretical introduction of GANs and LSGAN. In Sect. 3, we
decision map optimization, which is not end-to-end strictly. introduce our method, i, e., ACGAN, with the problem
In order to address the above challenges, in this paper, formulation, loss functions and network architectures.
based on deep learning, we propose a novel end-to-end Qualitative and quantitative comparisons and ablation
model for multi-focus image fusion, termed as a generative experiments are performed in Sect. 4. We conclude in
adversarial network with adaptive constraints (ACGAN). Sect. 5.
Due to the different gradient distribution (clear or blurred)
between the corresponding pixels of two source images,
direct fusion will result in the fused image between clarity 2 Related work
and blur, i.e., neutralization, which is not the result we
expect. Therefore, to obtain a clear fused image, an adap- In this section, a brief introduction of the existing deep
tive weight block is employed in our model to determine learning-based image fusion methods is given. Moreover,
whether source pixels are focused or not based on the we also present a brief explanation of generative adver-
gradient. Concretely, the focused pixel shares a bigger sarial networks (GAN) and an improved network, namely
gradient, which is selected to be the input of generator to LSGAN employed in our work.
obtain the fused image. In other words, two score maps are
generated for source images, which serve as the reference 2.1 Multi-focus image fusion based on deep
to our specific loss function. In a result, the generator is learning
forced to generate a fused image that is consistent with the
focused regions in source images. In addition, to make the The deep learning-based methods are mainly based on
fused image more similar to ground truth, a discriminator convolutional neural networks (CNN) and GAN. In the
network is applied to assess whether the fused image is methods based on CNN, Liu et al. [13] applied the con-
indistinguishable from the ground truth. In the stable ad- volutional neural network to the multi-focus image fusion
versarial process between the generator and discriminator, task for the first time, and the CNN is used here to classify
more information, e.g., texture details and spatial infor- focused and defocused regions in order to generating a
mation, can be preserved to meet this high-level goal. In decision map for fusion. Du et al. [4] regarded the detection
general, the advantages of our ACGAN are concluded as of decision map as an image segmentation problem
follows: First, our method is an end-to-end model without between the focused and defocused regions from source
manually designing complex activity level measurements images, and achieved segmentation through a multi-scale
and fusion rules, nor does it need any postprocessing. convolutional neural network. Ma et al. [15] proposed an

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Neural Computing and Applications (2020) 32:15119–15129 15121

unsupervised encoder-decoder model, termed as SESF- The other is to set a ¼ b, which can force the generated
Fuse. In contrast to previous works, SESF-Fuse analysed samples to be more similar to the real ones.
sharp appearance in deep feature instead of original image.
As for the methods based on GAN, Ma et al. [17, 20]
adopted GAN to the image fusion task for the first time, 3 Proposed method
which is also an unsupervised framework, named Fusion-
GAN. Innovatively, Xu et al. [19, 26] addressed multi- In this section, with analysis of the characteristics of multi-
resolution image fusion problem with an additional dis- focus images, we provide our problem formulation with the
criminator, and established two adversarial games between proposed adaptive weight block, the definition and design
a generator and two discriminators to generate a fused of loss functions. At the end of this section, we present the
image. Then, Guo et al. [6] proposed FuseGAN for multi- design of network architecture concretely.
focus image fusion with least square GAN to enhance the
training stability. In addition, Xu et al. [27, 32] proposed 3.1 Problem formulation
two frameworks for uniform image fusion, which can
addresses multi-focus, multi-modal and multi-exposure Multi-focus images are images with different focused
image fusion. regions. The essence of multi-focus image fusion is to
extract and integrate the most important information in the
2.2 Least square GAN source images, i.e., the focused regions, to a single image.
The focused region can be characterized by the intensity
The GAN is first proposed by Goodfellow et al. [5] in 2014, distribution and texture details. The entire fusion procedure
which is one of the generative models. The generator G and is shown in Fig. 1.
discriminator D included in the GAN are two adversarial To extract and integrate the focused regions in source
models, where the generative model G captures the data images, we propose an adaptive weight block, which is
distribution and the discriminative model D is used to employed to evaluate the sharpness of each pixel based on
determine whether the input is a generated sample or a real the gradient, as presented in Fig. 2. The focused regions
sample. In addition, an adversarial game is established share bigger gradient. Specifically, the pixels with larger
between G and D. Particularly, the generator aims to gradient are selected by us as the optimization target at the
generate a sample to fool the discriminator, while the corresponding pixel positions of the two source images,
discriminator tries to determine whether a sample is from while the smaller ones are abandoned. Therefore, the
the real sample or not. Finally, the sample generated by the specific content loss function designed by us with the
generator cannot be distinguished by the discriminator. adaptive weight block can adaptively guide the fused
In the following years after the advent of GAN, many image to approximate the intensity distribution and gradi-
variants of GAN are proposed [12, 31]. Specifically, in ent distribution of the focused regions from source images
2017, Mao et al. [22] proposed the least square GAN, i.e.,
LSGAN, to improve the stability of training process. The
sigmoid cross entropy loss function for the discriminator
adopted in the regular GAN may lead to the gradient-
vanishing problem when training. Therefore, the least
squares loss function for the discriminator is introduced in
LSGAN to address the above mentioned problem. The
optimization functions for LSGAN are shown as follows:
1
min VLSGAN ðDÞ ¼ Ex  Pdata ðxÞ ½ðDðxÞ  bÞ2 
D 2 ð1Þ
1
þ Ez  Pz ðzÞ ½ðDðGðzÞÞ  cÞ2 ;
2
1
min VLSGAN ðGÞ ¼ Ez  Pz ðzÞ ½ðDðGðzÞÞ  aÞ2 : ð2Þ
G 2
where b and c denote the labels for real data and fake data,
respectively, and a is the label that the generator expects the
discriminator to believe for fake data. One of the optimiza-
tion strategies is to set the b  a ¼ 1 and b  c ¼ 2, which
minimizing the v2 divergence between Pdata þ Pg and 2Pg . Fig. 1 The fusion procedure of the proposed ACGAN

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


15122 Neural Computing and Applications (2020) 32:15119–15129

Fig. 2 The source images and corresponding gradient maps. From left to right: source image 1, source image 2, the gradient map of source image
1, and the gradient map of source image 2

at the pixel level. The ablation experiment of the adaptive LGcon ¼ Lint þ a1 Lgrad þ a2 LSSIM ; ð4Þ
weight block is also conducted later in Sect. 4.5.1. In
addition, since the goal of our optimization is based on where a1 and a2 are used to control the trade-off, which
each pixel, in order to avoid chromatic aberrations in the will be analyzed later in Sect. 4.5.4.
fused image and ensure the overall naturalness of it, we add The adaptive weight block acts on the intensity loss Lint
the SSIM loss term. Based on the principle of statistics, the and gradient loss Lgrad , which guides the generator to
mean of the larger scores in each source image patch is generate a fused image that is consistent with the focused
calculated as the weight of corresponding SSIM loss term. regions in pixel level. Concretely, the Lint can guide the
The effect of SSIM loss term will be verified later in fused image to have the same intensity distribution as the
Sect. 4.5.2. Working with the adaptive weight block on focused regions in source images, which is presented as
content loss, our ACGAN can simultaneously achieve the follows:
clear and natural fused image. 1 XX
Lint ¼ ðS1i;j  minðS1i;j ; S2i;j ÞÞ
To further improve the quality of the fused image and HW i j
bring it closer to our ideal ground truth, we add a dis- ð5Þ
criminator to establish an adversarial relationship with the  ðIf i;j  I1i;j Þ2 þ ðS2i;j  minðS1i;j ; S2i;j ÞÞ
generator. The generator aims to generate a real-like image  ðIf i;j  I2i;j Þ2 ;
based on our specifically designed content loss to fool its
corresponding discriminator, while the discriminator aims where H and W mean the height and width of the source
to distinguish the differences between the generated image images, i.e., I1 and I2 , and the fused image, i.e., If . In
and ground truth. Finally, the discriminator cannot distin- particular, SðÞ is the score map generated by the adaptive
guish the generated image from ground truth, and the fused weight block based on the gradient, whose size is also
image can reach a higher quality, i, e, richer texture details H  W. i and j mean the pixel in the i-th row and the j-th
and more spatial information. The influence of the addi- column. The minðS1i;j ; S2i;j Þ means the minimum gradient
tional discriminator will be analyzed later in Sect. 4.5.3. score of the corresponding pixel in the source images.
Similarly, the Lgrad is employed to guide the fused
3.2 Loss function image to have the same gradient distribution, i.e., texture
details, as the focused regions in source images. Lgrad is
The loss function in our work can be divided into the loss formalized as follows:
of generator LG and the loss of discriminator LD . 1 XX
Lgrad ¼ ðS1i;j  minðS1i;j ; S2i;j ÞÞ
HW i j
3.2.1 Generator loss ð6Þ
 ðrIf i;j  rI1i;j Þ2 þ ðS2i;j  minðS1i;j ; S2i;j ÞÞ
The loss function of generator LG consists of content loss  ðrIf i;j  rI2i;j Þ2 :
LGcon and adversarial loss LGadv . Due to the instability of
GAN, the introduction of content loss adds a series of On this basis, we employ the SSIM loss term to avoid
constraints to the generator to achieve the fusion goal, chromatic aberrations in the fused image and ensure the
while the adversarial loss allows the fused image to meet overall naturalness of it. It is worth noting that, for each
stricter requirements. LG is defined as follows: overall source image, structural information with a larger
average gradient is preserved. Specifically, the LSSIM is
LG ¼ LGcon þ LGadv : ð3Þ
defined as follows:
Among them, LGcon includes intensity loss, gradient loss
and SSIM loss, which can be expressed as follows:

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Neural Computing and Applications (2020) 32:15119–15129 15123

1 XX 3.3 Network architecture


LSSIM ¼ ðS1i;j  minðS1i;j ; S2i;j ÞÞ
HW i j
ð7Þ 3.3.1 Generator architecture
 ð1  SSIMIf ;I1 Þ þ ðS2i;j  minðS1i;j ; S2i;j ÞÞ
 ð1  SSIMIf ;I2 Þ; The network architecture of generator is illustrated in
Fig. 3. The design of our generator draws on the idea of the
where SSIM stands for structural similarity and is an
pseudo-siamese network. For two different source images,
indicator for measuring the similarity between the source
we use different parameters to extract different features
images and the fused image. The larger the SSIM, the more
with two branches, which is suitable for processing source
similar the structure of the fused image is to the source
images with different focused regions. Adequate informa-
image. Mathematically, SSIM is defined as follows:
tion exchange is the biggest characteristics in our genera-
X 2lx lf þ C1 2rx rf þ C2 tor, which is reflected in the following three parts.
SSIMX;F ¼  2
x;f
lx þ lf þ C1 rx þ r2f þ C2
2 2 First, the information exchange on each branch as shown
ð8Þ in red, green and purple arrows: Similar to DenseNet [22],
rxf þ C3
 ; each layer is established a short direct connection with
rx rf þ C3
other layers in a feed-forward fashion. Avoiding vanishing
where the three items on the right hand reflect the com- gradients, strengthening feature propagation and reducing
parisons of brightness, contrast and structural, respectively. the number of parameters are the main advantages of this
x and f express the image patches in source image X and design. In particular, the convolution kernel of the first
fused image F. l denotes the mean value, while r denotes convolutional layer is 5  5, while the others in the next
the standard deviation/covariance. three convolutional layers are 3  3. Second, the infor-
The adversarial loss of the generator LGadv is used to mation exchange between branches as shown in blue
force the fused image to achieve a higher quality, which is arrows: The information between branches is also
formalized as follows: exchanged by concatenating and convolution, which can be
seen as ‘‘pre-fusion’’. Third, the final fusion: the outputs of
1X N 
LGadv ¼ DðIfn Þ  aÞ2 ; ð9Þ two branches are concatenating together, which is the input
N n¼1 of the last convolutional layer. The output of the last
where N denotes the number of fused image, and we convolutional layer with the kernal size of 1  1 is the
employ a as the probability label that the generator expects fused image. It is worth noting that throughout the process
the discriminator to judge the fused image. we use ‘‘SAME’’ as the padding mode to keep the size of
the feature map consistent with source images.
3.2.2 Discriminator loss
3.3.2 Discriminator architecture
The discriminator in ACGAN plays a role of discriminat-
ing between the ground truth and the generated fused The discriminator is designed to establish an adversarial
image. The adversarial loss of discriminator can calculate relationship with the generator. Particularly, it aims to
the least square loss to identify whether the distribution in distinguish the generated images from the ground truth,
fused image is unrealistic, and encourage the fused image which is illustrated in Fig. 4. There are four convolution
to match the realistic distribution. The discriminator loss layers with the kernal size of 3  3 and one linear layer
LD is defined as follows: with the kernal size of 1  1 in the discriminator. The leaky
ReLU activation function is employed in all four convo-
1X N
lution layers with the stride of 2. We use the last linear
LD ¼ ½DðIfn Þ  b2 þ ½DðIgn Þ  c2 ; ð10Þ
N n¼1 layer to acquire the probability scalar.

where b is the random label of the fused image, which is


expected to be small enough, while c is the random label 4 Experimental results and analysis
of ground truth, which is expected to be large enough, as
the fused image is expected to be judged by discriminator In this section, we validate the effectiveness of our
as fake data, while the ground truth is expected to be real ACGAN by comparing it with several state-of-the-art
data. methods on publicly available datasets. Not only the
qualitative comparisons but also the quantitative compar-
isons are implemented in our work. We also conduct the

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


15124 Neural Computing and Applications (2020) 32:15119–15129

Fig. 3 The network architecture


of generator I1

LReLU

LReLU
LReLU

LReLU
Conv

Conv
Conv

Conv

Conv
Tanh
LReLU

LReLU

LReLU

LReLU
Conv

Conv

Conv

Conv
If
I2

The detailed training procedure is summarized in Alg. 1.


LReLU

LReLU

LReLU

LReLU

Linear
Image
Conv

Conv

Conv

Conv
Scalar We train the generator and discriminator iteratively to
establish an adversarial relationship. Among them, our
total number of epoch is m, it takes n steps to train each
Fig. 4 The network architecture of discriminator
epoch, the number of training generator is x times of the
number of training discriminator, and the batch size is set
ablation experiments of the adaptive weight block, the
as k. Concretely, m, x and k is set to 20, 2 and 32,
SSIM loss term, and the discriminator. Moreover, the
respectively. The n is the ratio between the whole number
analysis of a1 and a2 is also performed.
of patches and batch size. We update all parameters by
AdamOptimizer in our ACGAN. Moreover, we set a1 ¼ 3
and a2 ¼ 10 in Eq. (4).
In addition, the images in training data are grayscale
images with single channel, while the images in testing
data are color images with RGB channels. In order to fuse
the images in testing data with the trained model, YCbCr
color space is employed in our work. Y channel (luminance
channel) can represent structural details and the brightness
variation, which is devoted to participating in fusion. Cb
and Cr channels are chrominance channels, which should
not be changed. Finally, the fused image is transferred back
to RGB color space with Cb and Cr channels to acquire the
final result.

4.1 Experimental settings 4.2 Comparative methods and evaluation


metrics
The dataset we train our network is from a public dataset
website1. In order to verify the generalization ability of our We select five state-of-the-art methods to evaluate our
model, we test our network in different public multi-focus ACGAN on publicly available datasets, including, GFDF
image datasets, i.e., Lytro dataset [23] and some standard [24], DSIFT [14], S-A [11], CNN [13] and SESF [15]. In
images for multi-focus image fusion2. The image pairs order to have a comprehensive assessment. CNN and SESF
have been accurately aligned, and image registration are methods based on deep learning, while others are tra-
techniques are required for unaligned images [16, 21]. ditional methods, and GFDF, DSIFT, CNN and SESF are
When training, the expansion strategy of tailoring and methods based on the decision map.
decomposition is employed in our work to get a larger data In order to have a more accurate evaluation of the
set, and the training set is cropped to 23, 714 groups of size experimental results. we utilize six metrics to evaluate the
60  60 with two source images and one ground truth in fusion results, including, sum of the correlations of dif-
each group. We employ 30 image pairs from the two ferences (SCD) [1], visual information fidelity (VIF) [8],
datasets for testing. correlation coefficient (CC) [3], QAB=F , which measure
between fused image and source images, and entropy (EN)
1
https://sites.google.com/view/durgaprasadbavirisetti/datasets. [25], standard deviation (SD) [25], which measure the
2
https://www.mathworks.com/matlabcentral/fileexchange/45992- fused image itself.
standard-images-for-multifocus-image-fusion.

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Neural Computing and Applications (2020) 32:15119–15129 15125

4.3 Qualitative comparisons demonstrate that our method has the greatest correlation
with source images and the best contrast, and the edge
The intuitive results on four typical image pairs are shown information can be preserved to the greatest extent. In
in Fig. 5. Our ACGAN not only performs well on the addition, our method can perform the best visual effect.
overall image but also in local details, especially for the In order to verify the convenience of our method, the
boundary of focused and defocused regions. As can be seen mean and standard deviation of running time for our
in the enlarged regions in the red boxes in the above two ACGAN and the competitors are presented in Table 1,
groups of results, the results of GFDF, DSIFT, CNN and where the methods, i.e., SESF and our ACGAN are tested
SESF that are all based on decision map cannot accurately on GPU RTX 2080Ti, while other methods are tested on
retain details near the junction of focused and defocused CPU i7-8750H (The testing environments of the competi-
regions, and lose details due to misclassification, e.g., the tors are consistent with the original paper). Clearly, our
pip on the ceiling and details between fingers. On the ACGAN can also perform comparable efficiency.
contrary, our ACGAN can accurately preserve the details
in the focused regions. In addition, as for the remaining 4.5 Ablation experiments
comparative method that are not based on the decision
map, such as S-A, it suffers from the neutralization phe- 4.5.1 Adaptive weight block analysis
nomenon and blurring near the boundary line, e.g., the
details between the fingers in the upper right group, the The adaptive weight block is employed in our model to
edge of the hat in the bottom left group and the building guide the generator to adaptively learn the distribution of
behind the monkey in the bottom right group. By com- the focused pixels, avoiding the neutralization phe-
parison, our ACGAN can preserve them better. nomenon. In order to show the effect of the adaptive
weight block, we perform the following comparative
4.4 Quantitative comparisons experiments: (a) The adaptive weight block is not
employed. (b) The adaptive weight block is employed. The
The quantitative comparisons of our ACGAN with the experimental settings of two comparative experiments are
competitors on the 30 image pairs in the dataset are also the same and the results are shown in the Fig. 7. By
reported, which is summarized in Fig. 6. As can be seen comparison, The fused result without the adaptive weight
from the statistical results, our ACGAN can achieve the block suffers from the neutralization phenomenon, while
largest mean values on all six metrics. These results the fused result with the adaptive weight block can present

Fig. 5 Qualitative results on the Lytro dataset. In each group, the first DSIFT (2015) and SESF (2019), and the fourth column are the results
column are the source images; the second column are the results of of S-A (2018) and our ACGAN
GFDF (2019) and CNN (2017); the third column are the results of

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


15126 Neural Computing and Applications (2020) 32:15119–15129

2 1.05
GFDF: 0.6987 DSIFT: 0.7 GFDF: 1.186 DSIFT: 1.1903 GFDF: 0.9712 DSIFT: 0.9711
2.5
S-A: 0.5266 CNN: 0.7021 S-A: 1.1282 CNN: 1.1825 S-A: 0.9717 CNN: 0.9715
SESF: 0.6981 ACGAN: 1.4362 1.8 SESF: 1.1903 ACGAN: 1.2274 SESF: 0.9709 ACGAN: 0.977
2 1
1.6
SCD

VIF

CC
1.5
1.4 0.95
1
1.2
0.5 0.9
1
0
0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 10 20 30
Image pairs Image pairs Image pairs

0.4
1 GFDF: 0.6375 DSIFT: 0.5826 8.5 GFDF: 7.3793 DSIFT: 7.3776 GFDF: 0.2191 DSIFT: 0.2192
S-A: 0.5058 CNN: 0.5864 S-A: 7.353 CNN: 7.3761 S-A: 0.2142 CNN: 0.219
0.35
0.9 SESF: 0.6354 ACGAN: 0.6409 8 SESF: 7.3813 ACGAN: 7.4661 SESF: 0.2193 ACGAN: 0.2383
0.8 0.3
7.5
QAB/F

EN

SD
0.7
0.25
7
0.6
0.2
0.5 6.5

0.4 0.15
6

0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 10 20 30
Image pairs Image pairs Image pairs

Fig. 6 Quantitative comparison of our ACGAN with five state-of-the-art methods. Means of metrics for different methods are shown in the
legends. Optimal values are shown in red and suboptimal values in blue

Table 1 The mean and standard


Methods GFDF [7] DSIFT [14] S-A [11] CNN [13] SESF [15] ACGAN
deviation of running time in
different methods. (unit: Mean 0.2816 5.8540 0.2435 116.9590 0.3396 0.0421
second)
STD 0.1251 3.5241 0.1032 54.3958 0.2245 0.0221

Fig. 7 Ablation experiment of the adaptive weight block. From left to right: source image 1, source image 2, the fused result without adaptive
weight block and the result with adaptive weight block

the focused regions well. As a result, it proves that the SSIM loss term is verified by the following comparative
adaptive weight block can avoid the neutralization phe- experiments: (c) The SSIM loss term is not employed.
nomenon well. (d) The SSIM loss term is employed. The experimental
settings of two comparative experiments are the same and
4.5.2 SSIM loss term analysis the results are shown in the Fig. 8. The fused image with
the SSIM loss term employed has almost the same color
In order to make the fused image more similar to the distribution as the focused area in the source image. On the
focused regions in the source image, including the color other hand, the fused image without the SSIM loss term
distribution and overall naturalness, the SSIM loss term is suffers from the chromatic aberrations with darker color,
introduced to address the above issues. The effect of the whose overall naturalness is also worse than the other one.

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Neural Computing and Applications (2020) 32:15119–15129 15127

Fig. 8 Ablation experiment of the SSIM loss term. From left to right: source image 1, source image 2, the fused result without SSIM loss term
and the result with SSIM loss term

Fig. 9 Ablation experiment of the discriminator. From left to right: source image 1, source image 2, the fused result without discriminator and the
result with discriminator

Therefore, it can be concluded that the SSIM loss term has (0.3, 1.5, 3, 4.5 and 6) for a1 , and determine the optimal
a positive impact on the fused image. value of a1 by comparing the results of quantitative com-
parison, which is summarized in Fig. 10. As can be seen
4.5.3 Discriminator analysis from the statistical results, when a1 ¼ 3, the results of the
quantitative comparison are optimal overall. Therefore,
We use the discriminator to establish a stable adversarial parameter a1 is determined to be set to 3.
relationship with the generator, forcing the fused image to Next, based on a1 ¼ 3, we add the LSSIM loss term for a
be more similar to ground truth, i.e., the focused regions in higher fusion quality. Similarly, we also select 5 values (1,
source images. In order to show the effect of the discrim- 5, 10, 15 and 20) for a2 , and determine the optimal value of
inator. The following comparative experiments are per- a2 by comparing the quantitative comparison results, which
formed: (e) The discriminator is not employed. (f) The is summarized in Fig. 11. As can be seen from the statis-
discriminator is employed. The experimental settings of tical results, when a2 ¼ 10, the results of the quantitative
two comparative experiments are the same and the results comparison are optimal overall. Therefore, parameter a2 is
are shown in the Fig. 9. The result with the discriminator is determined to be set to 10.
more similar to the focused regions in the source image. In
contrast, the result without the discriminator suffers from
more blurred details. It can be seen that the discriminator 5 Conclusion and future work
plays an important role in the fusion process.
In this paper, we propose a new end-to-end model for
4.5.4 Parameter analysis multi-focus image fusion based on generative adversarial
networks, termed as ACGAN. Our ACGAN overcomes the
In our work, Lint and Lgrad are employed to guide the fused difficulty of neutralization phenomenon and blurring near
image to have the same intensity and gradient distribution the boundary line with an adaptive weight block. In addi-
as the focused regions in source images, and the LSSIM is tion, an adversarial relationship between the generator and
used to avoid chromatic aberrations in the fused image and discriminator is established to generate the fused images of
ensure the overall naturalness of it based on Lint and Lgrad . higher quality. For qualitative experiments, our ACGAN
Therefore, in order to obtain the optimal values of a1 and not only performs well on the overall image but also in
a2 , we first analyze a1 without LSSIM . We select 5 values local details, especially for the boundary of focused and

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


15128 Neural Computing and Applications (2020) 32:15119–15129

3 =0.3: 0.3981 =1.5: 1.015 =0.3: 1.0076 =1.5: 1.2106 =0.3: 0.945 =1.5: 0.9721
1 1
=4.5:-0.4875 1 1 1.1 1 1
1
=3: 1.3505 1 =3: 1.3842 =4.5:0.8673 =3: 0.9728 =4.5:0.9489
2 1 1 1 1
1
=6: 0.7995 =6: 1.0692 =6: 0.943
2 1 1
1
SCD

1.5

VIF

CC
1
0.9

0 1
0.8
-1
0.5 0.7
0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30
Image pairs Image pairs Image pairs

1 9 0.45
=0.3: 0.3983 =1.5: 0.5226 =0.3: 7.3669 =1.5: 7.4366 =0.3: 0.2225 =1.5: 0.2474
1 1 8.5 1 1 0.4 1 1
=3: 0.6176 =4.5: 0.5715 =3: 7.4527 =4.5:7.1677 1
=3: 0.2607 1
=4.5:0.1916
1 1 1 1
0.8 =6: 0.4815 =6: 7.3907 =6: 0.2249
1 8 1 0.35 1

7.5 0.3
QAB/F

EN

SD
0.6
7 0.25

6.5 0.2
0.4
6 0.15

5.5 0.1
0.2
0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 10 20 30
Image pairs Image pairs Image pairs

Fig. 10 Quantitative comparison of different a1 values. Means of metrics for different a1 values are shown in the legends. Optimal values are
shown in red and suboptimal values in blue (color figure online)

1.1
=1: 0.882 = 5: 1.3296 2 =1: 1.1541 = 5: 1.1492 2
=1: 0.9712 2
= 5: 0.9704
2.5 2
=10:
1.4362
2
=15:1.3392
2
=10:
1.2274
2
=15:1.1549 =10: 0.977 =15:0.9773
2 2 2 2 1.05 2 2
=20:
1.3378 =20:
1.1593 =20: 0.9774
2 1.8 2 2
2 1
1.6
SCD

0.95
VIF

CC

1.5
1.4
0.9
1
1.2
0.85
0.5
1 0.8
0
0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 10 20 30
Image pairs Image pairs Image pairs

9 0.4
0.9 2
=1: 0.6099 2
= 5: 0.6496 =1: 7.4236 = 5: 7.4055 2
=1: 0.2285 2
=5: 0.2364
8.5 2 2
=10:
0.6409 =15:0.6343 =10:
7.4661 =15:7.4333 =10:
0.2383 =15:
0.2384
2 2 2 2 0.35 2 2
2
=20:
0.6503 =20:
7.4587 2
=20:
0.2373
0.8 8 2
0.3
QAB/F

0.7 7.5
EN

SD

0.25
7
0.6
0.2
6.5
0.5
6 0.15
0.4
0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 10 20 30
Image pairs Image pairs Image pairs

Fig. 11 Quantitative comparison of different a2 values. Means of metrics for different a2 values are shown in the legends. Optimal values are
shown in red and suboptimal values in blue (color figure online)

defocused regions. Quantitative experiments verify that our There may be potential limitation in our work, and our
method performs better than the existing state-of-the-art method is not based on decision map. In the existing
methods on six widely used metrics. methods based on decision map, the pixels of the fused

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Neural Computing and Applications (2020) 32:15119–15129 15129

image are completely consistent with the pixels of the 14. Liu Y, Liu S, Wang Z (2015) Multi-focus image fusion with
source images. In contrast, the pixels in our fused image dense sift. Inf Fusion 23:139–155
15. Ma B, Ban X, Huang H, Zhu Y (2019) Sesf-fuse: An unsuper-
are obtained by learning the pixels in the focused regions in vised deep model for multi-focus image fusion. arXiv preprint
the source images. Although it can overcome the problem arXiv:1908.01703
of blurring near the boundary line in the existing decision 16. Ma J, Jiang X, Jiang J, Zhao J, Guo X (2019) LMR: learning a
map-based methods and present good visual effect, it is two-class classifier for mismatch removal. IEEE Trans Image
Process 28(8):4045–4059
difficult for the pixels in our fused image to be completely 17. Ma J, Liang P, Yu W, Chen C, Guo X, Wu J, Jiang J (2020)
the same as the pixels in the focused regions in the source Infrared and visible image fusion via detail preserving adversarial
images. Therefore, in our future work, we will be com- learning. Inf Fusion 54:85–98
mitted to solving the problem of blurring near the boundary 18. Ma J, Ma Y, Li C (2019) Infrared and visible image fusion
methods and applications: a survey. Inf Fusion 45:153–178
line based on decision map. 19. Ma J, Xu H, Jiang J, Mei X, Zhang XP (2020) DDcGAN: a dual-
discriminator conditional generative adversarial network for
Acknowledgements This work was supported by the National Natural multi-resolution image fusion. IEEE Trans Image Process
Science Foundation of China under Grant No. 61903279. 29:4980–4995
20. Ma J, Yu W, Liang P, Li C, Jiang J (2019) FusionGAN: a gen-
Compliance with ethical standards erative adversarial network for infrared and visible image fusion.
Inf Fusion 48:11–26
21. Ma J, Zhao J, Jiang J, Zhou H, Guo X (2019) Locality preserving
Conflict of interest The authors declare that they have no conflict of matching. Int J Comput Vis 127(5):512–531
interest. 22. Mao X, Li Q, Xie H, Lau RY, Wang Z, Paul Smolley S (2017) Least
squares generative adversarial networks. In: Proceedings of the
IEEE international conference on computer vision, pp 2794–2802
References 23. Nejati M, Samavi S, Shirani S (2015) Multi-focus image fusion
using dictionary-based sparse representation. Inf Fus 25:72–84
1. Aslantas V, Bendes E (2015) A new image quality metric for 24. Qiu X, Li M, Zhang L, Yuan X (2019) Guided filter-based multi-
image fusion: the sum of the correlations of differences. AEU Int focus image fusion through focus region detection. Signal Process
J Electron Commun 69(12):1890–1896 Image Commun 72:35–46
2. Chen J, Li X, Luo L, Mei X, Ma J (2020) Infrared and visible 25. Roberts JW, Van Aardt JA, Ahmed FB (2008) Assessment of
image fusion based on target-enhanced multiscale transform image fusion procedures using entropy, image quality, and mul-
decomposition. Inf Sci 508:64–78 tispectral classification. J Appl Remote Sens 2(1):023522
3. Deshmukh M, Bhosale U (2010) Image fusion and image quality 26. Xu H, Liang P, Yu W, Jiang J, Ma J (2019) Learning a generative
assessment of fused images. Int J Image Process 4(5):484 model for fusing infrared and visible images via conditional
4. Du C, Gao S (2017) Image segmentation-based multi-focus generative adversarial network with dual discriminators. In:
image fusion through multi-scale convolutional neural network. Proceedings of twenty-eighth international joint conference on
IEEE Access 5:15750–15761 artificial intelligence (IJCAI-19), pp 3954–3960
5. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, 27. Xu H, Ma J, Le Z, Jiang J, Guo X (2020) Fusiondn: A unified
Ozair S, Courville A, Bengio Y (2014) Generative adversarial densely connected network for image fusion. In: Proceedings of
nets. In: Advances in neural information processing systems, the thirty-fourth AAAI conference on artificial intelligence
pp 2672–2680 28. Yang B, Li S (2009) Multifocus image fusion and restoration
6. Guo X, Nie R, Cao J, Zhou D, Mei L, He K (2019) Fusegan: with sparse representation. IEEE Trans Instrum Meas
learning to fuse multi-focus image via conditional generative 59(4):884–892
adversarial network. IEEE Trans Multimed 21:1982–1996 29. Yang L, Guo B, Ni W (2007) Multifocus image fusion algorithm
7. Haghighat MBA, Aghagolzadeh A, Seyedarabi H (2011) Multi- based on contourlet decomposition and region statistics. In:
focus image fusion for visual sensor networks in DCT domain. Fourth international conference on image and graphics (ICIG
Comput Electr Eng 37(5):789–797 2007), pp 707–712. IEEE
8. Han Y, Cai Y, Cao Y, Xu X (2013) A new image fusion per- 30. Yang Y, Huang S, Gao J, Qian Z (2014) Multi-focus image fusion
formance metric based on visual information fidelity. Inf Fusion using an effective discrete wavelet transform based algorithm.
14(2):127–135 Meas Sci Rev 14(2):102–108
9. Li H, Chai Y, Li Z (2013) Multi-focus image fusion based on 31. Zhang H, Sun Y, Liu L, Wang X, Li L, Liu W (2018) Clothin-
nonsubsampled contourlet transform and focused regions detec- gout: a category-supervised GAN model for clothing segmenta-
tion. Optik Int J Light Electron Opt 124(1):40–51 tion and retrieval. Neural Comput Appl. https://doi.org/10.1007/
10. Li S, Kang X, Hu J, Yang B (2013) Image matting for fusion of s00521-018-3691-y
multi-focus images in dynamic scenes. Inf Fusion 14(2):147–162 32. Zhang H, Xu H, Xiao Y, Guo X, Ma J (2020) Rethinking the image
11. Li W, Xie Y, Zhou H, Han Y, Zhan K (2018) Structure-aware fusion: A fast unified image fusion network based on proportional
image fusion. Optik 172:1–11 maintenance of gradient and intensity. In: Proceedings of the thirty-
12. Liu L, Zhang H, Xu X, Zhang Z, Yan S (2019) Collocating fourth AAAI conference on artificial intelligence
clothes with generative adversarial networks cosupervised by 33. Zhang Q, Liu Y, Blum RS, Han J, Tao D (2018) Sparse repre-
categories and attributes: a multidiscriminator framework. IEEE sentation based multi-sensor image fusion for multi-focus and
Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS. multi-modality images: a review. Inf Fus 40:57–75
2019.2944979
13. Liu Y, Chen X, Peng H, Wang Z (2017) Multi-focus image fusion Publisher’s Note Springer Nature remains neutral with regard to
with a deep convolutional neural network. Inf Fusion 36:191–207 jurisdictional claims in published maps and institutional affiliations.

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:

1. use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
2. use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at

[email protected]

You might also like