Hu2021 Article AnImprovedMulti-focusImageFusi

Applied Intelligence (2021) 51:4453–4469
https://doi.org/10.1007/s10489-020-02066-8
An improved multi-focus image fusion algorithm based

on multi-scale weighted focus measure
Zhanhui Hu1 · Wei Liang1 · Derui Ding1 · Guoliang Wei2
Accepted: 4 November 2020 / Published online: 4 January 2021

© Springer Science+Business Media, LLC, part of Springer Nature 2021
Abstract
This paper focuses on developing an improved multi-focus image fusion (MFIF) algorithm. Existing spatial domain
algorithms dependent on the obtained fusion decision map still lead to unexpected ghosting, blurred, edges as well as
blocking effects such that the visual effect of image fusion is seriously degraded. To overcome these shortages, an improved
MFIF algorithm is developed with the help of a novel multi-scale weighted focus measure and a decision map optimization
technique. First, a novel multi-scale measurement template is designed in order to effectively extract the gradient information
of rich texture regions, smooth regions as well as transitional regions between the aforementioned regions simultaneously.
Then, an improved calculation scheme of the focus score matrix is designed based on the weighted sum of the focus measure
maps in each region window centered on a concerned pixel, under which the advantage of pixel-by-pixel weighting is
employed. In what follows, an initial decision map is obtained in light of the focus score matrix combined with threshold
filtering, which is employed to eliminate the small isolated regions caused by some misclassified pixels. Furthermore, an
accurate decision map is received with the help of the optimization capability of guided filtering to avoid edge unexpected
artificial textures. In comparison with block-based fusion algorithms, our algorithm developed in this paper extracts the
focus regions pixel-by-pixel, thereby helping to reduce the blocking effects that appear in the fusion image. Finally, some
intensive comparison analysis based on common datasets is performed to verify the superiority over state-of-the-art methods
in both visual qualitative and quantitative evaluations.
Keywords Multi-focus image fusion · Multi-scale weighted focus measure · Focus score matrix · Decision map ·
Guided filtering
1 Introduction
This work was supported in part by the National Natural Science Image fusion technologies received ever-increasing research
Foundation of China under Grants 61973219 and 61933007.
concern due to various practical applications including
Derui Ding medical image fusion [1], infrared and visible light image
[email protected] fusion [2, 3], remote sensing image fusion [4–7] as well
Zhanhui Hu
as multi-focus image fusion (MFIF). As its name implies,
[email protected] image fusion is essential to merge information from two
or more images into one image, which can improve the
Wei Liang limitations and differences of a single sensor image and
[email protected]
obtain a more comprehensive and understandable scene
Guoliang Wei description [5, 8, 9]. The fusion image contains more
[email protected] abundant information, and therefore better satisfies the
needs of human subjective visual perception and post-
1 Department of Control Science and Engineering,
processing of computers.
University of Shanghai for Science and Technology, Clear panoramic pictures could not be easily obtained
Shanghai 200093, China due probably to the limitation of depth of field of optical
2 College of Science, University of Shanghai for Science lens. Usually, images with different focus points, named
and Technology, Shanghai 200093, China as multi-focus images, could simultaneously contain clear
4454 Z. Hu et al.
regions and blurred regions, as shown in Fig. 1. Natural by reorganizing their image blocks based on the larger
inspiration is to integrate all clear targets from different measurement. It should be pointed out that the fusion per-
images into one image, where clear targets come from the formance of block-based algorithms is usually dependent
multi-focus feature of images. Benefiting from effective on the selected block size, Since image blocks may con-
feature extraction [10–12], the target in the fusion image is tain both focus and defocus regions. In order to improve the
clearer than that in the source image and the corresponding fusion performance, Bai et al. developed a (MFIF) algorithm
texture information is more abundant too. In the past few based on quadtree decomposition [26], which can flexibly
years , various MFIF algorithms have been reported in decompose the original image into image blocks of different
the literature and can be roughly generalized into two sizes, and to a certain extent compensates for the shortcom-
categories: the one is the transform-domain-based fusion ings of manual block. Another important branch of spatial
algorithm and the other is the spatial-domain-based fusion domain algorithms is the ones based on the focus region
algorithm [13]. extraction. The representative algorithms include the algo-
The transform domain algorithms are usually performed rithm based on boundary finding [27], the algorithm based
via inverse transformation which reconstructs the fusion on dense SIFT [28] as well as the one based on multi-scale
image in light of the fusion transform coefficients in weighted gradient [29]. Similar to that of block-based algo-
the transform domain. Initially, image fusion algorithms rithms, the fusion performance is highly dependent on the
based on multi-scale geometric analysis were very popular, segmentation accuracy and is also sensitive to the size of the
such as Laplace pyramid transformation (LAP) [14, 15], segmentation unit.
non-subsampled contourlet transformation (NSCT) [16], On another research frontier, neural networks undertake
and discrete wavelet transformation (DWT) [17, 18]. a more and more important role in image processing due
With the development of sparse representation theory, mainly to their outstanding feature extraction capability.
some algorithms based on sparse representation have Among them, the pulse-coupled neural network (PCNN)
been proposed and showed good performance [19, 20]. have been employed to deal with image fusion issues by
Furthermore, it is worth mentioning that the image feature adopting visual signals as the processing mechanism. The
space, which can be regarded as a special transform domain improved versions for MFIF issues have been reported in
[21], can be utilized via independent component analysis [30] by combining guided filtering and in [31] by utilizing
(ICA) [22], sparse representation (SR) [19], as well as orientation information. In recent years, MFIF algorithms
high-order singular value decomposition (HOSVD) [23]. As based on convolution neural networks (CNNs) have been
such, the image fusion algorithms based on feature space firstly developed in [32] in light of the conception of taking
have received wide research, and their core is to seek a such a fusion issue as a binary classification problem.
reliable feature space to reflect the activity level of image Inspired by this original work, a CNN architecture based on
blocks [21]. ensemble learning (named as ECNN) has been constructed
Spatial domain algorithms do not require some conver- in [33] where the three different data sets are used to train
sion or mapping of an image to other spaces. The simplest three CNN models to improve the fusion performance. It is
algorithm is the one based on the weighted average of worth mentioning that the utilization of CNNs effectively
pixels, which only considers a single-pixel or only uses deals with the difficulty of scheme design on focus measure
local neighborhood information, resulting in low contrast through trial and error. However, these two algorithms
of the fusion image [24]. Block-based algorithms [25] have are not end-to-end, and their operation procedure is to
been developed subsequently, under which the source image construct the fusion decision map and then reconstruct
is first divided into blocks, and then the focus measure- the fusion image, which results in a very time-consuming
ment on the corresponding image blocks is calculated. In calculation. To enhance the computational efficiency, a
light of their measurements, a fusion image is constructed general image fusion framework based on CNNs, denoted
Fig. 1 An example of MFIF (a)

An image focused on the small
clock (b) An image focused on
the big clock (c) The fusion
image
An improved multi-focus image fusion algorithm based on multi-scale weighted focus measure 4455
briefly as IFCNN, has been established in [13] by means 2 Proposed algorithm

of an end-to-end idea. In this framework, the fusion image
is reconstructed by the features extracted from the source The purpose of MFIF is to obtain a clear panoramic
image and, unfortunately, the resulting no direct reference image by fusing two or more source images with different
to the source image is that the obtained fusion image easily focal points. The main technologies of almost all existing
occurs artificial textures. Besides, there is currently no algorithms based on the spatial domain is to divide a
unified large-scale sets of multi-focus images for training. group of source images into multiple regions, and then
In summary, there exist really some interesting and calculate and compare the focus measure of each pair
effective algorithms to realize MFIF in the literature. regions. Furthermore, the regions with larger focus measure
However, the fusion performance still needs to be further from different source images are combined in order to
improved. For instance, blocking effects and edge blurring produce a desired fusion image [27]. Obviously, the fusion
are usually inevitable in the fusion results due probably performance is highly dependent on the fusion decision
to the inaccurate fusion decision map. Some artificial map to decide which pixels in different source images
textures could be introduced when adopting transform are selected. In order to obtain an ideal decision map,
domain algorithms and seriously affect the visual effect of the following innovative schemes are proposed: 1) a novel
image fusion. Furthermore, there could also exist blocking multi-scale measurement template is first designed to
effects of some isolated regions coming from classification effectively extract more gradient information for regions
errors. As such, the purpose of this paper is to develop an with different characteristics, 2) an improved calculation
effective and reliable fusion algorithm to produce a more scheme of the focus score matrix is then designed based
accurate fusion decision map while reducing the impaction on the weighted sum of the focus measure maps in each
of blockiness in the fusion image. To this end, this paper region window centered on a concerned pixel, under which
proposes an improved MFIF algorithm based on multi-scale the advantage of pixel-by-pixel weighting is employed, 3)
weighted focus measure and decision map optimization. a desired binary segmentation map is obtained in light
The main contributions of this paper are highlighted as of threshold filtering, which is employed to reduce some
follows: isolated regions, and 4) the final decision map is received
with the help of guided filtering with an initial fusion image
– A novel multi-scale measurement template is designed and an initial decision map as inputs, where the blocking
to effectively extract gradient information of rich tex- effects can be effectively removed.
ture regions, smooth regions as well as transitional Without loss of generality, the presentation of the
regions between aforementioned regions simultane- proposed algorithm in this paper will be laid out based
ously; on two source images, and the developed algorithm is
– An improved calculation scheme of the focus score not difficult to be applied to the case of three or more
matrix, named as multi-scale weighted focus measure, input source images. Furthermore, the fusion flowchart,
is designed based on the weighted sum of the focus as shown in Fig. 2, involves four steps: 1) obtaining the
measure maps combining with the above multi-scale binary segmentation map with the help of the focus score
measurement template; matrices of source images and the designed multi-scale
– An accurate fusion decision map is obtained by measurement template (see the processing P1, P2, and
optimizing binary segmentation map with the help P3); 2) producing the initial decision map with the aid of
of the optimization capability of guided filtering and threshold filtering removing unnecessary small holes (see
threshold filtering; the processing P4); 3) obtaining the final decision map with
– Comparison analysis based on common data set is the guided filtering (see the processing P5 and P6); and 4)
implemented to verify the superiority over state-of-the- fusing multi-focus images via the obtained final decision
art methods in both visual qualitative and quantitative map (see the processing P7).
evaluations.
2.1 Focus measure
The organization of this paper is described as follows.
First, Section 2 proposes the framework and specific A focus measure is usually employed to quantize the image
implementation process of the developed algorithm in sharpness and takes on one of the most critical roles in
detail. Then, Section 3 performs the experimental results MFIF issues. For an image, more sharp edges in the focus
and analysis of both visual qualitative and quantitative regions are usually retained in comparison with ones in
evaluations. Finally, Section 4 presents the conclusion of our the defocus regions, and the sharpness of images can be
paper. evaluated via its gradient information [26, 34]. As we all
4456 Z. Hu et al.
Fig. 2 The flowchart of the

proposed fusion algorithm
know, most focus measures are dependent on the high- calculation, it can be solved directly by the image matrix. As
frequency information of images, such as gradients and such, the ML gradient map of the source image is defined
edges. Classical focus measures include, but are not limited as follows:
to, Energy of Laplacian (EOL), Energy of Gradient (EOG)
IML = |2I0 − I1 − I2 | + |2I0 − I3 − I4 | (2)
as well as the Sum of the Modified Laplacian (SML) [34,
35]. It should be pointed out that SML is usually superior to where I0 is the image matrix of source image, I1 and I2
others, which has been verified via lots of experiments in the are two matrices obtained by shifting a pixel to the left or
literature [34]. In this paper, an improved version of SML, right by I0 , and I3 and I4 are also two matrices obtained by
regarded as the sum of the multi-scale weighted modified shifting a pixel up or down by I0 .
Laplacian (denoted as MSWML for simplicity), is proposed
to further enhance the detection accuracy of focus regions. Step 2 Construction of a multi-scale measurement tem-
The main inspiration is that a single measurement template plate. A measurement template can be intuitively regarded
is extremely coarse and unapplicable for the source image as a filter to retain the needed regions. Without loss of
with very complex or large smooth regions. To overcome generality, the multi-scale measurement template will be
this shortage, a new multi-scale weighted template, whose produced by three single templates with different scales,
critical core is the calculation of MSWML, is introduced to denoted as H1 , H2 and H3 . For this purpose, the critical core
reduce the impact of the complexity of the source image. is to assign a set of suitable weights to form such a uniform
template denoted as H .
Step 1 The gradient calculation of source images. The same In practice, any single measurement template is also a
calculation formula proposed in [35] is adopted in this weight matrix in essence, whose elements are usually cal-
paper: culated according to the distance from the center element so
∇ 2 ML(i, j ) = |2C(i, j ) − C(i + step, j ) as to ensure that the impaction on the center element mainly
comes from its nearby elements [36, 37]. More specifically,
−C(i − step, j )| + |2C(i, j )
the element hs of Hs (s = 1, 2, 3) are calculated as follows:
−C(i, j + step) − C(i, j − step)| (1)
1
where C(i, j ) stands for a function on the pixel (i, j ), “step” hs (p, q) = , ∀(p, q) ∈ Rs
1 + [(p − i)2 + (q − j )2 ]1/2
means the variable space between pixels and is usually
(3)
selected as 1. Additionally, C(i±step, j ) and C(i, j ±step)
mean that the function shifts “step” pixels along left or right where the pixel (i, j ) is the center of the local window,
directions or up or down directions. In order to facilitate (p, q) is the index coordinate of other pixels, and Rs is the
neighborhood window of sizes (2n + 1) × (2n + 1) centered Up to now, we develop the multi-scale weighted template,
on (i, j ). which will be utilized to produce an accurate decision map
According to the internal features of complex source in the following subsections.
images, the small-scale and large-scale measurement In what follows, let us check the superiority of the
templates should be assigned relatively small weights in improved focus measure (i.e. MSWML) in comparison
order to effectively integrate the gradient information of with SML. As mentioned in [34], the effectiveness of
rich texture regions and smooth regions respectively, and focus measure should be evaluated by two aspects: 1) the
the medium-scale measurement template should be assigned monotony of focus measure, that is, it should decrease
larger weights than aforementioned two templates in order monotonically when the blur level of images increases
to integrate the gradient information of transitional regions monotonically, and 2) the discrimination capability, that
between aforementioned regions. In light of obtained is, the larger the variation of this indicator, the better
templates with different scales, the desired multi-scale the discrimination performance is for a set of images
measurement template H is constructed as follows: with close sharpness. In other words, such a measure is
sensitive for images whose sharpness is very close. To
H = (w1 H1 ) ⊕ (w2 H2 ) ⊕ (w3 H3 ) (4)
this end, performing Gaussian filtering to the selected
where w1 , w2 and w3 are weight coefficients. Here, “⊕” reference images (see Fig. 3) generates three sets of
stands for the center alignment overlay operation. image sequences, each of which consists of 20 images
with gradually decreasing sharpness. The curves and the
Step 3 The calculation of MSWML for source images. With whole dispersion coefficient of SML and MSWML under
the aid of the obtained image matrix IML and multi-scale a uniform standardization are plotted in Figs. 4 and 5,
measurement template, one can receive SMSW ML of pixel respectively.
(p, q) via the following formula It can be found from Fig. 4 that all curves are
SMSW ML (p, q) = I˜ML (p, q) ∗ H (5) nonincreasing monotonically and the distinction capability
our developed measure is more senstive than SML when the
with blur level is bigger than 10 for “Monkey” and “Einstein” and

IML (p, q), IML (p, q) ≥ β 15 for “Temple”. In other words, the developed MSWML
I˜ML (p, q) = (6)
0, otherwise measure is more suitable to evaluate blur images. There
where β is a predetermined threshold and the notation “∗” is no doubt that the fusion quality will be definitely
stands for the convolution operation. enhanced when blur images can be effectively identified
The effectiveness of feature fusion via the proposed with help of this measure, which is the main concern of
MSWML is closely dependent on three kinds of parameters: multi-focus image fusion issues. Furthermore, dispersion
the scales of measurement templates, the predetermined coefficients, reflecting the discrimination capability of the
threshold, and the weight coefficients. In our experiment, focus measure, are all the biggest for our developed
the selected parameters are 9, 13, and 17 for template’ measure. Taking both the rate of change in Fig. 4 and the
scales, 3 for the threshold, and 0.2, 0.6, and 0.2 for weight dispersion coefficients in Fig. 5 into consideration, we can
coefficients according to lots of tests. Note that the selection conclude that the discrimination capability of MSWML is
of measurement template scales is a two-stage process: pre- superior to that of SML.
experiments to determine the potential range of the desired
template scales, and optimization to determine the best ones. 2.2 Binary segmentation map
The principle of selecting weight coefficients is to optimize
the function of information extraction and the selection of Recalling MFIF, its purpose is to extract all clear focus
thresholds abides by the recommendation proposed in [34]. regions of two given source images, and then reorganize
Fig. 3 Reference images a-c

Images of “Temple”, “Monkey”
and “Einstein”
4458 Z. Hu et al.
Fig. 4 The focus measure of

image sequences (a)-(c) The
results of “Temple”, “Monkey”
and “Einstein”, where the
horizontal axis represents the
blur level of the image, and the
vertical axis represents the focus
measure value
these focus regions to generate a fusion image. In order a paired source images IA and IB are with different focus
to segment the focus and defocus regions more accurately points. According to each pixel (p, q), the image can be
while obtaining a smooth dividing boundary, a binary divided into a set of regions with predetermined sizes, where
segmentation map with accurately dividing boundary needs the center of regions is just the pixel (p, q). Additionally,
to be constructed with the help of the above developed regions are denoted as {DA } for the image IA and {DB } for
multi-scale weighted template. The process of obtaining this the image IB .
map consists of four steps. For the presentation, suppose that
Step 1 In light of the developed MSWML, calculate the
focus measure maps of source images IA and IB .
Step 2 Because a single pixel is often affected by noises, the

sum of the regional coefficients for every pixel S̄(p, q) is
adopted to further improve the focus measure of the source
image

S̄(p, q) = SMSW ML (i, j ), (i, j ) ∈ D (7)
(i,j )
where the region window D is set as 11×11. For the purpose

of presentation in the following context, denote S̄(p, q) as
S̄A (p, q) for image IA and S̄B (p, q) for image IB .
Step 3 Obtain the focus score matrix of source images. For

a paired region windows DA and DB , if S̄A (p, q) is larger
than S̄B (p, q), the score of all pixels in region DA of image
IA is added one, that is
Fig. 5 The dispersion coefficient of focus measure value of image
sequences, where the vertical axis stands for the dispersion coefficient MA (p, q) = MA (p, q) + 1, (p, q) ∈ DA (8)
Fig. 6 Demonstration of getting binary segmentation map (a)-(b) Source images (c)-(d) Focus score maps (e) Binary segmentation map
where the initial values of MA are set as zeros. Otherwise, is adopted to remove these small holes when the input of
the score of all pixels in region DB increases one for image the filter is smaller than a given threshold, and therefore this
IB and denoted as MB . It should be pointed out that the process can be regarded as threshold filtering. Assuming
region DA and DB constitute a sliding window pair moving that the sizes of source images are set as H × W , regions
pixel-by-pixel in the calculation processing. with the size smaller than (H × W )/80 can be regarded
as small regions. Experimental results disclose that this
Step 4 Produce the desired binary segmentation map. To threshold can reliably remove the overwhelming majority of
this end, we first create an initial binary segmentation small holes. In light of this filter, an initial decision map can
map Q, whose all elements are zero. In general, the focus be obtained, see Fig. 7b for an example.
measure of focus regions is larger than the focus metric of The two most important rules of image fusion are 1) to
a defocus regions. According to this rule, the ideal binary retain information from source images as much possible
segmentation map can be obtained by comparing the focus and 2) not to introduce artificial textures. However, artificial
scores MA and MB : textures, which seriously affect the visual effect of image
fusion, cannot be effectively discarded for the current image
1, MA (p, q) > MB (p, q)
Q(p, q) = (9) fusion algorithms via the spatial domain due mainly to the
0, otherwise omitted spatial continuity in the edge fusion of the focus
An acquisition process of the binary segmentation map regions. Fortunately, an edge-preserving filtering algorithm,
is shown in Fig. 6, where the white regions represents the named as guided filtering, has been developed in [38] to
focus regions detected from the source image IA , and the provide good edge-preserving capabilities while has a linear
black regions represents the focus regions detected from the running time independent of the filter size. In other words,
source image IB . guide filtering can be used to smooth weight coefficients in
Up to now, the desired binary segmentation map Q is the fusion decision map so that the fusion image has better
obtained in light of the developed weighted sum of the focus spatial smoothness [32]. In what follows, a guided filter is
measure map. In what follows, an optimized decision map attempted to optimize the decision map.
is developed in light of such a binary segmentation map. Guided filtering depends on a guide image and two
parameters, that is, the filter radius r and the regularization
2.3 Decision map optimization parameter ε. In this paper, the initial fusion image I0F and
the initial decision map D0 are, respectively, regarded as the
As shown in Fig. 7a, there are still some small holes guide image and the input image, see Fig. 7c and b. The
surrounded by focus regions in the obtained binary resultant output image is the desired final decision map DF :
segmentation map although the proposed focus scheme is
effective. To handle such a problem, a small region filtering DF = Gr,ε (I0F , D0 ) (10)
Fig. 7 Demonstration of optimized decision map (a) Binary segmentation map (b) Initial decision map (c) Initial fusion image (d) Final decision
map (e) Final fusion image
4460 Z. Hu et al.
where two parameters in the guide image filter are selected gradient (MWGF) [29], the algorithm based on dense
as r = 10 and ε = 0.01. It should be pointed that the value SIFT (DSIFT) [28], the algorithm based on ensemble of
range of each element in the grayscale shown in Fig. 7d is convolution neural network (ECNN) [33] as well as a gen-
0-255, and the value range of elements should be adjusted eral image fusion framework based on convolution neural
to 0-1 in the fusion stage. network (IFCNN) [13]. Besides, the source codes can
Finally, in light of the final decision map, the desired be downloaded from the website “(http://www.escience.
fusion image can be obtained via pixel-by-pixel weights for cn/people/liuyu1/Codes.html)” for SR, DSIFT, and CNN,
source images and from the website “(https://github.com/lsauto/MWGF-
Fusion)” “(https://github.com/uzeful/Quadtree-Based-Multi-
IF (p, q) = IA (p, q)∗DF (p, q)+I B (p, q)∗[1−DF (p, q)] focus-Image-Fusion)” “(https://github.com/mostafaaminnaji)”
(11) “(https://github.com/uzeful/IFCNN)” for MWGF,
QUADTREE, ECNN as well as IFCNN. Note that all
An experiment of the optimization process is shown in parameters of these algorithms in our experiments are kept
Fig. 7, Compared by Fig. 7a and b, we can find that the the same ones adopted in the original literature.
decision map after filtering out small holes has a better
consistency. As shown in Fig. 7c, there are some unexpected 3.2 Visual qualitative evaluation
artificial textures around the boundaries between focus
regions and defocus regions. Compared by Fig. 7c and Qualitative evaluation of MFIF is commonly realized by
Fig. 7e, we can find that the artificial textures around evaluating the visual effect of the fusion images, and the
the blade are significantly reduced, which verifies that the effectiveness of the fusion algorithm is directly reflected by
optimized decision map Fig. 7d is effective and feasible to a visual effect. For the sake of simplicity, the comparison
enhance the spatial continuity of the fusion image through tests are performed via grayscale images and the results are
guided filtering. demonstrated in Figs. 9, 10, 11 and 12.
Specifically, Figs. 9 and 10 show the fusion results
of the “Temple” image set, where Fig. 10 provides the
3 Experimental results and analysis enlarged local regions of corresponding images in Fig. 9.
We cannot difficult to see that the clear regions of source
In this section, the effectiveness of the developed algorithm images are successfully merged but the performance of
will be verified via some systematic experiments combined local details are differences. More specifically, there are
with the comparison with both recently proposed algorithms obviously ghosting around the Chinese characters, labeled
and classical algorithms from the subjective and objective by the orange rectangles, in fusion images obtained by the
aspects. SR based algorithm (Fig. 10c), the MWGF based algorithm
(Fig. 10d), and the IFCNN based algorithm (Fig. 10i).
3.1 Experimental settings Although retaining most details in the images, blocking
effects cannot be effectively avoided for the results obtained
In this experiment, 38 pairs of multi-focus images, which by QUADTREE, DSIFT, and ECNN algorithms, see the
are shown in Fig. 8, are employed to test our algorithm and top regions of the stone lion in Fig. 10e, f and h, marked
other existing ones. Among them, 20 pairs are from the with a red border. Furthermore, fusion results obtained by
latest “Lytro” data set, which can be obtained online from the CNN-based algorithm and the developed algorithm in
the website “(http://mansournejati.ece.iut.ac.ir/content/ this paper are, respectively, plotted in Fig. 10g and j, which
lytro-multi-focus-dataset)”, and the other 18 pairs are the are ideal visually and involve neither ghosting nor blocking
same as the images used in the literature [27]. All experi- effects. It is worth mentioning that our algorithm is even
ments are carried out with the help of Matlab and Python better against the CNN-based algorithm in the contrast ratio
platforms with a 64-bit Windows10 operation system. Fur- of the stone lion edge.
thermore, the configuration of the computer is with a 2.20 Figure 11 draws the fusion results of the “Clock”
GHz Intel Core i7-8750H, and a 16GB memory. It should be image set via different algorithms. It follows from both
pointed out that lots of contrastive experiments via existing the upper left corner of the large clock and the regions
algorithms are executed to effectively show the superior- around the number 8 that the SR-based algorithm and the
ity of the proposed algorithm. These algorithms include MWGF-based algorithm produces some artificial textures
the sparse representation based algorithm (SR) [20], the in the obtained fusion images, see Fig. 11c and d. Then,
algorithm based on quadtree decomposition (QUADTREE) the performance of QUADTREE-based algorithm, DSIFT-
[26], the algorithm via deep convolutional neural network based algorithm as well as ECNN-based algorithm are even
(CNN) [32], the algorithm based on multi-scale weighted worse, and their fusion images are obviously incomplete on
Fig. 8 Multi-focus images used in the experiment
the right edge of the small clock, which are been directly and the better the performance of the fusion algorithm is.
observed by Fig. 11e, f and h. Furthermore, the entire right It can be found that the proposed algorithm in our paper
edge of the small clock is relatively blurred in Fig. 11g contains the least image residual features, which means
obtained by the CNN-based algorithm. Compared with the that the proposed algorithm transfers more features from
fusion results of the above six algorithms, both the proposed the source image to the final fusion image, and hence the
algorithm in our paper and the IFCNN-based algorithm give fusion effect is the best. In order to disclose the fusion
rise to less artificial textures on the left side of the number performance more intuitively, we need to further compare
8, more abundant details and clearer edges. the decision maps of these algorithms. The corresponding
In what follows, Fig. 12 plots the difference map of results are shown in Figs. 13 and 14. It should be pointed
various algorithms, which is essentially a residual image out that, among the selected comparison algorithms, the
obtained by subtracting some source image from the fusion images for SR and IFCNN algorithms are directly
proposed fusion image [32]. There is no doubt that the less generated and therefore no decision map can be constructed.
the residual image features are, the more the features of Considering the algorithms’ differences, which usually lead
source images transferred into the final fusion image are to different representation, the decision maps closest to
4462 Z. Hu et al.
Fig. 9 The fusion results on the

“Temple” image set (a)-(b)
Source images (c)-(j) Fusion
images of SR, MWGF,
QUADTREE, DSIFT, CNN,
ECNN, IFCNN and the
proposed algorithm
Fig. 10 Magnified regions of

fusion results from Fig. 9 (a)-(j)
Fig. 11 The fusion results on

the “Clock” image set (a)-(b)
Source images (c)-(j) Fusion
images of SR, MWGF,
QUADTREE, DSIFT, CNN,
ECNN, IFCNN and the
proposed algorithm
Fig. 12 The differences between the fusion image and source images (a)-(h) The difference images of SR, MWGF, QUADTREE, DSIFT, CNN,
ECNN, IFCNN and the proposed algorithm
4464 Z. Hu et al.
Fig. 13 Decision maps on the “Flower” image set (a)-(b) Source images (c)-(h) Decision maps of MWGF, QUADTREE, DSIFT, CNN, ECNN
and the proposed algorithm
the binary segmentation maps are employed to execute the algorithm is smoother and without jagged edges, see the
comparison analysis. local amplification regions in Fig. 13e, f and h. Another
Now, let us analyze the obtained results in detail. comparison is executed via the image “Lytro-05” in the
Fig. 13 describes the decision maps of the “Flower” image set Lytro. As shown in Fig. 14a and b, this pair of
image set. Obviously, the decision maps obtained by images contain lots of multiple scattered focus regions. We
both the QUADTREE-based algorithm and the ECNN- can find from Fig. 14c that the MWGF algorithm cannot
based algorithm are incomplete, see Fig. 13d and g, and effectively identify the focus regions in Fig. 14a and b, at
the map is still unsatisfactory in spite of performing the same time, the decision maps obtained by QUADTREE,
slightly better than the previous two algorithms. The overall DSIFT and ECNN algorithms are obviously incomplete in
performance seems to be the same for other three ones, the reticular region, which have been confirmed by Fig. 14d,
but the edge of the decision map obtained by the proposed e and g. It is of interest to note that both the CNN-based
Fig. 14 Decision maps on the “Lytro-05” image set (a)-(b) Source images (c)-(h) Decision maps of MWGF, QUADTREE, DSIFT, CNN, ECNN
and the proposed algorithm
algorithm and the proposed algorithm can accurately extract Through the comparison of the above sets of fusion
the focus regions of source images to form the desired examples, we conclude that the proposed algorithm not
decision maps, see Fig. 14f and h. Furthermore, the smooth only can accurately extract the focus region but also
effect of edges of our algorithm is slightly better than suppresses the generation of artificial textures, and therefore
CNN’s, which can be verified via the regions marked with the obtained fusion image has a better visual effect than
red border. other algorithms.
Finally, the intermediate results of the other five groups
of source images are shown in Fig. 15 to further demonstrate 3.3 Quantitative comparison
the effectiveness of the developed algorithm in our paper.
It follows from the third column that there are some small In this subsection, objective evaluation indicators, mutual
holes in the binary segmentation maps due mainly to the information (MI), structural similarity (SSIM), spatial
misclassification of pixels. However, these small holes are frequency (SF), and average gradient (AVG), are employed
few and usually located in the smooth regions of the image, to objectively evaluate the fusion results obtained by
so their impact on the fusion result is relatively small. The different fusion algorithms.
fourth column verifies that a filter with a given threshold Let us briefly provide the evaluation rule of these
can effectively remove the undesired small holes to extract indicators. MI reflects how much information is transferred
the complete focus regions and form an ideal initial decision to the fusion image [39], SSIM describes the structural
map. Furthermore, the final decision map in the fifth column similarity between the final fusion image and source images
is obtained by the guided filtering to ensure a good visual [40], SF evaluates the capability of describing small details
effect. on the final fusion image in the spatial domain [39], as
Fig. 15 Intermediate results of other fusion examples (a-b) Source images (c) Binary segmentation maps (d) Initial decision maps (e) Final
decision maps (f) Final fusion images
4466 Z. Hu et al.
well as AVG reflects the capability of texture changes [41]. according to their roles described above. Our algorithm is
Furthermore, the larger the value of these four indicators concerned with the MI indictor with the hope to eliminate
is, the better the image fusion effect is. On the other hand, artificial textures while keeping more original information.
image quality evaluation can be generalized into two ways In what follows, in the aspect of capturing small details of
based on whether or not a reference image exists [42]. the textures, the resultant indicator SF of the proposed algo-
Among the above four evaluation indexes, the calculation rithm is 19.8071 and attains the best performance than other
of the first two indexes requires a reference image, while algorithms, which means that the fusion images obtained
the latter two indexes do not. The objective evaluation by our algorithm have the best overall activity. This bene-
indicators of the fusion images obtained by the proposed fits from the more accurate fusion decision map obtained
algorithm and the comparison algorithm are measured by the proposed algorithm, which ensures the fusion image
respectively, and their average values of these 38 pairs of collects more focus regions from the source image. Finally,
images are sumarrized in Table 1. In this table, the bold in the aspect of small details contrast and texture changes,
numbers highlight the optimal values in each row, which the resultant indicator AVG (6.9074) confirms that the per-
means that the corresponding algorithm is the best one. formance of our algorithm is the second best and very close
It is not difficult to see from this table that the score of MI to the best one (i.e. the DSIFT-based algorithm).
index obtained by the proposed algorithm is 6.1993 and the Different from other fields of image processing, the
highest, which reflects that our algorithm achieves the best contribution of performance enhancements of quantitative
performance in information transmission. Furthermore, the indicators mainly derives from the improvement in fusion
MI performance of the six algorithms based on the spatial images about edges between focus and defocus regions.
domain (i.e. MWGF, QUADTREE, DSIFT, CNN, ECNN, In a general way, the proportion of these edges’ pixels in
and ours) are better than the SR-based and IFCNN-based the whole source image is relatively small and thereby the
algorithms. The main reason is that, from the perspective absolute increment is also small. In summary, experimental
of information theory, algorithms based on the transform results demonstrate that our algorithm can guarantee state-
domain usually lose more original image information than of-the-art fusion performance in terms of both visual quality
algorithms based on the spatial domain. In the structural and objective assessments.
similarity aspect, the score of the IFCNN algorithm achieves In what follows, let us record the average running
0.8423 and takes the first position. Now, let us roughly dis- time of various algorithms to analyze their computational
close the inherent reason. In the training process, the model efficiency. It is not difficult to see from Table 2 that the
of the IFCNN algorithm is first pre-trained and then fine differences of computational efficiency are very enormous
trained by adding the perceptual loss into the loss func- from the biggest 453.6130 for ECNN to the smallest 0.1920
tion. The introduction of the perceptual loss is beneficial for for IFCNN. The fundamental reason lies in the structural
the regularization of the network, resulting in that a fusion differences of algorithms themselves. Among them, both
image is more structurally similar to the source image. On the CNN algorithm and the ECNN algorithm make use
the other hand, the fusion image is generated via the extrac- of a fashionable deep network to generate a decision
tive features but does not refer to source images in the map, where a large number of convolutional filters are
fusion stages for the IFCNN algorithm. As such, the capa- exploited to extract various features and elaborate post-
bility of keeping original information is relatively weak, processing is indispensable. Although adopting also CNNs,
and artificial textures could occur. Furthermore, MI and the computational efficiency of IFCNN is very high because
SSIM exist a mutually restricted relationship intuitively the employed CNN is an end-to-end shallow network
(only two layers) and linear elementwise fusion rules are
Table 1 The average objective assessments of different fusion utilized to fuse convolutional features of multiple source
algorithms images [13]. Furthermore, the efficiency of the proposed
Algorithm SR MWGF QUADTREE DSIFT algorithm is 5.0790 and takes the third position in all 8
algorithms. It should be declared that the fusion of the
MI 5.6843 5.7162 6.1981 6.1832 IFCNN algorithm can be regarded as the reorganization
SSIM 0.8372 0.8361 0.8337 0.8341
SF 19.5914 19.3327 19.8012 19.7948
Table 2 Average running time of different fusion algorithms
AVG 6.7832 6.6935 6.9053 6.9105
Algorithm CNN ECNN IFCNN Proposed Algorithm SR MWGF QUADTREE DSIFT
MI 5.9713 6.1783 4.7914 6.1993
SSIM 0.8368 0.8345 0.8423 0.8343 Time(s) 123.8320 6.6370 3.6470 24.5240
SF 19.5311 19.7869 19.7426 19.8071 Algorithm CNN ECNN IFCNN Proposed
AVG 6.8165 6.8773 6.8948 6.9074 Time(s) 227.4180 453.6130 0.1920 5.0790
Fig. 16 An example of multiple image fusion
via extracted features and our algorithm is dependent on time, a composite decision map with three gray scales (see
an accurate fusion decision map. Generally speaking, the Fig. 16d) is provided for the easy display of the two fusion
time consumption of establishing and optimizing the desired processes, where the black regions, the dark gray regions,
decision map is higher than that of extraction features. and the light gray regions indicate the corresponding focus
In summary, our algorithm can effectively capture more regions in sources 1, 2 and 3, respectively. As can be seen
original information while eliminating artificial textures by from Fig. 16e, the focus regions in the source images are
losing part calculation efficiency. accurately detected and integrated into the final result, the
edge contrast is also perfect especially for the seal’s outline
3.4 Application of the proposed algorithm on and the curved edge of the sphere, and no obvious artificial
multiple source images textures can be found. In summary, the proposed algorithm
has good extendibility for multi-source image fusion.
The developed scheme in this paper is also applicable
when more than two multi-focus images are available.
For the fusion of three source images, two source images 4 Conclusion
are first fused and then the corresponding fused result
combined with the third source image generates the final In this paper, in light of multi-scale weighted focus mea-
fusion image via our algorithm again. This demonstration is sure and decision map optimization, an improved MFIF
performed via a typical image set shown in Fig. 16a-c and algorithm has been developed to increase the fusion per-
downloaded from the website “(http://mansournejati.ece. formance including the elimination of unexpected ghosting,
iut.ac.ir/content/lytro-multi-focus-dataset)”. At the same blurring edges as well as blocking effects. Specifically, a
4468 Z. Hu et al.
novel multi-scale measurement template has been designed 16. Zhang Q, Guo BL (2009) Multifocus image fusion using the
to effectively extract the gradient information of rich tex- nonsubsampled contourlet transform. Signal Process 89(7):1334–
1346
ture regions, smooth regions as well as transitional regions 17. Li H, Manjunath BS, Mitra SK (1995) Multisensor image
between aforementioned regions. An improved scheme of fusion using the wavelet transform. Graph Mod Image Process
the accurate decision map has been developed in light of the 57(3):235–245
optimized binary segmentation map with threshold filtering 18. Lewis JJ, O’Callaghan RJ, Nkiolov SG, Bull DR, Canagarajah
N (2007) Pixel-and region-based image fusion with complex
and guided filtering. From the experimental results of visual
wavelets. Inf Fusion 8(2):119–130
effects and quantitative indicators, we have drawn a con- 19. Yang B, Li S (2010) Multifocus image fusion and restoration with
clusion that the developed algorithm performs very well on sparse representation. IEEE Trans Instrum Meas 59(4):884–892
MFIF and demonstrates the superiority over the latest fusion 20. Liu Y, Liu S, Wang Z (2015) A general framework for image
fusion based on multi-scale transform and sparse representation.
algorithms. Inf Fusion 24:147–164
21. Tang H, Xiao B, Li W, Wang W (2018) Pixel convolutional neural
network for multi-focus image fusion. Inf Sci 433-434:125–141
22. Mitianoudis N, Stathaki T (2007) Pixel-based and region-
References based image fusion schemes using ICA bases. Inf Fusion 8(2):
131–142
1. Li H, He X, Tao D, Tang Y, Wang R (2018) Joint medical image 23. Liang J, He Y, Liu D, Zeng X (2012) Image fusion using higher
fusion, denoising and enhancement via discriminative low-rank order singular value decomposition. IEEE Trans Image Process
sparse dictionaries learning. Pattern Recongn 79:130–146 21(5):2898–2909
2. Ma J, Ma Y, Li C (2019) Infrared and visible image fusion 24. Saha A, Bhatnagar G, Wu QMJ (2013) Mutual spectral residual
methods and applications: a survey. Inf Fusion 45:153–178 approach for multifocus image fusion. Digit Signal Process
3. Yin M, Duan P, Liu W, Liang X (2017) A novel infrared 23(4):1121–1135
and visible image fusion algorithm based on shift-invariant 25. De I, Chanda B (2013) Multi-focus image fusion using a morphology-
dual-tree complex shearlet transform and sparse representation. based focus measure in a quad-tree structure. Inf Fusion 14(2):136–
Neurocomputing 226:182–191 146
4. Tian J, Liu G, Liu J (2018) Multi-focus image fusion based on 26. Bai X, Zhang Y, Zhou F, Xue B (2015) Quadtree-based multi-
edges and focused region extraction. Optik 171:611–624 focus image fusion using a weighted focus-measure. Inf Fusion
5. Meher B, Agrawal S, Panda R, Abraham A (2019) A survey on 22:105–118
region based image fusion methods. Inf Fusion 48:119–132 27. Zhang Y, Bai X, Wang T (2017) Boundary finding based multi-
6. Jin H, Xing B, Wang L, Wang Y (2015) Fusion of remote focus image fusion through multi-scale morphological focus-
sensing images based on pyramid decomposition with Baldwinian measure. Inf Fusion 35:81–101
Clonal Selection Optimization. Infrared Phys Technol 73: 28. Liu Y, Liu S, Wang Z (2015) Multi-focus image fusion with dense
204–211 SIFT. Inf Fusion 23:139–155
7. Yang Z, Mu X, Zhao F (2018) Scene classification of remote 29. Zhou Z, Li S, Wang B (2014) Multi-scale weighted gradient-based
sensing image based on deep network and multi-scale features fusion for multi-focus image. Inf Fusion 20:60–72
fusion. Optik 171:287–293 30. Wang Z, Wang S, Zhu Y (2017) Multi-focus image fusion based
8. Gupta K, Walia GS, Sharma K (2020) Quality based adaptive on the improved PCNN and guided filter. Neural Process Lett
score fusion approach for multimodal biometric system. Appl 45(1):75–94
Intell 50(4):1086–1099 31. Du C, Gao S (2018) Multi-focus image fusion algorithm based on
9. Li S, Kang X, Fang L, Hu J, Yin H (2017) Pixel-level image pulse coupled neural networks and modified decision map. Optik
fusion: a survey of the state of the art. Inf Fusion 33:100–112 157:1003–1015
10. Zhao C, Wang X, Zuo W, Shen F et al (2020) Similarity learning 32. Liu Y, Chen X, Peng H, Wang Z (2017) Multi-focus image
with joint transfer constraints for person re-identification. Pattern fusion with a deep convolutional neural network. Inf Fusion 36:
Recogn 97:107014 191–207
11. Zhao C, Chen K, Zang D, Zhang Z et al (2019) Uncertainty- 33. Naji MA, Aghagolzadeh A, Ezoji M (2019) Ensemble of CNN for
optimized deep learning model for small-scale person re- multi-focus image fusion. Inf Fusion 51:201–214
identification. Sci China Inf Sci 62:220102 34. Huang W, Jing Z (2007) Evaluation of focus measures in multi-
12. Zhao C, Chen K, Wei Z, Chen Y et al (2018) Multilevel triplet focus image fusion. Pattern Recogn Lett 28(4):493–500
deep learning model for person re-identification. Pattern Recogn 35. Nayar SK, Nakagawa Y (1994) Shape from focus. IEEE Trans
Lett 117:161–168 Pattern Anal Mach Intell 16(8):824–831
13. Zhang Y, Liu Y, Sun P, Yan H et al (2020) IFCNN: A general 36. Ahmed KT, Trtaza A, Iqbal MA (2017) Fusion of local and global
image fusion framework based on convolutional neural network. features for effective image extraction. Appl Intell 47:526–543
Inf Fusion 54:99–118 37. Zhang P, Gao W, Liu G (2018) Feature selection considering
14. Burt P, Adelson E (1983) The Laplacian pyramid as a compact weighted relevancy. Appl Intell 48:4615–4625
image code. IEEE Trans Commun 31(4):532–540 38. He K, Sun J, Tang X (2010) Guided image filtering. In: European
15. Zhao W, Xu Z, Zhao J (2016) Gradient entropy metric Conference on Computer Vision. Heraklion, Greece, pp 1–14
and p-Laplace diffusion constraint-based algorithm for noisy 39. Qu G, Zhang D, Yan P (2002) Information measure for
multispectral image fusion. Inf Fusion 27:131–142 performance of image fusion. Electron Lett 38(7):313
40. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image Derui Ding received both the
quality assessment: from error visibility to structural similarity. B.Sc. degree in Industry Engi-
IEEE Trans Image Process 13:600–612 neering in 2004 and the M.Sc.
41. Zhao W, Wang D, Lu H (2019) Multi-focus image fusion with degree in Detection Technol-
a natural enhancement via joint multi-level deeply supervised ogy and Automation Equip-
convolutional neural network. IEEE Trans Circ Syst Video ment in 2007 from Anhui
Technol 29(4):1102–1115 Polytechnic University, Wuhu,
42. Nizami IF, Majid M, Khurshid K (2018) New feature selection China, and the Ph.D. degree
algorithms for no-reference image quality assessment. Appl Intell in Control Theory and Con-
48:3482–3501 trol Engineering in 2014 from
Donghua University, Shang-
hai, China. From July 2007
Publisher’s note Springer Nature remains neutral with regard to to December 2014, he was a
jurisdictional claims in published maps and institutional affiliations. teaching assistant and then a
lecturer in the Department of
Mathematics, Anhui Polytech-
nic University, Wuhu, China. He is currently a senior research fellow
with the School of Software and Electrical Engineering, Swinburne
University of Technology, Melbourne, Australia. From June 2012 to
Zhanhui Hu received a B.E. September 2012, he was a research assistant in the Department of
degree in Electrical Engineer- Mechanical Engineering, the University of Hong Kong, Hong Kong.
ing and Automation in 2017 From March 2013 to March 2014, he was a visiting scholar in the
from Luoyang Normal Uni- Departmentof Information Systems and Computing, Brunel University
versity, Luoyang, China. He London, UK. His research interests include nonlinear stochastic con-
is currently pursuing the M.S trol and filtering, as well as multi-agent systems and sensor networks.
degree in Control Engineering He has published around 80 papers in refereed international journals.
from University of Shanghai He is serving as an Associate Editor for Neurocomputing and IET Con-
for Science and Technology, trol Theory & Applications. He is also a very active reviewer for many
Shanghai, China. His research international journals.
interests include image pro-
cessing, computer vision, and
information fusion. Guoliang Wei received the
B.Sc. degree in mathemat-
ics from Henan Normal Uni-
versity, Xinxiang, China, in
1997 and the M.Sc. degree in
applied mathematics and the
Ph.D. degree in control engi-
Wei Liang received the B.E. neering, both from Donghua
degree in Automation in 2018 University, Shanghai, China,
from University of Shang- in 2005 and 2008, respec-
hai for Science and Technol- tively. He is currently a Pro-
ogy, Shanghai, China. He is fessor with the Department of
currently pursuing the Ph.D. Control Science and Engineer-
degree in Control Science and ing, University of Shanghai
Engineering from University for Science and Technology,
of Shanghai for Science and Shanghai, China. From March
Technology, Shanghai, China. 2010 to May 2011, he was an Alexander von Humboldt Research
His research interest is deep Fellow in the Institute for Automatic Control and Complex Sys-
learning and visual tracking. tems, University of Duisburg-Essen, Germany. From March 2009 to
He is a very active reviewer February 2010, he was a post doctoral research fellow in the Depart-
for many international jour- ment of Information Systems and Computing, Brunel University,
nals and conferences. Uxbridge, UK, sponsored by the Leverhulme Trust of the UK From
June to August 2007, he was a Research Assistant at the University of
HongKong. From March to May 2008, he was a Research Assistant
at the City University of Hong Kong. His research interests include
nonlinear systems, stochastic systems, and bioinformatics. He has pub-
lished over 100 papers in refereed international journals. His current
research interests include nonlinear systems, stochastic systems, and
bioinformatics. Dr. Wei is a very active reviewer for many international
journals.

Hu2021 Article AnImprovedMulti-focusImageFusi

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Hu2021 Article AnImprovedMulti-focusImageFusi

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hu2021 Article AnImprovedMulti-focusImageFusi

Uploaded by

Copyright:

Available Formats

Applied Intelligence (2021) 51:4453–4469

An improved multi-focus image fusion algorithm based

Accepted: 4 November 2020 / Published online: 4 January 2021

Fig. 1 An example of MFIF (a)

briefly as IFCNN, has been established in [13] by means 2 Proposed algorithm

Fig. 2 The flowchart of the

Fig. 3 Reference images a-c

Fig. 4 The focus measure of

Step 2 Because a single pixel is often affected by noises, the

where the region window D is set as 11×11. For the purpose

Step 3 Obtain the focus score matrix of source images. For

Fig. 8 Multi-focus images used in the experiment

Fig. 9 The fusion results on the

Fig. 10 Magnified regions of

Fig. 11 The fusion results on

Fig. 16 An example of multiple image fusion

You might also like