Image Segmentation-Based Multi-Focus Image Fusion
Image Segmentation-Based Multi-Focus Image Fusion
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2735019, IEEE
Access
Abstract: A decision map contains complete and clear information about the image to be fused, and detecting the decision map is
crucial to various image fusion issues, especially multi-focus image fusion. Nevertheless, in an attempt to obtain an approving
image fusion effect, it is necessary and always difficult to obtain a decision map. In this paper, we address this problem with a
novel image segmentation-based multi-focus image fusion algorithm, in which the task of detecting the decision map is treated as
image segmentation between the focused and defocused regions in the source images. The proposed method achieves segmentation
through a multi-scale convolutional neural network, which performs a multi-scale analysis on each input image to derive the
respective feature maps on the region boundaries between the focused and defocused regions. The feature maps are then
inter-fused to produce a fused feature map. Afterward, the fused map is post-processed using initial segmentation, morphological
operation and watershed to obtain the segmentation map/decision map. We illustrate that the decision map gained from the
multi-scale convolutional neural network is trustworthy and that it can lead to high-quality fusion results. Experimental results
evidently validate that the proposed algorithm can achieve optimum fusion performance in light of both qualitative and
quantitative evaluations.
INDEX TERMS: convolutional neural network, multi-focus image, decision map, image fusion
Ⅰ. INTRODUCTION
In digital imaging, the imaging equipment usually has dual-tree complex wavelet transform (DTCWT) [5] and the
difficult in shooting the target image, in which all of the non-subsampled contourlet transform (NSCT) [6]. These
objects of the image are effectively captured in focus. methods must go through three steps to fuse the image in
Normally, by setting an affirmative focal length for the terms of, the decomposition, fusion and reconstruction [7].
optical lens, only the objects in the depth of field (DOF) are Many studies have also been conducted while taking this
clear in the picture, while other objects can be indistinct. approach [8–9], where the input image is first transformed
Fortunately, multi-focus image fusion technology has into a multi-resolution representation by multi-resolution.
emerged to address the above-mentioned problems by Then, they select the different spectral information and
integrating significant sharp functions from multiple combine it to reconstruct the fused images.
images of the same scene. Over the past several years, a A new transform domain fusion approach [10–14] has
variety of image fusion algorithms have emerged. These become a compelling branch of the field. Unlike the
fusion algorithms can be divided into two categories [1]: MST-based approach described above, these fusion
spatial domain algorithms and transform domain algorithms transform the image into a single scale feature
algorithms. area through some superior signal theories, for example,
The image fusion algorithm based on the transform Sparse Representation (SR) and Independent Component
domain usually converts the source image to another Analysis (ICA). This type of approach usually uses the
feature domain, where the source image can be effectively sliding window method approach after the approximate
fused. The most popular transform domain fusion translation invariant fusion process. The most important
algorithms are founded on multi-scale transform (MST) problem with these approaches is in exploring a valid
methods. Some representative examples include the feature domain to obtain the focus map.
Laplacian pyramid (LP) [2], the morphological pyramid The block-based image fusion technique decomposes the
(MP) [3],the discrete wavelet transform (DWT) [4] , the input images into blocks, for example, an interesting
2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2735019, IEEE
Access
network (PCNN) is presented in [15]; then, every pair in to I , rather than according to the decision map [26].
the block is fused first through a designed activity level Although these new algorithms can improve the visual
measurement such as sum-modified-Laplacian(SML) [16]. quality of the fused images, they can lose some of the
Obviously, the size of the block has a large impact on the original image information due to inaccurate fusion
fusion results. Currently, many improved algorithms have decision maps.
emerged to replace the use of an artificial fixed block size The multi-focus image fusion can be treated as an image
in the previous block-based algorithms [17-18]. For segmentation (binary classification) problem have been
example, there are new adaptive block-based methods [19] proposed in the literatures [18], that is, the generation of
using a differential evolution algorithm to obtain an decision map in multi-focus image fusion can be treated as
optimum block size. Using the recently introduced method a binary segmentation problem. Specifically, the role of
based on the quad-tree [20-21], the input images can be multi-focus image fusion rule is analogous to that of
adaptively split into blocks of different sizes according to segmentation used in general image segmentation tasks.
the information in the image itself. Some spatial Thus, it is feasible to use CNN for image fusion in theory.
domain-based fusion algorithms are founded on image In this paper, we introduce convolutional neural
segmentation and the impact of the segmentation accuracy networks. Although convolution neural networks have been
on the fusion quality is critical [22-23]. successfully applied in the field of face recognition, license
With regard to both the transform domain and spatial plate recognition [27], behaviour recognition [28], speech
domain image fusion methods, the fusion map is the crucial recognition [29] and image classification [30], there are few
factor. To further enhance the quality of the image fusion, applications for image fusion work. We solved the problem
many of the recently proposed methods have become mentioned above with a novel image segmentation-based
increasingly complex. In recent years, the multi-focus image fusion method, in which the task of detecting the
image fusion algorithm based on the spatial domain has decision map is treated as image segmentation between the
been widely discussed. The simplest pixel-based image focused and defocused regions in the source images. The
fusion algorithm directly averages the pixel values of all of proposed method achieves segmentation through a
the source images. The advantages of the direct averaging multi-scale convolutional neural network (MSCNN), which
method are simple and fast, but its fused images tend to conducts multi-scale analysis on each input image to derive
produce blurring effects, thus losing some of the original the individual feature maps on the region boundaries
image information. To overcome the shortcomings of the between the focused and defocused regions. Feature maps
direct average algorithm, several state-of-the-art are then inter-fused to produce a fused feature map.
pixel-based image fusion algorithms have been proposed, Additionally, the fused map is post-processed using initial
such as guided filtering [24] and dense SIFT [25]. Guided segmentation, morphological operation and the watershed
filtering and dense SIFT first generate the fusion map by transform to obtain the segmentation map/decision map.
detecting the focused pixels from each source image; then, We illustrate that the decision map obtained from the
based on the modified decision map, the final fused image MSCNN is trustworthy and that it can lead to high-quality
is obtained by selecting the pixels in the focus areas. fusion results. Experimental results evidently validate that
Decision map is the focus region detection map, in which, the proposed algorithm can obtain optimum fusion
the white region indicates the focus region of Source A, performance in the light of both qualitative and quantitative
whereas the black region indicates the focus region of evaluations.
Source B. Using the detected focused regions as a fusion The remainder of this paper is organized as follows. The
decision map to guide the multi-focus image fusion process related theory of the convolutional neural network (CNN)
not only increases the robustness and reliability of the method is introduced in Section Ⅱ . In Section Ⅲ , the
fusion results, but also reduces the complexity of the proposed MSCNN-based fusion method is discussed in
procedure. The multi-scale weighted gradient method is to detail. In SectionⅣ, the detailed results and discussions of
the experiments are presented. Finally, in SectionⅤ, we
reconstruct I F by making its gradient as close as possible
conclude the paper.
Ⅱ. CONVOLUTIONAL NEURAL NETWORK model
2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2735019, IEEE
Access
CNN is an emblematical depth learning model that attempts typical CNN is mainly composed of an input layer,
to learn a hierarchical representation of an image at convolution layer, max-pooling(subsampling), fully
different abstraction levels [31]. As shown in Fig. 1, a connected layer and output layer.
fully
convolution max- convolution max- output
input layer connected
layer pooling layer pooling layer
layer
The input of the convolution neural network is usually propagated backward by gradient descent, and the training
parameters (W and b) of the individual layers of the
the original image X . In this paper, we use H i to represent
convolution neural network are updated layer by layer.
the feature map of the i-th layer of the convolution neural
E (W , b)
Wi Wi (3)
network ( H 0 X ). Assuming that H i is the Wi
pA , pB
convolution layer; then, there is max-pooling of the feature
of the same scene, our goal is to learn an CNN
graph according to a certain max-pooling rule. Through the
alternation of multiple convolution and max-pooling layers, whose output is a scalar ranging from 0 to 1 . Specifically,
the convolution neural network relies on a fully connected
when p A is focused while pB is the defocused region,
network to classify the extracted features to obtain the
probability distribution Y based on the input. The
the output value should be close to 1 , and when pB is
convolution neural network is essentially a mathematical
model that makes the original matrix H 0 pass through focused while p A is the defocused region, the output value
multiple levels of data transformation or dimension should be close to 0 .That is to say, the output value
reduction, mapping to a new feature expression Y . represents the focus property of the patch pair. Therefore,
the use of the CNN to fuse the image in theory is feasible.
Y (i) P( L li H 0 ;(W , b)) (2)
Ⅲ . MULTI-SCALE CONVOLUTIONAL NEURAL
The training objective of the CNN is to minimize the loss NETWORK METHOD
A.Method Formulation
function L(W , b) of the network. The residuals are
The conceptual work flow of the proposed MSCNN
2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2735019, IEEE
Access
method is demonstrated in Fig. 2. The schematic diagram the same size of the source image in this paper. In the
of the proposed multi-focus image fusion algorithm is second step, the feature map is split into a binary map with
shown in Fig. 3. From Fig. 3, we can see that the proposed a threshold of 0.9. In the third step, we extract and
multi-focus image fusion algorithm consists of four steps: smoothen the binary segmented map with the
MSCNN, initial segmentation, morphological operation morphological operation and watershed to generate the
and watershed, and the last step, fusion. In the first step, the final decision map (The filter bwareaopen is employed to
two source images are fed into a pretreatment training remove the black area as Morphological operations, and
MSCNN model to produce a feature map, and this map, the threshold of the bwareaopen depends on the image
includes the most focused information from the source
size, that is, 0.01* I H * IW , where I H and IW are the
images. Notably, the coefficient in the map represents the
focus property of the patch that corresponds to the two length and width of the image, respectively.). In the last
source images [37]. Through averaging the overlapping step, the fused image is obtained in the final decision map
patches, we obtain the feature map of the focus map with by the pixel-wise weighted-average strategy.
CNN1 M1
Inter Post-
fusion processing
……
……
Input Extracting
Mf Sf
image multi-scale
Fused feature
The decision map
CNNT MT map
Fig. 2 The conceptual work flow of the proposed method: T is the total number of CNN, Mc is a feature map of the location of boundary pixels of index c
(c=1,…,T), Mf is the fused feature map, and Sf is the decision map.
Source image A
morphological
Initial
MSCNN operation and Fusion
segmentation
watershed
Source image B
Binary segmented
Fused feature map The decision map Fusion images
focus map
2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2735019, IEEE
Access
Downsclale Pre-process
Extract
CNN2
……
……
……
……
Downsclale Pre-process
CNNT
Input iamge
Fig. 4 The process of extracting multi-scale input patches from the input image
2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2735019, IEEE
Access
64 feature map 128 feature map 256 feature map 256 feature map
Input patch
16*16 Output Oa
Source image A
480*640
…
…
…
soft-max
Input patch
16*16
Output Mc
M1:238*318 C3:236*316 Full
Max-pooling :2*2 C3 filter :3*3 connection
C1:478*638
Source image B C2:476*636
C1 filter :3*3
480*640 C2 filter :3*3
CNN Output Ob
Fig. 5 The CNN model used in the proposed method. The source image A and B are first subjected to multi-scale extraction;
then, they are fed into the CNN method as shown in Fig. 3.
Further, a 2-way soft-max layer takes the 2-dimensional rate, i is the iteration index, L is the loss function, and
vector as input and obtains a probability distribution Mc
L
is the derivative of the loss function at wi . In this
over two classes [32-34], where, Mc e Oa
(e e ) ,
Oa Ob
wi
that is, the processing of the soft-max layer. The base paper, the proposed CNN framework use the prevalent deep
learning framework Caffe[35] . The parameters of each
patch size wb is the input of the CNN and is set to 16.
convolutional layer in the CNN are initialized using the
Because all multi-scale patches are adjusted to the same Xavier algorithm [36]. The biases in every convolutional
base size, all of the CNNs in Fig. 2 share the same layer are initially set to 0 . The learning rate of all of the
structure. convolutional layers is equal and is initialized to 0.0001 .
The overlapping patches from the input image are When the loss reaches a steady state, we manually drop 10
extracted with three different sizes: 16 16 , 32 32 , and times. Throughout the training process, the learning rate
64 64 . Then, the two larger patches of the three dropped once.
different sizes, 32 32 and 64 64 , are downsized to To better understand the MSCNN model, we offer a typical
16 16 through a bicubic transformation. Therefore, the output feature map for each convolution layer. The source
patches fed into the CNN are of the same size but reveal images A and B shown in Fig. 6 are employed as the inputs.
different contexts. When training, the patches are rotated at The four corresponding feature maps of each convolution
layer are shown in Fig .6. From the first convolutional
90 /180 across the vertical and horizontal axes. The
layers, we can see that some feature maps catch the high
purpose of this step is to introduce invariance to such frequency information of the input image as illustrated in
changes in the CNN. Through these pre-processing steps, the first column, third column and fourth column, while the
the patches are fed into the CNN framework for training. second column is similar to the input images. This finding
E. Training indicates that the first layer cannot adequately characterize
Similar to the CNN-based tasks [33–34], the soft-max loss the spatial details of the image. The feature maps obtained
function is employed as the objective of the proposed CNN from the second layers are mainly focused on the extracted
framework. In this article, stochastic gradient descent is spatial details, covering different gradient directions. The
employed to minimize the loss function. The weight decay output feature maps from the third convolutional layer
and the momentum are set to 0.0005and 0.9, respectively, successfully capture the focus information of the different
in our CNN training procedure. The weights are updated source images. We can see that the feature map obtained by
using the following rule: the fully connected layers shows that the focus area has
been relatively clear.
L
vi 1 0.9* vi 0.0005* * wi * (10) Ⅳ. EXPERIMENTAL RESULTS AND ANALYSIS
wi
In this section, we introduce the evaluation index system
where v is the momentum variable, is the learning and analyse the experimental results as follows.
2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2735019, IEEE
Access
To verify the validity of the proposed MSCNN-based with four state-of-the-art multi-focus image fusion methods,
fusion algorithm, eight pairs of multi-focus images which are the MWGF [44], SSDI[45], CNN[37], and
(including colour images and greyscale images) are used in DSIFT[46]. The detailed analysis and discussions are given
our experiments. The proposed fusion method is compared below.
lay 1
lay 2
lay 3
Fully connected
Fig. 6 Some typical output feature maps of each convolutional layer. Here, “lay 1”, “lay 2”and “lay 3” and the fully connected layer denotes the C1, C2
2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2735019, IEEE
Access
Fig.7 (a) and (b) show the difference image obtained by subtracting source image A and source image B from each fused image, respectively, and
2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic researchthe
(c) shows only. Personal
fused imageuse is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2735019, IEEE
Access
A. Comparison with several other algorithms often cannot achieve the ideal fusion image from the source
We compare the validity of different fusion algorithms in image.
terms of the subjectivity first. For this purpose, we mainly It can be seen from Fig. 9 (d) that there are many jagged
provide the “Lab” source image pair as an example to show phenomena in the lower right corner of the boundary area.
the difference between the different methods. Fig. 7(c) At the same time, the fusion image on the left side of the
shows the fused images obtained with several different stone shows two black spots(see Fig. 9 (d)), which
fusion algorithms. In each of the fused images, the area indicates that the integration of the SSDI-based method is
around the boundary between the focused and defocused not sufficient. Similarly, we can see from Fig. 5 (e) that the
regions is magnified and displayed in the lower left corner. fused image obtained by the DSIFT-based method has
The CNN, MWGF and DSIFT based algorithms produce many jagged phenomena in the lower right corner of the
some undesirable artefacts in the fused image (as show in boundary region. Fig. 9 (a) and (b) show that the fusion
the right border of the clock), and the artefact is particularly image of MSCNN and CNN-based appear very good, and
pronounced for the CNN-based and MWGF-based methods. the boundaries shown in Fig. 9 (a) and (b) are relatively
The fusion results based on the MWGF and SSDI fusion smooth with respect to several other methods. However,
methods are blurred in the upper right corner of the clock. compared with Fig. 9 (b), the contour of the boundary of
To make a better comparison, Fig. 7(a) and (b) show the Fig. 9 (a) is closer to the stone lion.
difference image obtained by subtracting source image A Finally, because of the superiority of the proposed
and source image B from each of the fused images, methods, MSCNN could accurately find the multi-focus
respectively, and the values of each difference image in Fig. boundary between the focused and defocused parts and
7 are normalized to the range of 0 to 1. The difference then obtain a more accurate decision map from the source
images CNN (b) and DSIFT (b) displayed in Fig. 7 clearly images than from the other four fusion algorithms in this
show that the CNN and DSIFT-based method has a partial paper. The fusion image of MSCNN based model shows
residual in the upper right corner. The SSID-based the satisfactory visual quality compared to several other
approach is not sufficient in the integration of the head of algorithms.
the character. The difference image displayed in Fig. 7 B. Decision map of the proposed method
SSDI (b) also reveals this limitation. According to Fig. 7 Since the fused images are difficult to categorize between
MWDF (b), one can observe that the MWDF-based good and bad, to further prove the validity of the MSCNN
approach performs well in terms of extraction details, model for multi-focus image fusion, we mainly compare
except for the border area. In summary, the fusion image the decision map that is produced by a variety of methods.
obtained by the proposed method has the highest visual According to the decision map, one can clearly see the
quality in all five of these methods, which can be further advantages and disadvantages of various fusion methods.
proved by the difference image displayed in Fig. 7 (a) and The comparison results of eight pairs of input source
(b). images are shown in Fig. 10. The “choose-max” strategy is
The fused results of the „Temple‟ image set are shown in employed as the binary segmentation approach of the
Fig. 8. To clearly demonstrate the details of the fused proposed fusion algorithm to obtain a binary segmented
results, partial regions of the fused results are magnified, as map from the feature map (as shown in Fig. 3) with a fixed
shown in Fig. 9. As seen, the fusion results of each threshold. Thus, for multi-focus image fusion, the binary
algorithm can achieve the goal of the image fusion. segmented map/decision map can be considered to be the
However, different quality fused images were produced by actual output of our MSCNN model. From the binary
different fusion algorithms, depending on the performances segmented map of Fig. 3, we can conclude that the gained
of the various fusion methods. MWGF-based methods segmented maps of the MSCNN model are
produce a fusion image with blurring effects, such as the highly efficacious in that most pixels are classified
boundary between the focused and defocused parts around correctly, which illustrates the effectiveness of the learned
the stone lion (see Fig. 9 (c)). Compared with several other MSCNN model.
algorithms, we can clearly see that the erosion in stone lion However, there still exist some defects in light in the
is more serious for the boundary area obtained by the binary segmented map. First, a number of pixels are
MWGF-based method. Thus, the MWGF-based algorithm sometimes misclassified, which leads to the emergence of
2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2735019, IEEE
Access
holes or small regions in the segmented maps. Therefore, obtained from the MSCNN-based methods are very precise
we use mathematical morphology to address the binary in the boundary (which has been proven to be correct in Fig.
segmentation map and to obtain the final decision map. The 9), which results in higher visual quality fusion results
final decision maps displayed in the fifth column of Fig. 10 shown in the last column of Fig. 10 .
Fig. 8. The fusion results of all of the algorithms on the „Temple‟ image set. (a) and (b) are the source image, (c)-(g) are the fusion results of MWGF,
SSDI, CNN, DSIFT, and MSCNN; (h) clearly shows the boundary between the focused and defocused parts overlaid on the fusion image of MSCNN.
(d) (e)
Fig. 9. Magnified regions contain the boundary of the fusion results of all of the algorithms on the „Temple‟ image set. (a)-(e) are the magnified
regions extracted from the fused image through the MSCNN,CNN, MWGF, SSDI, and DSIFT-based methods.
2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2735019, IEEE
Access
h
Fig. 10 The decision map obtained by MWGF, SSDI, CNN, DSIFT and MSCNN. (a) Lab. (b) Temple. (c) Seascape. (d) Book.
C. Objective criteria of the proposed method on the structural similarity (SSIM) metric [48] without
requiring the reference image. The objective performance
To prove the validity and practicability of the proposed
on the fused images using the five fusion methods are listed
algorithm, the three indexes of mutual information MI,
in tableⅠ, from which we observe that the MSCNN-based
QAB/F and Q(A,B,F) are used as the objective evaluation
method provides the best fusion results by considering the
index of information fusion performance [40-43]. Q(A,B,F)
metrics MI and Q(A,B,F). According to the scores of the
is a similarity based quality metric [47] that is the objective
metric QAB/F, one can observe that the MSCNN-based
evaluation index of information fusion performance based
2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2735019, IEEE
Access
method provides the satisfactory fusion results for the images. The above results indicate that MSCNN-based
“Lab”, “Book” and “Leopard” images, while the DSIFT method needs to be improved in protecting the edge
outperforms the MSCNN-based method for the “Temple” information in the fusion process, since the metric QAB/F
and “Seascape” images, and the CNN outperforms the considers the fused image containing all the input edge
MSCNN-based method for the “Children” and “Flower” information as the ideal fusion result.
2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2735019, IEEE
Access
2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2735019, IEEE
Access
classification with deep convolutional neural network. Neural Process Lett , vol.45 , pp.75–94, 2017
A A SR I Proceclia,vol.6,no.1:89-94,2014. [40] Yong Yang, Song Tong, Shuying Huang, and Pan Lin .
[31] https://en.wikipedia.org/wiki/Deep _ learning . Multi-focus Image Fusion Based on NSCT and
01-Jan-2017 Focused Area Detection. IEEE SENSORS JOURNAL,
[32] P. Sermanet , D. Eigen , X. Zhang , M. Mathieu , R. vol. 15, no. 5, pp.2824-2838, 2015.
Fergus , Y. L , Overfeat: integrated recognition, [41] G. Piella, H. Heijmans. A new quality metric for
localizaton and detection using convolutional image fusion. in Proc. IEEE Int. Conf. Image Process.,
networks, arXiv, vol.62 , pp.1–16, 2014 . Amsterdam, The Netherlands, 2003. pp. 173-176
[33] S. Farfade , M. Saberian , L. Li , Multi-view face [42] Bhatnagar G, Q. M. Directive contrast based
detection using deep convolu- tional neural networks, multimodal medical image fusion in NSCT domain.
in: Proceedings of the 5th ACM on International Con- IEEE Trans. Multimedia, vol. 15, no. 5, pp. 1014-1024,
ference on Multimedia Retrieval, 2015, pp. 643–650 . 2013.
[34] J. Long , E. Shelhamer , T. Darrell , Fully [43] Yang B, Li S. Pixel-level image fusion with
convolutional networks for semantic seg- mentation, simultaneous orthogonal matching pursuit. Inf. Fusion,
in: Proceedings of the IEEE Conference on Computer vol. 13, pp.10-19, 2012
Vision and Pattern Recognition, 2015, pp. [44] Zhou Z.Q, Li S, Wang B. Multi-scale weighted
3431–3440 . gradient-based fusion for multi-focus images.
[35] Y. Jia , E. Shelhamer , J. Donahue , S. Karayev , J. Information Fusion, vol.20, pp.60-72, 2014.
Long , R. Girshick , S. Guadar- rama , T. [45] Guo D, Yan J.W, Qu X. High quality multi-focus
Darrell ,Caffe: convolutional architecture for fast image fusion using self-similarity and depth
feature embedding, in: Proceedings of the ACM information. Optics Communications, vol.338,
International Conference on Multimedia, 2014, pp. pp.138–144, 2015
675–678 . [46] Liu Y, Liu S, Wang Z. Multi-focus image fusion with
[36] X. Glorot , Y. Bengio , Understanding the difficulty of dense SIFT. Inf. Fusion, vol.23, pp.139–155, 2015.
training deep feedfor- ward neural networks, in: [47] C. Yang, J.Q. Zhang, X.R. Wang, X. Liu, A novel
International Conference on Artificial Intelligence and similarity based quality metric for image fusion,
Statistics, 2010 . Inform. Fusion vol.9, no.2, pp: 156-160, 2008.
[37] Liu Y, Chen X, Peng H, Wang Z. Multi-focus image [48] Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli,
fusion with a deep convolutional neural network. Image quality assessment: from error visibility to
Information Fusion ,vol.36 , pp.191–207, 2017. structural similarity, IEEE Trans. Image Process.
[38] Zhang Y, Bai X, Wang T. Boundary finding based vol.13, no.4, pp: 600-612, 2004.
multi-focus image fusion through multi-scale
morphological focus-measure. Inf. Fusion,
vol.35 ,81–101, 2017
[39] Z.Wang, S.Wang, Y, Zhu. Multi-focus Image Fusion
Based on the Improved PCNN and Guided Filter.
2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.