Attention-Guided CNN For Image Denoising
Attention-Guided CNN For Image Denoising
Chunwei Tian, Yong Xu, Zuoyong Li, Wangmeng Zuo, Lunke Fei,
Hong Liu
PII: S0893-6080(19)30424-1
DOI: https://doi.org/10.1016/j.neunet.2019.12.024
Reference: NN 4356
Please cite this article as: C. Tian, Y. Xu, Z. Li et al., Attention-guided CNN for image denoising.
Neural Networks (2020), doi: https://doi.org/10.1016/j.neunet.2019.12.024.
This is a PDF file of an article that has undergone enhancements after acceptance, such as the
addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive
version of record. This version will undergo additional copyediting, typesetting and review before it
is published in its final form, but we are providing this version to give early visibility of the article.
Please note that, during the production process, errors may be discovered which could affect the
content, and all legal disclaimers that apply to the journal pertain.
of
d
School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China
e
School of computer science and technology, Guangdong University of Technology, Guangzhou, 510006,
Guangdong, China
f
Engineering Lab on Intelligent Perception for Internet of Things, Shenzhen Graduate School, Peking University,
pro
Shenzhen, 518055, Guangdong, China
Abstract
Deep convolutional neural networks (CNNs) have attracted considerable interest in low-level com-
re-
puter vision. Researches are usually devoted to improving the performance via very deep CNNs.
However, as the depth increases, influences of the shallow layers on deep layers are weakened.
Inspired by the fact, we propose an attention-guided denoising convolutional neural network (AD-
Net), mainly including a sparse block (SB), a feature enhancement block (FEB), an attention block
(AB) and a reconstruction block (RB) for image denoising. Specifically, the SB makes a tradeof-
lP
f between performance and efficiency by using dilated and common convolutions to remove the
noise. The FEB integrates global and local features information via a long path to enhance the
expressive ability of the denoising model. The AB is used to finely extract the noise information
hidden in the complex background, which is very effective for complex noisy images, especially
real noisy images and bind denoising. Also, the FEB is integrated with the AB to improve the ef-
ficiency and reduce the complexity for training a denoising model. Finally, a RB aims to construct
a
the clean image through the obtained noise mapping and the given noisy image. Additionally,
comprehensive experiments show that the proposed ADNet performs very well in three tasks (i.e.
urn
synthetic and real noisy images, and blind denoising) in terms of both quantitative and qualitative
evaluations. The code of ADNet is accessible at http://www.yongxu.org/lunwen.html.
Keywords: Image denoising, CNN, Sparse block, Feature enhancement block, Attention block
1. Introduction
Jo
Image denoising is a typical problem for low-level vision applications in the real world [52].
Since image denoising has ill-posed nature and important realistic significance, it has become
∗
Corresponding author
Email addresses: [email protected] (Yong Xu), [email protected] (Zuoyong Li)
a hot topic in the field of image processing and computer vision [53]. Specifically, the typical
image denoising methods [28] usually apply the model y = x + υ to recover the latent clean
image, x, where y and υ represent the given noisy image and additive white Gaussian noise with
standard deviation, σ, respectively. From Bayesian of view, it is known that the image prior with
a given likelihood is key to overcome the ill-posed denoising problem. Inspired by that fact, a lot
of methods based on image degradation model have been developed to suppress the noise from
noisy images in the past 20 years [8]. Specifically, sparse methods were utilized to improve the
of
performance and reduce the computational cost [56]. Dong et al. used nonlocal self-similarity to
compute the sparse coefficients, then, centralized these obtained coefficients to recover the clean
image [9]. Taking the efficiency into account, the dictionary learning embedded sparse method was
employed to remove the noise from the given image [32]. To deal with the practical application,
pro
Gu et al. [13] extracted more information under different conditions as weights to address the
nuclear norm minimization problem for image denoising. Also, using image detailed information
to obtain the latent clean image was also a good tool in image denoising [2]. Base on this idea, a
Markov random field (MRF) consolidated wavelet transform to smooth the noisy image and filter
the noise [33]. Additionally, there were other excellent methods such as total-variation (TV) [4]
for image denoising.
re-
Although the methods above have obtained competitive performance in image denoising, most
of these methods are faced with three major problems: (1) manually selected parameters, (2)
complex optimized algorithms, and (3) certain noise task.
Due to strong representation ability, deep learning methods have become the dominant tech-
niques to solve these drawbacks. Specifically, deep CNNs with flexible modular architectures are
lP
also very popular for image applications, especially image denoising[14, 45]. Zhang et al. [57]
first proposed a denoising convolutional neural network (DnCNN), comprising residual learning
(RL) [16] and batch normalization (BN) [18] to recover the corrupted image. It is noted that the
proposed DnCNN can use a model to deal with multi denoising tasks, such as Gaussian noisy,
JPEG and low-resolution images. However, there is a challenge that as growth of the depth, deep
networks may result in performance degradation. For resolving the problem, iterative and skip
a
connection operations in CNNs were developed for image restoration [62]. For example, a deep
recursive residual network (DRRN) utilized global and local RL techniques to enhance the rep-
urn
resentation ability of the trained model in image restoration [43]. Similarly, a deeply-recursive
convolution network (DRCN) fused information of each layer into the final layer by utilizing the
RL technique to improve the denoising performance [19]. Also, increasing the width of the net-
work is useful to mine more information in image denoising [27]. Additionally, combining the
prior and CNN is also very beneficial to accelerate the training in image restoration [26]. Al-
though the proposed methods above have visual effects on image restoration, they still suffer from
the following drawbacks: (1) some of the methods such as residual dense network [62] have high
Jo
computational cost and memory consumption. (2) Some of these deep networks do not make full
use of the effect from shallow layers on the deep layers. (3) Most of these methods neglect that
complex background can hide some key features.
In this paper, we propose a denoising network as well as ADNet, which is composed of a
SB, a FEB, an AB and a RB. Specifically, the SB based on dilated and common convolutions is
used to extract useful features from the given noisy image. That can improve both the denoising
2
Journal Pre-proof
performance and efficiency. Then, the FEB fuses global and local features via a long path to
enhance the expressive ability of denoising model. It is noted that complex background from
the given image or video is easier to hide the features, which increases the difficulty of training
[24]. Thus, we use an attention mechanism to guide a CNN for image denoising. That is, the
AB can finely extract noise information hidden in the complex background to deal with complex
noisy images, i.e. real noisy image and blind denoising. Actually, the FEB is also utilized to
consolidate AB for improving the efficiency and complexity of the denoising model. Finally, the
of
RB is used to construct the clean image. Additionally, extensive experiments illustrate that our
ADNet outperforms state-of-the-art denoising methods such as block-matching and 3-D filtering
(BM3D) [8] and DnCNN in terms of both quantitative and qualitative analysis. Besides this, the
model is small, which is suitable to practical applications, i.e. mobile phones and cameras.
pro
The main contributions of this work can be summarized as follows:
(1) The SB comprising dilated and common convolutions is proposed to improve the denoising
performance and the efficiency. Also, it is used to reduce the depth.
(2) The FEB uses a long path to fuse information from shallow and deep layers for enhancing
the expressive ability of the denoising model.
(3) The AB is used to deeply mine noise information hidden in the complex background from
re-
the given noisy images, which is very useful to handle complex noisy images, such as real noisy
image and blind denoising.
(4) The FEB is integrated with AB to improve the efficiency and reduce the complexity for
training a denoising model.
(5) The ADNet is very superior to the state-of-the-arts on six benchmark datasets for synthetic
lP
2. Related Work
urn
number of training samples is also a good choice to suppress the noise. For example, a generative
adversarial network (GAN) [48] comprising a generative network and a discriminative network
was used to generate virtual samples, according to the training samples in image denoising. The
two sub-networks ruled the game theory. That is, the generative network was in charge of increas-
ing training samples. The other was utilized to verify the reliability of the generated samples.
Additionally, using the skip connection operation in CNNs to enhance the expressive ability of
3
Journal Pre-proof
the denoising model was effective. For example, a very deep persistent memory network (Mem-
Net) [44] was proposed to enhance the effects of shallow layers on deep layers through recursive
and gate units. A very deep residual encoder-decoder network30 (RED30) [35] utilized the skip
connections for between convolutions and deconvolutions to eliminate the noise or corruption.
Further, the combination of CNN and signal processing technique was beneficial to image de-
noising. A multi-level wavelet CNN (MWCNN) [28] fused the wavelet transform and a U-Net to
extract detailed information of the corrupted image.
of
In improving the efficiency of denoising task, deep CNNs can be regarded as a modular part to
plug into some classical optimized methods for recovering the latent clean image, which was very
effective to cope with the noisy image [58]. An image restoration CNN (IRCNN) [58] integrated
a CNN and half quadratic splitting (HQS) to accelerate the training speed for image denoising.
pro
Actually, reducing the number of parameters was very popular to improve the efficiency from the
training of the denoising model. Additionally, small filter sizes have been adopted to reduce the
computational cost and memory consumption. An information distillation network (IDN) [17]
including feature extraction, stacked information distillation and reconstruction blocks employed
partial filters of size 1 × 1 to distill more useful information for image super-resolution. That is
also very suitable to image denoising. Also, combining the nature of the given task and CNN
re-
can ease the difficulty of training. For example, a fast and flexible denoising convolutional neural
network (FFDNet) [59] used the noisy image patches and the noise mapping patches as input of the
network to efficiently tackle blind denoising. Further, separating the convolutions can improve the
denoising efficiency [7]. From these descriptions, it can know that deep CNNs are very competing
to both performance and efficiency in image denoising. Motivated by the fact, we use a deep CNN
lP
the receptive field, i.e. increasing the depth and width of the deep networks. However, the first
way with deeper CNNs suffer from difficulty of training. The second way may be involved into
urn
more parameters and increase the complexity of the denoising model. Inspired by the fact, dilated
convolutions [55] are developed. To give an example to show the principle of dilated convolutions.
A dilated convolution with dilated factor of 2 has the receptive field size of (4n + 1) × (4n + 1),
where n denotes the depth of a given deep CNN. We assume that n is 10, the receptive field size of
the given CNN is 41 × 41. Thus, it can map context information of 41 × 41. Moreover, it has the
same effect as 20-layer CNN with standard convolutions of 3 × 3, dilated factor of 1. Thus, dilated
convolutions make a tradeoff between increasing the depth and width of deep CNNs. Taking these
Jo
reasons into account, some scholars used dilated convolutions in CNN for image processing. To
resolve the difficulty of training, Peng et al. [39] proposed to use the symmetric skip connections
and dilated convolutions rather than batch normalization [18] to improve the denoising speed and
performance. To reduce the complexity, Wang et al. [50] combined dilated convolutions and the
BN technique to reduce the computational cost in image denoising. To solve the artifact removal,
Zhang et al. [61] utilized multi-scale loss and dilated convolutions to eliminate the effect of artifact
4
Journal Pre-proof
on JPEG compression task. These methods verify the effectiveness of dilated convolutions in
image applications. Thus, we propose a sparse mechanism based on dilated convolutions to reduce
the complexity and improve the denoising performance.
of
features [24]. An attention mechanism based on this reason is proposed [63]. There are usually
two kinds of attention mechanisms: attentions from different sub-networks and different branches
in the same network.
The first method emphasizes the effect of different perspectives to obtain the features of back-
pro
ground. For example, Zhu et al. [63] extracted the saliency features via different sub-networks for
video tracking.
The second method focuses on the effects of different branches in the same network to guide
previous stage for image application. Wang et al. [49] used higher-order statistics to direct the deep
network to improve the performance and efficiency of object detection. Specifically, the proposed
CNN utilized latter stage to offer complementary information for the previous network. The two
re-
methods can quickly find the object. However, the first method referred to multi sub-networks,
it may increase the computational cost of training. Moreover, there are little researches to study
the attention mechanism on image denoising, especially the real noisy image. Inspired by these
reasons, we integrate the second method into CNN for image denoising in this paper.
lP
The SB uses the dilated and standard convolutions to enlarge the receptive field size for improving
denoising performance. The FEB integrates the global and local features of ADNet via a long path
to enhance the expressive ability in image denoising. The attention block can quickly capture the
urn
key noisy features hidden in the complex background for complex noisy tasks, such as real noisy
image and blind denoising. For improving the efficiency, there are three phases: firstly, the SB
facilitates the ADNet to obtain shallow network architecture. Secondly, the FEB compresses the
output from the sixteenth layer as c. Thirdly, the AB uses a convolutional filter of 1 × 1 to reduce
the number of parameters. Further, we will introduce these techniques in later sub-sections.
Jo
where fSB represents the function of SB. OSB is the output of SB and it serves the FEB. The 4-
layer FEB makes full use of global and local features of ADNet to enhance the expressive ability
in image denoising, where the global features are the input noisy image, IN . OSB is regarded to
as local features. The implementations of this block can be transformed as the following formula:
of
where fF EB and OF EB denote the function and output of FEB, respectively. The OF EB is applied
on the AB. It is noted that complex background from the given image or video might be more
easier to hide the features, which increases the difficulty of extracting key features in the training
process [24]. To overcome this problem, the 1-layer AB is proposed to predict the noise. The AB
pro
can be expressed as
IR = fAB (OF EB ), (3)
where fAB and IR stand for the function and output of the AB, respectively. Specifically, IR is
used as the input of the RB. The RB is utilized to reconstruct the clean image via a RL technique.
This process is illustrated as Eq. (4).
re-
IL C =IN − IR
=IN − fAB (fF EB (IN , fSB (IN ))), (4)
=IN − fADN et (IN )
where fADN et is the function of ADNet to predict the residual image. Further, the ADNet is
lP
Dilated Conv+
Dilated Conv+
BN+ReLU
Conv+BN
Conv+BN
Conv+BN
Conv+BN
Conv+BN
Conv+BN
BN+ReLU
BN+ReLU
BN+ReLU
Conv+BN
Conv+BN
+ReLU
+ReLU
+ReLU
+ReLU
+ReLU
+ReLU
+ReLU
+ReLU
a
urn Conv+BN
Conv+BN
Conv+BN
+ReLU
+ReLU
+ReLU
Conv
Conv
Tanh
Cat
SB Ä Å
FEB AB RB
Jo
6
Journal Pre-proof
denote the i − th given clean and noisy images, respectively. The implementations of this process
can be formulated as
of
N
1 ∑ 2
l(θ) = fADN et (INi ) − (INi − ICi ) , (5)
2N i=1
pro
where θ stands for parameters in training the denoising model.
is noted that dilated convolutions can map more context [55]. Base on this idea, these layers can
be regarded as high energy points. The Conv+BN+ReLU is set at the first, third, fourth, sixth,
seventh, eighth, tenth and eleventh layers in ADNet, where can be treated as low energy points.
Specifically, the convolution filter size of 1-12 layers are 3 × 3. The input of the first layer is c,
which is the number of channels from the input noisy image. It is noted that if the given noisy
image is color, c is 3; Otherwise, c is 1. The input and output of 2-12 layers are 64. The combina-
a
tion of several high energy points and some low energy points can be considered as the sparsity.
Additionally, the sparse block uses less high-energy points rather than many high-energy points
urn
to capture more useful information, which can not only improve the denoising performance and
efficiency of training, but also reduce the complexity. That can be proved in Section 4.3. The
implementations of the sparse block are converted as a formula. First, we define some symbols
as follows. D stands for the function of a dilated convolution. R and B represent the function
of ReLU and BN, respectively. CBR is the function of Conv+BN+ReLU. Then, according to the
previous descriptions, we use the following equation to show the SB.
Jo
7
Journal Pre-proof
of
is fitted at the 13-15 layers of the ADNet and their filter sizes are 64 × 3 × 3 × 64. The Conv is
used for the sixteenth layer in ADNet and its filter size is 64 × 3 × 3 × c, where the setting of c is
the same as Section 3.3. Finally, we use a concatenation operation to fuse the input noisy image
pro
and the output of the sixteenth layer to enhance the representation ability of the denoising model.
Thus, the final output size is 64 × 3 × 3 × 2c. Additionally, a Tanh is used to convert the features
obtained into nonlinearity. The descriptions of this process are explained as
where C, Cat and T are functions of a convolution, concatenation and Tanh, respectively. Cat is
re-
used to denote the function of concatenation in Fig. 1. Further, OF EB is used for the AB.
to guide the CNN for training a denoising model. The AB uses the current stage to guide the
previous stage for learning the noise information, which is very useful for unknown noisy images,
i.e. blind denoising and real noisy images. The 1-layer AB only includes a Conv and its size is
2c × 1 × 1 × c, where c is the number of channels from the given corrupted image and its value
is the same as Section 3.3. The AB exploits the following two steps to implement the attention
mechanism. The first step uses a convolution of size 1 × 1 from the seventeenth layer to compress
a
the obtained features into a vector as the weights for adjusting the previous stage, which can also
improve the denoising efficiency. The second step utilizes obtained weights to multiply the output
urn
of the sixteenth layer for extracting more prominent noise features. The implemental procedure
can be transformed as the following formulates.
Ot =C(OF EB ), (8)
IR =Ot × OF EB , (9)
Jo
where Ot is the output of convolution from the seventeenth layer in ADNet. Specifically, ⊗ is used
to express the multiplication operator in Fig. 1 rather than ‘×’ in Eq. (9).
4. Experiments
4.1. Datasets
4.1.1. Training datasets
We use 400 images with size of 180 × 180 from the Berkeley Segmentation Dataset (BSD)
[36] and 3,859 images from the waterloo exploration database [31] to train the Gaussian synthetic
denoising model. It is noticeable that different areas of an image contain different detailed infor-
of
mation [64]. Inspired by that fact, we divide the training noisy images into 1,348,480 patches of
size 50 × 50. The patch is useful to facilitate more robust features and improve the efficiency of
training a denoising model. A side attraction is that noise is varying and complex in the real world.
Base on this reason, we use 100 real noisy images with size of 512 × 512 from the benchmark
pro
dataset [52] to train a real-noisy denoising model. This dataset is captured by five cameras, such
as Canon 5D Mark II, Cannon 80D, Cannon 600D, Nikon D800 and Sony A7 II with different
sensor sizes (e.g. 1, 600, 3,200 and 6, 400). To accelerate the speed of training, the 100 real-noisy
images are also divided into 211,600 patches of size 50 × 50. Additionally, each training image
above is randomly rotated by one way from eight ways: original image, 90◦ , 180◦ , 270◦ , original
image with flopped itself horizontally, 90◦ with flopped itself horizontally, 180◦ with flopped itself
re-
horizontally and 270◦ with flopped itself horizontally.
shown in Fig. 2. The other datasets are color images. Also, the BSD68 and CBSD68 have the
same scenes. The real-noisy cc dataset is captured from three different cameras (i.e. Nikon D800,
Nikon D600 and Canon 5D Mark III) with different ISO (1,600, 3,200 and 6,400). The size of
each real-noisy image is 512 × 512. The nine real-noisy images from the cc are shown in Fig. 3.
a
urn
Jo
of
pro
re-
a lP
urn
Specifically, the epsilon, beta 1 and beta 2 are parameters of the BN. Also, the batch size and the
number of epochs are 128 and 70, respectively. The learning rates vary from 1e-3 to 1e-5 for the
70 epochs. That is, the learning rates of the 1th-30th epochs are 1e-3. The learning rates of the
31th-60th epochs are 1e-4. The learning rates of the final ten epochs are 1e-5. Further, the Adam
[20] is exploited to optimize the loss function.
10
Journal Pre-proof
We apply Pytorch 1.01 [38] and Python 2.7 to train and test the proposed ADNet in image
denoising. Specifically, all the experiments are conducted on the Ubuntu 14.04 from a PC: an Intel
Core i7-6700 CPU, a RAM 16G and an Nvidia GeForce GTX 1080 Ti GPU. Finally, the Nvidia
CUDA of 8.0 and cuDNN of 7.5 are employed to accelerate the training speed of GPU.
of
sparse block, feature enhancement block and attention block.
Sparse block: As shown in [54], it is known that the sparse mechanism meets the following
requirements in general: 1) less high-energy points and more low-energy points. 2) Irregular high-
energy points. Motivated by these ideas, we propose a novel sparse mechanism in CNN for image
pro
denoising. Specifically, due to mapping more context information from dilated convolutions, the
second, fifth, ninth and twelfth layers are regarded as high-energy points. Other layers in the SB
are treated as low-energy points. The combination of several high-energy and many low-energy
points can be considered as sparsity. However, how to choose the high-energy points is important.
Here we choose the high-energy points from the design of network architecture.
It is noted that as the growth of depth, the shallow layer has poorer effect on deep layers in
re-
CNN [44]. If successive dilated convolutions are placed at the shallow layers, the deep layers
cannot fully map the information of shallow layers. Additionally, if the dilated convolution is
set at the first layer, the first layer would be padded into zero. That might cut down the denoising
performance. The fact is tested by Table 1, where ‘SB with extended layers’ has higher peak signal
to noise ratio (PSNR) value [43] than that of ‘SHE with extended layers’. Additionally, the ‘SHE’
lP
is successive high-energy points, which consists of dilated convolutions with dilated factor 2 from
the second, third, fourth and fifth layers. P SN R = 10 ∗ log10 (M AX)2 /M SE, where M AX is
the maximum pixel value of each image. The M SE denotes the error between a real clean image
and a latent clean image [46]. It is noticed that the depth of the ADNet is 17. To keep the same
depth as the ADNet, we add the same 3-layer Conv+BN+ReLU and 2-layer convolutions in the
final five layers to conduct the comparative experiments as shown in Tables 1 and 3, respectively.
a
Inspired by the idea, we do not apply equidistant points as high-energy points. That is proved
through ‘SB with extended layers’ and ‘EHE with extended layers’ as shown in Table 1, where
‘EHE’ denotes the equidistant high-energy points. Also, ‘EHE’ is that dilated convolutions with
dilated factor of 2 are used at the second, fifth, eighteenth and eleventh layers, respectively. Com-
bining the illustrations above, the second, fifth, ninth and twelfth layers with dilated convolutions
as high-energy points are reasonable. Additionally, the SB is also designed, according to both the
denoising performance and efficiency.
Jo
To improve the denoising performance, 12-layer SB has the receptive field of size 33 × 33,
which has the same effect as a 16-layer network. That reduces the depth of the network. Also,
irregular high-energy points can promote the diversity of the network architecture, which is useful
to improve the performance in low-level task [62]. Further, the fact above is verified by Table 2,
where ‘ADNet’ achieves greater PSNR value [43] than that of ‘FEB+AB’. Specifically, ‘FEB+AB’
is ADNet without SB. Additionally, the ‘SB with extended layers’ outperforms the conventional
11
Journal Pre-proof
Table 1: PSNR (dB) results of several networks from BSD68 with σ = 25.
Methods PSNR(dB)
SB with extended layers 29.016
EHE with extended layers 28.886
SHE with extended layers 28.999
ADC with extended layers 28.842
of
CB 28.891
Table 2: PSNR (dB) results of different methods from BSD68 on noise level of 25.
pro
Methods PSNR(dB)
ADNet 29.119
FEB+AB 29.088
FEB 29.045
RL+BN 28.988
RL 20.477
re-
Table 3: Running time of different networks the noisy images of sizes 256 × 256, 512 × 512 and
1024 × 1024 with σ = 25.
block (‘CB’) as illustrated in Table 1 is also tested the fact above, where the ‘CB’ is 15-layer
Conv+BN+ReLU and two convolutions with size 3 × 3.
To improve the efficiency, the SB only uses four high-energy points and eight low-energy
a
points to reduce the running time for dealing with each noisy image, which is very competing
with a lot of high-energy points. The low-efficiency reason from a lot of high-energy points is that
urn
dilated convolution of each layer needs to pad some things to capture more context information.
Thus, the ‘SB with extended layers’ is faster than the ‘ADC with extended layers’ as shown in
Table 3. The ADC is dilated convolutions for the whole twelve layers. Additionally, the receptive
field size of 58×58 from ‘ADC with extended layers’ is greater than the given patch size of 50×50,
which results in patch size cannot full map the ‘ADC with extended layers’. Thus, the proposed
SB has better denoising performance than that of ADC as shown in Table 1. In a summary, our
proposed SB is reasonable and advantageous from the network architecture, performance and
Jo
efficiency.
Feature enhancement block: influence of the network is very important to mine features for
image processing [44]. However, as the growth of the network depth, the effects from the shallow
layers on deep layers are weakened. For resolving the problem, Zhang et al. [62] combined the
local and global features to enhance the performance for image restoration. Inspired by that, the
feature enhancement block concatenates the input noisy image (regarded as global features) and
12
Journal Pre-proof
of
the output of the sixteenth layer (treated as local features) to enhance the effect of the shallow
layer, where concatenation operation is called a long path operation in this paper. Actually, the
pro
input noisy image includes strong noise information. Thus, we choose the original input to be
complementary with deep layer for the denoising task. Finally, performance of the FEB is proved
as shown in Table 2, where ‘FEB’ has higher PSNR than that of ‘RL+BN’, where the ‘RL+BN’
is the combination of RL and BN techniques. Additionally, the output of the sixteenth layer in
ADNet is c, where is useful to improve the denoising efficiency.
Attention block: environment (i.e., dark light) may increase the noise of captured image by the
camera. Thus, how to extract useful information from the noisy image with complex background
re-
is very important. For addressing this problem, the attentive idea is developed to extract saliency
features for image applications [51]. Specifically, the attentive idea uses the current stage to guide
the previous stage and obtain high-frequency features. From this point of view, we propose an
attention mechanism to extract the latent noise hidden in the complex background. Further, the
detailed information of AB is illustrated in Section 3.5. Additionally, we can see that the noise in
lP
Fig. 4 is more obvious than that of Fig. 3, where redder area denotes the higher weight from the
attention mechanism (also regarded as smaller value in Fig. 4). Moreover, the ‘FEB+AB’ is also
higher than ‘FEB’ in PSNR as illustrated in Table 2. These prove that the proposed AB is very
effective for complex noisy images. Also, the convolution size of 1 × 1 in AB is combined with
the c output channels from the sixteenth layer in FEB to reduce the complexity of denoising model
as shown in Table 4. The ‘ADNet’ has smaller parameters and flops than that of ‘Uncompressed
a
ADNet’, where ‘Uncompressed ADNet’ is the 64 output channels from the sixteenth layer and the
filter size of the seventeenth layer is 3 × 3. Thus, taking into account between task characteristic
urn
and complexity, we think that the proposed ADNet is competing in image denoising, especially
complex noisy images. Also, the BN is useful to promote the denoising performance in ADNet as
shown in Table 2.
Table 5: Average PSNR (dB) of different methods on BSD68 with different noise levels of 15, 25
and 50.
Methods BM3D WNNM EPLL MLP CSF TNRD DnCNN IRCNN ECNDNet ADNet ADNet-B
Jo
σ = 15 31.07 31.37 31.21 - 31.24 31.42 31.72 31.63 31.71 31.74 31.56
σ = 25 28.57 28.83 28.68 28.96 28.74 28.92 29.23 29.15 29.22 29.25 29.14
σ = 50 25.62 25.87 25.67 26.03 - 25.97 26.23 26.19 26.23 26.29 26.24
13
Journal Pre-proof
of
pro
re-
a lP
urn
and qualitative analysis. The quantitative analysis is that compares the PSNR, running time and
complexity of ADNet with competing denoising methods, such as BM3D [8], weighted nuclear
norm minimization method (WNNM) [13], Multi-Layer Perception (MLP) [3], trainable nonlinear
reaction diffusion (TNRD) [6], expected patch log likelihood (EPLL) [64], cascade of shrinkage
fields (CSF) [42], DnCNN [57], IRCNN [58], FFDNet [59], enhanced CNN denoising network
(ECNDNet) [45], CBM3D [8], noise clinic (NC) [22], neat image (NI) [1], RED30 [35], MemNet
14
Journal Pre-proof
Table 6: Average PSNR (dB) results of different methods on Set12 with noise levels of 15, 25 and
50.
Images C.man House Peppers Starfish Monarch Airplane Parrot Lena Barbara Boat Man Couple Average
Noise Level σ = 15
BM3D [8] 31.91 34.93 32.69 31.14 31.85 31.07 31.37 34.26 33.10 32.13 31.92 32.10 32.37
WNNM [13] 32.17 35.13 32.99 31.82 32.71 31.39 31.62 34.27 33.60 32.27 32.11 32.17 32.70
EPLL [64] 31.85 34.17 32.64 31.13 32.10 31.19 31.42 33.92 31.38 31.93 32.00 31.93 32.14
CSF [42] 31.95 34.39 32.85 31.55 32.33 31.33 31.37 34.06 31.92 32.01 32.08 31.98 32.32
of
TNRD [6] 32.19 34.53 33.04 31.75 32.56 31.46 31.63 34.24 32.13 32.14 32.23 32.11 32.50
DnCNN [57] 32.61 34.97 33.30 32.20 33.09 31.70 31.83 34.62 32.64 32.42 32.46 32.47 32.86
IRCNN [58] 32.55 34.89 33.31 32.02 32.82 31.70 31.84 34.53 32.43 32.34 32.40 32.40 32.77
FFDNet [59] 32.43 35.07 33.25 31.99 32.66 31.57 31.81 34.62 32.54 32.38 32.41 32.46 32.77
ECNDNet [45] 32.56 34.97 33.25 32.17 33.11 31.70 31.82 34.52 32.41 32.37 32.39 32.39 32.81
ADNet 32.81 35.22 33.49 32.17 33.17 31.86 31.96 34.71 32.80 32.57 32.47 32.58 32.98
pro
ADNet-B 31.98 35.12 33.34 32.01 33.01 31.63 31.74 34.62 32.55 32.48 32.34 32.43 32.77
Noise Level σ = 25
BM3D [8] 29.45 32.85 30.16 28.56 29.25 28.42 28.93 32.07 30.71 29.90 29.61 29.71 29.97
WNNM [13] 29.64 33.22 30.42 29.03 29.84 28.69 29.15 32.24 31.24 30.03 29.76 29.82 30.26
EPLL [64] 29.26 32.17 30.17 28.51 29.39 28.61 28.95 31.73 28.61 29.74 29.66 29.53 29.69
MLP [3] 29.61 32.56 30.30 28.82 29.61 28.82 29.25 32.25 29.54 29.97 29.88 29.73 30.03
CSF [42] 29.48 32.39 30.32 28.80 29.62 28.72 28.90 31.79 29.03 29.76 29.71 29.53 29.84
TNRD [6] 29.72 32.53 30.57 29.02 29.85 28.88 29.18 32.00 29.41 29.91 29.87 29.71 30.06
DnCNN [57] 30.18 33.06 30.87 29.41 30.28 29.13 29.43 32.44 30.00 30.21 30.10 30.12 30.43
re-
IRCNN [58] 30.08 33.06 30.88 29.27 30.09 29.12 29.47 32.43 29.92 30.17 30.04 30.08 30.38
FFDNet [59] 30.10 33.28 30.93 29.32 30.08 29.04 29.44 32.57 30.01 30.25 30.11 30.20 30.44
ECNDNet [45] 30.11 33.08 30.85 29.43 30.30 29.07 29.38 32.38 29.84 30.14 30.03 30.03 30.39
ADNet 30.34 33.41 31.14 29.41 30.39 29.17 29.49 32.61 30.25 30.37 30.08 30.24 30.58
ADNet-B 29.94 33.38 30.99 29.22 30.38 29.16 29.41 32.59 30.05 30.28 30.01 30.15 30.46
Noise Level σ = 50
BM3D [8] 26.13 29.69 26.68 25.04 25.82 25.10 25.90 29.05 27.22 26.78 26.81 26.46 26.72
WNNM [13] 26.45 30.33 26.95 25.44 26.32 25.42 26.14 29.25 27.79 26.97 26.94 26.64 27.05
lP
EPLL [64] 26.10 29.12 26.80 25.12 25.94 25.31 25.95 28.68 24.83 26.74 26.79 26.30 26.47
MLP [3] 26.37 29.64 26.68 25.43 26.26 25.56 26.12 29.32 25.24 27.03 27.06 26.67 26.78
TNRD [6] 26.62 29.48 27.10 25.42 26.31 25.59 26.16 28.93 25.70 26.94 26.98 26.50 26.81
DnCNN [57] 27.03 30.00 27.32 25.70 26.78 25.87 26.48 29.39 26.22 27.20 27.24 26.90 27.18
IRCNN [58] 26.88 29.96 27.33 25.57 26.61 25.89 26.55 29.40 26.24 27.17 27.17 26.88 27.14
FFDNet [59] 27.05 30.37 27.54 25.75 26.81 25.89 26.57 29.66 26.45 27.33 27.29 27.08 27.32
ECNDNet [45] 27.07 30.12 27.30 25.72 26.82 25.79 26.32 29.29 26.26 27.16 27.11 26.84 27.15
ADNet 27.31 30.59 27.69 25.70 26.90 25.88 26.56 29.59 26.64 27.35 27.17 27.07 27.37
a
ADNet-B 27.22 30.43 27.70 25.63 26.92 26.03 26.56 29.53 26.51 27.22 27.19 27.05 27.33
urn
[44] and MWCNN [28] for synthetic images and real noisy images. Specifically, the synthetic
noisy images include two kinds: gray and color Gaussian synthetic noisy images. The gray and
color Gaussian noisy images include noisy images of certain noise level and varying noise levels
from 0 to 55. The noisy image with varying noise levels is called the image with blind noise. For
the first noisy image, we design the experiments on BSD68 and Set12, respectively. As shown in
Table 5, the ADNet achieves the best performance on noise levels of 15, 25 and 50, respectively.
Also, the ADNet for blind denoisng (ADNet-B) outperforms the state-of-the-art denosing method,
Jo
such as DnCNN for noise level of 50. Table 6 is used to show the denoising result of each image
from Set12. From that, we can see that the ADNet is superior to DnCNN, IRCNN and FFDNet
for different noise levels (i.e., 15, 25 and 50) in image denosing. Moreover, the ADNet-B obtains
the second performance for σ = 25 and σ = 50. That shows that our proposed ADNet is very
robust for the certain noisy image and blind denoising. Specifically, red and blue lines are used to
denote the best and second denoising results in Tables 5 and 6, respectively. For the second noisy
15
Journal Pre-proof
of
(a) (b)
(c)
pro (d)
re-
lP
(e) (f)
a
urn
(g) (h)
Figure 5: Denoising results of different methods on one image from BSD68 with σ = 25. (a) Orig-
inal image (b) Noisy image/20.23dB (c) EPLL/37.21dB (d) WNNM/37.26dB (e) TNRD /37.83dB
(f) ECNDNet/38.31dB (g) DnCNN/38.35dB (h) ADNet /38.48dB.
Jo
image, we choose several popular denoising, e.g. CBM3D, FFDNet, IRCNN, ADNet and ADNet-
B on three benchmark datasets (i.e., CBSD68, Kodak24 and McMaster) with different noise levels
of 15, 25, 35, 50 and 75 for image denoising, respectively. As shown in Table 7, we can see
that the ADNet achieves excellent results on color Gaussian synthetic noisy image. Additionally,
our proposed ADNet is very outstanding on real noisy images. As shown in Table 8, the ADNet
obtains the improvement of 1.83dB than that of DnCNN in PSNR, where the popular denoising
16
Journal Pre-proof
of
(a) (b)
(c)
pro (d)
re-
lP
(e) (f)
a
urn
(g) (h)
Figure 6: Denoising results of different methods on one image from Set12 when σ = 15. (a) Orig-
inal image (b) Noisy image/24.64dB (c) EPLL/32.64dB (d) WNNM/32.99dB (e) TNRD/33.04dB
(f) ECNDNet/33.25dB (g) DnCNN/30.30dB (h) ADNet /33.49dB.
Jo
methods such as CBM3D, DnCNN, NI and NC are used as comparative experiments to remove
the noise from the real noisy image. That proves that the proposed AB is useful for complex noisy
images, such as real noisy images. Specifically, the red and blue lines are marked the highest and
second denoising performance in Tables 7, 8 and 9, respectively.
For running time, we choose 11 state-of-the-art denoising methods to conduct the experiments
17
Journal Pre-proof
of
(a) (b)
pro
re-
(c) (d)
Figure 7: Denoising results of one color Gaussian noisy image from Kodak 24 with σ = 35. (a)
Original image (b) Noisy image/17.40dB (c) CBM3D/32.56dB (d) ADNet/33.71dB.
Table 7: PSNR (dB) results of different methods on CBSD68, Kodak24 and McMaster datasets
lP
for testing the denoising running time of a noisy image from different sizes (i.e. 256 × 256, 512 ×
512 and 1024 × 1024) as illustrated in Table 9. From that, we can see that although the proposed
ADNet obtains the second result, its running time is very competing in contrast to other popular
methods. Further, the ADNet has the smaller complexity than that of state-of-the-arts, such as
DnCNN and RED30 as shown in Table 4. Thus, the ADNet is very effective for image denoising
18
Journal Pre-proof
of
(a) (b)
pro
re-
(c) (d)
lP
Figure 8: Denoising results of one color Gaussian noisy image from McMaster with σ = 75. (a)
Original image (b) Noisy image/12.34dB (c) CBM3D/29.64dB (d) ADNet /30.44dB.
19
Journal Pre-proof
Table 9: Running time of 11 popular denoising methods for the noisy images of sizes 256 × 256,
512 × 512 and 1024 × 1024.
Methods Device 256 × 256 512 × 512 1024 × 1024
BM3D [8] CPU 0.59 2.52 10.77
WNNM [13] CPU 203.1 773.2 2536.4
EPLL [64] CPU 25.4 45.5 422.1
MLP [3] CPU 1.42 5.51 19.4
of
TNRD [6] CPU 0.45 1.33 4.61
CSF[42] CPU - 0.92 1.72
DnCNN [57] GPU 0.0344 0.0681 0.1556
RED30 [35] GPU 1.362 4.702 15.77
MemNet [44] GPU 0.8775 3.606 14.69
pro
MWCNN [28] GPU 0.0586 0.0907 0.3575
ADNet GPU 0.0467 0.0798 0.2077
denosing method is more robust. Figs. 5 and 6 show the visual effects from the gray noisy image
denoising model. Figs. 7 and 8 describe the visual effects from the color noisy image denoising
model. From these figures, we can see that the ADNet is clearer than state-of-the-art denoising
methods, such as DnCNN, ECNDNet and CBM3D. Also, red and blue lines are used to express the
re-
best and second denoising results in these figures, respectively. The fact shows that our proposed
ADNet is effective for qualitative analysis. Base on the presentations above, we conclude that the
ADNet is very suitable to image denoising.
5. Conclusion
lP
In this paper, we propose an attention-guided denoising CNN as well as ADNet for image
denoising. Main components of ADNet play the following roles. The SB is based on dilated and
common convolutions and can make a tradeoff between denoising performance and efficiency.
The FEB is utilized to enhance the representation capability on noisy images of the model by
virtue of cooperation of the global and local infoamtion. The AB can extract the latent noise
a
information hidden in the complex background, which is very effective for complex noisy images,
such as images with blind noise and real noisy images. After the components above are carried
urn
out, the RB is implemented to obtain the resultant clean image. Since FEB is consolidated by AB,
the efficiency and complexity for training a denoising model can be improved. The experimental
results show that our proposed ADNet is very effective for image denoising in terms of both
quantitative and qualitative evaluations.
Acknowledgments
Jo
This paper is supported in part by the National Nature Science Foundation of China under
Grant No. 61972187, in part by the Shenzhen Municipal Science and Technology Innovation
Council under Gant No. JCYJ20170811155725434 and in part by the Shenzhen Municipal Sci-
ence and Technology Innovation Council under Grant No. GJHZ20180419190732022.
20
Journal Pre-proof
References
[1] ABSoft, N., 2017. Neat image.
[2] Barbu, A., 2009. Learning real-time mrf inference for image denoising. In: 2009 IEEE Conference on Computer
Vision and Pattern Recognition. IEEE, pp. 1574–1581.
[3] Burger, H. C., Schuler, C. J., Harmeling, S., 2012. Image denoising: Can plain neural networks compete with
bm3d? In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp. 2392–2399.
[4] Chambolle, A., 2004. An algorithm for total variation minimization and applications. Journal of Mathematical
of
imaging and vision 20 (1-2), 89–97.
[5] Chen, C., Xiong, Z., Tian, X., Wu, F., 2018. Deep boosting for image denoising. In: Proceedings of the European
Conference on Computer Vision (ECCV). pp. 3–18.
[6] Chen, Y., Pock, T., 2016. Trainable nonlinear reaction diffusion: A flexible framework for fast and effective
image restoration. IEEE transactions on pattern analysis and machine intelligence 39 (6), 1256–1272.
pro
[7] Cho, S. I., Kang, S.-J., 2018. Gradient prior-aided cnn denoiser with separable convolution-based optimization
of feature dimension. IEEE Transactions on Multimedia 21 (2), 484–493.
[8] Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K., 2007. Image denoising by sparse 3-d transform-domain
collaborative filtering. image processing, ieee transactions on 16 (8), pp. 2080-2095.
[9] Dong, W., Zhang, L., Shi, G., Li, X., 2012. Nonlocally centralized sparse representation for image restoration.
IEEE transactions on Image Processing 22 (4), 1620–1630.
[10] Du, B., Wei, Q., Liu, R., 2019. An improved quantum-behaved particle swarm optimization for endmember
re-
extraction. IEEE Transactions on Geoscience and Remote Sensing.
[11] Ephraim, Y., Malah, D., 1984. Speech enhancement using a minimum-mean square error short-time spectral
amplitude estimator. IEEE Transactions on acoustics, speech, and signal processing 32 (6), 1109–1121.
[12] Franzen, R., 1999. Kodak lossless true color image suite. source: http://r0k. us/graphics/kodak 4.
[13] Gu, S., Zhang, L., Zuo, W., Feng, X., 2014. Weighted nuclear norm minimization with application to image
denoising. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2862–2869.
[14] Guo, S., Yan, Z., Zhang, K., Zuo, W., Zhang, L., 2019. Toward convolutional blind denoising of real photograph-
lP
s. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 1712–1722.
[15] He, K., Zhang, X., Ren, S., Sun, J., 2015. Delving deep into rectifiers: Surpassing human-level performance
on imagenet classification. In: Proceedings of the IEEE international conference on computer vision. pp. 1026–
1034.
[16] He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. In: Proceedings of the
IEEE conference on computer vision and pattern recognition. pp. 770–778.
[17] Hui, Z., Wang, X., Gao, X., 2018. Fast and accurate single image super-resolution via information distillation
a
network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 723–731.
[18] Ioffe, S., Szegedy, C., 2015. Batch normalization: Accelerating deep network training by reducing internal
urn
[23] Li, J., Lu, G., Zhang, B., You, J., Zhang, D., 2019. Shared linear encoder-based multikernel gaussian process
latent variable model for visual classification. IEEE transactions on cybernetics.
[24] Li, Y., Chen, X., Zhu, Z., Xie, L., Huang, G., Du, D., Wang, X., 2019. Attention-guided unified network for
panoptic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
pp. 7026–7035.
[25] Liang, X., Zhang, D., Lu, G., Guo, Z., Luo, N., 2019. A novel multicamera system for high-speed touchless
palm recognition. IEEE Transactions on Systems, Man, and Cybernetics: Systems.
[26] Liu, J., Sun, Y., Xu, X., Kamilov, U. S., 2019. Image restoration using total variation regularized deep im-
21
Journal Pre-proof
age prior. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP). IEEE, pp. 7715–7719.
[27] Liu, P., Fang, R., 2017. Learning pixel-distribution prior with wider convolution for image denoising. arXiv
preprint arXiv:1707.09135.
[28] Liu, P., Zhang, H., Zhang, K., Lin, L., Zuo, W., 2018. Multi-level wavelet-cnn for image restoration. In: Pro-
ceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. pp. 773–782.
[29] Lu, Y., Lai, Z., Li, X., Wong, W. K., Yuan, C., Zhang, D., 2018. Low-rank 2-d neighborhood preserving projec-
tion for enhanced robust image representation. IEEE transactions on cybernetics 49 (5), 1859–1872.
of
[30] Lu, Y., Lai, Z., Xu, Y., Li, X., Zhang, D., Yuan, C., 2015. Low-rank preserving projections. IEEE transactions
on cybernetics 46 (8), 1900–1913.
[31] Ma, K., Duanmu, Z., Wu, Q., Wang, Z., Yong, H., Li, H., Zhang, L., 2016. Waterloo exploration database: New
challenges for image quality assessment models. IEEE Transactions on Image Processing 26 (2), 1004–1016.
pro
[32] Mairal, J., Bach, F. R., Ponce, J., Sapiro, G., Zisserman, A., 2009. Non-local sparse models for image restoration.
In: ICCV. Vol. 29. Citeseer, pp. 54–62.
[33] Malfait, M., Roose, D., 1997. Wavelet-based image denoising using a markov random field a priori model. IEEE
Transactions on image processing 6 (4), 549–565.
[34] Malfliet, W., Hereman, W., 1996. The tanh method: I. exact solutions of nonlinear evolution and wave equations.
Physica Scripta 54 (6), 563.
[35] Mao, X., Shen, C., Yang, Y.-B., 2016. Image restoration using very deep convolutional encoder-decoder net-
works with symmetric skip connections. In: Advances in neural information processing systems. pp. 2802–2810.
re-
[36] Martin, D., Fowlkes, C., Tal, D., Malik, J., et al., 2001. A database of human segmented natural images and its
application to evaluating segmentation algorithms and measuring ecological statistics. Iccv Vancouver:.
[37] Nam, S., Hwang, Y., Matsushita, Y., Joo Kim, S., 2016. A holistic approach to cross-channel image noise
modeling and its application to image denoising. In: Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition. pp. 1683–1691.
[38] Paszke, A., Gross, S., Chintala, S., Chanan, G., 2017. Pytorch. Computer software. Vers. 0.3 1.
lP
[39] Peng, Y., Zhang, L., Liu, S., Wu, X., Zhang, Y., Wang, X., 2019. Dilated residual networks with symmetric skip
connection for image denoising. Neurocomputing 345, 67–76.
[40] Ren, D., Zuo, W., Hu, Q., Zhu, P., Meng, D., 2019. Progressive image deraining networks: a better and simpler
baseline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3937–3946.
[41] Roth, S., Black, M. J., 2005. Fields of experts: A framework for learning image priors. In: 2005 IEEE Computer
Society Conference on Computer Vision and Pattern Recognition (CVPR’05). Vol. 2. Citeseer, pp. 860–867.
[42] Schmidt, U., Roth, S., 2014. Shrinkage fields for effective image restoration. In: Proceedings of the IEEE
a
[44] Tai, Y., Yang, J., Liu, X., Xu, C., 2017. Memnet: A persistent memory network for image restoration. In:
Proceedings of the IEEE international conference on computer vision. pp. 4539–4547.
[45] Tian, C., Xu, Y., Fei, L., Wang, J., Wen, J., Luo, N., 2019. Enhanced cnn for image denoising. CAAI Transac-
tions on Intelligence Technology 4 (1), 17–23.
[46] Tian, C., Xu, Y., Zuo, W., 2020. Image denoising using deep cnn with batch renormalization. Neural Networks
121, 461–473.
[47] Tian, C., Zhang, Q., Sun, G., Song, Z., Li, S., 2018. Fft consolidated sparse and collaborative representation for
image classification. Arabian Journal for Science and Engineering 43 (2), 741–758.
Jo
[48] Tripathi, S., Lipton, Z. C., Nguyen, T. Q., 2018. Correction by projection: Denoising images with generative
adversarial networks. arXiv preprint arXiv:1803.04477.
[49] Wang, H., Wang, Q., Gao, M., Li, P., Zuo, W., 2018. Multi-scale location-aware kernel representation for object
detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 1248–1257.
[50] Wang, T., Sun, M., Hu, K., 2017. Dilated deep residual network for image denoising. In: 2017 IEEE 29th
International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, pp. 1272–1279.
[51] Wang, X., Girshick, R., Gupta, A., He, K., 2018. Non-local neural networks. In: Proceedings of the IEEE
22
Journal Pre-proof
of
iv:1511.07122.
[56] Zha, Z., Liu, X., Huang, X., Shi, H., Xu, Y., Wang, Q., Tang, L., Zhang, X., 2017. Analyzing the group sparsity
based on the rank minimization methods. In: 2017 IEEE International Conference on Multimedia and Expo
(ICME). IEEE, pp. 883–888.
pro
[57] Zhang, K., Zuo, W., Chen, Y., Meng, D., Zhang, L., 2017. Beyond a gaussian denoiser: Residual learning of
deep cnn for image denoising. IEEE Transactions on Image Processing 26 (7), 3142–3155.
[58] Zhang, K., Zuo, W., Gu, S., Zhang, L., 2017. Learning deep cnn denoiser prior for image restoration. In:
Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3929–3938.
[59] Zhang, K., Zuo, W., Zhang, L., 2018. Ffdnet: Toward a fast and flexible solution for cnn-based image denoising.
IEEE Transactions on Image Processing 27 (9), 4608–4622.
[60] Zhang, L., Wu, X., Buades, A., Li, X., 2011. Color demosaicking by local directional interpolation and nonlocal
adaptive thresholding. Journal of Electronic imaging 20 (2), 023016.
re-
[61] Zhang, X., Yang, W., Hu, Y., Liu, J., 2018. Dmcnn: Dual-domain multi-scale convolutional neural network for
compression artifacts removal. In: 2018 25th IEEE International Conference on Image Processing (ICIP). IEEE,
pp. 390–394.
[62] Zhang, Y., Tian, Y., Kong, Y., Zhong, B., Fu, Y., 2018. Residual dense network for image super-resolution. In:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2472–2481.
[63] Zhu, Z., Wu, W., Zou, W., Yan, J., 2018. End-to-end flow correlation tracking with spatial-temporal attention.
lP
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 548–557.
[64] Zoran, D., Weiss, Y., 2011. From learning models of natural image patches to whole image restoration. In: 2011
International Conference on Computer Vision. IEEE, pp. 479–486.
a
urn
Jo
23
Journal Pre-proof
Dear Editors:
I would like to submit the manuscript entitled “Attention-guided CNN for
Image Denoising”, which I wish to be considered for publication in “Neural
Networks”. No conflict of interest exits in the submission of this manuscript, and
manuscript is approved by all authors for publication. I would like to declare on
behalf of my co-authors that the work described was original research that has not
been published previously, and not under consideration for publication elsewhere, in
whole or in part. All the authors listed have approved the manuscript that is enclosed.
of
In this paper, we propose an attention-guided denoising
convolutional neural network (ADNet), mainly including a sparse
block (SB), a feature enhancement block (FEB), an attention block
(AB) and a reconstruction block (RB) for image denoising.
pro
Specifically, the SB makes a tradeof between performance and
efficiency by using dilated and common convolutions to remove the
noise. The FEB integrates global and local features information via a
long path to enhance the expressive ability of the denoising model.
The AB is used to finely extract the noise information hidden in the
re-
complex background, which is very efective for complex noisy
images, especially real noisy images and bind denoising. Also, the
FEB is integrated with AB to improve the efficiency and reduce the
complexity for training a denoising model. Finally, a RB aims to
construct the clean image through the obtained noise mapping and
lP
receiving comments from the reviewers. If you have any queries, please don’t hesitate
urn