NERNet Noise Estimation and Removal Network For Image Denoising
NERNet Noise Estimation and Removal Network For Image Denoising
NERNet Noise Estimation and Removal Network For Image Denoising
71 (2020) 102851
a r t i c l e i n f o a b s t r a c t
Article history: While some denoising methods based on deep learning achieve superior results on synthetic noise, they
Received 20 January 2020 are far from dealing with photographs corrupted by realistic noise. Denoising on real-world noisy images
Revised 8 May 2020 faces more significant challenges due to the source of it is more complicated than synthetic noise. To
Accepted 27 June 2020
address this issue, we propose a novel network including noise estimation module and removal module
Available online 20 July 2020
(NERNet). The noise estimation module automatically estimates the noise level map corresponding to the
information extracted by symmetric dilated block and pyramid feature fusion block. The removal module
Keywords:
focuses on removing the noise from the noisy input with the help of the estimated noise level map.
Image denoising
Convolutional neural networks
Dilation selective block with attention mechanism in the removal module adaptively not only fuses fea-
Attention mechanism tures from convolution layers with different dilation rates, but also aggregates the global and local infor-
Dilated convolution mation, which is benefit to preserving more details and textures. Experiments on two datasets of
Dilation rate selecting synthetic noise and three datasets of realistic noise show that NERNet achieves competitive results in
comparison with other state-of-the-art methods.
Ó 2020 Elsevier Inc. All rights reserved.
https://doi.org/10.1016/j.jvcir.2020.102851
1047-3203/Ó 2020 Elsevier Inc. All rights reserved.
2 B. Guo et al. / J. Vis. Commun. Image R. 71 (2020) 102851
Fig. 1. Comparison of different noises. Realistic noise is far different from other synthetic noise.
by adding fusion sub-network that concatenates features of differ- where Mr ðÞ denotes the removal module, ‘‘Concat” operation con-
ent levels in the backbone. catenates the noisy input x and the estimated noise level map
What is well known to us all is that deeper network has better r^ ðyÞ, W r is the network parameters of the removal module.
learning capabilities. The above methods have simple architecture In the noise estimation module, the mean absolute error (MAE)
but still achieve better performance than traditional methods. To between the real noise level map rðyÞ and the estimated noise
effectively stimulate potential abilities of deep networks, Mao level map r ^ ðyÞ as loss function:
et al. [14] design a very deep network with encoder-decoder archi-
1X N
tecture (RED), which has a strong learning capacity. Thanks to the Losse ¼ jjr
^ ðyÞ rðyÞjj1 ð3Þ
deep network structure, RED can use a single model to handle dif- N i¼1
ferent levels of noise. However, it is challenging to recover edges
It should be noted that the ground truth rðyÞ used to deal with
and textures from noisy images. Some details in noisy images are
synthetic noise and realistic noise is different. For synthetic noise,
inevitably lost during the forward propagation when the number
the ground truth rðyÞ is the noise with specific standard deviation.
of convolution layers increases. To address it, Tai et al. [15] pro-
As for realistic noise, the ground truth rðyÞ is the difference x y
posed a network named MemNet, consisting of a recursive unit
acquired by noisy image x minus clean image y.
and a gate unit, to adaptively learn the weight of different features.
What’s more, in the removal module, the mean absolute error
This novel architecture modeled after human memory can take
(MAE) between the clean image y and the estimated clean image
advantage of multi-scale information and address the problem of
^ as loss function:
y
long-term memory in DnCNN, IRCNN, RED. Inspired by MemNet,
Jia et al. [16] develop FOCNet, including multi-scale memory sys- 1X N
tem that fuses features of previous layers and different scales, by Lossr ¼ ^ yjj1
jjy ð4Þ
N i¼1
solving a fractional-order optimal (FOC) control problem.
Regardless of shallow network or deep network based on deep
learning, they both focus on AWGN without spatial variance. Zhang 3.2. Noise estimation module
et al. [21] propose FFDNet that performs better on spatial variant
noise by transforming the noise into the noise level map and taking In the task of predicting the noise level map, sufficient contex-
it as input. This conversion introduces more information about tual information is beneficial to acquiring accurate noise and infer-
noise, which is beneficial to estimating noise. However, FFDNet is ring high-frequency components. To make full use of features from
not efficient when the noise standard-deviation is unknown, and different layers and scales, we design symmetric dilated convolu-
it increases the difficulty to train the model. tion block and pyramid feature fusion block. In the remainder of
the subsection, we introduce the design idea and function of each
2.3. Denoising of real images block in detail.
Due to the source of realistic noise is far from AWGN, denoising 3.2.1. Symmetric dilated convolution block
of real images is more difficult and significant than synthetic noise. Currently, the effective method to expand the receptive field is
Some PCA based methods [22,23] provide the motivation to deal dilated convolution [26] that can cover more spatial information of
with this issue, they build two stages including noise estimation feature map without increasing extra parameters and computation
and denoising. Inspired by these, Foi et al. [24] proposed local esti- pressure. However, it will cause the Gridding phenomenon when
mation of multiple expectation/standard-deviation pairs, and glo- we use a stack of convolution layers with the same dilation rate.
bal parametric model realize denoising task. Guo et al. [25] Inspired by multiple dilated convolutional blocks [27], we find
designed CBDNet also containing two sub-networks, i.e., noise esti- the combination of different dilation rates will improve the recep-
mation sub-network, and blind denoising sub-network. What’s tive field and avoid the Gridding phenomenon.
more, a novel loss function helps avoiding inaccurate noise level To extract more discriminative features from input noisy
predicted by noise estimation sub-network. images, we proposed a symmetric dilated convolutional block
including five DCRs as shown in Fig. 2. Furthermore, in each convo-
3. Proposed method lution layer of DCR, the kernel size is 3 3, dilation rate is set to
1,2,3,2,1 and the channel is set to 64. We can calculate the recep-
This section presents our NERNet consisting of the noise estima- tive filed of each layer is 3,10,21,28,31 according to the formula:
tion module and the removal module. To begin with, we introduce
X
k
the general network architecture of NERNet. Furthermore, we lk ¼ lk1 þ ½ðf k 1Þ si ð5Þ
described each module in detail. i¼1
3.1. Network architecture where lk , f k denote the receptive field and the kernel size of layer k,
si represents the stride of pooling layer i.
As illustrated in Fig. 2, the proposed NERNet includes the noise What’s more, we concatenate feature maps extracted from each
estimation module and the removal module. First, the noise esti- convolution layer of the symmetric dilated convolutional block:
mation module produces the estimated noise level map r ^ ðyÞ from F 0 ¼ Concatðf 1 ; :::; f 5 Þ ð6Þ
a noisy observation x, which can be formulated as:
where the ‘‘Concat” operation autonomously learns the weight
r^ ðyÞ ¼ Me ðx; W e Þ ð1Þ value of each feature, f i is the feature map extracted from each layer
where M e ðÞ denotes the noise estimation module, W e is the net- and F 0 is the feature sent to next block.
work parameters of the noise estimation module.
Then, the removal module takes both x and r
^ ðyÞ as input to pre- 3.2.2. Pyramid feature fusion block
dict the clean image y^: As one of the classic network architectures, pyramid pooling
module [28] has been widely applied to image segmentation and
^ ¼ Mr ðConcatðx; r
y ^ ðyÞÞ; W r Þ ð2Þ target detection, but haven’t been used for image denoising.
4 B. Guo et al. / J. Vis. Commun. Image R. 71 (2020) 102851
UpSampling
Dilaon
DCR-2
DCR-1
DCR-3
DCR-2
DCR-1
2x2
Conv 1x1 Selecve
Block
UpSampling
4x4
Conv 1x1
DCR-N
factor N Dense convoluonal
+ block
NxN ReLU
Average Pooling Concatenate
Fig. 2. The architecture of NERNet. Given a noisy image, the noise estimation module produces the feature of noise level, and the removal module predict the final clean
image.
The foremost advantage of the pyramid pooling module is that structure in DenseNet. As the line is shown in Fig. 3, we concate-
spatial information of different sizes can be fully extracted, which nate the feature obtained from the former layer with the features
can improve the robustness to different object sizes. On the con- acquired from the latter layers, it can be expressed as:
trary, image details will be lost due to multiple pooling operations.
To address this issue and take advantage of the pyramid pooling xi ¼ wc Concatðx0 ; :::; xiþ1 Þ þ bc ; i 2 f1; 2; 3; 4g ð9Þ
module, we design the pyramid feature fusion block. Our module
consists of three branches with different pooling kernels size, where wc is a filter of size of 1 1 and bc is the biases, the ‘‘Concat”
and each kernel size is set to 1 1, 2 2, 4 4. Then, the smaller operation concatenates the features extracted from different convo-
feature obtained after pooling will be concatenated with the upper lution layer of Dense Block.
layer feature, and this operation makes full use of feature informa- Furthermore, each convolution layer has 3 3 kernel size and
tion from multiple sizes and reduces the loss of details as much as padding 1, ensuring that the size of the feature map is consistent
possible. Last but not least, we estimate the feature of noise level during the forward propagation.
by a 1 1 convolution layer.
The feature of different branches and final output feature of the
noise level could be expressed as the follows: 3.3.2. Dilation selective block
As described in Section 3.1, dilated convolution is good for
Ai ¼ Av g i ðF 0 Þ ð7Þ enlarging the receptive field. To select features produced by the
where ‘‘Av g i ” means the average pooling layer with kernel size i, Ai
means the feature after the ‘‘Av g i ” operation, the ‘‘Concat” operation
concatenates the features extracted from different branches, wc is a
filter of size of 1 1 and bc is the biases, F 1 is the feature of noise
level.
We acquire the feature of noise level after the noisy input has
been passed through the noise estimation module. The next step
is to build a removal module to remove the noise from the noisy
input. We choose U-Net [29] as the basic framework. What’s more,
we modify U-Net by using dense block and dilation selective block.
In remainder of this subsection, we introduce the structure of each
block in detail.
Fig. 4. The structure of our dilation selective block. The ‘‘FC’’’ means the fully connected layer.
convolution layer with different dilation rates, inspired by SKNet In addition to focusing on the local attention, we also extract
[31], we propose the dilation selective block. global information F global by using a global average pooling layer
As shown in Fig. 4, the input of our dilation selective block is the (GAP):
feature F 2 outputted by two dense blocks and two average pooling
F global ¼ GAPðF 3 Þ ð12Þ
layers. It is divided into three parts by three parallel convolution
layers with different dilation rates 1,3 and 5, respectively, to get We combine the local information F local and the global informa-
the middle features F I2 , F II2 and F III
2 . We add both parts via
tion F global to acquire the attention weight F fusion . Then F fusion is
element-wise to get feature F 3 : shrunk and expanded by passing through two fully connected lay-
ers, and later operated by Softmax to get three weight outputs k0 , l0
F 3 ¼ F I2 þ F II2 þ F III
2 ð10Þ and m0 . Specifically, the Softmax operation is applied on the
Then we put F 3 into an attention mechanism that including a channel-wise digits:
local attention block and a global attention block, as shown in 0
ekc
Fig. 5. kc ¼ 0 ; k ¼ k; l; m ð13Þ
e þ elc þ emc
k0c 0
Fig. 5. The operation of attention mechanism in dilation selective block. The ‘‘GAP’’’ means the global average pooling operation.
6 B. Guo et al. / J. Vis. Commun. Image R. 71 (2020) 102851
Table 1
Average PSNR achieved by various algorithms on two synthetic noise datasets.
Dataset Noise level BM3D [2] IRCNN [11] DNCNN [12] FFDNET [21] HRLNet [13] FOCNet [16] FFDNet* NERNet* NERNet
BSD68 15 33.52 33.86 33.89 33.87 33.83 33.07 33.85 34.08 34.03
25 30.69 31.16 31.23 31.21 31.14 30.73 31.20 31.39 31.37
50 27.37 27.86 27.92 27.96 27.86 27.43 27.96 28.22 28.12
Set14 15 32.37 32.77 32.86 32.77 32.96 32.98 32.74 33.34 33.32
25 29.97 30.38 30.43 30.44 30.51 30.53 30.41 30.59 30.57
50 26.72 27.14 27.18 27.32 27.34 27.33 27.31 27.37 27.36
The significance of bold font means the best result of each contrast experiment or ablation experiments.
Fig. 6. Comparison of our model with other methods on an image from Set14. The region of blue box shows more detailed part of the result. (For interpretation of the
references to color in this figure legend, the reader is referred to the web version of this article.)
B. Guo et al. / J. Vis. Commun. Image R. 71 (2020) 102851 7
Fig. 7. Comparison of our model with other methods on an image from BSD68. The region of orange box shows more detailed part of the result. (For interpretation of the
references to color in this figure legend, the reader is referred to the web version of this article.)
with other state-of-the-art methods in synthetic noise and realistic 4.2. Parameter settings for network training
noise denoising. Finally, we illustrate the function of each module
by ablation studies. We adopt the ADAM algorithm with the exponential decay rate
equaling 0.9 to train our model. The method described in [42] is
4.1. Datasets selected for weight initialization. The batch size is set to 8, and
the epoch is set to 80. For symmetric dilated convolutional module
It is well known to us all that the quality of datasets is vital for and spatial feature pyramid module, we set the initial learning rate
the performance of the model in image denoising. We select a lot of of feature extraction module as 103 . The initial learning rate to
datasets of source image according to different tasks. inference module is set to 5 103 . Both learning rates of two
For denoising towards synthetic noise, we collected 1600 modules multiply 0.1 every 20 epochs.
images from Waterloo [33], 1600 images from MIT-Adobe Five K We implement our model with the PyTorch package. All the
[34], 500 images from ImageNet [35] as training data. We use experiments are carried out in Ubuntu 16.04 + python 3.6 environ-
two datasets, namely BSD68 [36] and Set14 [37], to test the ability ments running on a PC with Intel I7-9700 K CPU, Nvidia GeForce
to remove synthetic noise. The BSD68 dataset consists of 68 color RTX 2080Ti GPU and Galaxy GAMER 32 GB RAM. It takes about
images from the BSD300 dataset [36]. The Set14 dataset including 36 h to train our model on GPU with accelerating by CUDA
14 images, which is widely used for evaluating the ability of 8.0.61 + cuDNN-v6.1.
denoising. What’s more, both images in the phase of training and
testing should not resize or crop.
For denoising towards realistic noise, we selected Renoir [38] as 4.3. Comparison with state-of-the-art methods
training data. Three datasets of real-world noisy images, i.e., NC12
[39], Nam [40], and SIDD [41], are adopted for testing. The NC12 4.3.1. Experiment on synthetic noisy images
dataset includes 12 realistic noise images without ground-truth. In this subsection, we evaluate our model on the synthetic noisy
The Nam dataset consists of 11 static scenes, and images in the images corrupted by spatially invariant AWGN. Unlike other meth-
same scenes mostly contain similar objects and textures. For each ods, we train our model with fixed level AWGN (non-blind) and
scene, 500 temporal images were captured to compute ground- random level AWGN (blind), rather than only using fixed level
truth image and noise image. The SIDD dataset contains 320 image AWGN. Then we evaluate thee denoising ability towards different
pairs for training, 1280 images for validation, and 1280 images for level AWGN.
testing. We crop images to 350 350 in the phase of training and As the PSNR on BSD68 and Set14 in comparison with other
testing. methods shown on Table 1, our model has achieved very compet-
8 B. Guo et al. / J. Vis. Commun. Image R. 71 (2020) 102851
Fig. 8. Comparison of our model with other methods on an image from NC12. The region of green box shows more detailed part of the result. (For interpretation of the
references to color in this figure legend, the reader is referred to the web version of this article.)
itive results for all noise level. Our network exceeds BM3D by ule into FFDNet, calling it FFDNet*. We also find that the result is
about 0.7 dB for a wide range of noise level on BSD68 and Set14 very close to FFDNet, as shown on 1. It proves that our noise esti-
that proves the powerful performance of deep learning. Further- mation module could predict the accurate noise level map.
more, our model produces better images as shown in Figs. 6 and 7.
Contrast to FFDNet, the advantages of NERNet can be summa- 4.4. Experiment on realistic noisy images
rized to two folds: (i) benefit from the noise estimation module,
our method needn’t to know the noise level, which can be called Due to realistic image noise coming from multiple sources, the
non-blind denoising; (2) the structure of the removal module is denoising of the realistic noisy image is difficult. In this subsection,
more complex than FFDNet and has better performance on denois- we further verify the robustness of our model towards to realistic
ing. To further prove our method is superior than FFDNet, we con- noise which is more valuable in practical applications. We select
duct relevant experiments. First of all, we remove the noise NC12, Nam and SIDD as our testing datasets that have been widely
estimation module and reserve the removal module, calling it NER- used by other denoising methods.
Net*. Then, we take the specific noise level used in FFDNet into NC12: We only report the denoising results through the
NERNet*. As shown on Table 1, we find that NERNet* has better denoised image due to the ground-truth images of NC12 are
performance than FFDNet and NERNet*, which demonstrates the unavailable. We compared different methods including traditional
architecture of the removal module is better than FFDNet. What’s method (BM3D) and deep learning methods (DnCNN, FFDNet and
more, contrast to NERNet, it confirms that the more accurate noise CBDNet). The images produced by different methods as shown in
level map outputs the better results. Last but not least, we take the Fig. 8. The result of BM3D is still fuzzy. DnCNN trained on synthetic
estimated noise level map outputted by the noise estimation mod- noise can’t effectively remove the noise, and several light spots
B. Guo et al. / J. Vis. Commun. Image R. 71 (2020) 102851 9
Fig. 9. Comparison of our model with other methods on an image from SIDD. The result predicted by our model shows the best visual effects.
Table 2
The PSNR/SSIM achieved by various methods on the SIDD dataset.
The significance of bold font means the best result of each contrast experiment or ablation experiments.
Table 3
Average PSNR removal by different methods on the Nam dataset.
Method BM3D [2] DnCNN [12] FFDNet [21] CBDNet [25] NERNet
PSNR 37.30 35.55 38.7 39.01 40.10
The significance of bold font means the best result of each contrast experiment or ablation experiments.
appear on some patches in the image. The denoising image pro- tle regions of image to absolutely reveal the ability of different
duced by FFDNet is dark than the original noise image, which methods, as shown in Fig. 9. One can see that NERNet generates
makes visual effect worse. CBDNet performs not well in some more accurate details and textures than other models.
details of the image. Compared with the above methods, our Nam: In CBDNet, they random select 25 patches of cropped
method achieves the better visual effects and preserves more infor- images for testing rather than using both images. This testing
mation on whole image. method can’t completely reflect denoising ability. To address this
SIDD: We compare the PSNR and SSIM of different methods on issue and more accurately reflects the results, we crop images into
the SIDD validation datasets as shown in Table 2. It is obvious that 1/16 of the original size and use all of them for evaluation. The
our model achieves the best result. Moreover, we select three sub- average PSNR and SSIM scores are presented in Table 3, and we
10 B. Guo et al. / J. Vis. Commun. Image R. 71 (2020) 102851
Fig. 10. Comparison of our model with other methods on an image from Nam. The (i) The network containing both three modules achieves the
regions of different color boxes show more detailed parts of the results. best result than other architectures, which proves the effect
of our designed modules.
(ii) It is clear that the dilation selective module improves the
PSNR higher than other modules.
Table 4
The receptive fields of plain convolution block (PC) and symmetric dilated convolu-
tion block (SDC). 4.6. Comparison on testing time
Layer 1 2 3 4 5
In addition to PSNR or SSIM quality evaluation, another impor-
PC 3 5 7 9 11
SDC 3 10 21 28 31
tant aspect is testing time. We select 50 images from BSD68 with
noise level 15 for the speed evaluation among different ablation
studies. We compare different model on CPU and GPU to evaluate
their speed. Table 6 shows the testing times of DnCNN, FFDNet,
FOCNet, CBDNet and NERNet for images of 256 256, 512 512
can conclude that our method performs better than other methods and 1024 1024. We can conclude that our model consumes less
under different evaluation indicators. The visual denoising images time. What’s more, we also compare the testing time of different
produced by our model have more accurate details, as shown in structure in NERNet, as shown in Table 7. As modules increase,
Fig. 10. What’s more, our model is a blind denoising method and testing time doesn’t increase too much. It is clear that the time cost
more flexible than other non-blind methods. of each module is similar.
Table 5
The evaluation results of ablation study on BSD68 and SIDD datasets.
The significance of bold font means the best result of each contrast experiment or ablation experiments.
B. Guo et al. / J. Vis. Commun. Image R. 71 (2020) 102851 11
Table 6
The testing time (in seconds) of different image sizes.
Size Device DnCNN [25] FFDNet [21] HRLNet [13] FOCNet [16] CBDNet [31] NERNet
256 256 CPU 2.44 0.62 0.55 2.32 1.24 0.58
GPU 0.011 0.008 0.007 0.009 0.008 0.007
512 512 CPU 9.85. 2.51 2.17 9.11 3.56. 2.08
GPU 0.040 0.017 0.018 0.039 0.021 0.15
1024 1024 CPU 38.11 10.17 9.21 35.11. 11.59. 9.19
GPU 0.057 0.057 0.058 0.059 0.059 0.056
The significance of bold font means the best result of each contrast experiment or ablation experiments.
Table 7 Acknowledgement
The testing time (in seconds) of different structures among ablation studies.
SDC PFF DUDS CPU GPU This work is supported by the National Natural Science Founda-
p p tion of China (51805078), the National Key Research and Develop-
0.58 0.007
p p ment Program of China (2017YFB0304200), the Fundamental
0.60 0.008
p p
p p p
0.61 0.008 Research Funds for the Central Universities (N2003021), Taizhou
0.58 0.007 Science and Technology Planning Project (1902GY09).
5. Conclusion References
[1] B.K. Shreyamsha Kumar, Image denoising based on non-local means filter and
In this paper, we proposed a novel denoising method based on its method noise thresholding, SIViP 7 (6) (2013) 1211–1227.
deep learning, namely NERNet, which performs better on both syn- [2] W.J. Li et al., Fast combination filtering based on weighted fusion, J. Vis.
thetic noise and realistic noise than other state-of-the-art methods. Commun. Image Represent. 62 (2019) 226–233.
[3] J. Liu et al., Image denoising searching similar blocks along edge directions,
The proposed network includes two sequential stage. The first Signal Processing-Image Commun. 57 (2017) 33–45.
stage estimated the noise level map of from the noisy input. Benefit [4] S.H. Gu et al., Weighted Nuclear Norm Minimization with Application to Image
from the dilated convolution block and the pyramid feature fusion Denoising, Ieee Conference on Computer Vision and Pattern Recognition (Cvpr)
2014 (2014) 2862–2869.
block, the noise estimation module has the natural attributes of [5] Y.X. Liu et al., Patch based image denoising using the finite ridgelet transform
learning multiple scale information of noise and effectively esti- for less artifacts, J. Vis. Commun. Image Represent. 25 (2014) 1006–1017.
mates the accurate noise level map. At the second stage, with the [6] Y. Niu et al., Model-based adaptive resolution upconversion of degraded
images, J. Vis. Commun. Image Represent. 23 (2012) 1144–1157.
help of the noise level map estimated by the noise estimation mod- [7] W.H. Li et al., Total variation blind deconvolution employing split Bregman
ule, the removal module combines multiple branches with differ- iteration, J. Vis. Commun. Image Represent. 23 (2012) 409–417.
ent dilation rate to enhance the performance of denoising. [8] Z.X. Cui et al., A nonconvex nonsmooth regularization method with structure
tensor total variation, J. Vis. Commun. Image Represent. 43 (2017) 30–40.
Extensive experiments on two synthetic noise datasets and three [9] D. Cho, T.D. Bui, Multivariate statistical modeling for image denoising using
realistic noise datasets demonstrate the performance of our model. wavelet transforms, Signal Processing-Image Commun. 20 (1) (2005) 77–89.
[10] K.Q. Huang et al., Color image denoising with wavelet thresholding based on
human visual system model, Signal Processing-Image Commun. 20 (2) (2005)
Declaration of Competing Interest 115–127.
[11] K. Zhang et al., Beyond a Gaussian denoiser: residual learning of deep CNN for
The authors declare that they have no known competing finan- image denoising, IEEE Trans. Image Process. 26 (7) (2017) 3142–3155.
[12] K. Zhang et al., Learning deep CNN denoiser prior for image restoration, Ieee
cial interests or personal relationships that could have appeared Conference on Computer Vision and Pattern Recognition (Cvpr) 2017 (2017)
to influence the work reported in this paper. 2808–2817.
12 B. Guo et al. / J. Vis. Commun. Image R. 71 (2020) 102851
[13] W.Z. Shi et al., Hierarchical residual learning for image denoising, Signal [29] O. Ronneberger, P. Fischer, T. Brox, U-Net: convolutional networks for
Processing-Image Commun. 76 (2019) 243–251. biomedical image segmentation, Medical Image Computing and Computer-
[14] X.J. Mao, C.H. Shen, Y.B. Yang, Image restoration using very deep convolutional Assisted Intervention, Pt Iii 9351 (2015) 234–241.
encoder-decoder networks with symmetric skip connections. Advances in [30] G. Huang et al., Densely connected convolutional networks, Ieee Conference on
Neural Information Processing Systems 29 (Nips 2016), 2016. 29. Computer Vision and Pattern Recognition (Cvpr) 2017 (2017) 2261–2269.
[15] Y. Tai et al., MemNet: a persistent memory network for image restoration, Ieee [31] Xiang Li, et al., Selective Kernel Networks, 2019 Ieee Conference on Computer
International Conference on Computer Vision (Iccv) 2017 (2017) 4549–4557. Vision and Pattern Recognition (Cvpr), 2019, pp. 510–519.
[16] X. Jia et al., FOCNet: A fractional optimal control network for image denoising, [32] P. Drineas, M.W. Mahoney, On the Nystrom method for approximating a gram
Ieee Conference on Computer Vision and Pattern Recognition (Cvpr) 2019 matrix for improved kernel-based learning, J. Mach. Learn. Res. 6 (2005) 2153–
(2019) 6054–6063. 2175.
[17] L. Fan, F. Zhang, H. Fan, et al., Brief review of image denoising techniques, [33] K.D. Ma et al., Waterloo exploration database: new challenges for image
Visual Computing for Industry, Biomedicine, and Art 2 (1) (2019) 7. quality assessment models, IEEE Trans. Image Process. 26 (2) (2017) 1004–
[18] A. Ben Hamza, H. Krim, Image denoising: A nonlinear robust statistical 1016.
approach, IEEE Trans. Signal Process. 49 (12) (2001) 3045–3054. [34] V. Bychkovsky et al., Learning Photographic Global Tonal Adjustment with a
[19] J. Benesty, J.D. Chen, Y.T. Huang, Study of the widely linear wiener filter for Database of Input/Output Image Pairs, Ieee Conference on Computer Vision
noise reduction, in: 2010 Ieee International Conference on Acoustics, Speech, and Pattern Recognition (Cvpr) 2011 (2011) 97–104.
and Signal Processing, 2010, pp. 205–208. [35] J. Deng et al., ImageNet: A large-scale hierarchical image database, in: 2009
[20] R.K. Yang et al., Optimal weighted median filtering under structural Ieee Conference on Computer Vision and Pattern Recognition (Cvpr), 2009, pp.
constraints, IEEE Trans. Signal Process. 43 (3) (1995) 591–604. 248–255.
[21] K. Zhang, W.M. Zuo, L. Zhang, FFDNet: toward a fast and flexible solution for [36] R. Zeyde, M. Elad, M. Protter, On single image scale-up using sparse-
CNN-based image denoising, IEEE Trans. Image Process. 27 (9) (2018) 4608– representations, in: International conference on curves and surfaces, 2010,
4622. pp. 711–730.
[22] X.H. Liu, M. Tanaka, M. Okutomi, Single-image noise level estimation for blind [37] S. Roth, M.J. Black, Fields of Experts, Int. J. Comput. Vision 82 (2) (2009) 205–
denoising, IEEE Trans. Image Process. 22 (12) (2013) 5226–5237. 229.
[23] S. Pyatykh, J. Hesser, L. Zheng, Image noise level estimation by principal [38] J. Anaya, A. Barbu, RENOIR - A dataset for real low-light image noise reduction,
component analysis, IEEE Trans. Image Process. 22 (2) (2013) 687–699. J. Vis. Commun. Image Represent. 51 (2018) 144–154.
[24] A. Foi et al., Practical Poissonian-Gaussian noise modeling and fitting for [39] M. Lebrun, M. Colom, J.M. Morel, The noise clinic: a universal blind denoising
single-image raw-data, IEEE Trans. Image Process. 17 (10) (2008) 1737–1754. algorithm, in: 2014 Ieee International Conference on Image Processing (Icip),
[25] Guo, Shi, et al., Toward convolutional blind denoising of real photographs, 2014, pp. 2674–2678.
2019 Ieee Conference on Computer Vision and Pattern Recognition (Cvpr), [40] S. Nam et al., A holistic approach to cross-channel image noise modeling and
2019, pp. 1712–1722. its application to image denoising, in: 2016 Ieee Conference on Computer
[26] Yu, Fisher, and Vladlen Koltun., Multi-scale context aggregation by dilated Vision and Pattern Recognition (Cvpr), 2016, pp. 1683–1691.
convolutions. 2015, arXiv preprint arXiv:1511.07122. [41] A. Abdelhamed, S. Lin, M.S. Brown, A high-quality denoising dataset for
[27] Yunchao Wei, et al., Revisiting dilated convolution: A simple approach for smartphone cameras, in: 2018 Ieee Conference on Computer Vision and
weakly-and semi-supervised semantic segmentation. 2018 Ieee Conference on Pattern Recognition (Cvpr), 2018, pp. 1692–1700.
Computer Vision and Pattern Recognition (Cvpr), 2018, pp. 7268–7277. [42] K.M. He et al., Delving deep into rectifiers: surpassing human-level
[28] He, K.M., et al., Spatial Pyramid Pooling in Deep Convolutional Networks for performance on imagenet classification, in: 2015 Ieee International
Visual Recognition. Computer Vision - Eccv 2014, Pt Iii, 2014. 8691: p. 346-361. Conference on Computer Vision (Iccv), 2015, pp. 1026–1034.