KBNET
KBNET
Abstract
How to aggregate spatial information plays an essential role in learning-based image restoration.
Most existing CNN-based networks adopt static convolutional kernels to encode spatial informa-
tion, which cannot aggregate spatial information adaptively. Recent transformer-based architectures
achieve adaptive spatial aggregation. But they lack desirable inductive biases of convolutions and
require heavy computational costs. In this paper, we propose a kernel basis attention (KBA) module,
which introduces learnable kernel bases to model representative image patterns for spatial informa-
tion aggregation. Different kernel bases are trained to model different local structures. At each spatial
location, they are linearly and adaptively fused by predicted pixel-wise coefficients to obtain aggrega-
tion weights. Based on the KBA module, we further design a multi-axis feature fusion (MFF) block to
encode and fuse channel-wise, spatial-invariant, and pixel-adaptive features for image restoration. Our
model, named kernel basis network (KBNet), achieves state-of-the-art performances on more than ten
benchmarks over image denoising, deraining, and deblurring tasks while requiring less computational
cost than previous SOTA methods. Code will be released at https://github.com/zhangyi-3/kbnet.
1
Springer Nature 2021 LATEX template
kernels for all spatial locations and therefore have fused by the coefficients to produce adaptive and
limited capacity to handle different types of image diverse aggregation weights for each pixel.
structures and textures. While a few works [7, 47, Unlike the window-based self-attention in pre-
66] in the low-level task of burst image denois- vious works [40, 61, 63] that uses dot products
ing were proposed to adaptively encode local between pairwise positions to generate spatial
neighborhoods features, they require heavy com- aggregation weights, the KBA module generates
putational costs to predict adaptive convolutional the weights by adaptively combining the learn-
kernels of each output pixel location. able kernel bases. It naturally keeps the inductive
Recently, the vision transformers have shown biases of convolutions while handling various spa-
great progress where the attention mechanism tial contexts adaptively. KBA module is also dif-
uses dot products between pairwise positions to ferent from existing dynamic convolutions [27, 32]
obtain the linear aggregation weights for encod- or kernel prediction networks [7, 47, 62, 66], which
ing local information. A few efforts [12, 40, 61, predict all the kernel weights directly. The aggre-
63] have been made in image restoration. In gation weights by the KBA module are predicted
IPT [12], the global attention layers are adopted to by spatially adaptive fusing the shared kernel
aggregate from all spatial positions, which, how- bases. Thus, our KBA module is more lightweight
ever, face challenges on handling high-resolution and easier to optimize. The ablation study vali-
images. A series of window-based self-attention dates the proposed design choice.
solutions [40, 61, 63] have been proposed to allevi- Based on the KBA module, we design a multi-
ate the quadratic computational cost with respect axis feature fusion (MFF) block to extract and
to the input size. While self-attention enables each fuse diverse features for image restoration. We
pixel to adaptively aggerate spatial information combine operators of spatial and channel dimen-
from the assigned window, it lacks the induc- sions to better capture image context. In the
tive biases possessed by convolutions (e.g. local- MFF block, three branches, including channel
ity, translation equivalence, etc.), which is useful attention, spatial-invariant, and pixel-wise adap-
in modeling local structures of images. Some tive feature extractions are parallelly performed
work [18, 26, 44] also reported that using convolu- and then fused by point-wise production.
tions to aggregate spatial information can produce By integrating the proposed KBA module and
more satisfying results than self-attention. MFF block into the widely used U-Net architec-
In this paper, to tackle the challenge of effec- tures with two variants of feed-forward networks,
tively and efficiently aggregating neighborhood our proposed KBNet achieves state-of-the-art per-
information for image restoration, we propose formances on more than ten benchmarks of image
a kernel basis network (KBNet) with a novel denoising, deraining, and deblurring.
kernel basis attention (KBA) module to adap- The main contributions of this work are sum-
tively aggregate spatial neighborhood informa- marized as follows:
tion, which takes advantage of both CNNs and
1. We propose a novel kernel basis attention
transformers. Since natural images generally share
(KBA) module to effectively aggregate the spa-
similar patterns across different spatial locations,
tial information via a series of learnable kernel
we first introduce a set of learnable kernel bases to
bases and linearly fusing the kernel bases.
learn various local patterns. The kernel bases spec-
2. We propose an effective multi-axis feature
ify how the neighborhood information should be
fusion (MFF) block, which extracts and fuses
aggregated for each pixel, and different learnable
channel-wise, spatial-invariant, and pixel-wise
kernel bases are trained to capture various image
adaptive features for image restoration.
patterns. Intuitively, similar local neighborhoods
3. Our method kernel basis network (KBNet)
would use similar kernel bases so that the bases
demonstrates its generalizability and state-of-
are learned to capture more representative spatial
the-art performances on three image restora-
patterns. Given the kernel bases, we adopt a sepa-
tion tasks including denoising, deblurring, and
rate lightweight convolution branch to predict the
deraining.
linear combination coefficients of kernel bases for
each pixel. The learnable kernel bases are linearly
Springer Nature 2021 LATEX template
convolution
" #
(#, %)
Linear Fusion
! !! !′
(#, %)
(#, %)
!! [#, %]
Input Enhanced Feature Map Output
2
RN ×C×4×K contains N grouped convolution ker- the spatial position (i, j) can be obtained by the
nels. The C and K 2 denote channel number and linear combination of learnable kernel bases:
kernel size respectively. The group size is set to C4
N
for balancing the performance-efficiency tradeoff. X
Fusion Coefficients Prediction. To adap- M [i, j] = F [i, j, t]W [t],
t=0
tively combine the N learnable kernel bases at
each spatial location, given the input feature 2
where W [t] ∈ RC×4×K denotes the t-th learn-
map X ∈ RH×W ×C , a lightweight convolution
able kernel basis, and F [i, j, t] ∈ RN and M [i, j] ∈
branch is used to predict the N kernel bases 2
fusion coefficients F ∈ RH×W ×N at each loca- RC×4×K are the t-th kernel fusion coefficients
tion. The lightweight convolution branch contains and the fused kernel weights at the position (i, j)
two layers. The first 3 × 3 grouped convolution respectively. Besides, the input feature map X is
layer reduces the feature map channel to N with transformed by a 1 × 1 convolution to obtain the
group size N . To further transform the features, feature map Xe for adaptive convolution with the
a SimpleGate activation function [13] followed by fused kernel weights M . The output feature map
another 3 × 3 convolution layer is adopted. X 0 at position (i, j) is therefore obtained as
Here, a natural choice is to normalize the
fusion coefficients at each location by the softmax X 0 [i, j] = GroupConv(Xe [i, j], M [i, j]),
function so that the fusion coefficients sum up to 1.
However, we experimentally find that using soft- where X 0 ∈ RH×W ×C is the output feature map
max normalization hinders the final performance and maintains the input spatial resolution.
since it tends to select the most important kernel Discussion. Previous kernel prediction meth-
basis instead of fusing multiple kernel bases for ods [7, 32, 47, 66] also predict pixel-wise kernels for
capturing spatial contexts. convolution. But they adopt a heavy design to pre-
Kernel Basis Fusion. With the predicted fusion dict all kernel weights directly. Specifically, even
coefficient map F ∈ RH×W ×N and kernel bases predicting K × K depthwise kernels requires pro-
2
W ∈ RN ×C×4×K , the fused weights M [i, j] for ducing a (C × K × K) channel feature map, which
is quite costly in terms of both computational cost
Springer Nature 2021 LATEX template
Table 1 Denoising results of color images with Gaussian noise on four testing datasets and three noise levels. The best
results are marked by bold fonts.
Method σ=15 σ=25 σ=50 σ=15 σ=25 σ=50 σ=15 σ=25 σ=50 σ=15 σ=25 σ=50 MACs
IRCNN [76] 33.86 31.16 27.86 34.69 32.18 28.93 34.58 32.18 28.91 33.78 31.20 27.70 -
FFDNet [77] 33.87 31.21 27.96 34.63 32.13 28.98 34.66 32.35 29.18 33.83 31.40 28.05 -
DnCNN [75] 33.90 31.24 27.95 34.60 32.14 28.95 33.45 31.52 28.62 32.98 30.81 27.59 37G
DSNet [50] 33.91 31.28 28.05 34.63 32.16 29.05 34.67 32.40 29.28 - - -
DRUNet 34.30 31.69 28.51 35.31 32.89 29.86 35.40 33.14 30.08 34.81 32.60 29.61 144G
RPCNN [65] - 31.24 28.06 - 32.34 29.25 - 32.33 29.33 - 31.81 28.62 -
BRDNet [60] 34.10 31.43 28.16 34.88 32.41 29.22 35.08 32.75 29.52 34.42 31.99 28.56 -
RNAN [82] - - 28.27 - - 29.58 - - 29.72 - - 29.08 496G
RDN [83] - - 28.31 - - 29.66 - - - - - 29.38 1.4T
IPT [12] - - 28.39 - - 29.64 - - 29.98 - - 29.71 512G
SwinIR [40] 34.42 31.78 28.56 35.34 32.89 29.79 35.61 33.20 30.22 35.13 32.90 29.82 759G
Restormer [69] 34.40 31.79 28.60 35.47 33.04 30.01 35.61 33.34 30.30 35.13 32.96 30.02 141G
KBNets 34.41 31.80 28.62 35.46 33.05 30.04 35.56 33.31 30.27 35.15 32.96 30.04 69G
Table 2 Gaussian denoising results of gray images on three testing datasets and three noise levels. The best result are
marked in bold.
Method σ=15 σ=25 σ=50 σ=15 σ=25 σ=50 σ=15 σ=25 σ=50 MACs
DnCNN [75] 32.67 30.35 27.18 31.62 29.16 26.23 32.28 29.80 26.35 37G
FFDNet [77] 32.75 30.43 27.32 31.63 29.19 26.29 32.40 29.90 26.50 -
IRCNN [76] 32.76 30.37 27.12 31.63 29.15 26.19 32.46 29.80 26.22 -
DRUNet [74] 33.25 30.94 27.90 31.91 29.48 26.59 33.44 31.11 27.96 144G
FOCNet [30] 33.07 30.73 27.68 31.83 29.38 26.50 33.15 30.64 27.40 -
MWCNN [43] 33.15 30.79 27.74 31.86 29.41 26.53 33.17 30.66 27.42 -
NLRN [41] 33.16 30.80 27.64 31.88 29.41 26.47 33.45 30.94 27.49 -
RNAN [82] - - 27.70 - - 26.48 - - 27.65 496G
DeamNet [53] 33.19 30.81 27.74 31.91 29.44 26.54 33.37 30.85 27.53 146G
DAGL [48] 33.28 30.93 27.81 31.93 29.46 26.51 33.79 31.39 27.97 256G
SwinIR [40] 33.36 31.01 27.91 31.97 29.50 26.58 33.70 31.30 27.98 759G
Restormer [69] 33.42 31.08 28.00 31.96 29.52 26.62 33.79 31.46 28.29 141G
KBNets 33.40 31.08 28.04 31.98 29.54 26.65 33.77 31.45 28.33 69G
Full Image Noisy SwinIR [40] Restormer [69] Ours Ground Truth
Fig. 4 Visualization results on Gaussian denoising of color images on Urban100 dataset [29]. KBNet can recover more fine
textures
adaptive aggregation, KBNet can recover more design of in Restormer [69] to perform position-
textures even for some very thin edges. wise non-linear transformation on the SenseNoise
For Gaussian denoising of gray images, as dataset.
shown in Tab. 2, KBNet shows consistent per- SIDD: The SIDD dataset is collected on indoor
formance as the color image denoising. It out- scenes. Five smartphones are used to capture
performs the previous state-of-the-art method scenes at different noise levels. SIDD contains 320
Restormer [69] slightly, but only uses less than half image pairs for training and 1, 280 for validation.
of its MACs. As shown in Tab. 3, KBNet achieves state-of-
the-art results on SIDD dataset. It outperforms
4.3 Raw Image Denoising Results very recent transformer-based methods includ-
ing Restormer [69], Uformer [63], MAXIM [61]
For real-world denoising, we conduct experiments and CNN-based methods NAFNet [13] with fewer
on SIDD dataset [1] and SenseNoise dataset [79] MACs. Fig. 6 shows the performance-efficiency
to evaluate our method on both indoor and out- comparisons of our method. KBNet achieves the
door scenes. Besides the KBNets , we also provide best trade-offs.
a heavier model KBNetl that adopts the FFN
Springer Nature 2021 LATEX template
Method DnCNN MLP FoE BM3D WNNM NLM KSVD EPLL CBDNet
[75] [9] [55] [17] [24] [8] [4] [84] [25]
PSNR ↑ 23.66 24.71 25.58 25.65 25.78 26.76 26.88 27.11 30.78
SSIM ↑ 0.583 0.641 0.792 0.685 0.809 0.699 0.842 0.870 0.754
Method RIDNet VDN MIRNet NBNet Uformer MAXIM Restormer NAFNet KBNets
[6] [68] [70] [15] [63] [61] [69] [13] (Ours)
PSNR ↑ 38.71 39.28 39.72 39.75 39.89 39.96 40.02 40.30 40.35
SSIM ↑ 0.914 0.909 0.959 0.959 0.960 0.960 0.960 0.962 0.972
MACs ↓ 89 - 786 88.8 89.5 169.5 140 65 57.8
Fig. 6 PSNR v.s MACs of different methods on real-world Fig. 7 PSNR v.s MACs of different methods on real-world
image denoising on SIDD dataset [1]. denoising SenseNoie dataset [79].
SenseNoise: SenseNoise dataset contains 500 scale down their channel numbers for fast train-
diverse scenes, where each image is of high reso- ing. The performance and MACs are reported in
lution (e.g. 4000 × 3000). It contains both indoor Tab. 4. Our method not only outperforms all other
and outdoor scenes with high-quality ground methods but also achieves the best performance-
truth. We train the existing methods on the efficiency trade-offs as shown in Fig. 7. Some
SenseNoise dataset [79] under the same training visualizations are shown in Fig. 5. KBNet pro-
setting of NAFNet [13] but use 100k iterations. duces sharper edges and recovers more vivid colors
Since some of the models are too heavy, we than previous methods.
Springer Nature 2021 LATEX template
Method DnCNN RIDNet MIRNet MPRNet Uformer Restormer NAFNet KBNets KBNetl
[75] [6] [70] [71] [63] [69] [13] (Ours) (Ours)
PSNR ↑ 34.06 34.88 35.30 35.43 35.43 35.52 35.55 35.60 35.69
SSIM ↑ 0.904 0.915 0.919 0.922 0.920 0.924 0.923 0.924 0.924
MACs ↓ 37 89 130 120 90 80 65 57.8 104
Table 5 Comparison of defocus deblurring results on DPDD testset [3] containing 37 indoor and 39 outdoor scenes.
Method PSNR ↑ SSIM ↑ MAE ↓ LPIPS ↓ PSNR ↑ SSIM ↑ MAE ↓ LPIPS ↓ MACs ↓
EBDB [33] 25.77 0.772 0.040 0.297 21.25 0.599 0.058 0.373 -
DMENet [34] 25.50 0.788 0.038 0.298 21.43 0.644 0.063 0.397 1173
JNB [57] 26.73 0.828 0.031 0.273 21.10 0.608 0.064 0.355 -
DPDNet [3] 26.54 0.816 0.031 0.239 22.25 0.682 0.056 0.313 991G
KPAC [59] 27.97 0.852 0.026 0.182 22.62 0.701 0.053 0.269 -
IFAN [35] 28.11 0.861 0.026 0.179 22.76 0.720 0.052 0.254 363G
Restormer [69] 28.87 0.882 0.025 0.145 23.24 0.743 0.050 0.209 141G
KBNets 28.42 0.872 0.026 0.159 23.10 0.736 0.050 0.233 69G
KBNetl 28.89 0.883 0.024 0.143 23.32 0.749 0.049 0.205 108G
4.4 Deraining and Defocus results KBNetl produces more than 0.5dB improvement.
Some visualization results can be found in Fig. 9.
To demonstrate the generalization and effective-
Defocus Deblurring. As shown in Tab. 5, we
ness of our KBNet, we follow the state-of-the-
test our model on both indoor and outdoor scenes
art image restoration method Restormer [69] to
for deblurring. KBNets outperforms most previ-
conduct experiments on deraining and defocus
ous methods using only 69G MACs. KBNetl out-
deblurring. The channels of our MFF module are
performs previous state-of-the-art Restormer [69]
adjusted to make sure our model uses fewer MACs
while having 24% fewer MACs. Some visualization
than Restormer [69]. The training settings are
results are shown in Fig. 9.
kept the same as the Restormer [69].
Deraining. The largest two datasets (Test2800
and Test1200) are used for testing deraining per- 4.5 Ablation Studies
formance. Results in Tab. 6 indicate that our We conduct extensive ablation studies to validate
KBNet has a good generalization on deraining. the effectiveness of components of our method and
KBNetl outperforms Restormer [69] using only compare it with existing methods. All ablation
76% of its MACs. On the Test1200 dataset, studies are conducted on Gaussian denoising with
Springer Nature 2021 LATEX template
11
plain neural networks compete with BM3D? [19] Michael Elad and Michal Aharon. Image
In CVPR, 2012. denoising via sparse and redundant repre-
sentations over learned dictionaries. IEEE
[10] Jianrui Cai, Shuhang Gu, Radu Timofte, and Trans. Image Process., 15(12):3736–3745,
Lei Zhang. NTIRE 2019 challenge on real 2006.
image super-resolution: Methods and results.
In CVPR Workshops, 2019. [20] Rich Franzen. Kodak lossless true color image
suite. http://r0k.us/graphics/kodak/, 1999.
[11] Meng Chang, Qi Li, Huajun Feng, and Zhihai Online accessed 24 Oct 2021.
Xu. Spatial-adaptive network for single image
denoising. In ECCV, 2020. [21] Xueyang Fu, Jiabin Huang, Xinghao Ding,
Yinghao Liao, and John Paisley. Clearing
[12] Hanting Chen, Yunhe Wang, Tianyu Guo, the skies: A deep network architecture for
Chang Xu, Yiping Deng, Zhenhua Liu, Siwei single-image rain removal. TIP, 2017.
Ma, Chunjing Xu, Chao Xu, and Wen Gao.
Pre-trained image processing transformer. In [22] Xueyang Fu, Jiabin Huang, Delu Zeng, Yue
CVPR, 2021. Huang, Xinghao Ding, and John Paisley.
Removing rain from single images via a deep
[13] Liangyu Chen, Xiaojie Chu, Xiangyu detail network. In CVPR, 2017.
Zhang, and Jian Sun. Simple baselines
for image restoration. arXiv preprint [23] Shuhang Gu, Yawei Li, Luc Van Gool, and
arXiv:2204.04676, 2022. Radu Timofte. Self-guided network for fast
image denoising. In ICCV, 2019.
[14] Yinpeng Chen, Xiyang Dai, Mengchen Liu,
Dongdong Chen, Lu Yuan, and Zicheng Liu. [24] Shuhang Gu, Lei Zhang, Wangmeng Zuo, and
Dynamic convolution: Attention over convo- Xiangchu Feng. Weighted nuclear norm mini-
lution kernels. In CVPR, 2020. mization with application to image denoising.
In CVPR, 2014.
[15] Shen Cheng, Yuzhi Wang, Haibin Huang,
Donghao Liu, Haoqiang Fan, and Shuaicheng [25] Shi Guo, Zifei Yan, Kai Zhang, Wangmeng
Liu. NBNet: noise basis learning for image Zuo, and Lei Zhang. Toward convolutional
denoising with subspace projection. In blind denoising of real photographs. In
CVPR, 2021. CVPR, 2019.
[16] Sung-Jin Cho, Seo-Won Ji, Jun-Pyo Hong, [26] Qi Han, Zejia Fan, Qi Dai, Lei Sun, Ming-
Seung-Won Jung, and Sung-Jea Ko. Rethink- Ming Cheng, Jiaying Liu, and Jingdong
ing coarse-to-fine approach in single image Wang. On the connection between local
deblurring. In ICCV, 2021. attention and dynamic depth-wise convolu-
tion. In International Conference on Learn-
[17] Kostadin Dabov, Alessandro Foi, Vladimir ing Representations, 2022.
Katkovnik, and Karen Egiazarian. Image
denoising by sparse 3-D transform-domain [27] Yizeng Han, Gao Huang, Shiji Song, Le Yang,
collaborative filtering. TIP, 2007. Honghui Wang, and Yulin Wang. Dynamic
neural networks: A survey. IEEE Trans. Pat-
[18] Xiaohan Ding, Xiangyu Zhang, Yizhuang tern Anal. Mach. Intell., 44(11):7436–7456,
Zhou, Jungong Han, Guiguang Ding, and 2022.
Jian Sun. Scaling up your kernels to 31x31:
Revisiting large kernel design in cnns. arXiv [28] Jie Hu, Li Shen, Samuel Albanie, Gang Sun,
preprint arXiv:2203.06717, 2022. and Enhua Wu. Squeeze-and-excitation net-
works. IEEE TPAMI, 2019.
Springer Nature 2021 LATEX template
13
[29] Jia-Bin Huang, Abhishek Singh, and Naren- [39] Xia Li, Jianlong Wu, Zhouchen Lin, Hong
dra Ahuja. Single image super-resolution Liu, and Hongbin Zha. Recurrent squeeze-
from transformed self-exemplars. In CVPR, and-excitation context aggregation net for
2015. single image deraining. In ECCV, 2018.
[30] Xixi Jia, Sanyang Liu, Xiangchu Feng, and [40] Jingyun Liang, Jiezhang Cao, Guolei Sun,
Lei Zhang. Focnet: A fractional optimal con- Kai Zhang, Luc Van Gool, and Radu Tim-
trol network for image denoising. In CVPR, ofte. SwinIR: Image restoration using swin
2019. transformer. In ICCV Workshops, 2021.
[31] Kui Jiang, Zhongyuan Wang, Peng Yi, Bao- [41] Ding Liu, Bihan Wen, Yuchen Fan,
jin Huang, Yimin Luo, Jiayi Ma, and Junjun Chen Change Loy, and Thomas S Huang.
Jiang. Multi-scale progressive fusion network Non-local recurrent network for image
for single image deraining. In CVPR, 2020. restoration. In NeurIPS, 2018.
[32] Yifan Jiang, Bart Wronski, Ben Mildenhall, [42] Ding Liu, Bihan Wen, Xianming Liu,
Jonathan T. Barron, Zhangyang Wang, and Zhangyang Wang, and Thomas S. Huang.
Tianfan Xue. Fast and high-quality image When image denoising meets high-level vision
denoising via malleable convolutions. CoRR, tasks: A deep learning approach. In IJCAI,
abs/2201.00392, 2022. pages 842–848. ijcai.org, 2018.
[33] Ali Karaali and Claudio Rosito Jung. Edge- [43] Pengju Liu, Hongzhi Zhang, Kai Zhang,
based defocus blur estimation with adaptive Liang Lin, and Wangmeng Zuo. Multi-level
scale selection. TIP, 2017. wavelet-cnn for image restoration. In CVPR
Workshops, 2018.
[34] Junyong Lee, Sungkil Lee, Sunghyun Cho,
and Seungyong Lee. Deep defocus map esti- [44] Zhuang Liu, Hanzi Mao, Chao-Yuan Wu,
mation using domain adaptation. In CVPR, Christoph Feichtenhofer, Trevor Darrell, and
2019. Saining Xie. A convnet for the 2020s. Pro-
ceedings of the IEEE/CVF Conference on
[35] Junyong Lee, Hyeongseok Son, Jaesung Rim, Computer Vision and Pattern Recognition
Sunghyun Cho, and Seungyong Lee. Itera- (CVPR), 2022.
tive filter adaptive network for single image
defocus deblurring. In CVPR, 2021. [45] Julien Mairal, Michael Elad, and Guillermo
Sapiro. Sparse representation for color image
[36] Dasong Li, Xiaoyu Shi, Yi Zhang, Xiao- restoration. TIP, 2007.
gang Wang, Hongwei Qin, and Hongsheng
Li. No attention is needed: Grouped spatial- [46] David Martin, Charless Fowlkes, Doron Tal,
temporal shift for simple and efficient video and Jitendra Malik. A database of human
restorers. CoRR, abs/2206.10810, 2022. segmented natural images and its applica-
tion to evaluating segmentation algorithms
[37] Dasong Li, Yi Zhang, Ka Chun Cheung, and measuring ecological statistics. In ICCV,
Xiaogang Wang, Hongwei Qin, and Hong- 2001.
sheng Li. Learning degradation represen-
tations for image deblurring. In ECCV, [47] Ben Mildenhall, Jonathan T Barron, Jiawen
2022. Chen, Dillon Sharlet, Ren Ng, and Robert
Carroll. Burst denoising with kernel pre-
[38] Dasong Li, Yi Zhang, Ka Lung Law, Xiao- diction networks. In IEEE Conference on
gang Wang, Hongwei Qin, and Hongsheng Li. Computer Vision and Pattern Recognition
Efficient burst raw denoising with variance (CVPR), 2018.
stabilization and multi-frequency denoising
network. IJCV., 2022.
Springer Nature 2021 LATEX template
[48] Chong Mou, Jian Zhang, and Zhuoyuan Wu. [59] Hyeongseok Son, Junyong Lee, Sunghyun
Dynamic attentive graph learning for image Cho, and Seungyong Lee. Single image defo-
restoration. In ICCV, 2021. cus deblurring using kernel-sharing parallel
atrous convolutions. In ICCV, 2021.
[49] Seungjun Nah, Sanghyun Son, Suyoung Lee,
Radu Timofte, and Kyoung Mu Lee. Ntire [60] Chunwei Tian, Yong Xu, and Wangmeng
2021 challenge on image deblurring. In CVPR Zuo. Image denoising using deep cnn with
Workshops, 2021. batch renormalization. Neural Networks,
2020.
[50] Yali Peng, Lu Zhang, Shigang Liu, Xiaojun
Wu, Yu Zhang, and Xili Wang. Dilated resid- [61] Zhengzhong Tu, Hossein Talebi, Han Zhang,
ual networks with symmetric skip connection Feng Yang, Peyman Milanfar, Alan Bovik,
for image denoising. Neurocomputing, 2019. and Yinxiao Li. Maxim: Multi-axis mlp for
image processing. CVPR, 2022.
[51] Tobias Plötz and Stefan Roth. Neural nearest
neighbors networks. In NeurIPS, 2018. [62] Jiaqi Wang, Kai Chen, Rui Xu, Ziwei Liu,
Chen Change Loy, and Dahua Lin. Carafe:
[52] Kuldeep Purohit, Maitreya Suin, Content-aware reassembly of features. In The
AN Rajagopalan, and Vishnu Naresh Bod- IEEE International Conference on Computer
deti. Spatially-adaptive image restoration Vision (ICCV), October 2019.
using distortion-guided networks. In ICCV,
2021. [63] Zhendong Wang, Xiaodong Cun, Jianmin
Bao, and Jianzhuang Liu. Uformer: A general
[53] Chao Ren, Xiaohai He, Chuncheng Wang, u-shaped transformer for image restoration.
and Zhibo Zhao. Adaptive consistency prior arXiv:2106.03106, 2021.
based deep network for image denoising. In
CVPR, 2021. [64] Wei Wei, Deyu Meng, Qian Zhao, Zongben
Xu, and Ying Wu. Semi-supervised transfer
[54] Dongwei Ren, Wangmeng Zuo, Qinghua Hu, learning for image rain removal. In CVPR,
Pengfei Zhu, and Deyu Meng. Progres- 2019.
sive image deraining networks: A better and
simpler baseline. In CVPR, 2019. [65] Zhihao Xia and Ayan Chakrabarti. Iden-
tifying recurring patterns with deep neural
[55] Stefan Roth and Michael J Black. Fields of networks for natural image denoising. In
experts. IJCV, 2009. WACV, 2020.
[56] Leonid I. Rudin, Stanley Osher, and Emad [66] Zhihao Xia, Federico Perazzi, Michaël
Fatemi. Nonlinear total variation based noise Gharbi, Kalyan Sunkavalli, and Ayan
removal algorithms. Physica D: Nonlinear Chakrabarti. Basis prediction networks for
Phenomena, 60(1):259–268, 1992. effective burst denoising with large kernels.
In Proceedings of the IEEE/CVF Con-
[57] Jianping Shi, Li Xu, and Jiaya Jia. Just ference on Computer Vision and Pattern
noticeable defocus blur detection and estima- Recognition, pages 11844–11853, 2020.
tion. In CVPR, 2015.
[67] Rajeev Yasarla and Vishal M Patel. Uncer-
[58] Eero P. Simoncelli and Edward H. Adelson. tainty guided multi-scale residual learning-
Noise removal via bayesian wavelet coring. using a cycle spinning cnn for single image
In ICIP (1), pages 379–382. IEEE Computer de-raining. In CVPR, 2019.
Society, 1996.
[68] Zongsheng Yue, Hongwei Yong, Qian Zhao,
Deyu Meng, and Lei Zhang. Variational
Springer Nature 2021 LATEX template
15
denoising network: Toward blind noise mod- [78] Lei Zhang, Xiaolin Wu, Antoni Buades, and
eling and removal. In NeurIPS, 2019. Xin Li. Color demosaicking by local direc-
tional interpolation and nonlocal adaptive
[69] Syed Waqas Zamir, Aditya Arora, Salman thresholding. JEI, 2011.
Khan, Munawar Hayat, Fahad Shahbaz
Khan, and Ming-Hsuan Yang. Restormer: [79] Yi Zhang, Dasong Li, Ka Lung Law, Xiao-
Efficient transformer for high-resolution gang Wang, Hongwei Qin, and Hongsheng Li.
image restoration. In CVPR, 2022. Idr: Self-supervised image denoising via iter-
ative data refinement. In Proceedings of the
[70] Syed Waqas Zamir, Aditya Arora, Salman IEEE/CVF Conference on Computer Vision
Khan, Munawar Hayat, Fahad Shahbaz and Pattern Recognition (ICCV), 2022.
Khan, Ming-Hsuan Yang, and Ling Shao.
Learning enriched features for real image [80] Yi Zhang, Hongwei Qin, Xiaogang Wang, and
restoration and enhancement. In ECCV, Hongsheng Li. Rethinking noise synthesis
2020. and modeling in raw denoising. In Proceed-
ings of the IEEE/CVF International Confer-
[71] Syed Waqas Zamir, Aditya Arora, Salman ence on Computer Vision (ICCV), October
Khan, Munawar Hayat, Fahad Shahbaz 2021.
Khan, Ming-Hsuan Yang, and Ling Shao.
Multi-stage progressive image restoration. In [81] Yulun Zhang, Kunpeng Li, Kai Li, Lichen
CVPR, 2021. Wang, Bineng Zhong, and Yun Fu. Image
super-resolution using very deep residual
[72] Syed Waqas Zamir, Aditya Arora, Salman channel attention networks. In ECCV, 2018.
Khan, Munawar Hayat, Fahad Shahbaz
Khan, Ming-Hsuan Yang, and Ling Shao. [82] Yulun Zhang, Kunpeng Li, Kai Li, Bineng
Multi-stage progressive image restoration. In Zhong, and Yun Fu. Residual non-local atten-
CVPR, 2021. tion networks for image restoration. In ICLR,
2019.
[73] He Zhang and Vishal M Patel. Density-aware
single image de-raining using a multi-stream [83] Yulun Zhang, Yapeng Tian, Yu Kong, Bineng
dense network. In CVPR, 2018. Zhong, and Yun Fu. Residual dense network
for image restoration. TPAMI, 2020.
[74] Kai Zhang, Yawei Li, Wangmeng Zuo, Lei
Zhang, Luc Van Gool, and Radu Timofte. [84] Daniel Zoran and Yair Weiss. From learn-
Plug-and-play image restoration with deep ing models of natural image patches to whole
denoiser prior. TPAMI, 2021. image restoration. In ICCV, 2011.