0% found this document useful (0 votes)
78 views

KBNET

This document describes a new method called KBNet for image restoration tasks like denoising, deraining, and deblurring. It introduces a kernel basis attention module that uses learnable kernel bases to model local image patterns and adaptively aggregate spatial information. It also proposes a multi-axis feature fusion block to extract and fuse channel-wise, spatial-invariant, and pixel-adaptive features. Experimental results show KBNet achieves state-of-the-art performance on over 10 benchmarks for image restoration while requiring less computation than previous methods.

Uploaded by

zwu1913
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views

KBNET

This document describes a new method called KBNet for image restoration tasks like denoising, deraining, and deblurring. It introduces a kernel basis attention module that uses learnable kernel bases to model local image patterns and adaptively aggregate spatial information. It also proposes a multi-axis feature fusion block to extract and fuse channel-wise, spatial-invariant, and pixel-adaptive features. Experimental results show KBNet achieves state-of-the-art performance on over 10 benchmarks for image restoration while requiring less computation than previous methods.

Uploaded by

zwu1913
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Springer Nature 2021 LATEX template

KBNet: Kernel Basis Network for Image Restoration


Yi Zhang1 , Dasong Li1 , Xiaoyu Shi1 , Dailan He1 , Kangning Song2 , Xiaogang
arXiv:2303.02881v1 [cs.CV] 6 Mar 2023

Wang1 , Hongwei Qin2 and Hongsheng Li1


1 Multimedia Laboratory, The Chinese University of Hong Kong.
2 SenseTime Research.

Contributing authors: [email protected]; [email protected];


[email protected]; [email protected]; [email protected];
[email protected]; [email protected]; [email protected];

Abstract
How to aggregate spatial information plays an essential role in learning-based image restoration.
Most existing CNN-based networks adopt static convolutional kernels to encode spatial informa-
tion, which cannot aggregate spatial information adaptively. Recent transformer-based architectures
achieve adaptive spatial aggregation. But they lack desirable inductive biases of convolutions and
require heavy computational costs. In this paper, we propose a kernel basis attention (KBA) module,
which introduces learnable kernel bases to model representative image patterns for spatial informa-
tion aggregation. Different kernel bases are trained to model different local structures. At each spatial
location, they are linearly and adaptively fused by predicted pixel-wise coefficients to obtain aggrega-
tion weights. Based on the KBA module, we further design a multi-axis feature fusion (MFF) block to
encode and fuse channel-wise, spatial-invariant, and pixel-adaptive features for image restoration. Our
model, named kernel basis network (KBNet), achieves state-of-the-art performances on more than ten
benchmarks over image denoising, deraining, and deblurring tasks while requiring less computational
cost than previous SOTA methods. Code will be released at https://github.com/zhangyi-3/kbnet.

Keywords: Image Restoration, Dynamic Kernel, Feature Fusion

1 Introduction a key role in deep neural networks for feature


encoding in image restoration. Convolutional neu-
Image restoration is one of the most foundational ral networks (CNNs) are one of the dominant
tasks in computer vision, which aims to remove choices for local information aggregation. They
the unavoidable degradation of the input images use globally-shared convolution kernels in the
and produce clean outputs. Image restoration is convolution operator for aggregating neighboring
highly challenging as it is an ill-posed problem. It information, such as using dilated convolutions [6,
does not only play an important role in a wide 11] to increase the receptive fields and adopting
range of low-level applications (e.g. night sight multi-stage [71] or multi-scale features [23, 70] for
on smartphones) but also benefits many high-level better encoding spatial context. While CNN-based
vision tasks [42]. methods show clear performance gains than tra-
Linearly aggregating spatial neighborhood ditional handcrafted methods [5, 8, 17, 58], the
information is a common component and plays convolutions utilized static and spatially invariant

1
Springer Nature 2021 LATEX template

kernels for all spatial locations and therefore have fused by the coefficients to produce adaptive and
limited capacity to handle different types of image diverse aggregation weights for each pixel.
structures and textures. While a few works [7, 47, Unlike the window-based self-attention in pre-
66] in the low-level task of burst image denois- vious works [40, 61, 63] that uses dot products
ing were proposed to adaptively encode local between pairwise positions to generate spatial
neighborhoods features, they require heavy com- aggregation weights, the KBA module generates
putational costs to predict adaptive convolutional the weights by adaptively combining the learn-
kernels of each output pixel location. able kernel bases. It naturally keeps the inductive
Recently, the vision transformers have shown biases of convolutions while handling various spa-
great progress where the attention mechanism tial contexts adaptively. KBA module is also dif-
uses dot products between pairwise positions to ferent from existing dynamic convolutions [27, 32]
obtain the linear aggregation weights for encod- or kernel prediction networks [7, 47, 62, 66], which
ing local information. A few efforts [12, 40, 61, predict all the kernel weights directly. The aggre-
63] have been made in image restoration. In gation weights by the KBA module are predicted
IPT [12], the global attention layers are adopted to by spatially adaptive fusing the shared kernel
aggregate from all spatial positions, which, how- bases. Thus, our KBA module is more lightweight
ever, face challenges on handling high-resolution and easier to optimize. The ablation study vali-
images. A series of window-based self-attention dates the proposed design choice.
solutions [40, 61, 63] have been proposed to allevi- Based on the KBA module, we design a multi-
ate the quadratic computational cost with respect axis feature fusion (MFF) block to extract and
to the input size. While self-attention enables each fuse diverse features for image restoration. We
pixel to adaptively aggerate spatial information combine operators of spatial and channel dimen-
from the assigned window, it lacks the induc- sions to better capture image context. In the
tive biases possessed by convolutions (e.g. local- MFF block, three branches, including channel
ity, translation equivalence, etc.), which is useful attention, spatial-invariant, and pixel-wise adap-
in modeling local structures of images. Some tive feature extractions are parallelly performed
work [18, 26, 44] also reported that using convolu- and then fused by point-wise production.
tions to aggregate spatial information can produce By integrating the proposed KBA module and
more satisfying results than self-attention. MFF block into the widely used U-Net architec-
In this paper, to tackle the challenge of effec- tures with two variants of feed-forward networks,
tively and efficiently aggregating neighborhood our proposed KBNet achieves state-of-the-art per-
information for image restoration, we propose formances on more than ten benchmarks of image
a kernel basis network (KBNet) with a novel denoising, deraining, and deblurring.
kernel basis attention (KBA) module to adap- The main contributions of this work are sum-
tively aggregate spatial neighborhood informa- marized as follows:
tion, which takes advantage of both CNNs and
1. We propose a novel kernel basis attention
transformers. Since natural images generally share
(KBA) module to effectively aggregate the spa-
similar patterns across different spatial locations,
tial information via a series of learnable kernel
we first introduce a set of learnable kernel bases to
bases and linearly fusing the kernel bases.
learn various local patterns. The kernel bases spec-
2. We propose an effective multi-axis feature
ify how the neighborhood information should be
fusion (MFF) block, which extracts and fuses
aggregated for each pixel, and different learnable
channel-wise, spatial-invariant, and pixel-wise
kernel bases are trained to capture various image
adaptive features for image restoration.
patterns. Intuitively, similar local neighborhoods
3. Our method kernel basis network (KBNet)
would use similar kernel bases so that the bases
demonstrates its generalizability and state-of-
are learned to capture more representative spatial
the-art performances on three image restora-
patterns. Given the kernel bases, we adopt a sepa-
tion tasks including denoising, deblurring, and
rate lightweight convolution branch to predict the
deraining.
linear combination coefficients of kernel bases for
each pixel. The learnable kernel bases are linearly
Springer Nature 2021 LATEX template

2 Related Work 2.3 Transformers for Image


Restoration
Deep image restoration has been a popular topic
in the field of computer vision and has been exten- Transformers have shown great progress in natu-
sively studied for many years. In this section, we ral language and high-level vision tasks. For image
review some of the most relevant works. restoration, IPT [12] is the first method to adopt
a transformer (both encoder and decoder) into
2.1 Traditional Methods image restoration. But, it leads to heavy com-
putational costs and can only perform on fixed
Since image restoration is a highly ill-posed prob- patch size 48 × 48. Most of the following works
lem, many priors or noise models [80] are adopted only utilize the encoder and tend to reduce the
to help image restoration. Many properties of computational cost. A seris of window-based self-
natural images have been discussed in tradi- attention [40, 61, 63] has been proposed. Each
tional methods, like sparsity [19, 45], non-local pixel can aggregate the aggregate spatial infor-
self-similarity [8, 17], total variation [56]. Self- mation through the dot-productions between all
similarity is one of the natural image properties pairwise positions. More recently, some work [13,
used in image restoration. Traditional methods 18, 26, 44] indicate that self-attention is not
like non-local means [8], BM3D [17] leverage the necessary to achieve state-of-the-art results.
self-similarity to denoise images by averaging the
intensities of similar patches in the image. While
traditional methods have achieved good perfor- 3 Method
mance, they tend to produce blurry results and In this section, we aim to develop a novel ker-
fail in more challenging cases. nel basis network (KBNet) for image restoration.
We first describe the kernel basis attention (KBA)
2.2 CNNs for Image Restoration module to adaptively aggregate the spatial infor-
Static networks. With the popularity of deep mation. Then, the multi-axis feature fusion (MFF)
neural networks, learning-based methods become block is introduced to encode and fuse diverse fea-
the mainstream of image restoration [2, 10, 13, tures for image restoration. Finally, we describe
16, 36, 37, 49, 70, 71, 83]. One of the earli- the integration of the MFF block into the U-Net.
est works on deep image denoising using CNNs
is the DnCNN [75]. Then, lots of CNN-based 3.1 Kernel Basis Attention Module
models have been proposed from different design How to gather spatial information at each pixel
perspectives: residual learning [6, 81], multi-scale plays an essential role in feature encoding for
features [70], multi-stage design [38, 71], non- low-level vision tasks. Most CNN-based meth-
local information [51, 82]. While they improve ods [6, 15, 75] utilize spatial-invariant kernels to
the learning capacity of CNN models, they use encode spatial information of the input image,
the normal convolutions as the basic components, which cannot adaptively process local spatial con-
which are static and spatially invariant. As a text for each pixel. While self-attention [40, 61, 63]
result, plain and textural areas cannot be identi- can process spatial context adaptively according
fied and processed adaptively, which is crucial for to the attention weights from the dot products
image restoration. between pairwise positions, it lacks the inherent
Dynamic networks for Image Restora- inductive biases of convolutions. To tackle the
tion. Another line of research has focused on challenges, we propose the kernel basis attention
leveraging dynamic networks to image restoration (KBA) module to encode spatial information by
tasks. Kernel prediction networks [7, 32, 47, 66] fusing learnable kernel bases adaptively.
(KPNs) use the kernel-based attention mechanism As shown in Fig. 1, given an input feature
in a CNN architecture to aggregate spatial infor- map X ∈ RH×W ×C , our KBA learns a set of
mation adaptively. But, KPNs predict the kernels learnable kernel bases W shared across all spa-
directly which have significant memory and com- tial locations and all images to capture common
puting requirements, as well as their difficulty to spatial patterns. The learnable kernel bases W ∈
optimize.
Springer Nature 2021 LATEX template

Fusion Coefficient Map Learnable Kernel Bases

convolution
" #
(#, %)
Linear Fusion

"[&, (] Fused kernel weights

Fusion Coefficients $[&, (]

! !! !′
(#, %)
(#, %)

!! [#, %]
Input Enhanced Feature Map Output

Kernel Basis Attention Module


Fig. 1 An overview of kernel basis attention (KBA) Module. With the input feature map X, the KBA module first predicts
the fusion coefficient map F to linearly fuse the learnable kernel bases W for each location. Then, the fused kernel weights
M adaptively encode the local neighborhood of the enhanced feature map Xe to produce the output feature map X 0 .

2
RN ×C×4×K contains N grouped convolution ker- the spatial position (i, j) can be obtained by the
nels. The C and K 2 denote channel number and linear combination of learnable kernel bases:
kernel size respectively. The group size is set to C4
N
for balancing the performance-efficiency tradeoff. X
Fusion Coefficients Prediction. To adap- M [i, j] = F [i, j, t]W [t],
t=0
tively combine the N learnable kernel bases at
each spatial location, given the input feature 2
where W [t] ∈ RC×4×K denotes the t-th learn-
map X ∈ RH×W ×C , a lightweight convolution
able kernel basis, and F [i, j, t] ∈ RN and M [i, j] ∈
branch is used to predict the N kernel bases 2

fusion coefficients F ∈ RH×W ×N at each loca- RC×4×K are the t-th kernel fusion coefficients
tion. The lightweight convolution branch contains and the fused kernel weights at the position (i, j)
two layers. The first 3 × 3 grouped convolution respectively. Besides, the input feature map X is
layer reduces the feature map channel to N with transformed by a 1 × 1 convolution to obtain the
group size N . To further transform the features, feature map Xe for adaptive convolution with the
a SimpleGate activation function [13] followed by fused kernel weights M . The output feature map
another 3 × 3 convolution layer is adopted. X 0 at position (i, j) is therefore obtained as
Here, a natural choice is to normalize the
fusion coefficients at each location by the softmax X 0 [i, j] = GroupConv(Xe [i, j], M [i, j]),
function so that the fusion coefficients sum up to 1.
However, we experimentally find that using soft- where X 0 ∈ RH×W ×C is the output feature map
max normalization hinders the final performance and maintains the input spatial resolution.
since it tends to select the most important kernel Discussion. Previous kernel prediction meth-
basis instead of fusing multiple kernel bases for ods [7, 32, 47, 66] also predict pixel-wise kernels for
capturing spatial contexts. convolution. But they adopt a heavy design to pre-
Kernel Basis Fusion. With the predicted fusion dict all kernel weights directly. Specifically, even
coefficient map F ∈ RH×W ×N and kernel bases predicting K × K depthwise kernels requires pro-
2
W ∈ RN ×C×4×K , the fused weights M [i, j] for ducing a (C × K × K) channel feature map, which
is quite costly in terms of both computational cost
Springer Nature 2021 LATEX template

Point-wise X parallel. XThe first operator is a 3×3 depthwise con-


multiplication volution to capture spatially-invariant features.
LayerNorm The Layernorm
second operator is channel attention [28] to
modulate the feature channels. The third one is
our KBA module to adaptively handle spatial fea-
Channel Attention DWconv KBA tures.
1x1 conv The three branches output feature maps of
1x1 conv
the same size. Point-wise multiplication is used to
·
fuse the diverse feature from the three branches
directly, which also serves as the non-linear acti-
1x1 conv vation1x1[13]
convof the MFF block.

3.3 Intergration of MFF Block into


U-Net
Multi-axis Feature Fusion (MFF) Block (b) Feed-Forward
We adopt the Network
widely used U-shape architectures.
Fig. 2 An overview of Multi-axis Feature Fusion (MFF) The U-shape architecture processes the input
Block. Channel attention, depthwise convolution, and our noisy image I ∈ RH×W ×3 to generate a clean
KBA module process the input features parallelly. The
outputs of three operations are fused by point-wise multi-
image of the same resolution. The first convolu-
plication. tion layer transforms the input image into the
feature map F ∈ RH×W ×C . Then, the feature
and memory. In contrast, Our method trains a map F is processed by an encoder-decoder archi-
series of learnable kernel bases, which only needs tecture, each of which has four stages. At each
to predict an N -channel fusing coefficient map encoder stage, the input resolution is downsam-
(where N  C × K 2 ). Such a design avoids pre- pled by a convolution layer with stride 2 and the
dicting a large number of kernel weights for each channels are expanded by a factor of 2. Within
pixel and the representative kernel bases shared each stage, the MFF blocks are stacked in both
across all locations are still efficiently optimized. the encoder and decoder. In each decoder stage,
Compared with window-based self-attention [61, the feature map resolution and the channels are
63], our spatial aggregation weights are linearly reversed by the pixel-shuffle operation. Besides,
combined from the shared kernel bases instead the shortcuts from the encoder are passed to the
of being produced individually through dot prod- decoder layer of the same stages and fused sim-
ucts between pairwise positions. The KBA module ply by an addition operation. The last convolution
adopts a set of learnable convolution kernels for transforms the feature map to the same shape
modeling different local structures and fuses those as the input. Inspired by the design of trans-
kernel weights adaptively for each location. Thus, former blocks, the MFF block is also followed by
it benefits from the inductive bias of convolu- a Feed Forward Network (FFN) at each stage. We
tions while achieving spatially-adaptive processing present two variants of our method KBNets and
effectively. KBNetl . KBNets adopts the normal FFN block
with SimpleGate [13] activation function to per-
form the position-wise non-linear transformation.
3.2 Multi-axis Feature Fusion Block KBNetl utilizes a heavier transposed attention in
To encode diverse features for image restoration, Restormer [69] to achieve more complex position-
based on the KBA module, we design a Multi-axis wise transformation. The computational costs of
Features Fusion (MFF) block to handle channel KBNets and KBNetl are around 70G and 108G
and spatial information. As shown in Fig. 2, MFF MACs separately. Detailed network architectures
block first adopts a layer normalization to stabilize can be found in Sec. 4.1.
the training and then performs spatial informa-
tion aggregation. A residual shortcut is used to 4 Results
facilitate training convergence. Following the nor-
malization layer, three operators are adopted in In this section, we first describe the implemen-
tation details of our methods. Then, we evaluate
Springer Nature 2021 LATEX template

Table 1 Denoising results of color images with Gaussian noise on four testing datasets and three noise levels. The best
results are marked by bold fonts.

CBSD68 [46] Kodak24 [20] McMaster [78] Urban100 [29]

Method σ=15 σ=25 σ=50 σ=15 σ=25 σ=50 σ=15 σ=25 σ=50 σ=15 σ=25 σ=50 MACs

IRCNN [76] 33.86 31.16 27.86 34.69 32.18 28.93 34.58 32.18 28.91 33.78 31.20 27.70 -
FFDNet [77] 33.87 31.21 27.96 34.63 32.13 28.98 34.66 32.35 29.18 33.83 31.40 28.05 -
DnCNN [75] 33.90 31.24 27.95 34.60 32.14 28.95 33.45 31.52 28.62 32.98 30.81 27.59 37G
DSNet [50] 33.91 31.28 28.05 34.63 32.16 29.05 34.67 32.40 29.28 - - -
DRUNet 34.30 31.69 28.51 35.31 32.89 29.86 35.40 33.14 30.08 34.81 32.60 29.61 144G
RPCNN [65] - 31.24 28.06 - 32.34 29.25 - 32.33 29.33 - 31.81 28.62 -
BRDNet [60] 34.10 31.43 28.16 34.88 32.41 29.22 35.08 32.75 29.52 34.42 31.99 28.56 -
RNAN [82] - - 28.27 - - 29.58 - - 29.72 - - 29.08 496G
RDN [83] - - 28.31 - - 29.66 - - - - - 29.38 1.4T
IPT [12] - - 28.39 - - 29.64 - - 29.98 - - 29.71 512G
SwinIR [40] 34.42 31.78 28.56 35.34 32.89 29.79 35.61 33.20 30.22 35.13 32.90 29.82 759G
Restormer [69] 34.40 31.79 28.60 35.47 33.04 30.01 35.61 33.34 30.30 35.13 32.96 30.02 141G
KBNets 34.41 31.80 28.62 35.46 33.05 30.04 35.56 33.31 30.27 35.15 32.96 30.04 69G

The patch size is 256 × 256, batch size is 32, and


learning rate is 10−3 following the training settings
of NAFNet [13].

4.2 Gaussian Denoising Results


Color and gray image denoising for Gaussian noise
is widely used as benchmarks for evaluating the
denoising methods. We follow the previous meth-
ods [69] to show color image denoising results on
CBSD68 [46], Kodak24 [20], McMaster [78], and
Urban100 [29] datasets. For gray image denoising,
Fig. 3 PSNR v.s MACs of different methods on Gaus- we use Set12 [75], BSD68 [46], and Urban100 [29]
sian denoising of color images. PSNRs are tested on Urban
as the testing datasets. We train our Gaussian
dataset with noise level σ = 50.
denoising models KBNets on the ImageNet valida-
tion dataset over 3 noise levels (σ ∈ {15, 25, 50})
our KBNet on popular benchmarks over syn-
for both color and gray images.
thetic denoising, real-world denoising, deraining
The results of Gaussian denoising on color
and deblurring tasks. Finally, we conduct ablation
images are shown in Tab. 1. Our method out-
studies on Gaussian denoising to validate impor-
performs the previous state-of-the-art Restormer,
tant designs of our method and compare KBNet
but only requires half of its computational cost.
with existing methods.
It is also worth noticing that we do not use pro-
gressive patch sampling to improve the training
4.1 Implementation Details performance as Restormer [69]. Compared with
The default setting of our method is introduced as the window-based transformer SwinIR [40], we
follows unless otherwise specified. The block num- train KBNet for 300k iterations, which is much
bers for each stage in the encoder and decoder of fewer than 1.6M iterations used in SwinIR [40].
our KBNet are {2, 2, 2, 2} and {4, 2, 2, 2}, respec- The performance-efficiency comparisons with pre-
tively. For the KBA module, The number of kernel vious methods are shown in Fig. 3. Our KBNet
basis is set to 32 by default. For each kernel base, achieves state-of-the-art results while requiring
we use a grouped convolution kernel with kernel half of the computational cost. Some visual results
size 3 × 3 and 4 channels for each group. We train can be found in Fig. 4. Thanks to the pixel-wise
our model for 300k iterations for each noise level.
Springer Nature 2021 LATEX template

Table 2 Gaussian denoising results of gray images on three testing datasets and three noise levels. The best result are
marked in bold.

Set12 [75] BSD68 [46] Urban100 [29]

Method σ=15 σ=25 σ=50 σ=15 σ=25 σ=50 σ=15 σ=25 σ=50 MACs

DnCNN [75] 32.67 30.35 27.18 31.62 29.16 26.23 32.28 29.80 26.35 37G
FFDNet [77] 32.75 30.43 27.32 31.63 29.19 26.29 32.40 29.90 26.50 -
IRCNN [76] 32.76 30.37 27.12 31.63 29.15 26.19 32.46 29.80 26.22 -
DRUNet [74] 33.25 30.94 27.90 31.91 29.48 26.59 33.44 31.11 27.96 144G
FOCNet [30] 33.07 30.73 27.68 31.83 29.38 26.50 33.15 30.64 27.40 -
MWCNN [43] 33.15 30.79 27.74 31.86 29.41 26.53 33.17 30.66 27.42 -
NLRN [41] 33.16 30.80 27.64 31.88 29.41 26.47 33.45 30.94 27.49 -
RNAN [82] - - 27.70 - - 26.48 - - 27.65 496G
DeamNet [53] 33.19 30.81 27.74 31.91 29.44 26.54 33.37 30.85 27.53 146G
DAGL [48] 33.28 30.93 27.81 31.93 29.46 26.51 33.79 31.39 27.97 256G
SwinIR [40] 33.36 31.01 27.91 31.97 29.50 26.58 33.70 31.30 27.98 759G
Restormer [69] 33.42 31.08 28.00 31.96 29.52 26.62 33.79 31.46 28.29 141G
KBNets 33.40 31.08 28.04 31.98 29.54 26.65 33.77 31.45 28.33 69G

Full Image Noisy SwinIR [40] Restormer [69] Ours Ground Truth
Fig. 4 Visualization results on Gaussian denoising of color images on Urban100 dataset [29]. KBNet can recover more fine
textures

adaptive aggregation, KBNet can recover more design of in Restormer [69] to perform position-
textures even for some very thin edges. wise non-linear transformation on the SenseNoise
For Gaussian denoising of gray images, as dataset.
shown in Tab. 2, KBNet shows consistent per- SIDD: The SIDD dataset is collected on indoor
formance as the color image denoising. It out- scenes. Five smartphones are used to capture
performs the previous state-of-the-art method scenes at different noise levels. SIDD contains 320
Restormer [69] slightly, but only uses less than half image pairs for training and 1, 280 for validation.
of its MACs. As shown in Tab. 3, KBNet achieves state-of-
the-art results on SIDD dataset. It outperforms
4.3 Raw Image Denoising Results very recent transformer-based methods includ-
ing Restormer [69], Uformer [63], MAXIM [61]
For real-world denoising, we conduct experiments and CNN-based methods NAFNet [13] with fewer
on SIDD dataset [1] and SenseNoise dataset [79] MACs. Fig. 6 shows the performance-efficiency
to evaluate our method on both indoor and out- comparisons of our method. KBNet achieves the
door scenes. Besides the KBNets , we also provide best trade-offs.
a heavier model KBNetl that adopts the FFN
Springer Nature 2021 LATEX template

Table 3 Denoising comparisons on SIDD [1] dataset.

Method DnCNN MLP FoE BM3D WNNM NLM KSVD EPLL CBDNet
[75] [9] [55] [17] [24] [8] [4] [84] [25]

PSNR ↑ 23.66 24.71 25.58 25.65 25.78 26.76 26.88 27.11 30.78
SSIM ↑ 0.583 0.641 0.792 0.685 0.809 0.699 0.842 0.870 0.754

Method RIDNet VDN MIRNet NBNet Uformer MAXIM Restormer NAFNet KBNets
[6] [68] [70] [15] [63] [61] [69] [13] (Ours)

PSNR ↑ 38.71 39.28 39.72 39.75 39.89 39.96 40.02 40.30 40.35
SSIM ↑ 0.914 0.909 0.959 0.959 0.960 0.960 0.960 0.962 0.972
MACs ↓ 89 - 786 88.8 89.5 169.5 140 65 57.8

Noisy RIDNet [6] MIRNet [70] Uformer [63]

Full Image NAFNet [13] Restormer [69] KBNet(Ours) GT


Fig. 5 Visualization of denoising results on SenseNoise dataset [79]. Our method produces clearer edges and more faithful
colors.

Fig. 6 PSNR v.s MACs of different methods on real-world Fig. 7 PSNR v.s MACs of different methods on real-world
image denoising on SIDD dataset [1]. denoising SenseNoie dataset [79].

SenseNoise: SenseNoise dataset contains 500 scale down their channel numbers for fast train-
diverse scenes, where each image is of high reso- ing. The performance and MACs are reported in
lution (e.g. 4000 × 3000). It contains both indoor Tab. 4. Our method not only outperforms all other
and outdoor scenes with high-quality ground methods but also achieves the best performance-
truth. We train the existing methods on the efficiency trade-offs as shown in Fig. 7. Some
SenseNoise dataset [79] under the same training visualizations are shown in Fig. 5. KBNet pro-
setting of NAFNet [13] but use 100k iterations. duces sharper edges and recovers more vivid colors
Since some of the models are too heavy, we than previous methods.
Springer Nature 2021 LATEX template

Table 4 Denoising comparisons on SenseNoise [79] dataset.

Method DnCNN RIDNet MIRNet MPRNet Uformer Restormer NAFNet KBNets KBNetl
[75] [6] [70] [71] [63] [69] [13] (Ours) (Ours)

PSNR ↑ 34.06 34.88 35.30 35.43 35.43 35.52 35.55 35.60 35.69
SSIM ↑ 0.904 0.915 0.919 0.922 0.920 0.924 0.923 0.924 0.924
MACs ↓ 37 89 130 120 90 80 65 57.8 104

Table 5 Comparison of defocus deblurring results on DPDD testset [3] containing 37 indoor and 39 outdoor scenes.

Indoor Scenes Outdoor Scenes

Method PSNR ↑ SSIM ↑ MAE ↓ LPIPS ↓ PSNR ↑ SSIM ↑ MAE ↓ LPIPS ↓ MACs ↓

EBDB [33] 25.77 0.772 0.040 0.297 21.25 0.599 0.058 0.373 -
DMENet [34] 25.50 0.788 0.038 0.298 21.43 0.644 0.063 0.397 1173
JNB [57] 26.73 0.828 0.031 0.273 21.10 0.608 0.064 0.355 -
DPDNet [3] 26.54 0.816 0.031 0.239 22.25 0.682 0.056 0.313 991G
KPAC [59] 27.97 0.852 0.026 0.182 22.62 0.701 0.053 0.269 -
IFAN [35] 28.11 0.861 0.026 0.179 22.76 0.720 0.052 0.254 363G
Restormer [69] 28.87 0.882 0.025 0.145 23.24 0.743 0.050 0.209 141G

KBNets 28.42 0.872 0.026 0.159 23.10 0.736 0.050 0.233 69G
KBNetl 28.89 0.883 0.024 0.143 23.32 0.749 0.049 0.205 108G

Rainy Restormer [69] Ours GT Rainy Restormer [69] Ours GT


Fig. 8 Visualization results of defocus deblurring. Fig. 9 Visualization results of deraining.

4.4 Deraining and Defocus results KBNetl produces more than 0.5dB improvement.
Some visualization results can be found in Fig. 9.
To demonstrate the generalization and effective-
Defocus Deblurring. As shown in Tab. 5, we
ness of our KBNet, we follow the state-of-the-
test our model on both indoor and outdoor scenes
art image restoration method Restormer [69] to
for deblurring. KBNets outperforms most previ-
conduct experiments on deraining and defocus
ous methods using only 69G MACs. KBNetl out-
deblurring. The channels of our MFF module are
performs previous state-of-the-art Restormer [69]
adjusted to make sure our model uses fewer MACs
while having 24% fewer MACs. Some visualization
than Restormer [69]. The training settings are
results are shown in Fig. 9.
kept the same as the Restormer [69].
Deraining. The largest two datasets (Test2800
and Test1200) are used for testing deraining per- 4.5 Ablation Studies
formance. Results in Tab. 6 indicate that our We conduct extensive ablation studies to validate
KBNet has a good generalization on deraining. the effectiveness of components of our method and
KBNetl outperforms Restormer [69] using only compare it with existing methods. All ablation
76% of its MACs. On the Test1200 dataset, studies are conducted on Gaussian denoising with
Springer Nature 2021 LATEX template

Table 6 Image deraining results.

Test2800 [22] Test1200 [73]


Method PSNR ↑ SSIM ↑ PSNR ↑ SSIM ↑ MACs ↓

DerainNet [21] 24.31 0.861 23.38 0.835 -


SEMI [64] 24.43 0.782 26.05 0.822 -
DIDMDN [73] 28.13 0.867 29.65 0.901 -
UMRL [67] 29.97 0.905 30.55 0.910 -
RESCAN [39] 31.29 0.904 30.51 0.882 -
PreNet [54] 31.75 0.916 31.36 0.911 66.2G
MSPFN [31] 32.82 0.930 32.39 0.916 -
MPRNet [72] 33.64 0.938 32.91 0.916 1.4T
SPAIR [52] 33.34 0.936 33.04 0.922 -
Restormer [69] 34.18 0.944 33.19 0.926 141G

KBNets 33.10 0938 32.29 0.912 69G


KBNetl 34.19 0.944 33.82 0.931 108G

Table 7 Ablation studies on the dynamic spatial


existing dynamic aggregation solutions, includ-
aggregation design choices.
ing the window-based self-attention [63], dynamic
Method PSNR SSIM MACs convolution [14], and kernel prediction mod-
ule [47, 62] to replace our KBA module in the
Dynamic conv [14] 29.31 0.881 12G
proposed MFF blocks. As shown in Tab. 7,
Kernel prediction module [47] 29.29 0.881 55.1G
Shifted window attention [63] 29.33 0.882 21.9G directly adopting the kernel prediction module
in [47] requires heavy computational cost. While
KBA w/ softmax 29.40 0.883 15.3G the dynamic convolution [14] predicts spatial-
KBA (Ours) 29.47 0.884 15.3G
invariant dynamic kernels, it slightly outperforms
the kernel prediction module. This demonstrates
Table 8 The influence of the kernel basis number. that the kernel prediction module is difficult to
optimize as it requires predicting a large number of
Kernel Bases Number PSNR kernel weights directly. Most existing works [7, 47,
N =4 29.36 66] need to adopt a heavy computational branch to
N =8 29.41 realize the kernel prediction. The shifted window
N = 16 29.44 attention requires more MACs while only improv-
N = 32 29.47
N = 64 29.51
ing the performance marginally than the dynamic
N = 128 29.54 convolution. Our method is more lightweight and
improves performance significantly.
Using softmax in KBA module. To linearly
Table 9 The effectiveness of different branches in MFF
block. combine the kernel bases, a natural choice is to
use the softmax function to normalize the kernel
Method PSNR fusion coefficients. However, we find that using the
softmax would hinder the performance as shown in
DW3 × 3 29.12
DW3 × 3 + CA 29.22
Tab. 7. Using the softmax function may encourage
DW3 × 3 + CA + KBA 29.47 the fused kernel more focus on a specific kernel
basis and reduces the diversity of the fused kernel
weights.
noise level σ = 50. We train models for 100k iter- The impact of the number of kernel
ations. Other training settings are kept the same bases. We also validate the influence of the
as the main experiment on the Gaussian denoising number of kernel bases. As shown in Tab. 8,
of color images. more kernel bases bring consistent performance
Comparison with dynamic spatial aggrega- improvements since it captures more image pat-
tion methods. We compare our method with terns and increases the diversity of the spatial
Springer Nature 2021 LATEX template

11

spatial-invariant, and pixel-adaptive processing.


In the experiments, KBNet achieves state-of-the-
art results on popular synthetic noise datasets
and two real-world noise datasets (SIDD and
SenseNoise) with less computational cost. It also
presents good generalizations and state-of-the-art
results on deraining and defocus deblurring.

Noisy GT 64 × 64 256 × 256


References
Fig. 10 Visualization of kernel indices in different stages. [1] Abdelrahman Abdelhamed, Stephen Lin, and
The fusion coefficient map is transformed into a 3D RGB
map by random projection.
Michael S Brown. A high-quality denoising
dataset for smartphone cameras. In CVPR,
2018.
information aggregation. In our experiments, we
select N = 32 for a better performance-efficiency [2] Abdelrahman Abdelhamed, Radu Timofte,
trade-off. and Michael S Brown. NTIRE 2019 challenge
The impact of different branches in MFF on real image denoising: Methods and results.
block. As shown in Tab. 9, a single 3 × 3 depth- In CVPR Workshops, 2019.
wise convolution branch produces 29.12 dB on
Gaussian denoising. Adding the channel attention [3] Abdullah Abuolaim and Michael S Brown.
branch and KBA module branch successively leads Defocus deblurring using dual-pixel data. In
to 0.1dB and 0.25dB respectively. KBA module ECCV, 2020.
brings the largest improvement, which indicates
the importance of the proposed pixel adaptive [4] Michal Aharon, Michael Elad, and Alfred
spatial information processing. Bruckstein. K-SVD: an algorithm for design-
Visulization of kernel indices. We visualize ing overcomplete dictionaries for sparse rep-
the kernel indices of different regions in differ- resentation. Trans. Sig. Proc., 2006.
ent stages of our KBNet. We project the fusion
coefficients to a 3D space by random projection [5] Michal Aharon, Michael Elad, and Alfred M.
and visualize them as an RGB map. As shown in Bruckstein. K-SVD: an algorithm for design-
Fig. 10, similar color indicates that the pixels fuse ing overcomplete dictionaries for sparse rep-
kernel bases having similar patterns. KBNet can resentation. IEEE Trans. Signal Process.,
identify different regions and share similar kernel 54(11):4311–4322, 2006.
bases for similar textural or plain areas. Different
kernels are learned to be responsible for different [6] Saeed Anwar and Nick Barnes. Real image
regions, and they can be optimized jointly during denoising with feature attention. ICCV, 2019.
the training.
[7] Steve Bako, Thijs Vogels, Brian McWilliams,
Mark Meyer, Jan Novák, Alex Harvill,
5 Conclusion Pradeep Sen, Tony DeRose, and Fabrice
Rousselle. Kernel-predicting convolutional
In this paper, we introduce a kernel basis network networks for denoising monte carlo render-
(KBNet) for image restoration. The key designs ings. ACM Trans. Graph., 36(4):97:1–97:14,
of our KBNet are kernel basis attention (KBA) 2017.
module and Multi-axis Feature Fusion (MFF)
block. KBA module adopts the learnable ker- [8] Antoni Buades, Bartomeu Coll, and J-M
nel bases to model the local image patterns and Morel. A non-local algorithm for image
fuse kernel bases linearly to aggregate the spatial denoising. In CVPR, 2005.
information efficiently. Besides, the MFF block
aims to fuse diverse features from multiple axes [9] Harold C Burger, Christian J Schuler, and
for image denoising, which includes channel-wise, Stefan Harmeling. Image denoising: Can
Springer Nature 2021 LATEX template

plain neural networks compete with BM3D? [19] Michael Elad and Michal Aharon. Image
In CVPR, 2012. denoising via sparse and redundant repre-
sentations over learned dictionaries. IEEE
[10] Jianrui Cai, Shuhang Gu, Radu Timofte, and Trans. Image Process., 15(12):3736–3745,
Lei Zhang. NTIRE 2019 challenge on real 2006.
image super-resolution: Methods and results.
In CVPR Workshops, 2019. [20] Rich Franzen. Kodak lossless true color image
suite. http://r0k.us/graphics/kodak/, 1999.
[11] Meng Chang, Qi Li, Huajun Feng, and Zhihai Online accessed 24 Oct 2021.
Xu. Spatial-adaptive network for single image
denoising. In ECCV, 2020. [21] Xueyang Fu, Jiabin Huang, Xinghao Ding,
Yinghao Liao, and John Paisley. Clearing
[12] Hanting Chen, Yunhe Wang, Tianyu Guo, the skies: A deep network architecture for
Chang Xu, Yiping Deng, Zhenhua Liu, Siwei single-image rain removal. TIP, 2017.
Ma, Chunjing Xu, Chao Xu, and Wen Gao.
Pre-trained image processing transformer. In [22] Xueyang Fu, Jiabin Huang, Delu Zeng, Yue
CVPR, 2021. Huang, Xinghao Ding, and John Paisley.
Removing rain from single images via a deep
[13] Liangyu Chen, Xiaojie Chu, Xiangyu detail network. In CVPR, 2017.
Zhang, and Jian Sun. Simple baselines
for image restoration. arXiv preprint [23] Shuhang Gu, Yawei Li, Luc Van Gool, and
arXiv:2204.04676, 2022. Radu Timofte. Self-guided network for fast
image denoising. In ICCV, 2019.
[14] Yinpeng Chen, Xiyang Dai, Mengchen Liu,
Dongdong Chen, Lu Yuan, and Zicheng Liu. [24] Shuhang Gu, Lei Zhang, Wangmeng Zuo, and
Dynamic convolution: Attention over convo- Xiangchu Feng. Weighted nuclear norm mini-
lution kernels. In CVPR, 2020. mization with application to image denoising.
In CVPR, 2014.
[15] Shen Cheng, Yuzhi Wang, Haibin Huang,
Donghao Liu, Haoqiang Fan, and Shuaicheng [25] Shi Guo, Zifei Yan, Kai Zhang, Wangmeng
Liu. NBNet: noise basis learning for image Zuo, and Lei Zhang. Toward convolutional
denoising with subspace projection. In blind denoising of real photographs. In
CVPR, 2021. CVPR, 2019.

[16] Sung-Jin Cho, Seo-Won Ji, Jun-Pyo Hong, [26] Qi Han, Zejia Fan, Qi Dai, Lei Sun, Ming-
Seung-Won Jung, and Sung-Jea Ko. Rethink- Ming Cheng, Jiaying Liu, and Jingdong
ing coarse-to-fine approach in single image Wang. On the connection between local
deblurring. In ICCV, 2021. attention and dynamic depth-wise convolu-
tion. In International Conference on Learn-
[17] Kostadin Dabov, Alessandro Foi, Vladimir ing Representations, 2022.
Katkovnik, and Karen Egiazarian. Image
denoising by sparse 3-D transform-domain [27] Yizeng Han, Gao Huang, Shiji Song, Le Yang,
collaborative filtering. TIP, 2007. Honghui Wang, and Yulin Wang. Dynamic
neural networks: A survey. IEEE Trans. Pat-
[18] Xiaohan Ding, Xiangyu Zhang, Yizhuang tern Anal. Mach. Intell., 44(11):7436–7456,
Zhou, Jungong Han, Guiguang Ding, and 2022.
Jian Sun. Scaling up your kernels to 31x31:
Revisiting large kernel design in cnns. arXiv [28] Jie Hu, Li Shen, Samuel Albanie, Gang Sun,
preprint arXiv:2203.06717, 2022. and Enhua Wu. Squeeze-and-excitation net-
works. IEEE TPAMI, 2019.
Springer Nature 2021 LATEX template

13

[29] Jia-Bin Huang, Abhishek Singh, and Naren- [39] Xia Li, Jianlong Wu, Zhouchen Lin, Hong
dra Ahuja. Single image super-resolution Liu, and Hongbin Zha. Recurrent squeeze-
from transformed self-exemplars. In CVPR, and-excitation context aggregation net for
2015. single image deraining. In ECCV, 2018.

[30] Xixi Jia, Sanyang Liu, Xiangchu Feng, and [40] Jingyun Liang, Jiezhang Cao, Guolei Sun,
Lei Zhang. Focnet: A fractional optimal con- Kai Zhang, Luc Van Gool, and Radu Tim-
trol network for image denoising. In CVPR, ofte. SwinIR: Image restoration using swin
2019. transformer. In ICCV Workshops, 2021.

[31] Kui Jiang, Zhongyuan Wang, Peng Yi, Bao- [41] Ding Liu, Bihan Wen, Yuchen Fan,
jin Huang, Yimin Luo, Jiayi Ma, and Junjun Chen Change Loy, and Thomas S Huang.
Jiang. Multi-scale progressive fusion network Non-local recurrent network for image
for single image deraining. In CVPR, 2020. restoration. In NeurIPS, 2018.

[32] Yifan Jiang, Bart Wronski, Ben Mildenhall, [42] Ding Liu, Bihan Wen, Xianming Liu,
Jonathan T. Barron, Zhangyang Wang, and Zhangyang Wang, and Thomas S. Huang.
Tianfan Xue. Fast and high-quality image When image denoising meets high-level vision
denoising via malleable convolutions. CoRR, tasks: A deep learning approach. In IJCAI,
abs/2201.00392, 2022. pages 842–848. ijcai.org, 2018.

[33] Ali Karaali and Claudio Rosito Jung. Edge- [43] Pengju Liu, Hongzhi Zhang, Kai Zhang,
based defocus blur estimation with adaptive Liang Lin, and Wangmeng Zuo. Multi-level
scale selection. TIP, 2017. wavelet-cnn for image restoration. In CVPR
Workshops, 2018.
[34] Junyong Lee, Sungkil Lee, Sunghyun Cho,
and Seungyong Lee. Deep defocus map esti- [44] Zhuang Liu, Hanzi Mao, Chao-Yuan Wu,
mation using domain adaptation. In CVPR, Christoph Feichtenhofer, Trevor Darrell, and
2019. Saining Xie. A convnet for the 2020s. Pro-
ceedings of the IEEE/CVF Conference on
[35] Junyong Lee, Hyeongseok Son, Jaesung Rim, Computer Vision and Pattern Recognition
Sunghyun Cho, and Seungyong Lee. Itera- (CVPR), 2022.
tive filter adaptive network for single image
defocus deblurring. In CVPR, 2021. [45] Julien Mairal, Michael Elad, and Guillermo
Sapiro. Sparse representation for color image
[36] Dasong Li, Xiaoyu Shi, Yi Zhang, Xiao- restoration. TIP, 2007.
gang Wang, Hongwei Qin, and Hongsheng
Li. No attention is needed: Grouped spatial- [46] David Martin, Charless Fowlkes, Doron Tal,
temporal shift for simple and efficient video and Jitendra Malik. A database of human
restorers. CoRR, abs/2206.10810, 2022. segmented natural images and its applica-
tion to evaluating segmentation algorithms
[37] Dasong Li, Yi Zhang, Ka Chun Cheung, and measuring ecological statistics. In ICCV,
Xiaogang Wang, Hongwei Qin, and Hong- 2001.
sheng Li. Learning degradation represen-
tations for image deblurring. In ECCV, [47] Ben Mildenhall, Jonathan T Barron, Jiawen
2022. Chen, Dillon Sharlet, Ren Ng, and Robert
Carroll. Burst denoising with kernel pre-
[38] Dasong Li, Yi Zhang, Ka Lung Law, Xiao- diction networks. In IEEE Conference on
gang Wang, Hongwei Qin, and Hongsheng Li. Computer Vision and Pattern Recognition
Efficient burst raw denoising with variance (CVPR), 2018.
stabilization and multi-frequency denoising
network. IJCV., 2022.
Springer Nature 2021 LATEX template

[48] Chong Mou, Jian Zhang, and Zhuoyuan Wu. [59] Hyeongseok Son, Junyong Lee, Sunghyun
Dynamic attentive graph learning for image Cho, and Seungyong Lee. Single image defo-
restoration. In ICCV, 2021. cus deblurring using kernel-sharing parallel
atrous convolutions. In ICCV, 2021.
[49] Seungjun Nah, Sanghyun Son, Suyoung Lee,
Radu Timofte, and Kyoung Mu Lee. Ntire [60] Chunwei Tian, Yong Xu, and Wangmeng
2021 challenge on image deblurring. In CVPR Zuo. Image denoising using deep cnn with
Workshops, 2021. batch renormalization. Neural Networks,
2020.
[50] Yali Peng, Lu Zhang, Shigang Liu, Xiaojun
Wu, Yu Zhang, and Xili Wang. Dilated resid- [61] Zhengzhong Tu, Hossein Talebi, Han Zhang,
ual networks with symmetric skip connection Feng Yang, Peyman Milanfar, Alan Bovik,
for image denoising. Neurocomputing, 2019. and Yinxiao Li. Maxim: Multi-axis mlp for
image processing. CVPR, 2022.
[51] Tobias Plötz and Stefan Roth. Neural nearest
neighbors networks. In NeurIPS, 2018. [62] Jiaqi Wang, Kai Chen, Rui Xu, Ziwei Liu,
Chen Change Loy, and Dahua Lin. Carafe:
[52] Kuldeep Purohit, Maitreya Suin, Content-aware reassembly of features. In The
AN Rajagopalan, and Vishnu Naresh Bod- IEEE International Conference on Computer
deti. Spatially-adaptive image restoration Vision (ICCV), October 2019.
using distortion-guided networks. In ICCV,
2021. [63] Zhendong Wang, Xiaodong Cun, Jianmin
Bao, and Jianzhuang Liu. Uformer: A general
[53] Chao Ren, Xiaohai He, Chuncheng Wang, u-shaped transformer for image restoration.
and Zhibo Zhao. Adaptive consistency prior arXiv:2106.03106, 2021.
based deep network for image denoising. In
CVPR, 2021. [64] Wei Wei, Deyu Meng, Qian Zhao, Zongben
Xu, and Ying Wu. Semi-supervised transfer
[54] Dongwei Ren, Wangmeng Zuo, Qinghua Hu, learning for image rain removal. In CVPR,
Pengfei Zhu, and Deyu Meng. Progres- 2019.
sive image deraining networks: A better and
simpler baseline. In CVPR, 2019. [65] Zhihao Xia and Ayan Chakrabarti. Iden-
tifying recurring patterns with deep neural
[55] Stefan Roth and Michael J Black. Fields of networks for natural image denoising. In
experts. IJCV, 2009. WACV, 2020.
[56] Leonid I. Rudin, Stanley Osher, and Emad [66] Zhihao Xia, Federico Perazzi, Michaël
Fatemi. Nonlinear total variation based noise Gharbi, Kalyan Sunkavalli, and Ayan
removal algorithms. Physica D: Nonlinear Chakrabarti. Basis prediction networks for
Phenomena, 60(1):259–268, 1992. effective burst denoising with large kernels.
In Proceedings of the IEEE/CVF Con-
[57] Jianping Shi, Li Xu, and Jiaya Jia. Just ference on Computer Vision and Pattern
noticeable defocus blur detection and estima- Recognition, pages 11844–11853, 2020.
tion. In CVPR, 2015.
[67] Rajeev Yasarla and Vishal M Patel. Uncer-
[58] Eero P. Simoncelli and Edward H. Adelson. tainty guided multi-scale residual learning-
Noise removal via bayesian wavelet coring. using a cycle spinning cnn for single image
In ICIP (1), pages 379–382. IEEE Computer de-raining. In CVPR, 2019.
Society, 1996.
[68] Zongsheng Yue, Hongwei Yong, Qian Zhao,
Deyu Meng, and Lei Zhang. Variational
Springer Nature 2021 LATEX template

15

denoising network: Toward blind noise mod- [78] Lei Zhang, Xiaolin Wu, Antoni Buades, and
eling and removal. In NeurIPS, 2019. Xin Li. Color demosaicking by local direc-
tional interpolation and nonlocal adaptive
[69] Syed Waqas Zamir, Aditya Arora, Salman thresholding. JEI, 2011.
Khan, Munawar Hayat, Fahad Shahbaz
Khan, and Ming-Hsuan Yang. Restormer: [79] Yi Zhang, Dasong Li, Ka Lung Law, Xiao-
Efficient transformer for high-resolution gang Wang, Hongwei Qin, and Hongsheng Li.
image restoration. In CVPR, 2022. Idr: Self-supervised image denoising via iter-
ative data refinement. In Proceedings of the
[70] Syed Waqas Zamir, Aditya Arora, Salman IEEE/CVF Conference on Computer Vision
Khan, Munawar Hayat, Fahad Shahbaz and Pattern Recognition (ICCV), 2022.
Khan, Ming-Hsuan Yang, and Ling Shao.
Learning enriched features for real image [80] Yi Zhang, Hongwei Qin, Xiaogang Wang, and
restoration and enhancement. In ECCV, Hongsheng Li. Rethinking noise synthesis
2020. and modeling in raw denoising. In Proceed-
ings of the IEEE/CVF International Confer-
[71] Syed Waqas Zamir, Aditya Arora, Salman ence on Computer Vision (ICCV), October
Khan, Munawar Hayat, Fahad Shahbaz 2021.
Khan, Ming-Hsuan Yang, and Ling Shao.
Multi-stage progressive image restoration. In [81] Yulun Zhang, Kunpeng Li, Kai Li, Lichen
CVPR, 2021. Wang, Bineng Zhong, and Yun Fu. Image
super-resolution using very deep residual
[72] Syed Waqas Zamir, Aditya Arora, Salman channel attention networks. In ECCV, 2018.
Khan, Munawar Hayat, Fahad Shahbaz
Khan, Ming-Hsuan Yang, and Ling Shao. [82] Yulun Zhang, Kunpeng Li, Kai Li, Bineng
Multi-stage progressive image restoration. In Zhong, and Yun Fu. Residual non-local atten-
CVPR, 2021. tion networks for image restoration. In ICLR,
2019.
[73] He Zhang and Vishal M Patel. Density-aware
single image de-raining using a multi-stream [83] Yulun Zhang, Yapeng Tian, Yu Kong, Bineng
dense network. In CVPR, 2018. Zhong, and Yun Fu. Residual dense network
for image restoration. TPAMI, 2020.
[74] Kai Zhang, Yawei Li, Wangmeng Zuo, Lei
Zhang, Luc Van Gool, and Radu Timofte. [84] Daniel Zoran and Yair Weiss. From learn-
Plug-and-play image restoration with deep ing models of natural image patches to whole
denoiser prior. TPAMI, 2021. image restoration. In ICCV, 2011.

[75] Kai Zhang, Wangmeng Zuo, Yunjin Chen,


Deyu Meng, and Lei Zhang. Beyond a gaus-
sian denoiser: Residual learning of deep cnn
for image denoising. TIP, 2017.

[76] Kai Zhang, Wangmeng Zuo, Shuhang Gu,


and Lei Zhang. Learning deep cnn denoiser
prior for image restoration. In CVPR, 2017.

[77] Kai Zhang, Wangmeng Zuo, and Lei Zhang.


FFDNet: Toward a fast and flexible solution
for CNN-based image denoising. TIP, 2018.

You might also like