0% found this document useful (0 votes)
63 views5 pages

Two-Stage Filter Response Normalization Network For Real Image Denoising

The document summarizes a research paper titled "Two-Stage Filter Response Normalization Network for Real Image Denoising". The proposed method, called TFRNet, uses a two-stage network with filter response normalization to remove noise from real images. TFRNet consists of two stages, each using an encoder-decoder structure based on U-Net. A new filter response normalization block is introduced to extract features while accelerating training. Between the stages, a double-skip connection module and supervised attention module are used to maintain spatial detail and enhance learning. Experiments show TFRNet achieves excellent results on real image denoising datasets with reduced computation compared to previous methods.

Uploaded by

May Thet Tun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views5 pages

Two-Stage Filter Response Normalization Network For Real Image Denoising

The document summarizes a research paper titled "Two-Stage Filter Response Normalization Network for Real Image Denoising". The proposed method, called TFRNet, uses a two-stage network with filter response normalization to remove noise from real images. TFRNet consists of two stages, each using an encoder-decoder structure based on U-Net. A new filter response normalization block is introduced to extract features while accelerating training. Between the stages, a double-skip connection module and supervised attention module are used to maintain spatial detail and enhance learning. Experiments show TFRNet achieves excellent results on real image denoising datasets with reduced computation compared to previous methods.

Uploaded by

May Thet Tun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Journal of Signal Processing, Vol.26, No.6, pp.

183-187, November 2022

RESEARCH NOTE

Two-Stage Filter Response Normalization Network for Real Image


Denoising

Tai Yuwen, Yosuke Sugiura, Nozomiko Yasui and Tetsuya Shimamura

Graduate School of Science and Engineering, Saitama University


255 Shimo-Okubo, Sakura-ku, Saitama City, Saitama 338-8570, Japan
E-mail: [email protected]

Abstract In this paper, we propose a two-stage network for real image denoising with filter response nor-
malization, named as two-stage filter response normalization network (TFRNet). In TFRNet, we propose
a filter response normalization(FRN) block to extract features and accelerate the training of the network.
TFRNet consists of two stages, at each stage of which we use the encoder-decoder structure based on
U-Net. We also use the coordinate attention block(CAB), double channel downsampling module, double
skip connection module, and convolutional (Conv) block in our TFRNet. With the help of these modules,
TFRNet provides excellent results on both SIDD and DND datasets for real image denoising.

Keywords: real image denoising, filter response normalization, coordinate attention

1. Introduction is used to maintain spatial detail when extracting deep


contextualized features. The proposed FRN block is
During the formation or transmission of an image, added in the double-channel downsampling module.
the quality of the image may be affected by external The FRN block consists of FRN and the thresholded
or internal factors that cause a noisy image. There- linear unit (TLU), which helps extract contextualized
fore, image denoising is a basic task in image pro- features and accelerate training. Between Stage 1 and
cessing and has many applications in the real world. Stage 2, we use the double-skip connection module,
A large number of methods have been proposed for and the skip connection from Stage 1 is also con-
synthetic noise such as additive white Gaussian noise nected to Stage 2, which helps maintain spatial detail
(AWGN)[1]. In recent years, with the development of and makes TFRNet stable. The supervised attention
deep convolutional neural networks and optimization module (SAM) from multi-stage progressive restora-
methods, many methods have been proposed for im- tion network (MPRNet)[4] is also used at the end of
age denoising tasks, and the denoising target gradually Stage 1, and useful features from Stage 1 are propa-
changes from AWGN to real image noise. However, gated to Stage 2 to enhance learning in Stage 2. With
real image noise is much more complex than synthetic these modules, TFRNet extracts deep contextualized
noise; therefore, deep convolutional neural networks features while preserving the spatial detail, that helps
for real image denoising are a challenging topic. promote denoising performance.
In this paper, we propose a two-stage network for The performance of TFRNet is validated through
real image denoising with filter response normalization experiments. And TFRNet significantly reduces the
(FRN)[2], named two-stage filter response normaliza- computational cost compared to MPRNet.
tion network (TFRNet). We also propose the FRN
block as a component of TFRNet. We use an encoder-
decoder structure based on U-Net at both Stage 1 and 2. Related Work
Stage 2. At the beginning of each stage, the coordi-
nate attention block (CAB)[3], which embeds posi- 2.1 MPRNet
tional information into channel attention, is used to
extract features at each scale. In the encoder unit of The MPRNet consists of three stages. Stage 1 and
each stage, the double-channel downsampling module Stage 2 are based on U-net. In the last stage, a sub-

Journal of Signal Processing, Vol. 26, No. 6, November 2022 183


Fig. 2 TRFNet with depth = 2

2.3 Attention
Fig. 1 Coordinate attention from [3]
Attention mechanisms have been shown to be ef-
fective in low-level computer vision tasks. In image
network is used with several original-resolution blocks denoising tasks, attention is mainly divided into spa-
consisting of several channel attention blocks without tial attention and channel attention. MPRNet uses
any downsampling operations, in order to preserve the channel attention, and in [11], both channel attention
desired fine texture in the final output image. and spatial attention were used to improve the image
In addition, MPRNet incorporates a SAM between denoising accuracy.
every two stages; this suppresses useless information Coordinate attention, which is shown in Fig. 1, is
and then transforms useful information to the next used in high-level computer vision tasks. It embeds
stage. Therefore we also use the SAM in our TFRNet. pixel positional information into channel attention.
MPRNet achieves strong performance in image Therefore, it captures dependences with precise
restoration tasks. However, the network structure of positional information while inheriting the benefits of
MPRNet is very complicated because of the many channel attention and achieves excellent results in im-
channel attention blocks in the last stage. There- age classification tasks. Because of such benefits, we
fore, MPRNet is computationally intensive and time- add a CAB at the beginning of each stage in our TFR-
consuming. Net.

3. Two-Stage Filter Response Normalization


2.2 Normalization Network

Normalization is essential in high-level computer 3.1 TFRNet and loss function


vision tasks. Batch normalization (BN)[7] is widely
used in image classification tasks. However, a small For convenience, TFRNet with a depth equal to 2
mini-batch size is commonly used to train an image is shown in Fig. 2. The depth of U-Net refers to the
denoising network, leading to unstable BN. On the number of downsampling operations in U-Net, where
other hand, many normalization methods have been multiple downsampling operations extract deeper con-
also proposed. Group normalization (GN)[8] outper- textual information from a noisy image. Increasing
forms BN in image classification tasks with a very the number of downsamples in U-Net can extract
small mini-batch size such as 4 or 8. Chen et at. [9] deeper features, but too many downsamples can also
used instance normalization (IN)[10] for the image de- lead to overfitting. Thus, we trained our TFRNet with
noising task and achieved excellent results. a depth equal to 5 to extract deeper contextual fea-
FRN is proposed to eliminate the batch depen- tures without causing overfitting.
dence in the training of deep neural networks. In addi- The loss function of TFRNet is based on the loss
tion, adding TLU after FRN can significantly improve function of MPRNet. Let I denote the input noisy
the performance of FRN, making it outperform GN image and i denote stage i. The output of each stage
and BN when the mini-batch size is small. Therefore is predicted residual images Ri ; therefore, the denoised
we use FRN and TLU to construct the FRN block in image Xi of each stage can be calculated as
our TFRNet.
Xi = I + Ri (1)

184 Journal of Signal Processing, Vol. 26, No. 6, November 2022


X1 is only used to calculate the loss function, while 澷濃濂濊

澺濆濂澔澶激濃澷澿
X2 is used as the result of TFRNet. The Charbonnier 澮
澷濃濂濊 澺濆濂 濈激濉 澷濃濂濊 濆澹激濉
loss is used as the metric of the loss function and we
optimize the end-to-end TFRNet as
2

澷濣濢濪 澶激濃澷澿
Loss = Lch (Xi , Y ) (2) 澷濃濂濊

i=1 澮
澷濃濂濊 濆澹激濉 澷濃濂濊 濆澹激濉

where Y denotes the ground truth image and Lch is


the Charbonnier loss defined by
√ Fig. 3 FRN block and Conv block
2
Lch (Xi , Y ) = ∥Xi − Y ∥ + ε2 (3)

where ε is a constant set to 10−3 .


where γ and β are learned parameters, and y is used
as the input of TLU, that is,
3.2 Double-channel downsampling
zi = max(yi , τ ) (6)
In the encoder unit, there are two channels for
downsampling. In the first downsampling channel, we where τ is a learned threshold.
use the residual resizing module from MIRNet[11] for
downsampling, which is designed with recursive resid- In encoder units, the FRN block is added before
uals and uses skip connections to ease the flow of in- the residual downsampling operation to extract fea-
formation. Before the downsampling operation, we tures at each scale. The FRN block consists of a con-
add a FRN block to extract contextual features. In volutional layer, FRN with TLU, and another convo-
the second downsampling channel, we use only convo- lutional layer with ReLU. The first convolutional layer
lution for downsampling. Compared with the resid- extracts the initial features in this scale, then, FRN is
ual resizing module, convolution is a simple operation used to accelerate training, TLU is used to prevent the
where spatial information can be effectively preserved. activation of the FRN from deviations from zero, and
The result of the second channel is concatenated into another convolutional layer with ReLu is used to fur-
the first channel, similarly to a skip connection, which ther extract features. There is also a skip connection
helps maintain spatial information when extracting in the FRN block, making it a residual structure.
deeper features. If the FRN block is also used in the decoder unit, it
may lead to over-fitting. Therefore, the convolutional
3.3 FRN and FRN block (Conv) block is used to replace the FRN block in the
decoder unit, and it is added after the residual upsam-
FRN is a method of normalizing the responses of pling operation. The Conv block comprises the stacks
each batch element independently by dividing the re- of two convolutional layers with ReLU, and similarly
sponses of each filter by the square root of its non- to the FRN block, there is also a skip connection in
central second moment without using any mean sub- the Conv block.
tractions. Therefore, the FRN layer does not rely on The FRN block and Conv block are shown in Fig.
other batch elements or channels for normalization. 3.
However, it also leaves a lack of mean centrality in
FRN, which leads to the activation function deviating 3.4 Double-skip connection
from zero. Therefore, TLU is used instead of ReLU
to prevent detrimental effects on learning. TLU is a The double-skip connection module is used be-
point activation that is parameterized by the learned tween Stage 1 and Stage 2. Skip connection mod-
rectification threshold. The effect of TLU is the same ule from Stage 1 is also connected to Stage 2, which
as having a shared bias before and after ReLU, that transforms the contextual information from Stage 1
is, the deviaition of the activation function deviating to Stage 2 where the contextual features of Stage 1
from zero is prevented. Specifically, in FRN, let x de- help consolidate the intermediate features of Stage 2.
note input and v 2 be the mean squared norm of x such In addition, double-skip connection relieves the flow
as of information to enable a more stable network struc-
∑ x2 ture. Therefore, we can construct a deeper TFRNet.
i
v2 = (4)
N
Then the output of the FRN is
xi
yi = γ √ +β (5)
v2 + ϵ

Journal of Signal Processing, Vol. 26, No. 6, November 2022 185


4. Expriments

4.1 Dataset and optimization methods

We use the SIDD-Medium Dataset[5] as the train-


ing set, which consists of 320 high-resolution noisy and
ground-truth image pairs. We also use image flips Fig. 4 Noisy images and denoised images using TFR-
and image rotations for data amplification. Evalua- Net on SIDD dataset
tion is conducted on 1,280 validation patches of size
256 × 256 from the SIDD dataset. For testing, we use
two datasets: SIDD benchmark and DND[6] bench- Table 1 PSNR, SSIM and MACS for MPRNet and
mark. TFRNet (depth = 5) on SIDD and DND datasets
The network is trained with an Adam optimizer. In Dateset Method PSNR SSIM MACs(G)
the first five epochs, we use the warm-up operation to MPRNet 39.71 0.958 573.88
SIDD
slowly increase the learning rate to 2 × 10−4 ; then, the TFRNet 39.67 0.958 279.23
learning rate is decreased to 1 × 10−6 with the cosine MPRNet 39.80 0.954 573.88
DND
annealing strategy. In addition, TFRNet is trained on TFRNet 39.79 0.954 279.23
256×256 patches with a batch size of 16 for 80 epochs,
the setting parameters of which are the same as those
in MPRNet.
connection module is used to stabilize TFRNet, and a
4.2 Evaluation indicators and denoising re- double-channel downsampling module is used to main-
sults tain spatial information when extracting deeper fea-
tures.
The denoising performance is measured by PSNR Compared to MPRNet which delivers strong per-
and SSIM metrics. In addition, we compared the cal- formance in image denoising tasks, TFRNet provides
culation complexity in terms of MACs among TFR- almost identical results on both SIDD and DND
Net and MPRNet at the input image size of 1 × 3 × datasets while the MACs of TFRNet are only 48%
256 × 256. Specificlly, MAC is the multiply 窶殿 ccu- of that in MPRNet. Therefore, TFRNet significantly
mulate operation, which contains a multiplication op- reduces the computational cost.
eration and an addition operation. Therefore, MACs TFRNet still has shortcoming. The PSNR of TFR-
are used to measure the computational complexity of Net is still slightly lower than that of MPRNet. Mod-
tensor calculations in image denoising tasks. In ad- ifying the construction of double-channel downsam-
dition, the smaller the value of MACs, the lower the pling module and changing the position of CAB may
computational complexity of the model. be two ways to improve the accuracy of TFRNet.
As can be seen from the results shown in Table 1, There are many other tasks for future research in
TFRNet leads to a lower PSNR than MPRNet does by the field of image restoration, such as image derain-
0.04 dB on the SIDD dataset; the results for the DND ing, image deblurring and image super resolution. We
dataset are almost the same. In addition, the SSIM expect that the use of TFRNet in these tasks will also
values are almost identical for the two datasets. On achieve good results.
the other hand, the number of MACs of TFRNet can
be reduced to 52% compared with that of MPRNet.
Thus, TFRNet can reduce the number of MACs to References
half with almost the same denoising effect.
Examples of noisy images and denoised images
with TFRNet on the SIDD dataset are shown in Fig. [1] K. Dabov, A. Foi, V. Katkovnik and K. Egiazarian: Im-
4, in which the top images are the noisy images and age denoising with block-matching and 3D filtering, Image
the bottom images are the images denoised by our Processing: Algorithms and Systems, Neural Networks, and
TFRNet. Machine Learning, Vol. 6064, 2006.
[2] S. Singh and S. Krishnan: Filter response normalization
layer: Eliminating batch dependence in the training of deep
5. Conclusion
neural networks, Proceedings of IEEE/CVF Conference on
Computer Vision and Pattern Recognition, pp. 11237–11246,
In this paper, we propose a two-stage filter re-
2020.
sponse normalization network (TFRNet) for real im-
[3] Q. Hou, D. Zhou and J. Feng: Coordinate attention for
age denoising tasks, in which FRN with TLU is used efficient mobile network design, Proceedings of IEEE/CVF
to extract features and accelerate training, CAB is Conference on Computer Vision and Pattern Recognition,
used to extract features at each scale, a double-skip pp. 13713–13722, 2021.

186 Journal of Signal Processing, Vol. 26, No. 6, November 2022


[4] S.W. Zamir, A. Arora, S. Khan, M. Hayat, F.S. Khan, M.H.
Nozomiko Yasui received her
Yang and L. Shao: Multi-stage progressive image restoration, B.E., M.E. and Ph.D. degrees in en-
Proceedings of IEEE/CVF Conference on Computer Vision gineering from Ryukoku University,
and Pattern Recognition, pp. 14821–14831, 2021. Japan, in 2007, 2009 and 2012, re-
[5] A. Abdelhamed, S. Lin and M.S. Brown: A high-quality spectively. In 2012, she joined Na-
denoising dataset for smartphone cameras, Proceedings of tional Institute of Technology, Mat-
sue College, Japan. In 2018, she
IEEE Conference on Computer Vision and Pattern Recogni- joined Saitama University, Saitama,
tion, pp. 1692–1700, 2018. Japan and is currently an Assistant
[6] T. Plotz and S. Roth: Benchmarking denoising algorithms Professor. She received the Awaya-
with real photographs, Proceedings of IEEE Conference on Kiyoshi Award from the ASJ in
Computer Vision and Pattern Recognition, pp. 1586–1595, 2011. Her research interests include
human informatics and digital sig-
2017. nal processing. She is a member of ASJ, JSMPC and INCE/J.
[7] S. Ioffe and C. Szegedy: Batch normalization: Accelerating
deep network training by reducing internal covariate shift,
Proceedings of International Conference on Machine Learn- Tetsuya Shimamura received his
ing, pp. 448–456, 2015. B.E., M.E. and Ph.D. degrees in
[8] Y. Wu and K. He: Group normalization, Proceedings of electrical engineering from Keio
European Conference on Computer Vision, pp. 3–19, 2018. University, Yokohama, Japan, in
1986, 1988, and 1991, respectively.
[9] L. Chen, X. Lu, J. Zhang, X. Chu and C. Chen: HINet: In 1991, he joined Saitama Uni-
Half instance normalization network for image restoration, versity, Saitama, Japan, where he
Proceedings of IEEE/CVF Conference on Computer Vision is currently a Professor. He was
and Pattern Recognition, pp. 182–192, 2021. a visiting researcher at Loughbor-
[10] D. Ulyanov, A. Vedaldi and V. Lempitsky: Instance ough University, U.K. in 1995 and
at Queen’s University of Belfast,
normalization: The missing ingredient for fast stylization,
U.K. in 1996, respectively. Prof.
arXiv:1607.08022, 2016. Shimamura is an author and coau-
[11] S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, thor of six books. He serves as an editorial member of several
M. H. Yang and L. Shao: Learning enriched features for international journals and is a member of the organizing and
real image restoration and enhancement, Proceedings of 16th program committees of various international conferences. His
research interests are in digital signal processing and its appli-
European Conference Computer Vision, pp. 492–511, 2020.
cation to speech, image, and communication systems.
Tai Yuwen received his B.E.
and M.E. degrees from Northeast (Received May 6, 2022; revised July 8, 2022)
Forestry University, Harbin, China
in 2017 and Saitama University,
Saitama, Japan in 2022, respec-
tively. His research interests include
digital signal processing and image
denoising.

Yosuke Sugiura received his B.E.,


M.E. and Ph.D. degrees from Osaka
University, Osaka, Japan in 2009,
2011, and 2013, respectively. In
2013, he joined Tokyo University of
Science, Tokyo, Japan. In 2015, he
joined Saitama University, Saitama,
Japan, where he is currently an As-
sistant Professor. His research in-
terests include digital signed pro-
cessing, adaptive filter theory and
speech information processing.

Journal of Signal Processing, Vol. 26, No. 6, November 2022 187

You might also like