0% found this document useful (0 votes)
119 views16 pages

Image Restoration Via Frequency Selection

AI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
119 views16 pages

Image Restoration Via Frequency Selection

AI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 46, NO.

2, FEBRUARY 2024 1093

Image Restoration via Frequency Selection


Yuning Cui , Wenqi Ren , Member, IEEE, Xiaochun Cao , Senior Member, IEEE,
and Alois Knoll , Fellow, IEEE

Abstract—Image restoration aims to reconstruct the latent sharp Recently, deep neural networks have witnessed the rapid
image from its corrupted counterpart. Besides dealing with this development of image restoration and obtained favorable per-
long-standing task in the spatial domain, a few approaches seek formance compared to conventional methods. A flurry of con-
solutions in the frequency domain by considering the large dis-
crepancy between spectra of sharp/degraded image pairs. How- volutional neural networks (CNN) based methods have been
ever, these algorithms commonly utilize transformation tools, e.g., developed for diverse image restoration tasks by inventing or
wavelet transform, to split features into several frequency parts, borrowing advanced modules from other domains, e.g., dilated
which is not flexible enough to select the most informative frequency convolution [4], [5], U-Net [6], residual learning [7], multi-stage
component to recover. In this paper, we exploit a multi-branch
pipeline [8], and attention mechanisms [9], [10], [11], [12].
and content-aware module to decompose features into separate
frequency subbands dynamically and locally, and then accentuate However, with convolution units, these methods have limited
the useful ones via channel-wise attention weights. In addition, to receptive fields, and thus they cannot capture long-range depen-
handle large-scale degradation blurs, we propose an extremely sim- dencies. This requirement is essential for restoration tasks since
ple decoupling and modulation module to enlarge the receptive field a single pixel needs information from its surrounding region
via global and window-based average pooling. Furthermore, we to be recovered. More recently, many researchers have tailored
merge the paradigm of multi-stage networks into a single U-shaped
network to pursue multi-scale receptive fields and improve effi- Transformer [13] for image restoration tasks, such as image
ciency. Finally, integrating the above designs into a convolutional motion deblurring [14] and image dehazing [15], [16].
backbone, the proposed Frequency Selection Network (FSNet) per- Nevertheless, the above mentioned approaches mainly per-
forms favorably against state-of-the-art algorithms on 20 different form restoration in the spatial domain, which do not sufficiently
benchmark datasets for 6 representative image restoration tasks, leverage frequency discrepancies between sharp/degraded im-
including single-image defocus deblurring, image dehazing, image
motion deblurring, image desnowing, image deraining, and image age pairs. To this end, a few works utilize the transformation
denoising. tools, e.g., wavelet transform or Fourier transform, to decom-
pose features into different frequency components and then
Index Terms—Frequency selection, image restoration, multi-
scale learning.
treat separate parts individually to reconstruct the corresponding
feature [5], [17], [18], [19]. Nevertheless, wavelet transform
I. INTRODUCTION decouples the feature map into different subbands in a fixed
MAGE restoration aims to recover a high-quality image by manner, and thus it is not capable of distinguishing the most
I removing degradations, e.g., noise, blur, and snowflakes. In
view of its important role in surveillance, self-driving tech-
informative or useless frequency components to enhance or
suppress. In addition, these methods need corresponding inverse
niques, and remote sensing, image restoration has gathered con- Fourier/wavelet transform, leading to additional computation
siderable attention from industrial and academic communities. overhead.
However, due to its ill-posed property, many conventional ap- To overcome the above drawbacks and select the most in-
proaches address this problem based on various assumptions [1] formative frequency component to reconstruct, we propose a
or hand-crafted features [2], which are incapable of generating novel decoupling and recalibration module for image restoration
faithful results in real-world scenarios [3]. tasks, named Multi-branch Dynamic Selective Frequency mod-
ule (MDSF). Specifically, we utilize the multi-branch learnable
Manuscript received 31 March 2023; revised 4 September 2023; accepted filters to generate high- and low-frequency maps dynamically
20 October 2023. Date of publication 6 November 2023; date of current and locally. We then leverage the channel-wise attention mech-
version 8 January 2024. This work was supported in part by the National
Natural Science Foundation of China under Grants 62322216, 62025604, and
anism to emphasize or attenuate the resulting frequency com-
62172409, and in part by the Shenzhen Science and Technology Program under ponents. Our module has two key advantages. First, according
Grants 20220016, RCYX20221008092849068, and JCYJ20220530145209022. to the input and task, the decoupling step dynamically generates
Recommended for acceptance by D. Meng. (Corresponding author: Wenqi Ren.)
Yuning Cui and Alois Knoll are with the School of Computation, Information
filters to decompose feature maps. Second, our module does not
and Technology, Technical University of Munich, 85748 Munich, Germany introduce extra inverse transform.
(e-mail: [email protected]; [email protected]). Receptive field is another critical factor for image restoration
Wenqi Ren and Xiaochun Cao are with the School of Cyber Sci-
ence and Technology, Shenzhen Campus of Sun Yat-sen University, Shen-
tasks due to the various sizes of degradation blurs [26], [27].
zhen, Guangdong 518107, China (e-mail: [email protected]; caoxi- To complement the above dynamic module, MDSF, that pro-
[email protected]). cesses features locally, we further propose a simple yet effective
This article has supplementary downloadable material available at
https://doi.org/10.1109/TPAMI.2023.3330416, provided by the authors.
module, dubbed Multi-branch Compact Selective Frequency
Digital Object Identifier 10.1109/TPAMI.2023.3330416 module (MCSF), to enhance the helpful frequency signals based

0162-8828 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Ritsumeikan University. Downloaded on June 24,2024 at 03:34:40 UTC from IEEE Xplore. Restrictions apply.
1094 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 46, NO. 2, FEBRUARY 2024

TABLE I
COMPARISONS BETWEEN SFNET [24] AND FSNET AMONG 7 REPRESENTATIVE DATASETS FOR 5 TASKS

Fig. 1. Comparisons between the proposed models and other state-of-the-art algorithms. (a): PSNR versus FLOPs on SOTS-Indoor [20] for image dehazing;
(b): PSNR versus parameters on CSD [21] for image desnowing; (c) PSNR versus parameters on RSBlur [22] for image motion deblurring; and (d) PSNR versus
parameters on DPDD [23] for single-image defocus deblurring.

on multiple and relatively global receptive fields. Specifically, This study is an extension of the conference paper [24]. The
we utilize global and window-based average pooling techniques main improvements over the preliminary version are: i) We
to attain disparate frequency maps, and then use learnable pa- introduce a small U-Net into a large U-shaped network to provide
rameters to modulate the resulting maps without resorting to any multi-scale representation learning for the highest-resolution
convolution layers. Compared to MDSF, besides the enlarged features. This modification improves efficiency while achieving
receptive fields, MCSF is lightweight enough to be embedded comparable or better performance (Table I). For example, our
in multiple positions of the backbone. model obtains 0.02 dB PSNR improvement on the GoPro [25]
In addition, inspired by the multi-stage networks for image dataset with 11% fewer FLOPs over SFNet [24] and runs
restoration [28], [29], we embed a small U-Net into the first faster. Moreover, FSNet achieves 40.40 dB PSNR on SOTS-
scale1 of our model to improve efficiency while obtaining com- Outdoor [20], which is 0.35 dB higher than that of SFNet [24].
parable or better performance. The main contributions of this ii) We discuss the influence of computational complexity for
study are summarized as follows: image dehazing/desnowing by providing two versions of our
r We propose a multi-branch dynamic selective frequency model, i.e., FSNet-S and FSNet. As illustrated in Fig. 1(a),
module (MDSF) capable of dynamically decoupling fea- the small version, FSNet-S, achieves the state-of-the-art result
ture maps into different frequency components via the the- with lower complexity on SOTS-Indoor [20]. FSNet further
oretically proved filters, and selecting the most informative improves the performance to 42.45 dB PSNR, which is 1.21 dB
components to recover. higher than SFNet [24]. iii) Our model is evaluated on more
r We develop a multi-branch compact selective frequency datasets. Specifically, we provide additional experimental results
module (MCSF) that performs frequency decoupling and on Haze4K [30], NHR [31] and NH-HAZE [32] for image
recalibration using multi-scale average pooling operations dehazing and RealBlur [33] for image motion deblurring. Fur-
to pursue a large receptive field for large-scale degradation thermore, the model is extended to the image denoising task and
blurs. shows the strong capability for noise removal.
r We merge a small U-Net into the large U-shaped network
to provide multi-scale representation learning (MSL) for
II. RELATED WORK
the highest-resolution features and improve efficiency.
r Incorporating the above designs into a convolutional back- Image Restoration: Images captured with low-end cameras
bone, the proposed frequency selection network (FSNet) or in bad weather will degrade the visibility and impact the
performs favorably against state-of-the-art algorithms on robustness of downstream tasks. In this regard, image restoration
20 datasets for 6 image restoration tasks, including image is beneficial to recover a sharp image from those undesired
defocus/motion deblurring, dehazing, deraining, desnow- degradations, e.g., haze, snowflakes, noise, and rain streaks.
ing, and denoising. Because of its ill-posed nature, many conventional methods have
been proposed based on various assumptions and hand-designed
features to get plausible solutions [34], [35], [36]. However,
1 In this study, the scale with a smaller number involves larger resolution these methods are not robust enough for more complicated
features. real-world scenarios [3].

Authorized licensed use limited to: Ritsumeikan University. Downloaded on June 24,2024 at 03:34:40 UTC from IEEE Xplore. Restrictions apply.
CUI et al.: IMAGE RESTORATION VIA FREQUENCY SELECTION 1095

In recent years, with the rapid development of convolutional beyond regional ranges. Guo et al. [56] designed dilated self-
neural networks (CNNs), many pioneering deep frameworks attention to model long-range dependence for image denoising.
have been proposed and made significant advances in image These attention mechanisms mainly focus on the spatial domain.
restoration tasks, e.g., image motion deblurring [28], [37], [38], Our frequency mechanism is based on attention mechanisms to
[39], defocus deblurring [23], [27], [40], desnowing [21], [41], emphasize informative frequency signals.
dehazing [9], [42], [43], [44], and deraining [1], [45], pro- Frequency Learning in Image Restoration: According to the
ducing more favorable performance than those conventional spectral convolution theorem, fast Fourier transform (FFT) can
methods. Ren et al. [43] presented one of the first learning-based be used to model global information [57]. Moreover, high-
methods for image dehazing by learning the mapping between frequency signals represent image details and textures, while
hazy inputs and their corresponding transmission maps. Nah et low-frequency represents smooth and flat areas. Thus it is easy to
al. [25] proposed a multi-scale CNN to restore sharp images treat different frequency subbands individually in the frequency
in an end-to-end manner for blurs caused by various sources. domain. Considering these merits, a few deep frameworks have
Thereafter, a flurry of advanced CNN-based functional units been proposed for image restoration in the spectral domain.
have been proposed or borrowed from other realms. Among Specifically, Mao et al. [19] used Fourier transform to integrate
these designs, the encoder-decoder pipeline [6] is a popular both high- and low-frequency residual information for motion
solution to learning hierarchical representations efficiently for deblurring. Guo et al. [56] proposed a window-based frequency
image restoration. Similarly, skip connection based methods channel attention based on FFT to model global information
have been proven effective for residual signals learning [46]. Di- and keep the model consistency between training and inference
lated convolution has been introduced to provide large receptive phases. Li et al. [58] embedded Fourier into the model to address
fields [5]. Furthermore, various attention mechanisms have also low-light image enhancement by processing the amplitude and
been incorporated to focus on relevant information [28], [47]. phase separately. FFT can also be used to design loss functions
Nonetheless, the basic element of these CNN-based methods, to enrich high-frequency details [24], [38], [46], [59].
i.e., convolution operator, has two issues that are not applicable In addition, there are works using wavelet transform for image
to image restoration. First, limited receptive fields of convolution restoration. Chen et al. [21] proposed a hierarchical desnowing
impede the pursing of long-range dependencies. Second, after network based on dual-tree complex wavelet representation [17].
training, convolution filters have fixed parameters, which are not Yang et al. [18] developed the wavelet-based U-Net to replace
flexible enough to manage non-uniform blurs. up-sampling and down-sampling. Zou et al. [5] utilized a wavelet
To eliminate the above issues, Transformer [13] models have transform-based module to help recover texture details. Yang et
been introduced into low-level tasks [14], [48], [49], [50], [51], al. [60] devised a wavelet structure similarity loss function for
[52] and yield promising performance for image restoration. training. Moreover, a few algorithms adopted other techniques
Chen et al. [48] proposed an image processing transformer to ex- to generate different frequency signals, such as convolutional
cavate the capability of Transformer for low-level tasks by train- layer [61] and conventional filter [62].
ing on synthesized ImageNet [53]. Nevertheless, the quadratic However, the above methods focus more on the post-
complexity of self-attention makes these methods expensive for processing stage, while ignoring the frequency separation pro-
image restoration, which always involves high-resolution inputs. cess, which is vital to accurately generate the frequency that
Recently, a few modifications have been proposed to improve matters to restoration and avoid amplifying the harmful fre-
efficiency of Transformer-based methods. Liang et al. [49] and quency when enhancing the useful subbands. In this study, we
Wang et al. [51] established transformer-based architectures on dynamically decompose input features into different frequencies
window-based self-attention. Zamir et al. [50] presented trans- and utilize simple attention weights to highlight informative
posed attention that implements self-attention across channels frequencies.
rather than the spatial dimension. However, reducing computa-
tional complexity of self-attention is still intractable for practical III. PROPOSED METHOD
applications. In this study, instead of exploring advanced modifi-
A schematic of the proposed FSNet is depicted in Fig. 2. In
cation for Transformer, we propose to address image restoration
this section, we first present an overall pipeline of our FSNet. We
from the frequency perspective and apply our modules to a
then delineate details of our designs: MSL, MDSF, and MCSF.
CNN-based backbone for high efficiency.
Finally, we introduce the used loss functions.
Attention Mechanisms in Image Restoration: Inspired by their
successful applications in high-level tasks, attention mecha-
nisms have been introduced into low-level tasks to selectively A. Overall Architecture
attend to important information. Qin et al. [47] proposed a fea- In this part, we first introduce the architecture of FSNet,
ture attention module that combines pixel attention and channel and then detail the pipeline. FSNet adopts the encoder-decoder
attention for image dehazing. Zamir et al. [28] developed a architecture to learn hierarchical representations. Specifically,
supervised attention module to control information transmission FSNet consists of a three-scale decoder and a three-scale en-
between two stages in a multi-stage framework. Chen et al. [54] coder. The first scales of both encoder and decoder are comprised
introduced simplified channel attention and a simple gate to of SUNet, a small UNet [6] to provide multi-scale learning
simplify the baseline network. Recently, Li et al. [55] proposed and reduce complexity. Other scales are mainly composed of
anchored stripe self-attention to efficiently model dependencies ResBlock (Fig. 2(c)). MDSF is only deployed in the last residual

Authorized licensed use limited to: Ritsumeikan University. Downloaded on June 24,2024 at 03:34:40 UTC from IEEE Xplore. Restrictions apply.
1096 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 46, NO. 2, FEBRUARY 2024

Fig. 2. (a) Overall architecture of the proposed FSNet. (b) Shallow Layer extracts shallow features for low-resolution input images. (c) ResBlock contains N + 1
residual-type blocks that accommodate the proposed modules: MDSF (Decoupler (d) and Modulator(e)) and MCSF. MDSF is shown in the one-branch case for
clarity. Invert depicts the operation of subtracting the low-pass filter from the identity filter. SUNet represents a small U-Net inserted into the first scales of both
encoder and decoder.

block of ResBlock/SUNet, while our lightweight MCSF exists


in all residual-type blocks. Following previous methods [19],
[46], multi-input and multi-output mechanisms are used to ease
training difficulty.
Given a degraded image of shape H × W × 3, where 3 is the
number of channels and H × W denotes spatial coordinates, a
3 × 3 convolution layer is used to extract shallow layers with the
size of H × W × C. Then these features pass through the three- Fig. 3. Architecture of SUNet. Features are reduced to a quarter of the
input resolution to provide multi-scale learning and improve efficiency. Colored
scale encoder sub-network to generate in-depth features. During arrows represent downsampling/upsampling. MRes denotes the residual block
this process, the encoder gradually reduces the resolution while with our MCSF while MMRes represents the last residual block including our
expanding the channels. In addition, low-resolution degraded MDSF and MCSF.
images are merged into the main path via the Shallow Layer
(Fig. 2(b)) and concatenation, followed by 3 × 3 convolution to
adjust channels. Next, the resulting deepest features are fed into multi-scale representation learning and improves efficiency by
a three-scale decoder to yield sharp features by progressively reducing the resolution of features. The architecture of SUNet
restoring features to the original size. In this process, the de- is illustrated in Fig. 3. The deployment manner of our modules,
coder features are concatenated with encoder features to assist i.e., MDSF and MCSF, are identical to that in ResBlock, where
reconstruction and a 1 × 1 convolution is used to reduce the MCSF is used in all residual-type blocks, while MDSF only
number of channels by half. Three restored images are produced exists in the last one. SUNet has n residual-type blocks, which
after two ResBlock and the final SUNet via 3 × 3 convolutions is the same as ResBlock, i.e., n = N + 1 (Fig. 2(c)). Given
and image-level skip connections. Two low-resolution results input features of size H × W × C, n4 blocks are first used to
are only used for training. In Fig. 2(a), we only showcase the extract features, and then the resolution of resulting features are
top-level image skip connection for brevity. The up-sampling reduced to H W n
2 × 2 . After passing through 2 blocks, features
and down-sampling layers are implemented by transposed and are up-sampled to the input size for the subsequent process.
strided convolutions, respectively. The down-sampling and up-sampling are realized by depth-wise
convolution (stride = 2, kernel = 2) and bilinear interpolation,
respectively.
B. Multi-Scale Learning
Inspired by the multi-stage networks, we insert a small
C. Multi-Branch Dynamic Selective Frequency Module
U-Net (SUNet) into the first scales of the encoder and de-
coder, which involve the highest-resolution features. Besides the To select the informative frequency component to reconstruct,
adopted large U-shaped backbone, our SUNet provides further MDSF mainly contains two elements: frequency decoupler
Authorized licensed use limited to: Ritsumeikan University. Downloaded on June 24,2024 at 03:34:40 UTC from IEEE Xplore. Restrictions apply.
CUI et al.: IMAGE RESTORATION VIA FREQUENCY SELECTION 1097

(Fig. 2(d)) and modulator (Fig. 2(e)). Decoupler dynamically layers; [·, ·] denotes concatenation; and c is the channel index of
decomposes features into separate frequency parts based on concatenated features. Then, the final weights can be obtained
learned filters, and then the modulator utilizes channel-wise at- by split operation.
tention to accentuate the useful frequency. Additionally, MDSF Based on the above one-branch case, the multiple branches
splits features among the channel dimension to provide various with varied filter sizes can be express as:
local receptive fields, then applies different filter sizes to separate
X̂ = [M1 (D1 (X1 )), . . ., Mm (Dm (Xm ))] (6)
parts. For simplicity, we only show the one-branch case in
Fig. 2(d). where D and M denote decoupler and modulator, respectively,
To dynamically decompose feature maps, we utilize the learn- and Xm is the evenly split feature.
able and theoretically proven low-pass filter (refer to Appendix
for the proof, available online) and the corresponding high-pass D. Multi-Branch Compact Selective Frequency Module
filter to generate low- and high-frequency maps. The learned
Since receptive field plays a critical role in image restoration,
filters are shared across the group dimension to balance com-
where degradation blurs always differ in size [19], [27], we
plexity and feature diversity. Specifically, given any feature map
develop MCSF to efficiently enlarge the receptive field of FSNet.
X ∈ RH×W ×C , where H × W denotes the spatial dimension
MCSF has two branches with different receptive fields, i.e.,
and C is the number of channels, we first leverage the filter-
the global branch and the window-based branch. Considering
generating layer to produce the low-pass filter for each group of
these branches share a similar paradigm, we only detail the
the input, formulated as
window-based one, which is inspired by the idea of window-
F L = Softmax(BN(W (GAP(X)))) (1) based attention [63].
C
L 1×1×gk2 g×k×k Specifically, given the split feature X ∈ RH×W × 2 , it is par-
where F is reshaped from R to R , k × k is the
titioned into four windows, each with the size of H2 × W 2 × 2.
C
kernel size of low-pass filters, g denotes the number of groups,
To get the low-frequency part, global average pooling is applied
BN, W , and GAP are Batch Normalization, the parameters
to the resulting windows. The corresponding high-frequency
of convolution, and global average pooling, respectively. The
part can be obtained by subtracting the low-frequency map from
Softmax function is imposed on each k × k filter. The group-
the partitioned feature. To select the useful frequency subbands,
based operation has fewer parameters and lower complexity
we rescale these two maps by learnable channel-wise weights,
than generating filters for each pixel. The number of groups
which are directly optimized by backpropagation. Finally, the
is discussed in Section IV-C. To attain the high-pass filter, we
updated frequency maps are reversed to the original resolution.
subtract the resulting low-pass filter from the identity kernel
The global branch has a similar pipeline, yet with a global
with the central value as one and everywhere else as zero. Next,
receptive field.
for each group feature Xi ∈ RH×W ×Ci , where i is the group
Compared to MDSF, besides the enlarged receptive field,
index and Ci = Cg , its low- and high-frequency components
MCSF does not accomplish frequency decoupling and modulat-
can be obtained by using the corresponding filter F L and F H ing with convolution layers, resulting in fewer parameters and
(∈ Rg×k×k ), which is expressed as: lower complexity (see Table XIV for details). Hence, MCSF can

l L be embedded in multiple positions.
Xi,h,w,c = Fi,p,q Xi,h+p,w+q,c (2)
p,q
 E. Loss Function
h H
Xi,h,w,c = Fi,p,q Xi,h+p,w+q,c (3)
p,q
To facilitate the frequency selection process, we adopt L1 loss
in both spatial and frequency domains:
where c is the index of a channel, h and w denote spatial
3
coordinates; and p, q ∈ {−1, 0, 1}. 1 ˆ
Lspatial = Ys − Ys 1 (7)
After decoupling the feature map into different frequency
s=1
Es
components, we leverage the frequency modulator to emphasize
3
the genuinely useful part for reconstruction, as illustrated in 1
Fig. 2(e). Formally, given two frequency maps, X l and X h , Lfrequency = F(Yˆs ) − F(Ys )1 (8)
s=1
E s
we first generate the fused feature by,
where s denotes the index of input/output images of different
Z = Wf c (GAP(X l + X h )) (4)
scales; F represents fast Fourier transform; Es is the number
where Wf c is the parameters of a fully connected layer. To attain of elements for normalization; and Yˆs , Ys are output and target
channel-wise weights, we use two other fully connected layers images, respectively. The final loss function is given by L =
followed by concatenation and Softmax function, formulated as: Lspatial + λLfrequency , where λ is set as 0.1.
e[Wl (Z),Wh (Z)]c
[W l , W h ]c = 2 C (5) IV. EXPERIMENTS
[Wl (Z),Wh (Z)]j
j e
In this section, we perform quantitative and qualitative as-
where W l and W h are channel-wise attention weights for two sessments of the results produced by the proposed FSNet and
frequency parts; Wl and Wh are parameters of fully connected other state-of-the-art methods. We first describe implementation

Authorized licensed use limited to: Ritsumeikan University. Downloaded on June 24,2024 at 03:34:40 UTC from IEEE Xplore. Restrictions apply.
1098 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 46, NO. 2, FEBRUARY 2024

details and universal hyper-parameters. Then we individually TABLE II


IMAGE DEHAZING RESULTS ON RESIDE/SOTS [20] DATASET
introduce the datasets, specific training setups, and results for
six tasks: image hazing, image defocus deblurring, image mo-
tion deblurring, image deraining, image desnowing, and image
denoising. The best and second-best results in tables are bold
and underlined, respectively.

A. Experimental Setup
1) Implementation Details: We train separate models for
different tasks. Unless specified otherwise, the following pa-
rameters are adopted. The batch size is set as 4 with patch
size of 256 × 256. Each patch is randomly flipped horizontally
for data augmentation with a probability of 0.5. The initial
learning rate is 1e−4 and gradually reduced to 1e−6 with the
cosine annealing [64]. Adam (β1 = 0.9, β2 = 0.999) is used and corresponding ground-truth images. The training strategy
for training. N (see Fig. 2(c)) is set to 3 and 15 for FSNet-S follows that of [40].
and FSNet, respectively. Due to the enormous complexity of Image Motion Deblurring: Consistent with recent meth-
other tasks, we only evaluate FSNet-S on image dehazing and ods [28], [51], we train FSNet using the GoPro dataset [25],
desnowing. MDSF has two branches with filter kernel sizes of which consists of 2103 blurry/sharp image pairs for training and
3 × 3 and 5 × 5 and the number of groups is 8. We use PyTorch to 1111 pairs for evaluation. To validate the generalization capacity
implement our models on an NVIDIA Tesla V100 GPU. FLOPs of our method, we directly apply the GoPro-trained model to
are computed on patch size of 256 × 256. the HIDE dataset [67], which contains 2025 image pairs for
2) Datasets: In this part, we introduce the used datasets and evaluation. The images in GoPro and HIDE are both generated
specify training configurations. synthetically. To evaluate the performance of our method on
Image Dehazing: We evaluate our models on both daytime real-world images, we further assess on the RSBlur [22] dataset.
and nighttime datasets. For daytime scenes, both synthetic It has 8878 and 3360 image pairs for training and evaluation, re-
(RESIDE [20], Haze4K [30]) and real-world datasets (NH- spectively. We train our networks on GoPro and RSBlur for 3000
HAZE [32], Dense-Haze [65]) are used for evaluation. RESIDE and 710 epochs, respectively. In addition, we directly apply the
consists of two training subsets, indoor training set (ITS) and GoPro-trained model to another widely used real-world dataset
outdoor training set (OTS), and a synthetic objective testing RealBlur-R [33], which has 980 paired images for evaluation.
set (SOTS). We train our models on ITS and OTS separately, Image Deraining: Following prior arts [68], [69], we lever-
and test on the corresponding test sets, i.e., SOTS-Indoor and age a composite training dataset containing 13712 image pairs
SOTS-Outdoor. Models are trained for 1000 epochs on ITS collected from several datasets [70], [71], [72], [73]. FSNet
and 30 for OTS, respectively, and the batch size is set as 8. is evaluated on five datasets: Rain100H [71], Rain100L [71],
The initial learning rate is 4e−4 . We further train FSNet on the Test100 [72], Test1200 [74], and Test2800 [70]. The network is
more realistically synthesized dataset Haze4K [30]. The model trained for 300 epochs.
is trained for 1000 epochs with the batch size as 8 and the initial Image Desnowing: We adopt CSD [21], SRRS [41], and
learning rate as 4e−4 . Moreover, we also include real-world Snow100K [75] datasets for the desnowing task. The dataset
datasets Dense-Haze [65] and NH-HAZE [32] for evaluation, settings follow previous works [24], [41], where we randomly
both of which contain 55 paired images. NH-HAZE contains sample 2500 image pairs from the training set for training and
nonhomogeneous haze scenes, whereas Dense-Haze contains 2000 images from the testing set for evaluation. Models are
homogeneous and dense hazy scenes. The model is trained for trained for 800 epochs on each dataset.
5000 epochs following the previous method [15] with patch size Image Denoising: Following [49], [50], we train the proposed
as 600 × 800. The initial learning rate and batch size are set to model on a compound dataset for image denoising. The noisy
2e−4 and 2, respectively. In addition, we evaluate our models images are produced by adding additive white Gaussian noise
on the nighttime dataset NHR [31], which contains 16146 and with different levels (σ ∈ {15, 25, 50}) to sharp images. The
1794 image pairs for training and testing. Models are trained for trained model is evaluated on the BSD68 [76] dataset. We train
300 epochs with batch size of 8. separate models for 300 epochs for different noise levels with
Single-Image Defocus Deblurring: We use DPDD [23] to the batch size as 8.
verify the effectiveness of our method following recent meth-
ods [40], [50], [66]. This dataset contains images in 500 in-
B. Experimental Results
door/outdoor scenes, each with four images, labeled as right
view, left view, center view, and the all-in-focus ground truth. 1) Image Dehazing: The quantitative results on the synthetic
DPDD is split into training, validation, and testing sets with RESIDE [20] dataset are shown in Table II. Our FSNet and
350, 74, and 76 scenes. FSNet is trained by taking center view FSNet-S obtain the highest and second-highest scores on all
images as input and computing loss values between outputs metrics, respectively. Particularly on the outdoor scene, FSNet

Authorized licensed use limited to: Ritsumeikan University. Downloaded on June 24,2024 at 03:34:40 UTC from IEEE Xplore. Restrictions apply.
CUI et al.: IMAGE RESTORATION VIA FREQUENCY SELECTION 1099

Fig. 4. Visual comparisons on the SOTS [20] test set. The top image is obtained from SOTS-Indoor while the bottom one is from SOTS-Outdoor.

TABLE III TABLE V


IMAGE DEHAZING RESULTS ON THE HAZE4K [30] DATASET NIGHTTIME DEHAZING RESULTS ON THE NHR [31] DATASET

TABLE IV
IMAGE DEHAZING RESULS ON REAL-WORLD DATASETS

Fig. 5. Image dehazing comparisons on the Haze4K [30] dataset.

on the nighttime dehazing dataset, our two versions significantly


outperform other algorithms, as shown in Table V. The dehazed
results in Figs. 4, 5, and 6 illustrate that our method is more
generates a substantial gain of 5.22 dB PSNR over DeHamer [15] effective in removing haze than other frameworks.
with only 10% parameters. Compared to DehazeFormer-L [16], 2) Single-Image Defocus Deblurring: Table VI shows nu-
FSNet-S receives 0.42 dB higher PSNR on the SOTS-Indoor test merical comparisons between defocus deblurring methods on
set with 88% lower complexity. Our FSNet achieves 42.45 dB DPDD [23]. Our FSNet surpasses other state-of-the-art methods
PSNR on SOTS-Indoor, which is much higher than other state- on most metrics. Particularly on the combined scene, FSNet
of-the-art methods with comparable FLOPs. Furthermore, the obtains 0.24 dB PSNR improvement compared to the recent
results on the more realistically synthesized dataset Haze4K [30] Transformer-based method Restormer [50] with only 51% pa-
are shown in Table III. Our FSNet significantly outperforms rameters, as illustrated in Fig. 1(d). In addition, our method
PMNet [78] by 0.63 dB PSNR and 0.01 SSIM with 30% fewer provides a significant gain of 0.5 dB over the CNN-based
parameters. DRBNet [40]. The visual results in Fig. 7 show that our method
In addition, we validate the performance of our approach on recovers more details than other algorithms.
the real hazy datasets of Dense-Haze [65] and NH-HAZE [32]. 3) Image Motion Deblurring: We evaluate our method on
The results are shown in Table IV. As we can see, our FSNet- both synthetic (GoPro [25], HIDE [67]) and real-world (RS-
S performs favorably against state-of-the-art algorithms on the Blur [22], RealBlur [33]) datasets. The numerical compar-
real-world dehazing problem, receiving a gain of 0.21 dB PSNR isons on GoPro [25] and HIDE [67] are reported in Table
and 0.14 SSIM over PMNet [78] on Dense-Haze. Furthermore, VII. On GoPro, FSNet shows 0.37 dB PSNR performance

Authorized licensed use limited to: Ritsumeikan University. Downloaded on June 24,2024 at 03:34:40 UTC from IEEE Xplore. Restrictions apply.
1100 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 46, NO. 2, FEBRUARY 2024

TABLE VI
QUANTITATIVE COMPARISONS WITH PREVIOUS LEADING SINGLE-IMAGE DEFOCUS DEBLURRING METHODS ON DPDD TEST SET [23]

TABLE VIII
OVERALL COMPARISONS BETWEEN MOTION DEBLURRING METHODS ON THE
GOPRO [25] TEST SET

TABLE IX
IMAGE MOTION DEBLURRING RESULTS ON THE RSBLUR DATASET [22]
Fig. 6. Nighttime dehazing comparisons on the NHR [31] dataset.

TABLE VII
IMAGE MOTION DEBLURRING RESULTS ON GOPRO [25] AND HIDE [67]
DATASETS

TABLE X
IMAGE MOTION DEBLURRING RESULTS ON THE REAL-WORLD
REALBLUR-R [33]

improvement over Restormer [50] with 3.4× faster inference real-world scenes, following [69], we directly apply the GoPro-
speed (Table VIII). Moreover, our method receives a perfor- trained model to another real-world dataset RealBlur-R [33]. The
mance gain of 0.21 dB compared to Stripformer [14]. Notably, results are presented in Table X. Compared to the MLP-based
our FSNet demonstrates stronger generalization capability to method MAXIM-3S [69], our model provides a performance
the HIDE dataset than Stripformer on all metrics. In addition to gain of 0.06 dB PSNR and 0.005 SSIM with 38% fewer param-
the synthetic datasets, we further evaluate the effectiveness of eters and 34% lower complexity. Figs. 8, 9, 10, and 11 illustrate
our network on the real-world dataset. Table IX shows quanti- the visual results on the GoPro, HIDE, RSBlur, and RealBlur-R
tative comparisons on the newly proposed RSBlur [22] dataset. datasets, respectively. The results demonstrate that the proposed
FSNet obtains the state-of-the-art performance on this dataset FSNet produces more visually pleasant results than comparable
with fewer parameters (see Fig. 1(c)). Specifically, our model algorithms.
provides a substantial gain of 0.33 dB PSNR over the strong 4) Image Deraining: Following recent works [28], [68],
Transformer-based algorithm Uformer-B [51]. Additionally, for we compare PSNR/SSIM scores on the Y channel in YCbCr

Authorized licensed use limited to: Ritsumeikan University. Downloaded on June 24,2024 at 03:34:40 UTC from IEEE Xplore. Restrictions apply.
CUI et al.: IMAGE RESTORATION VIA FREQUENCY SELECTION 1101

Fig. 7. Single-image defocus deblurring results on the DPDD dataset [23]. FSNet is more effective in removing defocus blur than other approaches.

Fig. 8. Image motion deblurring comparisons on the GoPro [25] dataset. Our method recovers more fine details from the blurry images.

Fig. 9. Image motion deblurring comparisons on the HIDE [67] dataset. The image produced by our method is visually closer to ground truth.

color space. Table XI shows that our method performs favor- further performance boost in terms of PSNR on all datasets.
ably against other deraining approaches. Specifically, on the Specifically, FSNet outperforms TransWeather [97] by 6.61 dB
Rain100H dataset [71], the proposed FSNet obtains a perfor- PSNR on the CSD [21] dataset. The performance gain produced
mance boost of 1.12 dB PSNR over HINet [89]. Visual results by deepening the network is remarkable for CSD due to its more
in Fig. 12 illustrate that our model produces faithful images complicated snowy scenes. The visual results in Fig. 13 show
without artifacts. that our method is more effective in removing spatially varying
5) Image Desnowing: We evaluate our models on CSD [98], snowflakes than competitors.
SRRS [41], and Snow100K [75] datasets for image desnowing. 6) Image Denoising: The Gaussian grayscale image denois-
As shown in Table XII, our FSNet-S performs better than other ing results on the BSD68 [76] dataset are presented in Table XIII.
state-of-the-art algorithms. The large version FSNet obtains As seen, the proposed FSNet outperforms Restormer [50] for

Authorized licensed use limited to: Ritsumeikan University. Downloaded on June 24,2024 at 03:34:40 UTC from IEEE Xplore. Restrictions apply.
1102 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 46, NO. 2, FEBRUARY 2024

Fig. 10. Visual comparisons between motion deblurring algorithms on the real-world RSBlur [22]. Our model recovers more structural details than other methods.

TABLE XI
IMAGE DERAINING RESULTS ON FIVE DERAINING DATASETS: RAIN100H [71], RAIN100L [71], TEST100 [72], TEST1200 [74] AND TEST2800 [70]

TABLE XIII
GAUSSIAN GRAYSCALE IMAGE DENOISING COMPARISONS ON THE BSD68 [76]
DATASET

Fig. 11. Image motion deblurring comparisons on RealBlur-R [33].

TABLE XII
IMAGE DESNOWING RESULTS ON THREE WIDELY USED DATASETS

C. Ablation Studies
In this section, we first demonstrate the effectiveness of the
proposed modules, and then investigate the effects of different
designs for each module. Finally, we delve into the mechanism
of MDCF to demonstrate its validity. Unless specified otherwise,
the models are trained on the GoPro [25] dataset for 1000
epochs, and N is set to 7 in Fig. 2. The baseline network is
obtained by removing MDSF, MCSF, and MSL from our model.
Other training settings are identical to that of the final motion
deblurring model.
all noise levels with lower complexity. Furthermore, compared Effectiveness of each module: Table XIV(a) shows that the
to SwinIR [49], FSNet yields a 0.07 dB gain for σ = 50 with baseline model obtains 31.20 dB PSNR. MDSF (Table XIV(b))
85% fewer FLOPs. The visual comparisons in Fig. 14 illustrate and MCSF (Table XIV(c)) yield performance gains of 0.22 dB
that our results are much closer to the ground-truth images. The and 0.25 dB over the baseline model with low introduced com-
results verify the effectiveness of our method. plexity. Deployed only in a single position in each scale, MDSF

Authorized licensed use limited to: Ritsumeikan University. Downloaded on June 24,2024 at 03:34:40 UTC from IEEE Xplore. Restrictions apply.
CUI et al.: IMAGE RESTORATION VIA FREQUENCY SELECTION 1103

Fig. 12. Visual comparisons between image deraining approaches on the Rain100H [71] dataset. Our method produces more faithful images without artifacts.

Fig. 13. Image desnowing comparisons on CSD [21]. Our methods remove more snowy blurs while preserving fine texture and structural patterns.

TABLE XIV
ABLATION STUDIES OF EACH MODULE

Fig. 14. Gaussian denoising results of gray images on the BSD68 [76] dataset Fig. 15. Variance difference between the ground truth and input/results of
with the noise level as 50. several methods for motion deblurring (Left) and dehazing (Right). Dehazing
models are trained on OTS and tested on SOTS-Outdoor [20].

performs similarly to MCSF, demonstrating the effectiveness


of the dynamic frequency selection mechanism. Employing
both MDSF and MCSF, the model obtains 31.68 dB PSNR In addition, in Fig. 15, we plot the variance difference between
(Table XIV(d)). When equipped with MSL (Table XIV(e)), the the ground truth and results of three methods on motion de-
model achieves a further performance gain of 0.07 dB PSNR blurring (GoPro [25]) and dehazing (SOTS-Outdoor [20]). The
with 11% fewer FLOPs, verifying the efficacy of our multi-scale results are obtained by computing the average on 100 images
learning mechanism. randomly sampled from the test sets. Our results are produced

Authorized licensed use limited to: Ritsumeikan University. Downloaded on June 24,2024 at 03:34:40 UTC from IEEE Xplore. Restrictions apply.
1104 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 46, NO. 2, FEBRUARY 2024

TABLE XV TABLE XVIII


ABLATION STUDIES FOR THE NUMBER OF MCSF RESULTS OF ALTERNATIVES TO MDSF

TABLE XVI
ABLATION STUDIES FOR MDSF

TABLE XIX
COMPARISONS BETWEEN SKNET [105] AND OUR MDSF/MODULATOR

TABLE XVII
ABLATION STUDIES FOR MSL

method (Table XVIII(a)) by using strided convolution to gener-


ate different frequency parts with reduced resolution [61]. The
Octconv (Table XVIII(b) ) version [106] shares the similar idea
with Conv, which utilizes down-sampling to reduce network
redundancy. These variants only introduce extra low-frequency
signals to the network and are inferior to our MDSF. Gaussian
(Table XVIII(c)) and Wavelet (Table XVIII(b)) produce similar
by FSNet. With the frequency selection mechanism, the statistics
results, much lower than our MDSF. Additionally, Wavelet needs
of our results are closer to that of ground truth.
more parameters to deal with its multiple branches.
Number of MCDF: We study the influence of the number of
Since our filter kernel is generated by learning, we further
MCDF in Table XV, where 2 MCSF means that we employ the
compare our MDSF with two attention approaches to verify
proposed MCSF in the last two residual blocks of each ResBlock
the validity of the proposed selection mechanism. Specifically,
in Net (c) (Table XIV). As can be seen, using more MCSF
we utilize the widely used window-based self-attention [51]
leads to the consistently increasing performance from 31.22 to
(Table XVIII(e)) and dynamic convolution [107], [108] (Table
31.45 dB PSNR while only introducing 0.01 M parameters and
XVIII(f)) to conduct comparisons. As we can see from the
0.04 G FLOPs. Due to its few introduced parameters and low
table, our method has obvious advantages over these methods,
complexity, we insert MCSF in each residual block for frequency
demonstrating the effectiveness of MDSF.
learning.
We further compare our modulator with SKNet [105] which
Number of MDSF: To understand the impact of the number
imposes the Softmax function on the channels with the same
of groups in MDSF, we conduct experiments by changing the
index in different features. Table XIX shows that our method
number of groups in Net (b) (Table XIV). The results are shown
is superior to SKNet version by 0.01 dB PSNR, suggesting the
in Table XVI. Generally, the increasing number of groups leads
efficacy of our design.
to higher PSNR, demonstrating the effectiveness of the filter
diversity. However, the accuracy saturates at group 8, which is
D. Qualitative Analyses of MDSF
probably caused by overfitting. Therefore, we finally picked 8
groups for better performance. We provide qualitative analyses of MDSF using discrete
Design choices for MSL: MSL improves efficiency and per- Fourier transform. Results are obtained from the branch of 3 × 3
formance on GoPro [25] as shown in Table XIV. To explore the filter in Net (b) (Table XIV). The input image is sampled from
potential of this mechanism, we additionally employ MSL in GoPro [25], as shown in Fig. 18. The features are obtained from
the middle scales of both the decoder and encoder. As presented the last residual block in the last ResBlock of decoder.
in Table XVII, despite lower complexity, more MSL leads to We first verify the properties of the alleged low-/high-pass
performance degradation. The reason is probably that the disad- filters in MDSF. To this end, we iteratively apply the produced
vantage of losing spatial information outweighs the advantage of filters to the image. The variance and corresponding spectral
employing multi-scale learning on smaller resolution features. features of intermediate images are provided in Fig. 16. Taking
Therefore, we deploy MSL only in the first scale of our model the low-pass filter as an example, with the increasing of iteration
for a better trade-off between performance and complexity. times, the variance of the image decreases constantly, and the
Alternatives to MDSF: To examine the advantage of our high-frequency signals in spectral features are reduced drasti-
design, we compare our decoupler with several alternatives in cally. The high-pass filter exhibits the opposite properties. These
Table XVIII. We first substitute the learning-based and fixed results demonstrate the effectiveness of our filters. It is remark-
frequency separation methods for our decoupler. We form Conv able that the high-pass filter produces large variance with fewer

Authorized licensed use limited to: Ritsumeikan University. Downloaded on June 24,2024 at 03:34:40 UTC from IEEE Xplore. Restrictions apply.
CUI et al.: IMAGE RESTORATION VIA FREQUENCY SELECTION 1105

Fig. 16. Variance and discrete Fourier transform of the resulting images as we iteratively impose the produced filters on the image. Left: The low-pass filter.
Right: The high-pass filter.

Fig. 17. Discrete Fourier transform results of group-wise features generated by our MDSF decoupler. The results are sampled from the last decoder. Top:
High-frequency. Bottom: Low-frequency.

Fig. 18. Internal features of MDSF. With our frequency selection mechanism, MDSF produces more fine details than the initial feature, e.g., the number plate.
Zoom in for the best view.

iterations, hence it is more effective than the low-pass filter. As


a result, it is easy for MDSF to introduce more high-frequency
signals into the network for reconstruction.
In MDSF, we generate different filters for each group to
enhance the diversity of frequency features. To delve into this
mechanism, we visualize the group-wise spectral features in
Fig. 17. As expected, different groups focus on learning dis-
parate low-/high-frequency signals, enriching the diversity of
Fig. 19. Our image dehazing result on the Dense-Haze [65] dataset.
frequency representations for selection. We further compare the
feature maps before and after our MDSF in Fig. 18. Using
the attained filters, the decoupler of MDSF produces different
frequency components. The high-frequency feature contains
much edge information. The resulting feature after modulator
recovers more details of the number plate that is blurry in the
initial feature.

V. LIMITATIONS
Despite the superior performance, our model struggles to
Fig. 20. Failure case of our model for defocus deblurring.
completely recover the sharp images from severe motion blurs,
such as the number plate in Figs. 8 and 10. Furthermore, we
illustrate our result on the real-world dehazing dataset Dense-
Haze [65] in Fig. 19. As seen, there is a noticeable gap between In addition, we show a failure case of our model in Fig. 20.
our result and the ground truth in terms of color and textual Despite the higher score, FSNet is inferior to DeepRFT [19] in
details, which is due to the deficiency of real-world training recovering details of iron gauze with large area background. This
data. In future work we will develop the advanced domain is probably because our MCSF performs reconstruction only us-
adaptation method to sufficiently leverage synthetic datasets for ing binary frequency decomposition and cannot strike a balance
the high-quality reconstruction of real-world degraded images. between overlapped scenes, i.e., background and thin objects

Authorized licensed use limited to: Ritsumeikan University. Downloaded on June 24,2024 at 03:34:40 UTC from IEEE Xplore. Restrictions apply.
1106 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 46, NO. 2, FEBRUARY 2024

(iron wire/string). By contrast, the spectral features generated [15] C.-L. Guo, Q. Yan, S. Anwar, R. Cong, W. Ren, and C. Li, “Im-
in DeepRFT contain more spectra than ours. On the other hand, age dehazing transformer with transmission-aware 3D position em-
bedding,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022,
MCSF concentrates on modulating lowest frequency and the pp. 5812–5820.
complementary high frequency, and thus it yields high-quality [16] Y. Song, Z. He, H. Qian, and X. Du, “Vision transformers for single image
global information like illumination, leading to the higher score. dehazing,” 2022, arXiv:2204.03883.
[17] I. W. Selesnick, R. G. Baraniuk, and N. C. Kingsbury, “The dual-tree
complex wavelet transform,” IEEE Signal Process. Mag., vol. 22, no. 6,
pp. 123–151, Nov. 2005.
VI. CONCLUSION [18] H.-H. Yang and Y. Fu, “Wavelet U-net and the chromatic adaptation
transform for single image dehazing,” in Proc. IEEE Int. Conf. Image
The authors present an image restoration framework, FSNet, Process., 2019, pp. 2736–2740.
[19] X. Mao, Y. Liu, W. Shen, Q. Li, and Y. Wang, “Deep residual fourier
which is built on the proposed frequency selection mechanism. transformation for single image deblurring,” 2021, arXiv:2111.11745.
We develop two key modules, MDSF and MCSF, to conduct [20] B. Li et al., “Benchmarking single-image dehazing and beyond,” IEEE
frequency decomposition and recalibration with different re- Trans. Image Process., vol. 28, no. 1, pp. 492–505, Jan. 2019.
[21] W.-T. Chen et al., “All snow removed: Single image desnowing algo-
ceptive fields. Specifically, our multi-branch dynamic selective rithm using hierarchical dual-tree complex wavelet representation and
frequency module (MDSF) builds dynamic filters to decompose contradict channel loss,” in Proc. IEEE Int. Conf. Comput. Vis., 2021,
feature maps into various frequency parts and utilizes channel pp. 4196–4205.
[22] J. Rim, G. Kim, J. Kim, J. Lee, S. Lee, and S. Cho, “Realistic blur
attention to perform accentuation, thus effectively selecting synthesis for learning image deblurring,” in Proc. Eur. Conf. Comput.
the most informative frequency to recover. Furthermore, the Vis., 2022, pp. 487–503.
proposed multi-branch compact selective frequency module [23] A. Abuolaim and M. S. Brown, “Defocus deblurring using dual-pixel
data,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 111–126.
(MCSF) introduces a simple yet effective manner to enlarge [24] Y. Cui et al., “Selective frequency network for image restoration,” in Proc.
the receptive field and conduct frequency selection. In addition, Int. Conf. Learn. Representations, 2023.
we insert a small U-Net into our model to provide multi-scale [25] S. Nah, T. Hyun Kim, and K. Mu Lee, “Deep multi-scale convolutional
neural network for dynamic scene deblurring,” in Proc. IEEE Conf.
learning (MSL) and improve efficiency. With these designs, our Comput. Vis. Pattern Recognit., 2017, pp. 3883–3891.
models perform favorably against state-of-the-art algorithms on [26] M. Suin, K. Purohit, and A. Rajagopalan, “Spatially-attentive patch-
20 benchmark datasets for six image restoration tasks. hierarchical network for adaptive motion deblurring,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit., 2020, pp. 3606–3615.
[27] H. Son, J. Lee, S. Cho, and S. Lee, “Single image defocus deblurring
using kernel-sharing parallel atrous convolutions,” in Proc. IEEE Int.
REFERENCES Conf. Comput. Vis., 2021, pp. 2642–2650.
[28] S. W. Zamir et al., “Multi-stage progressive image restoration,” in Proc.
[1] W. Yang, R. T. Tan, S. Wang, Y. Fang, and J. Liu, “Single image deraining: IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 14 821–14 831.
From model-based to data-driven and beyond,” IEEE Trans. Pattern Anal. [29] W. Ren et al., “Gated fusion network for single image dehazing,” in Proc.
Mach. Intell., vol. 43, no. 11, pp. 4059–4077, Nov. 2021. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 3253–3261.
[2] A. Karaali and C. R. Jung, “Edge-based defocus blur estimation with [30] Y. Liu et al., “From synthetic to real: Image dehazing collaborating
adaptive scale selection,” IEEE Trans. Image Process., vol. 27, no. 3, with unlabeled real data,” in Proc. ACM Int. Conf. Multimedia, 2021,
pp. 1126–1137, Mar. 2018. pp. 50–58.
[3] K. Zhang et al., “Deep image deblurring: A survey,” Int. J. Comput. Vis., [31] J. Zhang, Y. Cao, Z.-J. Zha, and D. Tao, “Nighttime dehazing with
vol. 130, no. 9, pp. 2103–2130, 2022. a synthetic benchmark,” in Proc. ACM Int. Conf. Multimedia, 2020,
[4] P. Luo, G. Xiao, X. Gao, and S. Wu, “LKD-Net: Large kernel convolution pp. 2355–2363.
network for single image dehazing,” 2022, arXiv:2209.01788. [32] C. O. Ancuti, C. Ancuti, and R. Timofte, “NH-HAZE: An image de-
[5] W. Zou, M. Jiang, Y. Zhang, L. Chen, Z. Lu, and Y. Wu, “SDWNet: A hazing benchmark with non-homogeneous hazy and haze-free images,”
straight dilated network with wavelet transformation for image deblur- in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, 2020,
ring,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 1895–1904. pp. 1798–1805.
[6] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks [33] J. Rim, H. Lee, J. Won, and S. Cho, “Real-world blur dataset for learning
for biomedical image segmentation,” in Proc. Int. Conf. Med. Image and benchmarking deblurring algorithms,” in Proc. Eur. Conf. Comput.
Comput. Comput.- Assist. Intervention, 2015, pp. 234–241. Vis., 2020, pp. 184–201.
[7] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a Gaussian [34] M. I. Sezan and A. M. Tekalp, “Survey of recent developments in
denoiser: Residual learning of deep CNN for image denoising,” IEEE digital image restoration,” Opt. Eng., vol. 29, no. 5, pp. 393–404,
Trans. Image Process., vol. 26, no. 7, pp. 3142–3155, Jul. 2017. 1990.
[8] H. Zhang, Y. Dai, H. Li, and P. Koniusz, “Deep stacked hierarchical [35] D. Kundur and D. Hatzinakos, “Blind image deconvolution,” IEEE Signal
multi-patch network for image deblurring,” in Proc. IEEE Conf. Comput. Process. Mag., vol. 13, no. 3, pp. 43–64, May 1996.
Vis. Pattern Recognit., 2019, pp. 5978–5986. [36] D. Calvetti, L. Reichel, and Q. Zhang, “Iterative solution methods for
[9] X. Liu, Y. Ma, Z. Shi, and J. Chen, “GridDehazeNet: Attention-based large linear discrete ill-posed problems,” in Applied and Computa-
multi-scale network for image dehazing,” in Proc. IEEE Int. Conf. Com- tional Control, Signals, and Circuits. Berlin, Germany: Springer, 1999,
put. Vis., 2019, pp. 7314–7323. pp. 313–367.
[10] Y. Cui, Y. Tao, L. Jing, and A. Knoll, “Strip attention for image restora- [37] Y. Yuan, W. Su, and D. Ma, “Efficient dynamic scene deblurring us-
tion,” in Proc. Int. Joint Conf. Artif. Intell., 2023, pp. 645–653. ing spatially variant deconvolution network with optical flow guided
[11] Y. Cui, W. Ren, S. Yang, X. Cao, and A. Knoll, “IRNeXt: Rethinking training,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020,
convolutional network design for image restoration,” in Proc. Int. Conf. pp. 3555–3564.
Mach. Learn., 2023, pp. 6545–6564. [38] Y. Cui, Y. Tao, W. Ren, and A. Knoll, “Dual-domain attention for image
[12] Y. Cui, W. Ren, X. Cao, and A. Knoll, “Focal network for image restora- deblurring,” in Proc. AAAI Conf. Artif. Intell., 2023, pp. 479–487.
tion,” in Proc. IEEE Int. Conf. Comput. Vis., 2023, pp. 13001–13011. [39] K. Purohit, M. Suin, A. Rajagopalan, and V. N. Boddeti, “Spatially-
[13] A. Vaswani et al., “Attention is all you need,” in Proc. Adv. Neural Inf. adaptive image restoration using distortion-guided networks,” in Proc.
Process. Syst., 2017, pp. 21–25. IEEE Int. Conf. Comput. Vis., 2021, pp. 2309–2319.
[14] F.-J. Tsai, Y.-T. Peng, Y.-Y. Lin, C.-C. Tsai, and C.-W. Lin, “StripFormer: [40] L. Ruan, B. Chen, J. Li, and M. Lam, “Learning to deblur using light field
Strip transformer for fast image deblurring,” in Proc. Eur. Conf. Comput. generated and real defocus images,” in IEEE Conf. Comput. Vis. Pattern
Vis., 2022, pp. 146–162. Recognit., 2022, pp. 16 304–16 313.

Authorized licensed use limited to: Ritsumeikan University. Downloaded on June 24,2024 at 03:34:40 UTC from IEEE Xplore. Restrictions apply.
CUI et al.: IMAGE RESTORATION VIA FREQUENCY SELECTION 1107

[41] W.-T. Chen, H.-Y. Fang, J.-J. Ding, C.-C. Tsai, and S.-Y. Kuo, “JSTASR: [66] J. Lee, H. Son, J. Rim, S. Cho, and S. Lee, “Iterative filter adaptive network
Joint size and transparency-aware snow removal algorithm based on for single image defocus deblurring,” in Proc. IEEE Conf. Comput. Vis.
modified partial convolution and veiling effect removal,” in Proc. Eur. Pattern Recognit., 2021, pp. 2034–2042.
Conf. Comput. Vis., 2020, pp. 754–770. [67] Z. Shen et al., “Human-aware motion deblurring,” in Proc. IEEE Int.
[42] H. Dong et al., “Multi-scale boosted dehazing network with dense feature Conf. Comput. Vis., 2019, pp. 5571–5580.
fusion,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, [68] K. Jiang et al., “Multi-scale progressive fusion network for single image
pp. 2154–2164. deraining,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020,
[43] W. Ren, S. Liu, H. Zhang, J. Pan, X. Cao, and M.-H. Yang, “Single pp. 8346–8355.
image dehazing via multi-scale convolutional neural networks,” in Proc. [69] Z. Tu et al., “MAXIM: Multi-axis MLP for image processing,” in Proc.
Eur. Conf. Comput. Vis., 2016, pp. 154–169. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 5769–5780.
[44] H. Zhang, V. Sindagi, and V. M. Patel, “Multi-scale single image dehazing [70] X. Fu, J. Huang, D. Zeng, Y. Huang, X. Ding, and J. Paisley, “Removing
using perceptual pyramid deep network,” in Proc. IEEE Conf. Comput. rain from single images via a deep detail network,” in Proc. IEEE Conf.
Vis. Pattern Recognit. Workshops, 2018, pp. 1015–101509. Comput. Vis. Pattern Recognit., 2017, pp. 1715–1723.
[45] T. Wang, X. Yang, K. Xu, S. Chen, Q. Zhang, and R. W. Lau, “Spatial [71] W. Yang, R. T. Tan, J. Feng, J. Liu, Z. Guo, and S. Yan, “Deep joint
attentive single-image deraining with a high quality real rain dataset,” rain detection and removal from a single image,” in Proc. IEEE Conf.
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 12262– Comput. Vis. Pattern Recognit., 2017, pp. 1685–1694.
12271. [72] H. Zhang, V. Sindagi, and V. M. Patel, “Image de-raining using a
[46] S.-J. Cho, S.-W. Ji, J.-P. Hong, S.-W. Jung, and S.-J. Ko, “Rethinking conditional generative adversarial network,” IEEE Trans. Circuits Syst.
coarse-to-fine approach in single image deblurring,” in Proc. IEEE Int. Video Technol., vol. 30, no. 11, pp. 3943–3956, Nov. 2020.
Conf. Comput. Vis., 2021, pp. 4641–4650. [73] Y. Li, R. T. Tan, X. Guo, J. Lu, and M. S. Brown, “Rain streak removal
[47] X. Qin, Z. Wang, Y. Bai, X. Xie, and H. Jia, “FFA-Net: Feature fusion using layer priors,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
attention network for single image dehazing,” in Proc. AAAI Conf. Artif. 2016, pp. 2736–2744.
Intell., 2020, pp. 11 908–11 915. [74] H. Zhang and V. M. Patel, “Density-aware single image de-raining using
[48] H. Chen et al., “Pre-trained image processing transformer,” in Proc. IEEE a multi-stream dense network,” in Proc. IEEE Conf. Comput. Vis. Pattern
Conf. Comput. Vis. Pattern Recognit., 2021, pp. 12 299–12 310. Recognit., 2018, pp. 695–704.
[49] J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte, “SwinIR: [75] Y.-F. Liu, D.-W. Jaw, S.-C. Huang, and J.-N. Hwang, “DesnowNet:
Image restoration using swin transformer,” in Proc. IEEE Int. Conf. Context-aware deep network for snow removal,” IEEE Trans. Image
Comput. Vis., 2021, pp. 1833–1844. Process., vol. 27, no. 6, pp. 3064–3073, Jun. 2018.
[50] S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, and M.- [76] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human
H. Yang, “Restormer: Efficient transformer for high-resolution image segmented natural images and its application to evaluating segmentation
restoration,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, algorithms and measuring ecological statistics,” in Proc. IEEE Int. Conf.
pp. 5728–5739. Comput. Vis., 2001, pp. 416–423.
[51] Z. Wang, X. Cun, J. Bao, W. Zhou, J. Liu, and H. Li, “Uformer: A [77] H. Wu et al., “Contrastive learning for compact single image de-
general U-shaped transformer for image restoration,” in Proc. IEEE Conf. hazing,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021,
Comput. Vis. Pattern Recognit., 2022, pp. 17683–17693. pp. 10551–10560.
[52] M. Zhou, J. Huang, C.-L. Guo, and C. Li, “Fourmer: An efficient global [78] T. Ye et al., “Perceiving and modeling density for image dehazing,” in
modeling paradigm for image restoration,” in Proc. Int. Conf. Mach. Proc. Eur. Conf. Comput. Vis., Springer, 2022, pp. 130–145.
Learn., 2023, pp. 42589–42601. [79] B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao, “DehazeNet: An end-to-end
[53] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: system for single image haze removal,” IEEE Trans. Image Process.,
A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. vol. 25, no. 11, pp. 5187–5198, Nov. 2016.
Vis. Pattern Recognit., 2009, pp. 248–255. [80] B. Li, X. Peng, Z. Wang, J. Xu, and D. Feng, “AOD-Net: All-in-
[54] L. Chen, X. Chu, X. Zhang, and J. Sun, “Simple baselines for image one dehazing network,” in Proc. IEEE Int. Conf. Comput. Vis., 2017,
restoration,” in Proc. Eur. Conf. Comput. Vis., Springer, 2022, pp. 17–33. pp. 4780–4788.
[55] Y. Li et al., “Efficient and explicit modelling of image hierarchies for [81] K. He, J. Sun, and X. Tang, “Single image haze removal using dark
image restoration,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., channel prior,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 12,
2023. pp. 2341–2353, Dec. 2011.
[56] S. Guo, H. Yong, X. Zhang, J. Ma, and L. Zhang, “Spatial-frequency [82] M. Hong, Y. Xie, C. Li, and Y. Qu, “Distilling image dehazing with
attention for image denoising,” 2023, arXiv:2302.13598. heterogeneous task imitation,” in Proc. IEEE Conf. Comput. Vis. Pattern
[57] H. Yu et al., “Deep fourier up-sampling,” in Proc. Adv. Neural Inf. Recognit., 2020, pp. 3459–3468.
Process. Syst., 2022, pp. 22995–23008. [83] J. Zhang, Y. Cao, and Z. Wang, “Nighttime haze removal based on a
[58] C. Li et al., “Embedding fourier for ultra-high-definition low-light image new imaging model,” in Proc. IEEE Int. Conf. Image Process., 2014,
enhancement,” in Proc. Int. Conf. Learn. Representations, 2023. pp. 4557–4561.
[59] D. Fuoli, L. Van Gool, and R. Timofte, “Fourier space losses for efficient [84] Y. Li, R. T. Tan, and M. S. Brown, “Nighttime haze removal with glow
perceptual image super-resolution,” in Proc. IEEE Int. Conf. Comput. and multiple light colors,” in Proc. IEEE Int. Conf. Comput. Vis., 2015,
Vis., 2021, pp. 2360–2369. pp. 226–234.
[60] H.-H. Yang, C.-H. H. Yang, and Y.-C. J. Tsai, “Y-Net: Multi-scale feature [85] J. Zhang, Y. Cao, S. Fang, Y. Kang, and C. Wen Chen, “Fast haze removal
aggregation network with wavelet structure similarity loss function for for nighttime image using maximum reflectance prior,” in Proc. IEEE
single image dehazing,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Conf. Comput. Vis. Pattern Recognit., 2017, pp. 7016–7024.
Process., 2020, pp. 2628–2632. [86] T. Wang et al., “Restoring vision in hazy weather with hierarchical
[61] Y. Pang et al., “FAN: Frequency aggregation network for real image contrastive learning,” 2022, arXiv:2212.11473.
super-resolution,” in Proc. Eur. Conf. Comput. Vis., Springer, 2020, [87] A. Abuolaim, M. Afifi, and M. S. Brown, “Improving single-image defo-
pp. 468–483. cus deblurring: How dual-pixel images help through multi-task learning,”
[62] K.-H. Liu, C.-H. Yeh, J.-W. Chung, and C.-Y. Chang, “A motion deblur in Proc. IEEE Winter Conf. Appl. Comput. Vis., 2022, pp. 1231–1239.
method based on multi-scale high frequency residual image learning,” [88] K. Zhang et al., “Deblurring by realistic blurring,” in Proc. IEEE Conf.
IEEE Access, vol. 8, pp. 66025–66036, 2020. Comput. Vis. Pattern Recognit., 2020, pp. 2737–2746.
[63] Z. Liu et al., “Swin transformer: Hierarchical vision transformer us- [89] L. Chen, X. Lu, J. Zhang, X. Chu, and C. Chen, “HINet: Half instance
ing shifted windows,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, normalization network for image restoration,” in Proc. IEEE Conf. Com-
pp. 9992–10002. put. Vis. Pattern Recognit. Workshops, 2021, pp. 182–192.
[64] I. Loshchilov and F. Hutter, “SGDR: Stochastic gradient descent [90] X. Tao, H. Gao, X. Shen, J. Wang, and J. Jia, “Scale-recurrent network
with warm restarts,” in Proc. Int. Conf. Learn. Representations, for deep image deblurring,” in Proc. IEEE Conf. Comput. Vis. Pattern
2017. Recognit., 2018, pp. 8174–8182.
[65] C. O. Ancuti, C. Ancuti, M. Sbert, and R. Timofte, “Dense-haze: A [91] X. Fu, J. Huang, X. Ding, Y. Liao, and J. Paisley, “Clearing the skies: A
benchmark for image dehazing with dense-haze and haze-free images,” deep network architecture for single-image rain removal,” IEEE Trans.
in Proc. IEEE Int. Conf. Image Process., 2019, pp. 1014–1018. Image Process., vol. 26, no. 6, pp. 2944–2956, Jun. 2017.

Authorized licensed use limited to: Ritsumeikan University. Downloaded on June 24,2024 at 03:34:40 UTC from IEEE Xplore. Restrictions apply.
1108 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 46, NO. 2, FEBRUARY 2024

[92] R. Yasarla and V. M. Patel, “Uncertainty guided multi-scale residual Wenqi Ren (Member, IEEE) received the PhD de-
learning-using a cycle spinning CNN for single image de-raining,” in gree from Tianjin University, Tianjin, China, in 2017.
Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 8405–8414. From 2015 to 2016, he was supported by the China
[93] X. Li, J. Wu, Z. Lin, H. Liu, and H. Zha, “Recurrent squeeze-and- Scholarship Council and working with Prof. Ming-
excitation context aggregation net for single image deraining,” in Proc. Husan Yang as a Joint-Training PhD Student with
Eur. Conf. Comput. Vis., Springer, 2018, pp. 262–277. the Electrical Engineering and Computer Science
[94] D. Ren, W. Zuo, Q. Hu, P. Zhu, and D. Meng, “Progressive image Department, University of California, Merced. He is
deraining networks: A better and simpler baseline,” in Proc. IEEE Conf. currently an assistant professor with the School of
Comput. Vis. Pattern Recognit., 2019, pp. 3937–3946. Cyber Science and Technology, Shenzhen Campus,
[95] D. Engin, A. Genc, and H. Kemal Ekenel, “Cycle-dehaze: Enhanced Sun Yat-sen University, Shenzhen, China. His re-
cyclegan for single image dehazing,” in Proc. IEEE Conf. Comput. Vis. search interests include image processing and related
Pattern Recognit. Workshops, 2018, pp. 825–833. high-level vision problems. He received the Tencent Rhino Bird Elite Graduate
[96] R. Li, R. T. Tan, and L.-F. Cheong, “All in one bad weather removal Program Scholarship in 2017 and the MSRA Star Track Program in 2018.
using architectural search,” in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit., 2020, pp. 825–833.
[97] F.-J. Tsai, Y.-T. Peng, Y.-Y. Lin, C.-C. Tsai, and C.-W. Lin, “Stripformer:
Strip transformer for fast image deblurring,” in Proc. IEEE Conf. Comput.
Vis. Pattern Recognit., 2022, pp. 2353–2363.
[98] W.-T. Chen et al., “All snow removed: Single image desnowing algo-
rithm using hierarchical dual-tree complex wavelet representation and
contradict channel loss,” in Proc. IEEE Int. Conf. Comput. Vis., 2021,
pp. 4196–4205.
[99] X. Jia, S. Liu, X. Feng, and L. Zhang, “FOCNet: A fractional optimal Xiaochun Cao (Senior Member, IEEE) received the
control network for image denoising,” in Proc. IEEE Conf. Comput. Vis. BE and ME degrees in computer science from Bei-
Pattern Recognit., 2019, pp. 6054–6063. hang University, Beijing, China, and the PhD degree
[100] P. Liu, H. Zhang, K. Zhang, L. Lin, and W. Zuo, “Multi-level wavelet- in computer science from the University of Central
CNN for image restoration,” in Proc. IEEE Conf. Comput. Vis. Pattern Florida, Orlando, FL, USA. He has been a professor
Recognit. Workshops, 2018, pp. 773–782. with the School of Cyber Science and Technology,
[101] D. Liu, B. Wen, Y. Fan, C. C. Loy, and T. S. Huang, “Non-local recurrent Shenzhen Campus, Sun Yat-sen University, Shen-
network for image restoration,” in Proc. Adv. Neural Inf. Process. Syst., zhen, China. After graduation, he spent about three
2018, pp. 1680–1689. years with OjectVideo Inc., as a research scientist.
[102] Y. Zhang, K. Li, B. Zhong, and Y. Fu, “Residual non-local attention net- From 2008 to 2012, he was a professor with Tianjin
works for image restoration,” in Proc. Int. Conf. Learn. Representations, University, Tianjin, China. He is a fellow of IET. He
2019. is on an Editorial Board of the IEEE Transactions on Image Processing. His
[103] C. Ren, X. He, C. Wang, and Z. Zhao, “Adaptive consistency prior based dissertation was nominated for the University of Central Florida’s University-
deep network for image denoising,” in Proc. IEEE Conf. Comput. Vis. Level Outstanding Dissertation Award.
Pattern Recognit., 2021, pp. 8596–8606.
[104] C. Mou, J. Zhang, and Z. Wu, “Dynamic attentive graph learning
for image restoration,” in Proc. IEEE Int. Conf. Comput. Vis., 2021,
pp. 4328–4337.
[105] X. Li, W. Wang, X. Hu, and J. Yang, “Selective kernel networks,” in Proc.
IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 510–519.
[106] Y. Chen et al., “Drop an octave: Reducing spatial redundancy in convolu-
tional neural networks with octave convolution,” in Proc. IEEE Int. Conf.
Comput. Vis., 2019, pp. 3435–3444. Alois Knoll (Fellow, IEEE) received the MSc degree
[107] Q. Han et al., “On the connection between local attention and dynamic in electrical/communications engineering from the
depth-wise convolution,” in Proc. Int. Conf. Learn. Representations, University of Stuttgart, Germany, in 1985, and the
2021. PhD (summa cum laude) degree in computer science
[108] F. Wu, A. Fan, A. Baevski, Y. N. Dauphin, and M. Auli, “Pay less attention from the Technical University of Berlin, Germany,
with lightweight and dynamic convolutions,” 2019, arXiv: 1901.10430. in 1988. He served on the faculty of the Computer
Science department at TU Berlin until 1993. He
joined the University of Bielefeld, Germany as a full
professor and served as the director of the Technical
Informatics research group until 2001. Since 2001, he
has been a professor with the Department of Informat-
ics, Technical University of Munich (TUM), Germany. He was also on the board
Yuning Cui received the BEng degree from Central of directors of the Central Institute of Medical Technology, TUM (IMETUM).
South University, China, in 2016, and the MEng From 2004 to 2006, he was executive director of the Institute of Computer
degree from the National University of Defense Tech- Science, TUM. Between 2007 and 2009, he was a member of the EU’s highest
nology, China, in 2018. He is currently working to- advisory board on information technology, ISTAG, the Information Society
wards the PhD degree with the Chair of Robotics, Ar- Technology Advisory Group, and a member of its subgroup on Future and
tificial Intelligence and Real-time Systems within the Emerging Technologies (FET). His research interests include cognitive, medical
School of Computation, Information and Technology and sensor-based robotics, multi-agent systems, data fusion, adaptive systems,
at the Technical University of Munich. His research multimedia information retrieval, model-driven development of embedded sys-
interest lies in image restoration. tems with applications to automotive software and electric transportation, as
well as simulation systems for robotics and traffic.

Authorized licensed use limited to: Ritsumeikan University. Downloaded on June 24,2024 at 03:34:40 UTC from IEEE Xplore. Restrictions apply.

You might also like