MADNet A Fast and Lightweight Network For Single-Image Super Resolution

This document summarizes a research paper that proposes a new lightweight neural network called MADNet for single-image super resolution (SISR). The paper introduces how deep convolutional neural networks have achieved state-of-the-art performance for SISR but are limited by their heavy computational requirements. MADNet aims to address this by utilizing multiscale features and dense connections to improve performance while reducing parameters and computations. The network includes a residual multiscale module with attention and a dual residual-path block to better leverage hierarchical features from low-resolution images. Results show MADNet achieves superior performance compared to other methods using fewer computations and parameters.

Uploaded by

Adithyan Ps

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views11 pages

MADNet A Fast and Lightweight Network For Single-Image Super Resolution

Uploaded by

Adithyan Ps

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

IEEE TRANSACTIONS ON CYBERNETICS, VOL. 51, NO.

3, MARCH 2021 1443

MADNet: A Fast and Lightweight Network for

Single-Image Super Resolution
Rushi Lan , Long Sun , Zhenbing Liu, Huimin Lu , Cheng Pang , and Xiaonan Luo

Abstract—Recently, deep convolutional neural networks is related to reconstructing a visually high-resolution (HR)
(CNNs) have been successfully applied to the single-image super- image from its low-resolution (LR) input. In practice, SISR
resolution (SISR) task with great improvement in terms of is generally difficult to process due to its ill-posed nature,
both peak signal-to-noise ratio (PSNR) and structural similarity
(SSIM). However, most of the existing CNN-based SR mod- wherein multiple HR images can map to the same LR ver-
els require high computing power, which considerably limits sion. Addressing SISR has proven to be useful in many
their real-world applications. In addition, most CNN-based meth- practical cases, such as video streaming [44], [50]; remote
ods rarely explore the intermediate features that are helpful sensing [16], [58]; and medical imaging [31], [45], [48].
for final image recovery. To address these issues, in this arti- To mitigate this problem, numerous SR approaches
cle, we propose a dense lightweight network, called MADNet,
for stronger multiscale feature expression and feature correla- have been proposed from different perspectives, includ-
tion learning. Specifically, a residual multiscale module with an ing interpolation-based [17], reconstruction-based [33], and
attention mechanism (RMAM) is developed to enhance the infor- example-based methods [23], [25], [40], [41], [49]. The for-
mative multiscale feature representation ability. Furthermore, mer two kinds of methods are simple and efficient but suffer a
we present a dual residual-path block (DRPB) that utilizes the dramatic drop in restoration performance as the scale factors
hierarchical features from original low-resolution images. To
take advantage of the multilevel features, dense connections are increase, and the example-based methods that try to analyze
employed among blocks. The comparative results demonstrate relationships between LR and HR pairs achieve satisfactory
the superior performance of our MADNet model while employing performance but involve time-consuming operations.
considerably fewer multiadds and parameters. Recently, due to the powerful feature representation capa-
Index Terms—Channel attention, dense connections, image bility of the deep convolutional neural network (CNN), CNN-
super resolution, lightweight, multiscale mechanism. based methods have been proposed to learn a nonlinear
mapping from an interpolated or LR version to its corre-
I. I NTRODUCTION sponding high-quality output. By entirely utilizing the inherent
relations among images in training datasets, these models have
INGLE-IMAGE super resolution (SISR) is an essential
S and classical problem in low-level computer vision that
provided outstanding performance in SR tasks [5], [7], [18],
[22], [27], [30], [56], [57]. Ranging from the SRCNN [5],
Manuscript received June 23, 2019; revised November 5, 2019; accepted which has only three convolution layers (Conv layers), to
January 17, 2020. Date of publication March 4, 2020; date of current ver- the recent RCAN [56], which has over 400 layers, these
sion February 17, 2021. This work was supported in part by the National approaches obviously illustrate that as the model becomes
Natural Science Foundation of China under Grant 61702129, Grant 61772149,
Grant U1701267, and Grant 61866009, in part by the National Key Research deeper, the performance improves.
and Development Program of China under Grant 2018AAA0100305, in part Although CNN-based models have achieved state-of-the-art
by the China Postdoctoral Science Foundation under Grant 2018M633047, performance, these methods face some limitations.
in part by the Guangxi Science and Technology Project under Grant
2019GXNSFAA245014, Grant AD18281079, Grant AD18216004, Grant 1) Most CNN-based frameworks gain improvement by sub-
2017GXNFDA198025, and Grant AA18118039, and in part by the Innovation stantially increasing the depth or width of the network;
Project of GUET Graduate Education under Grant 2019YCXS048. This arti- this means that they rely heavily on computation
cle was recommended by Associate Editor H. Lu. (Corresponding author:
Zhenbing Liu.) to produce the HR images, limiting their real-world
Rushi Lan is with the Guangxi Key Laboratory of Image and Graphic applications.
Intelligent Processing, Guilin University of Electronic Technology, Guilin 2) Most existing CNN-based SR models seldom utilize the
541004, China, and also with the School of Computer Science and
Engineering, South China University of Technology, Guangzhou 510006, multiscale representation for image super resolution and
China. do not fully use the hierarchical features.
Long Sun, Zhenbing Liu, and Cheng Pang are with the Guangxi Consequently, it is important to design a lightweight archi-
Key Laboratory of Images and Graphics Intelligent Processing, Guilin
University of Electronic Technology, Guilin 541004, China (e-mail: tecture that is practical to solve the mentioned problems. The
[email protected]). general way to build a lightweight network is to reduce the
Huimin Lu is with the Department of Mechanical and Control of number of model parameters and computational operations
Engineering, Kyushu Institute of Technology, Kitakyushu 8048550, Japan.
Xiaonan Luo is with the National Local Joint Engineering Research Center (multiadds). Based on this concept, we provide a feasible solu-
of Satellite Navigation and Location Service, Guilin University of Electronic tion for the challenge that combines the multiscale mechanism
Technology, Guilin 541004, China. and the dense connection. Specifically, an efficient feature
Color versions of one or more of the figures in this article are available
online at http://ieeexplore.ieee.org. extraction network (EFEN) is proposed for exploring feature
Digital Object Identifier 10.1109/TCYB.2020.2970104 maps, and an upsampling network (UN) is used for enlarging
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
1444 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 51, NO. 3, MARCH 2021

27.8 MADNet-L F BTSRN

A. CNN-Based Lightweight Super-Resolution Networks
27.6 SR-MDNF MemNet
CARN-M DRRN CNN-based SISR models [4], [5], [7], [8], [18], [24],
IDN
27.4 [27], [56], [57] have shown dramatic improvements in recent
PSNR on Urban100

27.2 years given their powerful nonlinear representation ability.

VDSR DRCN
Dong et al. [5], [6] first introduced a shallow CNN-based
27
method called SRCNN, which only contains three Conv lay-
26.8 ers and obtains impressive performance. The input image of
26.6 SRCNN, however, is a bicubic-interpolated image that reduces
FSR CNN
high-frequency information and adds a relative amount of
26.4
computational cost and time. Later, to reduce the computa-
SRCNN
26.2 tional cost caused by the preprocessed input, FSRCNN [7] and
100 101 102 103 104 105
Multi-Adds (G)
ESPCN [32] explored two different upsampling approaches:
1) the deconvolution layer [52] and 2) the subpixel convolution
Fig. 1. Multiadds versus PSNR. Comparison between our MADNet model layer. In their networks, they enlarge images at the end part
and other advanced lightweight networks on the Urban100 test set (×3). Our of the models and thus trim down the number of parameters
MADNet model outperforms other methods. The multiadds are calculated by
assuming that the size of the output image is 1280 × 720. and operations.
Meanwhile, VDSR [18] employs the global residual learn-
features. The EFEN subnet is the key part of our method. To ing to train a very deep network, providing proof of the
build this module, we introduce a residual multiscale module fact that increasing the depth of the network can improve
with an attention mechanism (RMAM) for better multiscale the reconstruction performance. Subsequently, an increasing
feature correlation learning. Our RMAM adaptively exploits number of works have been mainly concerned about improve-
the discriminative information at different scale spaces. Such ment by designing more complex CNN architectures. For
a mechanism allows our model to focus on more informa- example, by combining residual learning and the channel
tive features and enhance the multiscale representation ability. attention mechanism, Zhang et al. [56] proposed the RCAN
Moreover, for propagating the feature and gradient data, a model, which has more than 400 layers and can achieve
dual residual-path block (DRPB) is proposed. By stacking great SR performance. However, increasing the reconstruction
the DRPB, we can utilize the hierarchical features from LR performance by increasing the model complexity with a deeper
images. In addition, we employ a dense connection struc- network is not free: it comes at the cost of a tremendous
ture for incorporating features from various layers, which can increase in computational resources and time. Furthermore,
make full use of multilevel features. As shown in Fig. 1, this approach limits real-world applications [1]. Thus, it is
our network obtains state-of-the-art reconstruction results with still a challenging task to build lightweight SR networks [42].
fewer multiadd operations. The Laplacian pyramid super-resolution network
In summary, our main contributions are listed as follows. (LapSRN) [21] has been introduced to address the speed and
1) We propose an RMAM that can not only effectively accuracy of the SR problem, which takes the LR image as
extract multiscale features but can also utilize the dis- input and progressively reconstructs the subband residuals
criminative information among different channels. of HR images. DRRN [37] shares the parameters through
2) We introduce a residual learning-based block, called a recursive mechanism to not only reduce the number of
DRPB, to map the low-level feature to high-level space parameters but also improve the reconstruction accuracy.
and gathers more information to the greatest extent Ahn et al. [1] proposed an architecture that conducts a
possible. cascading mechanism upon a residual network to achieve
3) We employ a dense connection structure among DRPBs lightweight and efficient reconstruction. Hui et al. [15]
that can integrate multilevel features such as those at designed a novel information distillation network (IDN) to
local or global levels, and thereby enhance the repre- maintain the speed of real-time reconstruction.
sentational capability.
The remainder of this article is organized as follows. In B. Multiscale Representations
Section II, we briefly review the relevant works on the
Multiscale feature representations have been widely
proposed method. In Section III, we provide the architec-
used in a large number of visual tasks, such as
ture of our proposed model in detail and discuss the relation
image classification [34]–[36], object detection [28], seman-
between the state-of-the-art models and our proposed one. In
tic segmentation [51], and image super resolution [26].
Section IV, we show the implementation details and datasets,
GoogLeNet [35] uses parallel filters with different kernel sizes
as well as an ablation study and experimental results. Finally,
to enhance the multiscale representation capability in order
we conclude the proposed methods in Section V.
to find an optimal local sparse structure. After that accom-
plishment, the improved Inception versions [34], [36] were
II. R ELATED W ORKS designed, stacking more filters in each branch of the par-
Single-image super resolution has been broadly studied for allel paths to further expand the receptive field. Moreover,
many years. In this section, we briefly introduce some works Res2Net [9] currently exhibits a new module to further
that are related to our proposed model. improve the multiscale feature representation ability of CNNs.
LAN et al.: MADNet: FAST AND LIGHTWEIGHT NETWORK FOR SINGLE-IMAGE SUPER RESOLUTION 1445

Fig. 2. Architecture of our proposed model (MADNet), which contains two subnetworks: an EFEN and a UN. The former includes three DRPBs; the latter
is constructed by three sets of Conv layers and a pixel-shuffle layer.

In the Res2Net module, the input features are divided into A. Network Framework
several groups, and each group of the parallel groups utilizes As shown in Fig. 2, the proposed MADNet consists of two
a smaller filter to extract features and connects with others via components: 1) an EFEN and 2) a UN.
residual shortcuts. The EFEN utilizes two successive Conv layers with kernel
Recently, Li et al. [26] introduced a multiscale residual sizes of 3 × 3 and 1 × 1 for simply detecting low-level fea-
network to exploit the image features to achieve a significant tures from the input image. Then, to extract the global and
performance gain for image super resolution. However, they local image features, the output is fed to the DRPBs, and
simply concatenate the information with two different filter all the results of the intermediary block are connected to the
sizes while ignoring the granular-level multiscale feature and following block as dense connections. Let ILR represent the
thus cannot cover a large range of receptive fields and cause original input image and ISR be the output; then, this stage
a computational burden. Importantly, for image SR, features can be formulated as
with more multiscale information are more accurate for recon-
struction, while an SR model with fewer parameters is more FFEA = HEFEN (ILR ) = HDRPB (HLL (ILR )) (1)
feasible for real applications. where HEFEN (·) is the feature extraction function and can be
divided into the shallow feature extraction step HLL (·) and the
representation learning step HDRPB (·). FFEA denotes the output
C. Attention Mechanism feature map from EFEN.
Attention in human perception refers to how visual systems Finally, we concatenate all of the feature maps for further
adaptively exploit a sequence of information items and feature fusion. After fusing, these features are processed by two
selectively focus on salient areas [12]. Recently, several Conv layers and a pixel-shuffle layer to generate the HR image
attempts have introduced attention processing to improve
ISR = HUP (FFEA ) = HGEN (HCON (FFEA )) (2)
the performance of CNNs for various computer vision
tasks [12], [43], [47], [56]. where HUP (·) denotes the upsampling procedure and contains
Hu et al. [12] employed an attention module to exploit two stages: 1) HCON (·) means feature concatenation and fusion
the interchannel relationship. In their work, the squeeze-and- and 2) HGEN (·) represents the subsequent processing.
excitation (SE) module utilizes global average-pooled features
to calculate channelwise attention and achieves considerable B. Efficient Feature Extraction Network
improvement for image classification. Woo et al. [47] further We now describe our EFEN (see Fig. 2) in detail. It is
exploited this schema for both spatial and channelwise atten- stacked with two Conv layers and three DRPBs, while a sin-
tion. In addition, Wang et al. [43] proposed a novel attention gle DRPB gains a sequence of our proposed residual module,
block for video classification in which nonlocal operations are that is, it operates with the multiscale module and attention
used to capture spatial attention. mechanism. The details regarding this structure are presented
as follows.
DRPB: The DRPB contains M = 3 proposed multiscale
III. M ETHODOLOGY modules. To utilize different level features and enhance the
In this section, we first present the network framework of representation capability of our model, we adopt a dense
MADNet in detail, and then suggest the multiscale module, connection structure for the EFEN, that is, the dth DPRB
which is the core of the proposed method. After that, the relays intermediate features to all of the next blocks. The mth
loss functions are illustrated and the discussions among the multiscale module [see Fig. 3(c)] in the dth DPRB can be
proposed method and other related algorithms are provided at represented as

the end of this section. Fd,m = Hd,m Fd,m−1 (3)
1446 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 51, NO. 3, MARCH 2021

(a) (b) (c)

Fig. 3. Exploring different residual forms. We compare the performance

of these structures in terms of PSNR, and experimentally show that the dual
residual-path schema is more effective to extract features. (a) RPB1 . (b) RPB2 .
(c) DRPB.

where Hd,m denotes the function of the mth multiscale module

in the dth DPRB, and Fd,m and Fd,m−1 are the corresponding
output and input. To gain more informative features, the dual
residual path is used to generate the block output

Fd = Fd,m−1 + Fd,m+1 Fd−1 + Fd,m (4)

where Fd−1 and Fd are the outputs of the (d − 1)th and dth
DPRB, respectively. Such a connection schema allows more
low-frequency information to be bypassed during training. In
fact, to confirm the effectiveness of this combination form, we
compare several types of residual blocks and elaborate on the
details in Section IV.
RMAM: Multiscale representations are essential for vari-
ous vision tasks [9], such as semantic segmentation, object
detection, and image classification. The multiscale feature Fig. 4. Comparison of different multiscale modules. From top to bottom are:
extraction ability of CNNs leads to effective representations. (a) inception module (simplified form) [34], (b) RFB module [28], (c) MSRB
In addition, we focus on solving the efficiency limitation that module [26], and (d) RMAM module.
is essentially presented in real-world SR applications. To bal-
ance the performance and computational budgets, the channel where Conv3 (·) denotes the process of the left branch, and Fi
split strategy is introduced in the residual layer. Meanwhile, is the output. Then

the channel attention mechanism [12] is employed to learn Fi i=1
FUi = (6)
discriminative representations. It was empirically found that F1 + · · · + Fi−1 1 < i ≤ 4
our multiscale module is not only efficient but also accurate. where FUi (·) means the mixed features that potentially receive
Multiscale Structure: Most previous CNN-based SR mod- feature information from all preceding feature splits.
els do not consider multiscale representations. To exploit such After extracting the feature maps, we fuse these features at
information, MSRN [26] was introduced to detect features different scale spaces. The feature maps from all branches are
at different scales for accurate super-resolution construction. concatenated and sent to the SE block for exploring discrim-
However, the receptive fields within MSRN are limited, and inative representations among channels. For better preserving
the computational complexity is fairly higher. Inspired by the inherent information, the output features are then fused
Inception [34] and RFB [28], we propose a multiscale module with the original input tensors in a residual-like manner. From
[see Fig. 4(d)] to learn the multiscale representation ability. our observation, this schema is useful for utilizing features at
First, we apply a 1 × 1 Conv to reduce the dimension of the different scale spaces.
input data for lessening computational burden and then send Channel Attention Mechanism: The attention mechanism
them to the following four parallel branches. Except for the is popular in numerous vision tasks since it adaptively
left (i.e., it includes a 3 × 3 convolution layer), other branches recalibrates the channelwise feature responses by explicitly
contain two normal convolutional layers (e.g., 1 × 1, 3 × 3) modeling interdependencies between channels [12]. Recently,
and a depthwise convolution with a dilation rate r = 2, 3, this strategy was introduced to further improve CNN-based
and 5, respectively, denoted by MS(·). These smaller filters SR performance [56].
first obtain features from the processed input feature maps fi Let V = [v1 , . . . , vn ] denote the input data that contain n
and then use a large range of receptive fields to describe the feature maps, and the spatial shape of each feature map is
information. Specifically, the output of the previous branch is H × W. Then, the statistic Sc of the cth feature map fc is
connected to the next branch via an elementwise sum. This calculated as
procedure is repeated several times until the outputs from all H W
branches are processed. This procedure can be defined as i=1 j=1 fc (i, j)
Sc = HAVGP (fc ) = (7)
H×W

Conv3 (fi ) i = 1 where HAVGP (·) means the global average pooling opera-
Fi = (5)
MSi (fi ) 1<i≤4 tion, and fc (i, j) represents the corresponding value of fc .
LAN et al.: MADNet: FAST AND LIGHTWEIGHT NETWORK FOR SINGLE-IMAGE SUPER RESOLUTION 1447

The attention statistic of the feature fc is where ∇h (·) and ∇v (·) denote the gradient operator among the
Ac = F(w1 δ(w2 Sc )) (8) horizontal and vertical direction, respectively.
Thus, the second loss function is defined as follows:
where F(·) is the ReLU activation function, and δ(·) represents
the sigmoid function and can be treated as a gating mechanism. LF = L1 + λLTV . (13)
w1 is the weight of a dimension-increasing layer (i.e., 1 × 1
convolution layer for upscaling) and w2 denotes the weight of We train our model with these losses, empirically finding
a dimension-reduction layer (i.e., 1 × 1 convolution layer for that the LF loss can obtain a better performance than the L1
downscaling). The downscaling layer first reduces the number loss and λ = 1e−5 works well. As shown in Fig. 7, the LF
of input channels by a reduction factor r with w2 , activated loss enables our model to produce sharper SR results.
by an activation function δ, and then upscaling to the original
spatial space with w1 . The attention statistic Ac that is used to E. Relation to Other CNN Methods
rescale the input feature map fc Relation to Res2Net: The motivation for exploiting the
fˆc = Ac · fc . (9) multiscale potential is similar between the Res2Net [9] mod-
Densely Connected Structure: Due to our DRPB and the ule and our RMAM. However, there are three main differences
multiscale module, the information can be perceived from very in our mechanism.
different scales. To go a further step to assimilate multilevel 1) In general, Res2Net is used in high-level computer
features, we densely connect each DRPB. The mth block vision tasks (e.g., semantic segmentation and object
DPRBm (see Fig. 2) can be represented as recognition), and some inherent operations of this model
are not suitable for image SR such as batch normal-
DPRBm = Concate(HLL , DPRB1 , . . . , DPRBm−1 ). (10) ization (BN) layers, which increase the computational
Concatenating the preceding features as the input of DPRBm , complexity and hinder the reconstructed performance of
the output is also connected to the subsequent block employ- the network. Thus, we remove these layers.
ing the same process. Such a dense connection struc- 2) The procedure of extracting features is different. In
ture [13] allows more abundant low-frequency information to Res2Net, the input features are evenly split into several
be bypassed during training. groups, and each group is processed by a correspond-
ing 3 × 3 convolution except for the first part, where
the convolutional output is added to the preceding
C. Upsampling Network
feature and then fed into the next. However, in our
As stated in Section II, our proposed model directly model, we stack three convolutional layers with different
processes original input images so that it can extract features kernel sizes and dilation rates for effectively extracting
efficiently. The final high-quality image ISR is reconstructed in information. All of the previous outputs are added to the
the UN, and all of the features from EFEN are concatenated following group for integrating multiscale features.
at the input layer of the UN; thus, the dimension of the input 3) For learning the discriminative representation, the SE
data is rather large. Therefore, we use 1 × 1 to reduce the block [12] is embedded to recalibrate the channelwise
input dimension before generating the HR pixels. feature.
Then, the magnification layer reshapes the feature maps to a Relation to MSRN: We summarize the main differences
high-level space and outputs nine channels where each channel between MSRN [26] and our MADNet. The first one is the
represents each real-valued tensor of the upsampled pixel. design of the basic module. In MSRN, the multiscale resid-
ual block (MSRB) mainly combines parallel convolutions with
D. Loss Function multiscale feature fusion by residual learning [11], operating
We consider two types of loss functions that measure the on all feature channels. Such an approach leads to heavy com-
difference between the HR output ISR and its correspond- putations. However, our multiscale module is based on several
ing ground truth IGT . The first one is the mean absolute convolutional branches and introduces a split and concatena-
error (MAE), also called the l1 -norm, which is formulated as tion strategy to effectively process features and reduce the
follows: number of parameters. The second one is the activation func-
tion. MSRN uses the ReLU function, whereas we utilize the
L1 = ISR − IGT 1 . (11)
PReLU activation function. From the comparisons in Fig. 5,
Alternatively, the mean-square error (MSE) can be used; how- in the negative part, PReLU introduces a learnable param-
ever, in previous work [27], it was experimentally found to be eter that can counterweigh the positive mean of the ReLU,
a poor choice to recover clear images. making it slightly symmetric; moreover, previous experiments
Given the perception that MAE or MSE tends to lead a have proven that PReLU converges faster than ReLU and
smooth result, we additionally introduce a total variation (TV) obtains better performance [55]. Thus, our proposed multiscale
regularizer [10], [29] to constrain the smoothness of ISR module possesses more powerful representational ability.
Relation to MemNet: MemNet [38] uses a dense block and
LTV = ∇h (ISR )2 + ∇v (ISR )2
2 2
various shortcuts. The differences in our method are listed as
= ISRi,j+1 − ISRi,j + ISRi+1,j − ISRi,j (12) follows. First, Lim et al. trained the network with the L2 loss,
i,j while it was empirically found that training with the L1 loss
1448 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 51, NO. 3, MARCH 2021

TABLE I
E FFECTS OF D IFFERENT R ESIDUAL S TRUCTURES M EASURED ON THE
S ET 14 × 3 DATASET IN 200 E POCHS

TABLE II
R ESULTS OF AN A BLATION S TUDY ON THE E FFECT OF THE SE B LOCK .
T HE E VALUATION I S ON THE S ET 5 AND B100 T EST S ETS

Fig. 5. (a) ReLU versus (b) PReLU. PReLU introduces a learnable parameter
that can counterweigh the positive mean of the ReLU, making it slightly
symmetric.

Fig. 6. Effect of MADNet with different residual structures. The curves are
based on the PSNR (dB) on DIV2K (val) with an upsampling factor of 3 in
200 epochs.

provides better convergence and results than L2 [27]. In this

article, we further improve the L1 loss, and the experimental
results demonstrate that the modification is feasible. Second, Fig. 7. Comparisons of the loss function for ×3 SR. On the top row, the
MemNet takes the bicubic-upsampled images as input. Such “zebra” image from the Set14 dataset, the image processed with LF has clear
input images dramatically increase the number of multiadds. details in the area around the eye. On the bottom row, the “img006” image
from the Urban100 benchmark, the LF method super resolves the number
However, our MADNet directly extracts hierarchical features “55” sharply.
from the original LR image and upsamples it at the end of TABLE III
the network in order to achieve computational efficiency and E FFECT OF THE L OSS F UNCTION W ITH S CALE FACTORS OF ×2 AND ×4
ON THE S ET 14 AND U RBAN 100 B ENCHMARK DATASETS , R ESPECTIVELY
improve SR performance. Third, the components are totally
different. Inside of the memory blocks of MemNet, the output
features of each recursive unit are concatenated at the gate
unit for fusing multilevel representations with 1 × 1 convo-
lution. The analysis in [1] and [57] shows that this schema is
not efficient at detecting hierarchical features. In our model,
we extract multiscale feature maps via utilizing the parallel
convolutional branch with different kernel sizes. Furthermore,
we additionally introduce the channel attention mechanism
for effectively learning channelwise feature interdependencies. network with the state-of-the-art models on four benchmark
Thus, our model is more powerful for feature representation. datasets and show the visual results on different scales.

A. Training Details
IV. E XPERIMENTAL R ESULTS As shown in Fig. 2, the input and output data of our network
In this section, we first briefly depict the experimental are RGB images. During training, in each mini-batch, we ran-
implementation as well as the training and testing datasets; domly crop 16 color patches with a specific size (i.e., 96 × 96
the ablation studies follow this step. Finally, we compare our for ×2, 144 × 144 for ×3, and 192 × 192 for ×4) from the LR
LAN et al.: MADNet: FAST AND LIGHTWEIGHT NETWORK FOR SINGLE-IMAGE SUPER RESOLUTION 1449

TABLE IV
Q UANTITATIVE C OMPARISONS OF THE S TATE - OF - THE -A RT S UPER -R ESOLUTION M ODELS ON P UBLIC B ENCHMARKS .
R ED /B LUE T EXT M EANS THE B EST /S ECOND−B EST P ERFORMANCE

images as input. We augment the training images via rotating B. Datasets

by 90◦ and via horizontal flipping. Our model is trained by We train our model based on the DIV2K dataset [39],
the ADAM optimizer [20] with β1 = 0.9, β2 = 0.999, and which includes 800 high-quality (2K resolution) images for
= 10−8 . The learning rate is initialized as 1e−3 , and then the training set, and another 200 pictures for the validation
reduced by half every 100 epochs for a total of 400 epochs. It and test set. During testing, we use four standard bench-
takes about 15 h to train the proposed model for each magni- mark datasets: 1) Set5 [3]; 2) Set14 [53]; 3) B100 [2]; and
fication factor in this article. All experiments are implemented 4) Urban100 [14], each of which has various characteris-
in the PyTorch framework on NVIDIA Tesla P100 with a tics. In detail, the Set5, Set14, and B100 datasets mainly
single GPU. contain images of person and natural landscapes in many
1450 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 51, NO. 3, MARCH 2021

TABLE V
AVERAGE I NFERENCE T IME (S ECOND ) AND R ECONSTRUCTE P ERFORMANCE . T HE R ESULTS A RE E VALUATED ON THE S ET 14, B100, AND DIV2K
DATASETS FOR ×4 SR

different scenes; the Urban100 set includes 100 urban build- are illustrated in Table III. The LF achieves better results with
ing images in the real world. Both peak signal-to-noise regard to both PSNR and SSIM. For example, LF gains a
ratio (PSNR) and structural similarity (SSIM) [46] results PSNR improvement of 0.05 dB on the Set14 dataset with a
are calculated on the final SR images on the Y channel scaling factor 4.
of the transformed YCbCr color space. The LR image is
downscaled from the corresponding HR one using bicubic
downsampling. D. Comparison With State-of-the-Art Methods
We compare the proposed method with benchmark SR
C. Ablation Study models on two commonly used image quality metrics, namely,
To provide a better understanding of the proposed method, PSNR and SSIM. Note that we use the number of parameters
an ablation study is first conducted here from the following and multiadds to measure the model size. The multiadds
perspectives, that is, residual-path block, SE block, and loss is defined as follows [1], that is, the number of multiply
function. accumulate operations and we assume the SR outputs size
1) Study of the Residual-Path Block: Fig. 3 illustrates three to 1280 × 720 to calculate multiadds. The geometric self-
different residual structures. We first conduct the ablation ensembling strategy [27], [41] is used for further evaluation
experiment on these structures and the corresponding results and marked with “+” in this article. Note that we reimple-
are presented in Fig. 6 and Table I. In Table I, the base- ment IDN [15] with PyTorch, and the official TensorFlow
line is a plain structure without any shortcuts, the RPB1 implementation is at https://github.com/Zheng222/IDN
utilizes the residual learning between the first and last mod- -tensorflow.
ule, the RPB2 connects the first two modules via adding As shown in Fig. 1, we compare our model against
shortcuts, and the DRPB is as illustrated in the previous the various state-of-the-art algorithms in terms of the mul-
section. tiadds on the Urban100 dataset with an upscaling factor
It can be seen that the block with residual learning shows of 3. Here, our MADNet method outperforms all state-of-
better performance than the baseline because the residual path the-art lightweight models that have less than 2M param-
allows the earlier feature to pass into later layers. It also can eters. Specifically, MADNet has similar model size to
be observed that the DRPB form depicts a better and stable those of DRCN [19], MemNet [38], and SRMDNF [54],
performance as the training epochs increase. This result mainly while we achieve a better performance than all of
occurs because the dual residual path effectively promotes the them.
information propagation. The quantitative comparisons with several state-of-the-art
2) Study of the SE Block: To evaluate the performance of methods are listed in Table IV. Our model outperforms the
the SE block components in RMAM, we remove the SE block, existing models by a large margin on different scaling fac-
such that the entire network does not take account of the atten- tors except for CARN [1]. It can be seen that although our
tion mechanism. Observing the results shown in Table II, the method has quite a few parameters and multiadds, it gains
attention schema can bring absolute improvements, and the completely similar performance or even better. Considering
PSNR value improves by approximately 0.9 and 0.8 dB on the GPU runtime, we mainly compare the proposed method
Set5 and B100, respectively. with the latest CARN model and use the official codes
3) Study of the Loss Function: To examine the effect of to test their running time. As shown in Table V, our
the mentioned loss functions, we trained two versions of our proposed model averagely spends 0.0455, 0.0162, and 0.1117
network. Expressed formally, let the first model be “L1 ” (i.e., s to reconstruct an image on the Set14, B100, and DIV2K
using L1 loss for training) and other be “LF ” (i.e., using the (100 validation pictures in total) datasets for scale fac-
enhanced LF loss for training). We tried different linear com- tor 4, respectively, and totally running as fast as the CARN
binations of L1 and LF with different weights. Moreover, it series.
was found that λ = 1e−5 achieves a tradeoff between PSNR Fig. 8 presents the visual comparisons on the B100 and
and visual quality. Fig. 7 shows this perception that LF loss Urban100 datasets for the ×4 scale. The figure shows that
leads to sharper images with more details. In addition, we test our method works better than other comparative ones, and the
the performance on benchmarks. The corresponding results reconstructed SR images are closer to the HR ones in detail.
LAN et al.: MADNet: FAST AND LIGHTWEIGHT NETWORK FOR SINGLE-IMAGE SUPER RESOLUTION 1451

Fig. 8. Visual qualitative comparisons with the bicubic degradation model for ×4 SR on benchmarks.

V. C ONCLUSION Comprehensive benchmark evaluations show the effective-

We proposed a dense network of moderate size for fast and ness of our MADNet model in terms of the model size and
accurate image SR in this article. Specifically, the RMAM reconstructed results.
allows our model to capture the informative multiscale fea-
ture maps by embedding the channel attention operation. ACKNOWLEDGMENT
Furthermore, the DRPB utilizes the hierarchical features from The authors would like to greatly appreciate the reviewers’
the original LR images and allows the abundant low-frequency valued comments and suggestions which greatly improved this
information to be bypassed through dense skip connections. article.
1452 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 51, NO. 3, MARCH 2021

R EFERENCES [25] B. Li, R. Liu, J. Cao, J. Zhang, Y.-K. Lai, and X. Liu, “Online low-rank
representation learning for joint multi-subspace recovery and clustering,”
[1] N. Ahn, B. Kang, and K.-A. Sohn, “Fast, accurate, and lightweight IEEE Trans. Image Process., vol. 27, no. 1, pp. 335–348, Jan. 2018.
super-resolution with cascading residual network,” in Proc. Eur. Conf. [26] J. Li, F. Fang, K. Mei, and G. Zhang, “Multi-scale residual network
Comput. Vis. (ECCV), Sep. 2018, pp. 256–272. for image super-resolution,” in Proc. Eur. Conf. Comput. Vis. (ECCV),
[2] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik, “Contour detection Sep. 2018, pp. 527–542.
and hierarchical image segmentation,” IEEE Trans. Pattern Anal. Mach. [27] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, “Enhanced deep
Intell., vol. 33, no. 5, pp. 898–916, May 2011. residual networks for single image super-resolution,” in Proc. IEEE
[3] M. Bevilacqua, A. Roumy, C. Guillemot, and M.-L. A. Morel, “Low- Conf. Comput. Vis. Pattern Recognit. (CVPR) Workshops, Jul. 2017,
complexity single-image super-resolution based on nonnegative neighbor pp. 1132–1140.
embedding,” in Proc. Brit. Mach. Vis. Conf., 2012, pp. 1–10. [28] S. Liu, D. Huang, and Y. Wang, “Receptive field block net for accurate
[4] L. Chen, J. Pan, and Q. Li, “Robust face image super-resolution via joint and fast object detection,” in Proc. Eur. Conf. Comput. Vis. (ECCV),
learning of subdivided contextual model,” IEEE Trans. Image Process., Sep. 2018, pp. 404–419.
vol. 28, no. 12, pp. 5897–5909, Dec. 2019. [29] A. Marquina and S. J. Osher, “Image super-resolution by
[5] C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional TV-regularization and Bregman iteration,” J. Sci. Comput., vol. 37,
network for image super-resolution,” in Proc. Eur. Conf. Comput. Vis., no. 3, pp. 367–382, 2008.
2014, pp. 184–199. [30] J. Pan et al., “Learning dual convolutional neural networks for low-level
[6] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using vision,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
deep convolutional networks,” IEEE Trans. Pattern Anal. Mach. Intell., Jun. 2018, pp. 3070–3079.
vol. 38, no. 2, pp. 295–307, Jan. 2016. [31] S. C. Park, K. K. Park, and M. G. Kang, “Super-resolution image recon-
[7] C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution construction: A technical overview,” IEEE Signal Process. Mag., vol. 20,
volutional neural network,” in Computer Vision—ECCV 2016, B. Leibe, no. 3, pp. 21–36, May 2003.
J. Matas, N. Sebe, and M. Welling, Eds. Cham, Switzerland: Springer [32] W. Shi et al., “Real-time single image and video super-resolution using
Int., 2016, pp. 391–407. an efficient sub-pixel convolutional neural network,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 1874–1883.
[8] Y. Fan et al., “Balanced two-stage residual networks for image super-
[33] J. Sun, Z. Xu, and H.-Y. Shum, “Image super-resolution using gradient
resolution,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR)
profile prior,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2008,
Workshops, Jul. 2017, pp. 1157–1164.
pp. 1–8.
[9] S. Gao, M. Cheng, K. Zhao, X. Zhang, M. Yang, and P. H. S. Torr, [34] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4,
“Res2Net: A new multi-scale backbone architecture,” CoRR, inception-ResNet and the impact of residual connections on learning,”
vol. abs/1904.01169, pp. 1–10, Sep. 2019. [Online]. Available: in Proc. ICLR Workshop, 2016, pp. 4278–4284. [Online]. Available:
http://arxiv.org/abs/1904.01169 https://arxiv.org/abs/1602.07261
[10] S. Guo, Z. Yan, K. Zhang, W. Zuo, and L. Zhang, “Toward convolutional [35] C. Szegedy et al., “Going deeper with convolutions,” in Proc. IEEE
blind denoising of real photographs,” in Proc. IEEE Conf. Comput. Vis. Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 1–9.
Pattern Recognit. (CVPR), 2019, pp. 1712–1722. [36] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking
[11] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for the inception architecture for computer vision,” in Proc. IEEE Conf.
image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 2818–2826.
(CVPR), Jun. 2016, pp. 770–778. [37] Y. Tai, J. Yang, and X. Liu, “Image super-resolution via deep recursive
[12] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in residual network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2018, (CVPR), Jul. 2017, pp. 2790–2798.
pp. 7132–7141. [38] Y. Tai, J. Yang, X. Liu, and C. Xu, “MemNet: A persistent memory
[13] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely network for image restoration,” in Proc. IEEE Int. Conf. Comput. Vis.
connected convolutional networks,” in Proc. IEEE Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 4549–4557.
Pattern Recognit. (CVPR), Jul. 2017, pp. 2261–2269. [39] R. Timofte et al., “NTIRE 2017 challenge on single image super-
[14] J.-B. Huang, A. Singh, and N. Ahuja, “Single image super-resolution resolution: Methods and results,” in Proc. IEEE Conf. Comput. Vis.
from transformed self-exemplars,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) Workshops, Jul. 2017, pp. 1110–1121.
Pattern Recognit. (CVPR), Jun. 2015, pp. 5197–5206. [40] R. Timofte, V. De Smet, and L. Van Gool, “A+: Adjusted anchored
[15] Z. Hui, X. Wang, and X. Gao, “Fast and accurate single image super- neighborhood regression for fast super-resolution,” in Computer Vision—
resolution via information distillation network,” in Proc. Conf. Comput. ACCV 2014, D. Cremers, I. Reid, H. Saito, and M.-H. Yang, Eds. Cham,
Vis. Pattern Recognit., 2018, pp. 723–731. Switzerlands: Springer Int., 2015, pp. 111–126.
[16] K. Jiang, Z. Wang, P. Yi, G. Wang, T. Lu, and J. Jiang, “Edge-enhanced [41] R. Timofte, R. Rothe, and L. Van Gool, “Seven ways to improve
gan for remote sensing image super-resolution,” IEEE Trans. Geosci. example-based single image super resolution,” in Proc. IEEE Conf.
Remote Sens., vol. 57, no. 8, pp. 5799–5812, Aug. 2019. Comput. Vis. Pattern Recognit., 2016, pp. 1865–1873.
[17] R. Keys, “Cubic convolution interpolation for digital image processing,” [42] C. Wang, Z. Li, and J. Shi, “Lightweight image super-resolution
IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-29, no. 6, with adaptive weighted learning network,” 2019. [Online]. Available:
pp. 1153–1160, Dec. 1981. arXiv:1904.02358.
[18] J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution [43] X. Wang, R. B. Girshick, A. Gupta, and K. He, “Non-local neural
using very deep convolutional networks,” in Proc. IEEE Conf. Comput. networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 1646–1654. Jun. 2018, pp. 7794–7803.
[44] Z. Wang et al., “Multi-memory convolutional neural network for
[19] J. Kim, J. K. Lee, and K. M. Lee, “Deeply-recursive convolutional
video super-resolution,” IEEE Trans. Image Process., vol. 28, no. 5,
network for image super-resolution,” in Proc. IEEE Conf. Comput. Vis.
pp. 2530–2544, May 2019.
Pattern Recognit. (CVPR), Jun. 2016, pp. 1637–1645.
[45] Z. Wang, J. Chen, and S. C. H. Hoi, “Deep learning for image super-
[20] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” resolution: A survey,” CoRR, vol. abs/1902.06068, pp. 1–24, Feb. 2019.
in Proc. Int. Conf. Learn. Represent., 2015. [Online]. Available: http://arxiv.org/abs/1902.06068
[21] W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Deep Laplacian [46] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image
pyramid networks for fast and accurate super-resolution,” in Proc. IEEE quality assessment: From error visibility to structural similarity,” IEEE
Conf. Comput. Vis. Pattern Recognit., 2017, pp. 624–632. Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
[22] R. Lan et al., “Cascading and enhanced residual networks for accu- [47] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “CBAM: Convolutional
rate single image super-resolution,” IEEE Trans. Cybern., early access, block attention module,” in Proc. Eur. Conf. Comput. Vis. (ECCV),
doi: 10.1109/TCYB.2019.2952710 Sep. 2018, pp. 3–19.
[23] R. Lan, Y. Zhou, Z. Liu, and X. Luo, “Prior knowledge-based proba- [48] B. Wronski et al., “Handheld multi-frame super-resolution,”
bilistic collaborative representation for visual recognition,” IEEE Trans. ACM Trans. Graph., vol. 38, no. 4, pp. 1–18, Jul. 2019.
Cybern., early access, doi: 10.1109/TCYB.2018.2880290. [Online]. Available: http://doi.acm.org/10.1145/3306346.3323024
[24] C. Ledig et al., “Photo-realistic single image super-resolution using [49] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution
a generative adversarial network,” in Proc. IEEE Conf. Comput. Vis. via sparse representation,” IEEE Trans. Image Process., vol. 19, no. 11,
Pattern Recognit. (CVPR), Jul. 2017, pp. 105–114. pp. 2861–2873, Nov. 2010.
LAN et al.: MADNet: FAST AND LIGHTWEIGHT NETWORK FOR SINGLE-IMAGE SUPER RESOLUTION 1453

[50] P. Yi, Z. Wang, K. Jiang, Z. Shao, and J. Ma, “Multi-temporal ultra dense Zhenbing Liu received the B.S. degree from Qufu
memory network for video super-resolution,” IEEE Trans. Circuits Syst. Normal University, Qufu, China, and the M.S. and
Video Technol., early access, doi: 10.1109/TCSVT.2019.2925844. Ph.D. degrees from the Huazhong University of
[51] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated Science and Technology, Wuhan, China.
convolutions,” in Proc. Int. Conf. Learn. Represent., 2016. He was a Visiting Scholar with the Department of
[52] M. D. Zeiler, G. W. Taylor, and R. Fergus, “Adaptive deconvolutional Radiology, University of Pennsylvania, Philadelphia,
networks for mid and high level feature learning,” in Proc. IEEE Int. PA, USA, in 2015. He is currently a Professor and a
Conf. Comput. Vis. (ICCV), Nov. 2011, pp. 2018–2025. Doctoral Supervisor with the School of Computer
[53] R. Zeyde, M. Elad, and M. Protter, “On single image scale-up using and Information Security, Guilin University of
sparse-representations,” in Curves and Surfaces, J.-D. Boissonnat et al., Electronic Technology, Guilin, China. His main
Eds. Heidelberg, Germany: Springer, 2012, pp. 711–730. research interests include image processing, machine
[54] K. Zhang, W. Zuo, and L. Zhang, “Learning a single convolutional learning, and pattern recognition.
super-resolution network for multiple degradations,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2018, pp. 3262–3271.
[55] Y. Zhang, L. Sun, C. Yan, X. Ji, and Q. Dai, “Adaptive residual networks
for high-quality image restoration,” IEEE Trans. Image Process., vol. 27,
no. 7, pp. 3150–3163, Jul. 2018. Huimin Lu received the M.S. degrees in elec-
[56] Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, “Image super- trical engineering from the Kyushu Institute of
resolution using very deep residual channel attention networks,” in Proc. Technology, Kitakyushu, Japan, and Yangzhou
Eur. Conf. Comput. Vis. (ECCV), Sep. 2018, pp. 294–310. University, Yangzhou, China, in 2011, and the Ph.D.
[57] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, “Residual dense degree in electrical engineering from the Kyushu
network for image super-resolution,” in Proc. IEEE Conf. Comput. Vis. Institute of Technology in 2014.
Pattern Recognit. (CVPR), Jun. 2018, pp. 2472–2481. From 2013 to 2016, he was a JSPS Research
[58] L. Zhou, Z. Wang, Y. Luo, and Z. Xiong, “Separability and compactness Fellow with the Kyushu Institute of Technology,
network for image recognition and superresolution,” IEEE Trans. Neural where he is currently an Associate Professor. He
Netw. Learn. Syst., vol. 30, no. 11, pp. 3275–3286, Nov. 2019. is an Excellent Young Researcher with MEXT,
Tokyo, Japan. His research interests include com-
puter vision, robotics, artificial intelligence, and ocean observation.

Rushi Lan received the B.S. and M.S. degrees from

Cheng Pang received the B.S. degree in com-
the Nanjing University of Information Science and
puter science and the M.S. and Ph.D. degrees in
Technology, Nanjing, China, and the Ph.D. degree
computer technology from the Harbin Institute of
from the University of Macau, Macau, China.
Technology, Harbin, China, in 2011, 2013, and 2018,
He is currently an Associate Professor with
respectively.
the Guangxi Key Laboratory of Image and
He is currently a faculty with the Guilin
Graphic Intelligent Processing, Guilin University of
University of Electronic Technology, Guilin, China.
Electronic Technology, Guilin, China. His research
His interests include pattern recognition, image pro-
interests include image classification, image denois-
cessing, machine learning, and computer vision.
ing, and metric learning.

Xiaonan Luo received the B.S. degree in com-

putational mathematics from Jiangxi University,
Nanchang, China, the M.S. degree in applied math-
ematics from Xidian University, Xi’an, China, and
the Ph.D. degree in computational mathematics
from the Dalian University of Technology, Dalian,
Long Sun received the B.S. degree from the Yunnan
China.
University of Finance and Economics, Kunming,
He is currently a Professor with the School
China, in 2018. He is currently pursuing the M.S.
of Computer and Information Security, Guilin
degree with the School of Computer Science and
University of Electronic Technology, Guilin, China.
Information Security, Guilin University of Electronic
He was the Director of the National Engineering
Technology, Guilin, China.
Research Center of Digital Life, Sun Yat-sen University, Guangzhou, China.
His current research interests include image/video
He received the National Science Fund for Distinguished Young Scholars
restoration and computational photography.
granted by the National Natural Science Foundation of China. His current
research interests include computer graphics, machine learning, and pattern
recognition.