Li 2019
Li 2019
https://doi.org/10.1007/s11554-019-00925-3
Abstract
Super-resolution is generally defined as a process to obtain high-resolution images form inputs of low-resolution observa-
tions, which has attracted quantity of attention from researchers of image-processing community. In this paper, we aim to
analyze, compare, and contrast technical problems, methods, and the performance of super-resolution research, especially
real-time super-resolution methods based on deep learning structures. Specifically, we first summarize fundamental problems,
perform algorithm categorization, and analyze possible application scenarios that should be considered. Since increasing
attention has been drawn in utilizing convolutional neural networks (CNN) or generative adversarial networks (GAN) to
predict high-frequency details lost in low- resolution images, we provide a general overview on background technologies
and pay special attention to super-resolution methods built on deep learning architectures for real-time super-resolution,
which not only produce desirable reconstruction results, but also enlarge possible application scenarios of super resolution
to systems like cell phones, drones, and embedding systems. Afterwards, benchmark datasets with descriptions are enumer-
ated, and performance of most representative super-resolution approaches is provided to offer a fair and comparative view
on performance of current approaches. Finally, we conclude the paper and suggest ways to improve usage of deep learning
methods on real-time image super-resolution.
Keywords Image super-resolution · Real-time processing · Deep learning · Convolutional neural network · Generative
adversarial network
1 Introduction
13
Vol.:(0123456789)
Journal of Real-Time Image Processing
scene. It has been proved that resolution raising via SR learning-based SR methods are proposed. We highlight deep
methods can largely increase the amount of available infor- learning methods for real-time SR by analyzing, compar-
mation and thus leads to an accurate and robust vision-based ing, and concluding their core ideas. Afterwards, perfor-
machine learning system [70, 75]. Therefore, SR methods mance of the most representative deep learning approaches
have gained great success in multiple domains, such as aerial on benchmark datasets is compared and analyzed. Finally,
imaging [138, 139], medical image processing [47, 118], we conclude to bring out open questions and future trends
automated mosaicking [13, 54], compressed image/video on improving deep learning methods for real-time SR
enhancement [11, 62], action recognition [73, 89], pose performance.
estimation [43, 44], face [18, 132], iris [4, 71], fingerprint The rest of the paper is organized as follows: Sect. 2 pre-
[61, 101] and gait recognition [3, 140], scene text image sents background concepts on super-resolution and deep
improvement and reading [121, 122], and so on. learning methods for the benefit of beginners. Section 3
For the past two decades, numerous methods have been highlights and analyzes deep learning-based methods for
proposed to perform SR task. One of the most classical tax- real-time SR. Benchmark datasets, evaluation methods
onomy ways to category SR methods relies on the number of and performance comparisons of quantity of methods are
LR images involved: single image or multiple images based presented in Sect. 4. Finally, the paper is summarized with
SR methods [74]. Essentially, there exist fundamental differ- discussions about remaining problems and future trends in
ences on thoughts to solve SR problems in terms of number Sect. 5.
of LR images. Specifically, single-image based SR methods
tend to hallucinate missing image details with relationship
learned from training datasets. Multiple-image based SR 2 Super‑resolution formation
methods generally utilize global/local geometric or photo-
metric relation between multiple LR images to reconstruct To offer an overall view on deep-learning based SR meth-
HR images. ods, it is useful to provide background information about
We must address that several surveys [39, 77, 107, 135] the underlying problems. In this section, we first introduce
have been conducted on single-image, multi-based based SR mathematic definition of SR problem by reviewing the rel-
or both, which is the basis to build up our paper. For exam- evant imaging model. Then, we categorize the existing meth-
ple, Nasrollahi and Moeslund [74] provide a comprehensive ods, which helps readers not only know history of SR, but
overview of most of published and related papers up to 2012, also comprehend deep learning-based SR methods by com-
which has offered plenty of reading resources to learn fun- paring them with other types of SR methods. Afterwards,
damentals and developing history for beginners in this field. we review SR applications in different domains and focus on
Yue et al. [135] reviews SR methods and applications based applications requesting real-time performance, which should
on machine learning techniques up to 2016. Most recently, be considered during the design of real-time SR methods.
Hayat [41] focuses on the deep learning-based progress of Finally, we explain fundamental thoughts of CNN and GAN
three aspects in multimedia, i.e., image, video and multi- structures for readers’ convenience, since this paper mainly
dimensions. Nguyen et al. [75] comprehensively survey the focuses on real-time methods for SR with CNN and GAN
super-resolution approaches proposed for a special applica- structures.
tion domain, i.e., biometrics including fingerprint, iris, gait,
face, etc. Built on but differing from these survey papers, 2.1 Definition of super‑resolution problem
our paper focuses on the review of deep learning-based SR
methods with real-time response, which could enlarge pos- Although SR methods give rise to many applications, the
sible application scenarios of SR methods to embedding sys- fundamental goal of SR is to collect missing detail infor-
tems, cell phone, drones and so on by effective performance mation by reconstructing super-resolved images from LR
and low computation cost. observations. In the literature, SR can be regarded as a heav-
In this paper, we attempt to establish a baseline for ily ill-posed problem, since the informative information
future work by providing a comprehensive literature sur- represented in LR images is often insufficient to complete
vey of deep learning methods in real-time SR research. the task of reconstruction. Therefore, SR methods need to
Incremental advances or thoughts to the state of the art complete three main tasks: up-sampling of LR images to
thus could be made or inspired on our provided baseline. increase image resolution, removing artifacts including blur
Specifically, we first summarize the fundamental problems, and noise during SR process, and registration or fusion of
categorize existing methods and review the applications to multiple input LR images for a better representation of target
the benefit of beginners. We then briefly introduce signifi- HR image.
cant progress in deep learning methods. By involving deep Based on these three tasks, we first describe the most
learning structures to solve problem of SR, quantity of deep common imaging model [51, 95] to generate LR images in
13
Journal of Real-Time Image Processing
the simplest case, where this process can be modeled lin- shift, vertical shift and rotation, and 𝜌i,j () is a pixel intensity
early as transformation function. Specifically, if the LR image of Li
is displaced from the HR scene of R by a translational vector
1 ∑ ∑
(q+1)m−1 (q+1)n−1
as (a, b) and a rotational angle of 𝜃 , the warping function in
Li (x� , y� ) = R(x, y), (1)
q2 x=qm y=qm homogeneous coordinates could be represented as
−1
where Li is an observed LR image, R is the original HR ⎛⎡ x ⎤⎞ ⎛⎡ 1 0 a ⎤ ⎡ cos 𝜃 sin 𝜃 0 ⎤⎞ ⎡m⎤
scene, q is a decimation factor or sub-sampling parameter 𝜔⎜⎢ y ⎥⎟ = ⎜⎢ 0 1 b ⎥ × ⎢ − sin 𝜃 cos 𝜃 0 ⎥⎟ ⎢n ⎥ (3)
⎜⎢ ⎥⎟ ⎜⎢ ⎥ ⎢ ⎥⎟ ⎢ ⎥
which is assumed to be equal for both x and y directions, x ⎝⎣ 1 ⎦⎠ ⎝⎣ 0 0 1 ⎦ ⎣ 0 0 1 ⎦⎠ ⎣1 ⎦
and y are the coordinates of the HR image, and m and n of
the LR images. The imaging model in Eq. 1 states that an In matrix form, Eq. 2 can be written as follows:
LR observed image has been obtained by averaging the HR {
Li = AR + 𝜂
intensities over a neighborhood of q2 pixels. (4)
Lj = 𝜌Li ,
This model becomes more realistic when the other param-
eters involved in the imaging process are taken into account, in which A stands for the above-mentioned degradation fac-
which is shown in Fig. 1, including stepwise representations tors. This imaging model has been used in many SR works.
for warping, down-sampling, noise adding, and blurring. It It is noted that the first line in Eq. 2 refers to warping, down-
is noted that SR methods could be generally considered sampling, noise adding, and blurring on a single HR image
as an inverse processing workflow to generate HR images and corresponds to inverse operations of task 1 and 2; mean-
based on LR images. Supposing the real-world image R is while, the second line refers to multi-view transformation on
captured by n different located cameras to form multi-view two different LR image and corresponds to inverse operation
images, the generated LR Li is constructed by the following of task 3. Moreover, we often define process of image reg-
formulas: istration [48, 80, 102] including functions 𝜏i,j () and 𝜌i,j () to
{ handle multi-view images during SR. Essentially, the pro-
Li (x� , y� ) = (𝜇i (𝛿i (𝜔i R(x, y))) + 𝜂i (x� , y� ))
Lj (x� , y� ) = 𝜌i,j (Li (𝜏i,j (x� , y� ))), (2) cess is to geometrically align multiple images of the same
scene onto a common reference plain, where images can be
where x, y and x′ , y′ refer to coordinates of one pixel in real- captured at different times and from different views, or by
world image R and generated LR images Li , Lj , respectively, multiple sensors.
𝜔i () is a warping function determined by locations and rota-
tions of multiple cameras, 𝛿i () is a own-sampling function,
𝜇i () is a blurring function, 𝜂i () is an additive noise, 𝜏i,j () is
a coordinate transformation function including horizontal
Fig. 1 Illumination of the most common used imaging model, where ward pipeline refers to a basic method of reverse super-resolution to
the forward pipeline represents the process of generating multiple reconstruct the real-world or HR image from one input LR image or
LR images {Li , … , Ln } from a real-world image R by warping (𝜔), multiple LR images by de-blurring, up-sampling, aligning, and image
down-sampling (𝛿 ), blurring (𝜇), and adding noise (𝜂 ) , and the back- registration (𝜏 and 𝜌)
13
Journal of Real-Time Image Processing
2.2 Categorization of super‑resolution methods data might introduce spurious high frequencies, resulting in
noise and blur details. Therefore, it is important to keep bal-
Following the description of imaging model and basic cat- ance between size of training data and reconstruction visual
egorization in [74], we further categorize the existing SR effects.
methods into single image SR (SISR) or multiple images With the development of machine learning technologies,
SR (MISR) as shown in Fig. 2. In the following discussions, researchers have tried quantity of learning models to solve
we would follow Fig. 2 to state the categorization of current SR problem. We further classify learning-based methods
SR methods. into five groups based on differences of their core ideas:
Being a highly ill-posed problem without sufficient infor- neighbor embedding methods [7, 15], sparse coding meth-
mation about the original image sets, early SISR methods ods [14, 28], self-exemplar methods [34, 46], locally linear
tend to utilize analytical interpolation to reconstruct HR regression methods [38, 129], and deep learning methods
images. Several famous interpolation-based SISR methods [26, 52, 67, 104]. In this paper, we focus on utilizing deep
can be listed as linear, bicubic, cubic splines interpolation learning-based methods to solve SR problem, due to their
methods, Lanczos up-sampling [29], New Edge Directed significant HR reconstruction results. Deep learning-based
Interpolation (NEDI) [63] and so on. These methods are methods will be comprehensively discussed in Sect. 3. In
very simple and effective ways in smooth parts with real- the following discussions, we will briefly introduce other
time performance. However, simple rule in interpolation four groups of learning-based methods for comparisons with
brings in overly smooth and blurring details, which harm the deep learning methods.
visual effect of image discontinuities like edges, boundaries, Neighbor embedding (NE) methods consider that similar
and corners. Hence more sophisticated insight to recover local geometries property is shared between LR patches and
image details via reasonable SR ways is required. their corresponding HR patches. Due to the similar local
Besides interpolation-based methods, researchers have geometry property of LR and HR feature space, patches
proposed another two categories of SISR methods, i.e., in the HR feature domain can be computed with a form of
reconstruction-based and learning-based methods. Recon- a weighted average of local neighbors. After construction
struction-based methods suppose there exist certain priors weight scheme, the whole SR computation process can share
or constraints in the form of distribution, energy function the same weights within LR feature domain. Based on this
or score function between HR and the original LR images. thought, Chang et al. [15] propose a SR method by apply-
Therefore, researchers try a variety of methods to perform ing a typical kind of manifold learning method, i.e., locally
SR tasks by establishing reconstruction priors like sharpen- linear embedding (LLE) [87] on weight learning. Their pro-
ing of edge details [21], regularization [5] or de-convolution posed method assumes each sample and its neighbors lie
[97]. on or near a locally linear patch of the manifold, the idea of
Learning-based methods try to restore missing high-fre- which has greatly influenced the subsequent coding-based
quency image details by establishing implicit relationship methods in early times.
between LR patches and their corresponding HR patches Sparse coding methods consider image patches as a
via machine learning models. This category of methods has sparse linear combination of elements, which could be
achieved more and more attention from researchers due to its selected from a pre-constructed and sparse enough dic-
promising and visually desirable reconstruction results. It is tionary. By exploiting reasonable and sparse enough
a general idea to enhance SR quality by learning relationship representation for each patch of low-resolution inputs,
from large quantity of training data. However, applying over the process of generating high-resolution outputs can be
13
Journal of Real-Time Image Processing
13
Journal of Real-Time Image Processing
images. After years of efforts, there exist successful and require humans to observe and design manual features to
applicable examples in remote sensing area [138]. For perform classification tasks. In this subsection, we focus
example, Skybox Imaging Plan utilizes SR techniques to on explanation of fundamental thoughts of CNN and GAN
help provide real-time remote sensing images under a sub- structures.
meter resolution, which shows that SR methods could help Convolutional neural network Inspired by the promis-
improve outputs of remote sensing applications. The main ing classification results achieved by a typical CNN, i.e.,
challenge for remote sensing image SR lies in two aspects: AlexNet [55], quantity of trials on structures, learning strate-
(1) how to deal with scene variations in case of temporal gies and applications have been made, some of which can be
differences, and (2) how to modify the existing and success- listed as: VGG [100], Googlenet [103], ResNet [42], R-CNN
ful SR methods to handle large amounts of remote sensing [33] and so on. Several common types of layers are involved
images captured by satellites every day. to construct CNN structure: convolutional layers, pooling
Based on former discussions, we can observe the vari- layers, and fully connected layers.
ety of application scenarios of SR methods. Among these Convolutional layers are designed to gather information
applications, medical diagnosis requires to process with less of neighboring pixels. In fact, each pixel is closely associated
errors and higher robustness, while remote sensing requires with neighboring pixels and nearly irrelevant with pixels in
to overcome temporal differences and handle massive long range, which is named as local receptive field. We show
amounts of input. Besides, both categories of applications the comparison between local connection and full connec-
have common property that users, i.e., doctors and scientists, tion adopted by fully connected layer in Fig. 3. Essentially,
could buy computation resource abundant devices to handle a convolution kernel can only extract one specific feature in
SR for edge computing or big data services [35, 127]. In a local sense. Researchers thus design multiple convolution
other words, they can easily access better equipments for kernels to extract a variety of features from input images.
more satisfying SR results, meanwhile keeping low comput- With different feature maps produced by multiple convolu-
ing time. For biometrics and text image reading applications, tion kernels, convolutional layers lead to better understand-
they mainly run on embedding devices and have high request ing of image content.
for real-time responses to improve user experience. There- Unlike convolutional layers, pooling layers are defined
fore, it is a key problem for such applications to keep balance without parameters. Pooling layers utilize down-sampling
between SR quality and computation cost. In this paper, we operation to extract features from feature map, which reduce
would like to emphasize applications like biometrics and data size without modifications on data characteristics. The
text image reading, thus reviewing SR methods for real-time resulting abstract feature not only owns generalization abil-
computing and fulfilment of their emergency requests. ity from feature maps, but also has a certain degree of vari-
ety for translation, rotation, and scaling invariance. Pooling
2.4 Deep learning background layers thus help improve robustness and generalization per-
formance of the whole network.
The most popular structures related to SR can be roughly Fully connected layers locate at the end of CNN, where
categorized into the two groups: convolutional neural net- each neuron is connected to all neurons in the upper
work (CNN) and generative adversarial networks (GAN). layer. With such processing, features locally extracted
These deep structures have high capability to represent infor- are globally involved to output the final result. Above
mation abundant and distinctive features by self-learning all, CNN structure combines abilities provided by con-
strategies. For comparison, traditional learning structures volutional, pooling and fully connected layers, which
13
Journal of Real-Time Image Processing
help approximate any continuous function and ensure 3 Deep learning methods for real‑time
to perform difficult classification and recognition tasks super‑resolution
successfully.
Generative adversarial network As one of the most sig- Despite there exist quantity of review papers on deep learn-
nificant improvements on the research of deep generative ing methods for SR [39, 77, 107, 135], there is a lack of
models, GAN [36] provides a novel way to learn depth reviewing of methods for real-time SR to the benefit of
representation of features without large labeled training researchers. In this section, we attempt to survey deep learn-
data. With this power for distribution modeling, GAN is ing literature, including CNN, GAN and other deep learning
extremely suitable to for unsupervised tasks [69, 148] methods, with the view of real-time super-resolution.
including image generation [24], image editing [150], and In the following discussion of each subsection, we first
representation learning [84]. introduce several typical methods, which have achieved sig-
The key idea of GAN stems from a two-player game nificant HR reconstruction results but failed to obtain real-
preformed by a generator and a discriminator, where we time performance. Then, we highlight fast and real-time
show the basic structure of a typical GAN for generating deep learning methods, where we would carefully explain
hand-written digital images in Fig. 4. Specifically, the their core ideas, innovations, algorithm steps and perfor-
discriminator is responsible to judge whether an input mances. Finally, we conclude current state and discuss future
image generated by the generator appears natural or developing trend. Furthermore, it is noted that we fuse SISR
could be found with artifacts; meanwhile the generator and MISR methods in this section, since many deep learn-
is to create images with the goal of making discriminator ing- based SR methods have ability to perform SISR and
believe the created image is a natural image without any MISR.
artifacts. After rounds of training, Nash Equilibrium will
be achieved, where the trained generator would have the 3.1 CNN‑based methods in real‑time image
ability to understand inherent and intern representation of super‑resolution
real images, thus generating real enough images.
GAN is still developing with great leading steps on 3.1.1 CNN‑based methods for SR
structures, training algorithms and so on. With the devel-
opment of GAN structure, more related applications have CNN-based SR methods are quite large in amount, due to
been applied by researchers. However, training GAN for their impressive HR image reconstruction results. The first
data augmentation is challenging, since the training pro- work to solve SISR problem by CNN structure is proposed
cess can be easily trapped into the mode collapsing prob- by Dong et al. [26], which constructs a three-layer CNN
lem. Essentially, mode collapsing problem is defined as named as Super-Resolution Convolutional Neural Network
that where the generator only concentrates on producing (SRCNN) to learn mapping between LR patches and cor-
samples lying on a few modes, instead of the whole data responding HR patches. Utilizing a bicubic interpolation
space [16, 92]. It is noted this problem exists in SR appli- for pre-processing, SRCNN optimizes and learns nonlinear
cations with GAN model as well, which will be further mapping in manifold space based on information abun-
explained in Sect. 3.2. dant feature maps produced by convolutional layers. With
the high distinguishing power of deep structure, SRCNN
13
Journal of Real-Time Image Processing
has achieved promising reconstruction results outperform- Zhang et al. [142] adopt an existing channel attention
ing majority of former methods, such as Self-Ex [46], A+ mechanism to construct very deep residual channel atten-
[110] and Kernel-based learning. Although SRCNN claims tion networks (RCAN). Their proposed residual in residual
efficiency with a lightweight structure, its performance is (RIR) structure is specially designed to bypass abundant
still far from real-time response due to its time-consuming low-frequency information for learning of high-frequency
pre-processing step, i.e., bicubic interpolation. information. Hu et al. [45] propose a channel-wise and spa-
Essentially, SRCNN is an important work to offer inspi- tial feature modulation (CSFM) network, which connects
rations on utilizing deep structure for SR purposes. After feature-modulation memory (FMM) modules with stack
its publication, SRCNN appeared in many other works as connections for transforming low-resolution features to
a baseline method for comparisons or a basic structure to high informative features. Considering that most of CNN-
modify for new learners. More important, its structure with based methods have not fully exploited all the features of
only convolutional layers has greatly affected later CNN- the original low-resolution image, Shamsolmoali et al. [96]
based SR methods, which successfully avoid down-sampling proposed an effective model based on dilated dense network
effects brought by pulling or subsampling layers. However, operations to accelerate deep networks for image SR, which
the additional convolutional layers largely increase size of supports the exponential growth of the receptive field paral-
parameters, resulting in more possibility to be overfit with lel by increasing the filter size.
training dataset and be harder to achieve real-time responds.
How to keep a balance between desirable reconstruction per- 3.1.2 CNN‑based methods for real‑time SR
formance and real-time computing speed thus becomes a
major challenge in utilizing deep structures for SR. CNN-based SR methods have demonstrated remarkable per-
With the remarkable success to construct very deep struc- formance in quality of reconstructed HR images, compared
tures achieved by Res-Net, its core idea of residual learn- with the previous non deep learning based models. However,
ing is adopted by researchers to perform SR tasks. Residual high computation cost and large computing time
learning not only offers capability to construct larger number prevent its further practical usage, especially in phones or
of layers for better HR reconstruction results, but also reduce wearable devices that demand small computing burden and
difficulty in training process with fast convergence and a real-time performance. There are thus many trials to acceler-
small number of epoches. For example, Kim et al. [52] first ate network for real-time performance, in order to enlarge
try residual learning for SR with a novel Very Deep Convo- possible application scenarios of CNN-based SR methods.
lutional Networks (VDSR), which imitates VGG-net struc- One of the most famous successful and inspiring tri-
ture [100] to build 20 convolutional layers as a a very deep als is advised and performed by Shi et al. [99], who find
network. Following the trend of applying residual learning that utilizing a single filter, usually bicubic interpolation,
on SR, Tai et al. [104] propose Deeply Recursive Residual before reconstruction to up-scale input LR images is sub-
Network (DRRN), which first utilizes global residual learn- optimal and time-consuming. They thus prefer to avoid such
ing to identify branch during inference and then proposes pre-interpolation operation by utilizing an end-to-end and
new concept of local residual learning to optimize local unified CNN structure named as Efficient Sub-pixel CNN
residual branch. (ESPCN) for SR tasks, which tries to directly learn a up-
Recently, the main focus of CNN-based SR research is to scaling filter, i.e., sub-pixel convolution layer, and integrates
utilize proper technologies for either improved HR recon- it into the structure of CNN network. We show the structure
struction results or fast computing speed. For example, Deep of ESPCN in Fig. 5, where we can notice feature maps to
Back-Projection Networks (DBPN) [40] propose iterative fill up image detail information is extracted in the LR space,
up- and down- sampling layers to form an error feedback rather than performing in HR space by most of the SR meth-
scheme, which help transmit projection errors among differ- ods. Afterwards, extracted feature map in different layers is
ent layers. With such scheme, they can represent the process fed into sub-pixel convolution layer for further processing.
of image degradation and super-resolution by simply con- As far as we know, ESPCN is the first CNN-based SR
necting up- and down-sampling layers, thus improving HR methods with real-time performance, which is reported to
reconstruction results with large scaling factors. perform real-time SR tasks on 1080p videos using a K2
Related to the topic of real-time SR, Lim et al. [65] GPU device. Besides, authors report reconstruction results
develop an enhanced deep super-resolution network (EDSR) achieved by ESPCN is better than SRCNN by +0.15dB on
with performance exceeding current state-of-the-art SISR Set14 dataset images.
methods. Their proposed method performs optimization by Nearly the same time with ESPCN [99], Dong et al. [27]
removing unnecessary modules in conventional residual successfully accelerate SRCNN by constructing a compact
networks and expands model depth with a stable training hourglass-shape CNN structure, named as FSRCNN. As
procedure. Inspired by the development of attention models, shown in Fig. 6, their structure modification on accelerating
13
Journal of Real-Time Image Processing
Fig. 5 Network structure of Shi et al. [99] for real-time SR, where they propose sub-pixel convolutional layers to perform up-sampling opera-
tions
Fig. 6 Network structure of FSRCNN [27], which successfully accelerates SRCNN by several modifications
SRCNN lies in three aspects: (1) they replace bicubic inter- Following the idea to replace time-consuming up-
polation operation of SRCNN with a de-convolution layer sampling with partly design of neural network in ESPCN
located at the end of CNN network, thus avoiding bicubic [99] and FSRCNN [27], Yamanaka et al. [128] integrate
pre-interpolation with visible reconstruction artifacts and network in network (NIN) structure [66], i.e., Parallelized
unnecessary computational cost; (2) they train four convolu- 1 × 1 CNNs, into the whole network as a post processing
tion layers in a joint optimization manner to complete three step for efficient up-sampling operations. Specifically, they
tasks during feature extraction, i.e., shrinking, mapping, and first involve deep CNN layers and skip connection layers
expanding. Essentially, structure design of placing mapping to extract feature maps by gathering information from both
layer after shrinking layer will greatly reduce feature dimen- local and global areas. Afterwards, they utilize NIN lay-
sions, leading to smaller computation cost; (3) they utilize ers to perform up-sampling operation for reconstruction of
smaller filter sizes and more mapping layers, in order to HR images. Authors report 10 times lower computation cost
achieve desirable reconstruction results and less computing achieved by their proposed work than typical deep resid-
burden at the same time. ual network for SR tasks, thus ensuring real-time perfor-
FSRCNN is reported to achieve real-time performance mance. However, its simple structure design results in rela-
(> 24 fps) on test images in all benchmark datasets, which tively worse reconstruction results, comparing with results
is almost a 40-time improvement than SRCNN in computing achieved by FSRCNN.
speed. Compared with ESPCN [99] which achieves real- To achieve real-time performance, ESPCN [99],
time performance on GPU, FSRCNN [27] could process LR FSRCNN [27] and Yamanaka et al. [128] replace bicubic
images in real-time on a CPU-based platform, which largely up-sampling operation with sub-pixel convolution layers, de-
expands its possible applicable scenarios. convolution layers, and NIN layers, respectively. However,
13
Journal of Real-Time Image Processing
Lai et al. [56] argue that sub-pixel convolution, de-convo- only help go deeper of neural networks by preventing gradi-
lution or NIN layer adopts small size networks, which can- ent loss, but also relieve computation burden with unneces-
not guarantee to describe complicated mappings with their sary computing steps. Tong et al. [112] thus introduce dense
limited network representation capacity. They thus propose skip connections in a very deep neural network for SR tasks,
the Laplacian Pyramid Super-Resolution Network (LapSRN) where we show its structure in Fig. 8. In each dense block,
to progressively reconstruct sub-band residuals for visually we can notice input low-level features and generated high-
desirable HR images, where we show its structure in Fig. 7. level features are combined in a reasonable way to boost
During construction of each pyramid, feature extraction reconstruction performance. To properly fuse low- and
branch first predicts missing high-frequency residuals of high-level features, dense block structure propagates feature
HR images based on the input of coarse-resolution feature maps generated by each layer into all subsequent layers and
maps. Afterwards, image reconstruction branch adopts trans- designs dense skip connection to allow for deeper structure.
posed convolutions to perform up-sampling operation, thus Since deep network generally leads to large computation
generating finer and coarser feature maps as input for next cost, they further integrate de-convolution layers to reduce
level. Furthermore, they design recursive layers as parame- number of parameters for boosting speed of reconstruction
ters sharing scheme across and within pyramid levels, which process. By employing algorithm on a platform with a Titan
helps reduce the number of parameters in a large amount. X GPU, their proposed method could achieve an average
With the careful design of pyramid structure, LapSRN running time of 36.8ms to reconstruct a single image from
claims to adaptively build model with different up-sampling B100 dataset, thus guaranteeing real-time SR effects.
scales, thus reducing computational complexity and achiev- Unlike former CNN-based methods for real-time SR to
ing real-time computing speed on public testing datasets. utilize up-sampling operation and residual learning, Johnson
Besides pyramid structure, another solution to improve et al. [50] tend to consider SR as an image transformation
small network without enough representation ability is con- problem between input LR and output HR images. They thus
structing neural networks with skip connections, which not model SR as a global optimization problem under a given
Fig. 7 Network structure of Lai et al. [56], where a careful design with pyramid structure and two working branches is adopted to progressively
reconstruct sub-band residuals of HR images
13
Journal of Real-Time Image Processing
Fig. 8 Network structure of Tong et al. [112], which represents that all levels of features are combined via skip connections as input to recon-
struct HR images
objective function, which is formed as the sum of per-pixel Dai et al. [22] explore to enhance the representational power
loss between HR and ground-truth images and perceptual of CNNs for more powerful feature expression and feature
loss based on high-level features extracted from pre-trained correlation learning. Specifically, they propose a second-
neural networks. Such optimization problem could be solved order attention network (SAN) to adaptively rescale the
by Gatys et al. [32] in real-time, thus achieving similar quali- channel-wise features by using second-order feature statis-
tative results but with three times faster running speed than tics for more discriminative representations. Furthermore,
SRCNN and other comparative methods. Essentially, their they present a non-locally enhanced residual group (NLRG)
usage of perceptual loss functions to train feed-forward net- structure, which not only incorporates non-local operations
works for SR problems and other image transformation tasks to capture long-distance spatial contextual information, but
is novel and offers inspirations to many other similar works. also contains repeated local-source residual attention groups
Inspired by A+ [110] and ARN [1] which build mapping (LSRAG) to learn increasingly abstract feature representa-
between input LR space and output HR space by learning tions. All these improvements in structure have been shown
proper local linear functions, Li et al. [64] construct convo- in Fig. 10.
lutional anchored regression network (CARN) to learn repre- To solve the problem of lack of realistic training data
sentative mapping function for fast and accurate SR. Differ- and information loss of the model input, Xu et al. [124]
ent from A+ and ARN, regression blocks inside CARN are propose a new pipeline to generate realistic training data by
built on the basis of automatically extracting feature map by simulating the imaging process of digital cameras. To fur-
convolutional filters, other than limited and hand-crafted fea- ther remedy the information loss of the input, they develop
tures used by A+ and ARN. Besides, CARN transforms all a dual convolutional neural network to exploit the origi-
key concepts during SR operation, such as feature extracting, nally captured radiance information in raw images. They
anchor detection, and regressor construction, into convolu- gain favorable reconstruction results both quantitatively
tion operations with different parameters, so that users can and qualitatively, and their proposed method is declared to
jointly optimize all steps in an end-to-end manner. Such end- enable super-resolution for real captured images.
to-end design relieves the burden of complicated step-wise After precise description on quantity of works about
optimization and thus decreases running time without lose CNN-based methods for real-time SR, we can conclude
of accuracy. CARN is reported to achieve 10 times lower their general structure. Specifically, in order to widen the
computation cost than SRCNN, thus reaching real-time per- receptive field, increasing network depth is one way adopted
formance on most platforms. by quantity of methods, which is to construct network by a
Due to the requirement of heavy computation, deep learn- convolutional layer with filter size larger than a 1 × 1 or a
ing methods cannot be easily applied to realworld appli- pooling layer that reduces the dimension of intermediate
cations. To address this issue, Ahn et al. [2] propose an representation. However, such designing may have a major
accurate and lightweight deep network for image super-res- drawback: a convolutional layer introduces more param-
olution. Specifically, they design an architecture that imple- eters and a pooling layer typically discards some pixel-wise
ments cascading connections starting from each intermedi- information.
ary layer to the others upon a residual network, where we For the first issue of too many convolutional layers,
show its special structure design in Fig. 9. Such connections we can see that each convolutional layer represents a new
are made on both the local and global levels, which allows weight layer so that deep structure with quantity of convo-
for the efficient flow of information and gradient. lutional layers bring disadvantages of too many parameters.
To explore feature correlations of intermediate layers This problem might lead to over-fitting concern, difficulty
rather than focus on wider or deeper architecture design, to achieve real-time performance, and huge size of trained
13
Journal of Real-Time Image Processing
Fig. 9 Network structure of Ahn et al. [2], where a, b represent plain ResNet and CARN structure, respectively. In the CARN model, each
residual block is changed to a cascading block
Fig. 10 Framework of second-order attention network (SAN) [22] and its sub-modules
model to store. Therefore, light structure of CNN without 3.2 GAN and other deep learning‑based
additional and time-consuming designs is required and pur- super‑resolution methods
sued by researchers, which is the main trend of CNN-based
methods for real-time SR. Due to the unsupervised training property of GAN [90],
Regarding image restoration problems such as super- GAN-base SR methods could use a large dataset of unla-
resolution and denoising, image details are very important. beled images and work without any prior knowledge between
Therefore, most deep-learning approaches for such problems inputting LR and HR image, which is essentially the main
do not use pooling or sub-sampling layers, so that important feature of GAN-based SR methods. Since GAN is originally
image details can be saved during process of SR tasks. For designed to generate images, GAN-based SR methods could
example, DRCN [53] repeatedly applies the same convo- achieve super performance in generating photo-realistic SR
lutional layer as many times as desired and does not apply images. However, we find no real-time methods are proposed
pooling layers in their network architecture. The number of for SR at current time. In fact, the most common use of GAN
parameters does not increase while more recursions are per- is to regard its generator part as a CNN network to perform
formed, which offers a promising idea on network structure. low-to-high SR task. Without special design like statements
13
Journal of Real-Time Image Processing
in last subsection, it is hard for such GAN-based methods to sampled from public benchmarks. Meanwhile, SR methods
achieve real-time performance. In our thought, researchers designed with MSE-based measurement generate poorly
on GAN are still focusing on solving several most important visual HR images, even achieving high PSNR values at
problems of GAN structure, like mode collapsing and hard the same time. Therefore, we could conclude that SRGAN
to train. focuses on global contextual information to generate HR
At last but not least, we introduce other deep learning images, thus achieving lower pix-wise PSNR values than
methods for SR, including deep auto-encoder and deep rein- comparative methods as shown in Fig. 12. However, human
forcement learning. perception is visual effect in a global sense; we thus observe
Ledig et al. [60] find that most commonly used measure- photo-realistic textures from samples generated by SRGAN.
ments to evaluate SR performance, such as MSE and PSNR, With the first and successful trial in SRGAN, Johnson
are designed with pixel-wise property. Since human percep- et al. [50] further modify SRGAN on design of loss function,
tion often relies on evaluation in a global sense, some SR which could be improved as a sum form of pixel-wise loss,
methods with high PSNR and MSE values would result in perceptual loss, and texture matching loss. In the context
poor visual effects during SR process. Inspired by the sig- of combining GAN and CNN, Sajjadi et al. [91] propose
nificant property of GAN, they thus proposed super-resolu- EnhanceNe for automated texture synthesis, which utilizes
tion generative adversarial network (SRGAN) with a novel feed-forward fully convolutional neural networks in an
measurement, named as perceptual similarity. We show the adversarial training setting. Their proposed network suc-
structure of SRGAN in Fig. 11. Specifically, perceptual cessfully create realistic textures, rather than optimizing for
similarity is measured by a perceptual loss function, which a pixel-accurate reproduction of ground truth images during
serves as a sum form of an adversarial loss and a content training.
loss. It is noted that adversarial loss ensures SRGAN could By leveraging the extension of the basic GAN frame-
generate high-quality and photo-realistic SR images with work [149], Yuan et al. [133] propose an unsupervised SR
help of a discriminator network, which is trained to classify algorithm with a Cycle-in-Cycle network structure, which
the generated SR images are whether super-resolved images is inspired by the recent successful image-to-image trans-
or original natural scene images. lation applications. They further expand the algorithm to
We show sampling images of SRGAN and other com- a modified version, i.e., MCinCGAN [143], which utilizes
parison samples in Fig. 12. From Fig. 12, we could notice a multiple Cycle-in-Cycle network structure to deal with
that SRGAN, designed with perceptual loss function, could the more general case of SR tasks, using multiple genera-
recover photo-realistic textures from low-resolution images tive adversarial networks (GAN) as the basis components.
13
Journal of Real-Time Image Processing
Fig. 12 Comparisons on SR quality, PSNR, and SSIM of different SR measurement, deep residual generative adversarial network designed
methods. It is noted that four images refer to HR images generated with human perception loss [60], and original HR image, respectively
by bicubic interpolation, deep residual network optimized by MSE
More precisely, their proposed first network cycle aims at In this way, the number of total cycles depends on the dif-
mapping the noisy and blurry LR input to a noise-free LR ferent up-sampling factors (×2, ×4, ×8), which is presented
space. On the basis of first network cycle, a new cycle with in Fig. 13. Finally, users could get the desired HR images
a well-trained ×2 network model is orderly introduced to with different scale factors by training all modules in an
super-resolve the intermediate output of the former cycle. end-to-end manner .
Fig. 13 The framework of the proposed MCinCGAN [143], where G1, G2, G3, G4, and G5 are generators and D1, D2, D3, and D4 are discrimi-
nators. It is noted that a–c show the frameworks for ×2, ×4 and ×8, respectively
13
Journal of Real-Time Image Processing
Bulat et al. [10] propose a two-stage process involving trained in an unsupervised way, i.e., without using paired
the idea of using a GAN to learn how to perform image images. since there are no pairs of LR-HR images in
degradation at first and then learn image super-resolution practice,
with trained GAN, where we show its network structure Bulat and Tzimiropoulos [9] propose Super-FAN to com-
in Fig. 14. Specifically, they train a High-to-Low GAN plete two tasks simultaneously, i.e., improves resolution of
to learn degradation and down-sampling operations on face images and detects facial landmarks inside improved
HR images during the first stage. Once training process of face images. Essentially, super-FAN constructs two sub-
High-to-Low GAN is finished, they utilize pairs of low- networks to first optimize loss function for constructing
and high-resolution images computed by High-to-Low of a convinced heat map and then perform face alignment
GAN as training samples for a new Low-to-High GAN. through heat map regression. By jointly training both sub-
After training based on enough and variant training pairs networks, they report desirable HR and detection results
generated by High-to-Low GAN, the resulting Low-to- based on not only input LR images, but also real-world
High GAN could output desirable HR images The most images affected by variant factors.
interesting point for Bulat et al. [10] lies in the fact that To further enhance the visual quality, Wang et al. [116]
their proposed method only requires unpaired image data propose enhanced Super-Resolution Generative Adversarial
for training so that annoying work of pairing low and Network (ESRGAN) to generate realistic textures, which
high-resolution images can be avoided. By applying this introduces the Residual-in-Residual Dense Block (RRDB)
two-stage process, the proposed unsupervised model effec- without batch normalization as the basic network build-
tively increases the quality of super-resolving real-world ing unit. Furthermore, they successfully modify network
LR images and obtains large improvement over previous structure with relativistic GAN and improve the perceptual
state-of-the-art works. Although Bulat et al. [10] can simu- loss with features before activation. Benefiting from these
late more complex degradation, there is no guarantee that improvements, ESRGAN achieves consistently better visual
such simulated degradation can approximate the authentic quality with more realistic and natural textures and wins the
degradation in practical scenarios which is usually very first place in the PIRM2018-SR Challenge.
complicated. Therefore, Zhao et al. [146] improves it by Deep Reinforcement Learning (DRL) has also been intro-
exploring the relationship between reconstruction and duced recently. Following the thought of reinforce learning,
degradation with bi-cycle structure, which jointly stabi- DRL generally designs a learning policy to guide spatial
lizes the training of SR reconstruction and degradation attention. In other words, they utilize reward scheme of rein-
networks. Most importantly, their degradation model is force learning to navigate up-scaling regions, which results
Fig. 14 Architecture design of Bulat et al. [10]. It is noted that LR and HR images are not paired in the training dataset
13
Journal of Real-Time Image Processing
in an adaptively optimizing way for SR based on the char- descriptions in Table 1, where we can notice their original
acteristics of inputting images. For instance, Cao et al. [12] usages are different such as segmentation, classification, etc.
propose a novel attention-aware Face Hallucination frame- Therefore, there are no actually construction rules to organ-
work, which follows principles of DRL to sequentially dis- ize a specific dataset on SR topic. Among all these dataset,
cover patches required to up-scale at first. Afterwards, they four datasets, i.e., SET5, SET14, B100, and URBAN100, are
follow the resulting optimization sequence to perform facial mostly commonly used for comparison in SR community.
patch enhancement by exploiting and involving global char- Our following performance will be performed on these four
acteristics of the inputting facial image. Work of Cao et al. datasets.
[12] is quite new in concept and provides a novel thought There are two standard quality measures, i.e., peak signal-
on how to adaptively process high-resolution images. How- to-noise ratio (PSNR) and structural similarity index (SSIM)
ever, the computation of [12] is much larger than SRCNN, [37], which have been mostly used for measuring the quality
reporting four times larger than SRCNN, due to the high of super resolution methods. Specially, the definition of PSNR
computation cost of its deep structure. relies on MSE; we thus define these three measures as follows:
m∗n
MSE = ∑m ∑n (5)
2
4 Comparisons i=1 j=1 (I(i, j) − P(i, j))
( )
4.1 Datasets and measurements 2552
PSNR =10 × log (6)
MSE
Many image datasets are popular to be adopted to prove and
compare effectiveness of different super resolution meth-
ods in SR community. We list most of them with cites and
Table 1 Image dataset for super resolution and their corresponding descriptions
Index Dataset name Descriptions
1 SET5 [7] It contains five images named as baby, bird, butterfly, head, and woman. Scale factor used includes 2×, 3× and
4×
2 SET14 [137] It includes 14 commonly used images for super-resolution evaluation. Compared with Set5, more diversity are
introduced by SET14 including bridge, comic, poster and so on
3 B100 [68] Full name is Berkeley Segmentation Dataset and Benchmark, which was originally designed for natural scene
image segmentation. Due to its high quality to cover variant attributes of natural scene images, it has been
widely adopted to evaluate performance of SR approaches on natural scene images. It is noted that BSD300
uses 100 testing images (named here ‘B100’) and 200 training images. The latest version is named as
BSD500 including 200 more fresh test images
4 URBAN100 [46] It has 100 HR images diverse in real-world scenes and is widely used by examining with its self-similarities
5 ImageNet [88] ImageNet is a large visual database in visual object recognition and classification research. Latest, more than
14 million images with over 20,000 categories have been hand-annotated. Since 2010, A famous challenge
named as the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is performed on ImageNet to
compete on object or scene classification and detection
6 super texture dataset [20] The main concern of this dataset lies on SR operation on texture. It provides 136 texture images in total
7 Timeofte dataset [110] It is one of the most widely used datasets by SISR researchers, which consists of 91 training images, and test-
ing images in BSD500, Set5, Set14 and super texture dataset
8 DIV2K [108] The DIVerse 2K resolution image dataset served as a benchmark for NTIRE 2017 Challenge. It is designed to
evaluate SR performance on high-resolution images. It consists of 800 training, 100 validation, and 100 test
images
9 Mnist [59] The MNIST database contains 60,000 training and 10,000 testing images. It is a large database of handwritten
digits to evaluate performance of SR on hand-written images
10 Manga109 [31] It consists of numerous comic sketches from 109 Japanese comic books and often adopted as basic dataset to
evaluate performance of SR on drawing images
11 LIVE [98] All images are originally captured by another project for the purpose of generic shape matching and recogni-
tion. Now it serves as a dataset of images with visual distortion effects to examine the extreme handling
performance of SR methods
12 L20 [111] It focuses on images with large size. The common size in L20 is range from 3m to 29m pixels, while most of
the other datasets own images below 0.5 m pixels
13
Journal of Real-Time Image Processing
13
Journal of Real-Time Image Processing
Table 2 Quantitative evaluation Methods Scale Set5 Set14 B100 Urban100 Manga109
of a quantity of SR algorithm,
where average PSNR/SSIM ×2 Bicubic 33.66/0.9299 30.24/0.87688 29.56/0.8431 26.88/0.8403 30.80/0.9339
with scale factor ×2, ×3, ×4 is
A+ [110] 36.54/0.9544 32.28/0.9056 31.21/0.8863 29.20/0.8938 −/−
listed, information not provided
by original authors is marked SRCNN [26] 36.66/0.9542 32.45/0.9067 31.36/0.8879 29.50/0.8946 35.60/0.9663
with [–] VDSR [52] 37.53/0.9590 33.05/0.9130 31.90/0.8960 30.77/0.9140 37.22/0.9750
EDSR [65] 38.11/0.9602 33.92/0.9195 32.32/0.9013 32.93/0.9351 39.10/0.9773
LapSRN [58] 37.52/0.9591 33.08/0.9130 31.08/0.8950 30.41/0.9101 37.27/0.9740
GuideAE [17] 37.52/0.9591 33.08/0.9130 31.08/0.8950 30.41/0.9101 37.27/0.9740
SRGAN [60] 37.22/0.9263 32.14/0.8862 31.89/0.8761 31.02/0.8955 −/−
ESRGAN [116] 37.81/0.9531 33.62/0.9152 31.99/0.8873 32.01/0.9131 −/−
SRMDNF [141] 37.79/0.9601 33.32/0.9154 32.05/0.8984 31.33/0.9204 38.07/0.976
IDN [147] 37.83/0.9600 33.30/0.9148 32.08/0.8985 31.27/0.9196 38.02/0.9749
CSFM [45] 38.26/0.9615 34.07/0.9213 32.37/0.9021 33.12/0.9366 39.40/0.9785
CARN [2] 37.76/0.9590 33.52/0.9166 32.09/0.8978 31.51/0.9312 −/−
RDN [144] 38.24/0.9614 34.01/0.9212 32.34/0.9017 32.89/0.9353 39.18/0.9780
SAN [22] 38.35/0.9619 34.44/0.9244 32.50/0.9038 33.73/0.9416 39.72/0.9797
×3 Bicubic 30.39/0.8682 27.55/0.7742 27.21/0.7385 24.46/0.7349 26.95/0.8556
A+ [110] 32.58/0.9088 29.13/0.8188 28.29/0.7835 26.03/0.7973 −/−
SRCNN [26] 32.75/0.9090 29.30/0.8215 28.41/0.7863 26.24/0.7989 30.48/0.9117
VDSR [52] 33.67/0.9210 29.78/0.8320 28.83/0.7990 27.14/0.8290 32.01/0.9340
EDSR [65] 34.65/0.9280 30.52/0.8462 29.25/0.8093 28.80/0.8653 34.17/0.9476
LapSRN [58] 33.82/0.9227 29.87/0.8320 28.82/0.7980 27.07/0.8280 32.21/0.9350
GuideAE [17] 33.82/0.9227 29.87/0.8320 28.82/0.7980 27.07/0.8280 32.21/0.9350
SRMDNF [141] 34.12/0.9254 30.04/0.8371 28.97/0.8025 27.57/0.8398 33.00/0.9403
IDN [147] 34.11/0.9253 29.99/0.8354 28.95/0.8013 27.42/0.8359 32.69/0.9378
CSFM [45] 34.76/0.9301 30.63/0.8477 29.30/0.8105 28.98/0.8681 34.52/0.9502
CARN [2] 34.29/0.9255 30.29/0.8407 29.06/0.8034 27.38/0.8404 −/−
RDN [144] 34.71/0.9296 30.57/0.8468 29.26/0.8093 28.80/0.8653 34.13/0.9484
SAN [22] 34.89/0.9306 30.77/0.8498 29.38/0.8121 29.29/0.8730 34.74/0.9512
×4 Bicubic 28.42/0.8104 26.00/0.7027 25.96/0.6675 23.14/0.6577 24.89/0.7866
A+ [110] 30.28/0.8603 27.32/0.7491 26.82/0.7087 24.32/0.7183 −/−
SRCNN [26] 30.48/0.8628 27.50/0.7513 26.90/0.7101 24.52/0.7221 27.58/0.8555
VDSR [52] 31.35/0.8830 28.02/0.7680 27.29/0.0726 25.18/0.7540 28.83/0.8870
EDSR [65] 32.46/0.8968 28.80/0.7876 27.71/0.7420 26.64/0.8033 31.02/0.9148
LapSRN [58] 31.54/0.8850 28.19/0.7720 27.32/0.7270 25.21/0.7560 29.09/0.8900
GuideAE [17] 31.54/0.8850 28.19/0.7720 27.32/0.7270 25.21/0.7560 29.09/0.8900
SRMDNF [141] 31.96/0.8925 28.35/0.7772 27.49/0.7337 25.68/0.7731 30.09/0.9024
IDN [147] 31.82/0.8903 28.25/0.7730 27.41/0.7297 25.41/0.7632 29.41/0.8936
CSFM [45] 32.61/0.9000 28.87/0.7886 27.76/0.7432 26.78/0.8065 31.32/0.9183
RDN [144] 32.47/0.8990 28.81/0.7871 27.72/0.7419 26.61/0.8028 31.00/0.9151
CARN [2] 32.13/0.8937 28.60/0.7806 27.58/0.7349 26.07/0.7837 −/−
SRGAN [60] 29.40/0.8472 26.64/0.7101 25.16/0.6682 25.11/0.7253 −/−
ESRGAN [116] 31.40/0.8713 27.98/0.7624 27.21/0.7123 31.99/0.8874 −/−
SAN [22] 32.70/0.9013 29.05/0.7921 27.86/0.7457 27.23/0.8169 31.66/0.9222
13
Journal of Real-Time Image Processing
Bold text indicates the best performance among all comparative methods
Table 3 Comparisons of the Methods Scale Set5 Set14 B100 Urban100 Manga109
FPS (frames per seconds) on 5
benchmark dataset with scale ×2 A+ [110] 1.12 0.52 0.74 0.15 0.10
factors 2×, 4×, and 8×
SRCNN [26] 24.70 22.92 39.50 9.03 6.53
FSRCNN [27] 31.04 53.86 98.20 47.23 34.48
RFL [94] 0.65 0.45 0.52 0.13 0.15
SCN [119] 1.19 0.85 1.19 0.24 0.17
VDSR [52] 11.01 6.46 10.00 2.12 1.71
DRCN [53] 0.70 0.37 0.59 0.10 0.08
LapSRN [56] 30.20 40.00 97.36 16.81 85.32
ProGAN [117] 25.88 34.80 71.59 14.97 73.48
D2GAN [76] 46.21 61.72 141.34 24.86 132.12
×4 A+ [110] 2.86 1.62 2.43 0.49 0.41
SRCNN [26] 21.74 22.27 40.13 9.95 7.13
FSRCNN [27] 31.61 56.58 101.54 53.95 55.23
RFL [94] 1.97 1.21 1.64 0.42 0.34
SCN [119] 1.38 0.87 1.19 0.31 0.25
VDSR [52] 10.71 6.59 9.91 2.15 1.76
DRCN [53] 0.80 0.37 0.59 0.10 0.08
LapSRN [56] 25.49 25.46 54.35 12.40 47.63
ProGAN [117] 20.67 20.18 43.05 9.92 37.29
D2GAN [76] 38.33 38.28 87.95 16.08 75.71
×8 A+ [110] 5.79 2.84 4.31 0.80 0.64
SRCNN [26] 20.92 17.69 40.13 9.81 7.17
FSRCNN [27] 34.10 63.28 104.46 71.67 71.95
RFL [94] 2.54 1.61 2.25 0.47 0.33
SCN [119] 0.79 0.53 0.68 0.21 0.19
VDSR [52] 10.58 6.50 10.13 2.15 1.77
LapSRN [56] 24.02 23.40 50.44 10.54 33.09
ProGAN [117] 17.97 17.48 38.76 7.66 25.11
D2GAN [76] 23.70 23.08 50.97 10.28 34.41
13
Journal of Real-Time Image Processing
original HR to get LR images, and then using LR images as high-resolution images as input. Inspired by the significant
input to pursue HR results. However, such collecting process performance of deep learning methods, this paper focuses
is not exactly the same as what happened in the applica- on reviewing current deep learning methods for real-time
tion scenario, since patterns to induce quality loss can be image super-resolution. As the first comprehensive survey
various, such as image transmission, different compression on such topic, it has analyzed recent approaches, classified
algorithms and so on. Simply utilizing down-sampling to them according to as many as criteria, and illustrated perfor-
generate LR images would do great harm to flexibility of mance for the most representative approaches.
CNN-based SR methods with respect to different scenarios. In the past decade, research in this field has progressed as
On the contrary, GAN-based methods require less image improved methods emerge. However, the small number of
data for SR tasks, since they use a largely unsupervised deep learning- based SR methods achieving real-time perfor-
training process on the real images. Therefore, they do not mance shows that ample room remains for future research.
require label or prior condition between LR and HR image. Essentially, the main reason for less methods on real-time
Major difficulty in constructing GAN-based SR methods SR lies in the fact that deep learning methods are hard to
lies in their designs of architecture and loss function. It is achieve fast processing speed without high computation
noted the most common used measurement, i.e., MSE, is resource. However, applying deep learning methods on SR
adopted in favor of maximizing PSNR. However, halluci- is a reasonable way to achieve high performance with the
nated details of generated HR images are often accompanied current large datasets, named as “big data”. We thus present
with unpleasant artifacts, even achieving high PSNR val- main challenges of deep learning for real-time SR: how to
ues. In other words, traditional measurements expose several adapt deep learning-based SR methods with acceleration
constraints to human perception. However, GAN has ability strategies to deal with “big data” situation. Furthermore,
to produce better results only with the integrated perceptual low-resolution images captured in extreme imaging condi-
assessment other than single and simple measurement. In tions require robust enough algorithms to deal with such
order to further improve visual quality, it is thus required real-life complexities.
to improve key components of GAN models for SR, i.e., It is thus essential to develop novel methods, which are
network architecture and loss function, which are now pro- not only effective and efficient for “big data” processing,
gressively explored and developed by researchers. All these but also robust enough to handle extremely processing.
truths imply GAN is still developing and is promising to Although abundant optimization methods have been pro-
achieve better reconstruction results than CNN-based meth- posed for real-time SR with deep learning structures in
ods. Moreover, GAN is hard to train due to mode collaps- Sect. 3, high efficiency, effectiveness, and robustness for
ing problem, which leads to early stopping by concentrating SR on specific application area are still highly required by
on only a few modes instead of the whole data space. On industry and require further improvement. Despite design-
the contrary, CNN-based methods have been developed for ing alternative and high effective neural network structures,
several generations with visually desirable reconstruction cloud computing is another simple and efficient solution to
results. With back-propagation training methods and well- improve effectiveness of SR. By providing enough comput-
developed CNN construction softwares, it is much easier to ing and storage services over the Internet [82, 123, 125,
implement and train a CNN-based SR method than perform- 126] for local SR tasks, a powerful computing platform with
ing with GAN model. easy access and high-scalability could be utilized locally
Above all, we can conclude that GAN-based SR methods and help users accomplish their SR goals. There exist other
have high potential to achieve better reconstruction results novel methods to help improve SR from different aspects.
than CNN-based SR methods, due to its property to describe For example, deep compression methods could help prune
variety of SR patterns with less labeled data. However, GAN neural networks for SR, resulting in less storage and com-
model needs to be further developed to be easily trained and putation consume.
implemented. Based on all these analyses, we believe that this review
is useful for developers who are willing to improve perfor-
mance of their SR solutions in both running time and accu-
5 Summary racy. Our review will serve as a guidance and dictionary
for further research activities in this area, especially in the
Super-resolution is a hot and important research topic in deployment of real-time super-resolution with deep learn-
computer vision and image processing community. By ing methods.
applying SR technologies, users can not only improve the
resolution and visual appearance of inputting images, but Acknowledgments This work was supported by National Key R&D
Program of China under Grant 2018YFC0407901, the Natural Sci-
also help improve accuracy and effectiveness of vision- ence Foundation of China under Grant 61702160, the Natural Science
based machine learning systems which generally regard
13
Journal of Real-Time Image Processing
Foundation of Jiangsu Province under Grant BK20170892, and the 18. Chen, Y., Tai, Y., Liu, X., Shen, C., Yang, J.: Fsrnet: End-to-end
open Project of the National Key Lab for Novel Software Technology learning face super-resolution with facial priors. In: Proceedings
in NJU under Grant K-FKT2017B05. of Computer Vision and Pattern Recognition, pp. 2492–2501
(2018)
19. Cristóbal, G., Gil, E., Šroubek, F., Flusser, J., Miravet, C., Rod-
ríguez, F.d.B.: Superresolution imaging: a survey of current tech-
References niques. In: Advanced signal processing algorithms, architectures,
and implementations XVIII, vol. 7074, p. 70740C (2008)
1. Agustsson, E., Timofte, R., Van Gool, L.: Anchored regression 20. Dai, D., Timofte, R., Van Gool, L.: Jointly optimized regres-
networks applied to age estimation and super resolution. In: Pro- sors for image super-resolution. Comput. Graph. Forum 34(2),
ceedings of International Conference on Computer Vision, pp. 95–104 (2015)
1652–1661 (2017) 21. Dai, S., Han, M., Xu, W., Wu, Y., Gong, Y.: Soft edge smooth-
2. Ahn, N., Kang, B., Sohn, K.: Fast, accurate, and lightweight ness prior for alpha channel super resolution. In: Proceedings of
super-resolution with cascading residual network. In: Proceed- IEEE Conference on Computer Vision and Pattern Recognition,
ings of European Conference on Computer Vision, pp. 256–272 pp. 1–8 (2007)
(2018) 22. Dai, T., Cai, J., Zhang, Y., Xia, S.T., Zhang, L.: Second-order
3. Akae, N., Makihara, Y., Yagi, Y.: Gait recognition using periodic attention network for single image super-resolution. In: Proceed-
temporal super resolution for low frame-rate videos. In: Proceed- ings of the IEEE Conference on Computer Vision and Pattern
ings of International Joint Conference on Biometrics, pp. 1–7 Recognition, pp. 11065–11074 (2019)
(2011) 23. Demirel, H., Anbarjafari, G.: Image resolution enhancement
4. Alonso-Fernandez, F., Farrugia, R.A., Bigun, J., Fierrez, J., Gon- by using discrete and stationary wavelet decomposition. IEEE
zalez-Sosa, E.: A survey of super-resolution in iris biometrics Trans. Image Process. 20(5), 1458–1460 (2011)
with evaluation of dictionary-learning. IEEE Access (2018) 24. Denton, E.L., Chintala, S., Fergus, R., et al.: Deep generative
5. Aly, H.A., Dubois, E.: Image up-sampling using total-variation image models using a laplacian pyramid of adversarial networks.
regularization with a new observation model. IEEE Trans. Image In: Proceedings of Advances in neural information processing
Process. 14(10), 1647–1659 (2005) systems, pp. 1486–1494 (2015)
6. Belekos, S.P., Galatsanos, N.P., Katsaggelos, A.K.: Maximum a 25. Dey, N., Li, S., Bermond, K., Heintzmann, R., Curcio, C.A., Ach,
posteriori video super-resolution using a new multichannel image T., Gerig, G.: Multi-modal image fusion for multispectral super-
prior. IEEE Trans. Image Process. 19(6), 1451–1464 (2010) resolution in microscopy. In: Proceedings of Medical Imaging
7. Bevilacqua, M., Roumy, A., Guillemot, C., Alberi-Morel, M.L.: 2019: Image Processing, p. 109490D (2019)
Low-complexity single-image super-resolution based on nonneg- 26. Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convo-
ative neighbor embedding. In: Proceedings of British Machine lutional network for image super-resolution. In: Proceedings of
Vision Conference (2012) European Conference on Computer Vision, pp. 184–199 (2014)
8. Bose, N.K., Ahuja, N.A.: Superresolution and noise filtering 27. Dong, C., Loy, C.C., Tang, X.: Accelerating the super-resolution
using moving least squares. IEEE Trans. Image Process. 15(8), convolutional neural network. In: Proceedings of European Con-
2239–2248 (2006) ference on Computer Vision, pp. 391–407 (2016)
9. Bulat, A., Tzimiropoulos, G.: Super-fan: Integrated facial land- 28. Dong, W., Zhang, L., Shi, G., Wu, X.: Image deblurring and
mark localization and super-resolution of real-world low resolu- super-resolution by adaptive sparse domain selection and adap-
tion faces in arbitrary poses with gans. In: Proceedings of Com- tive regularization. IEEE Trans. Image Process. 20(7), 1838–
puter Vision and Pattern Recognition, pp. 109–117 (2018) 1857 (2011)
10. Bulat, A., Yang, J., Tzimiropoulos, G.: To learn image super-res- 29. Duchon, C.E.: Lanczos filtering in one and two dimensions. J.
olution, use a gan to learn how to do image degradation first. In: Appl. Meteorol. 18(8), 1016–1022 (1979)
Proceedings of the European Conference on Computer Vision, 30. Fahmy, G.: Super-resolution construction of iris images from
pp. 185–200 (2018) a visual low resolution face video. In: National Radio Science
11. Caballero, J., Ledig, C., Aitken, A., Acosta, A., Totz, J., Wang, Conference, pp. 1–6 (2007)
Z., Shi, W.: Real-time video super-resolution with spatio-tem- 31. Fujimoto, A., Ogawa, T., Yamamoto, K., Matsui, Y., Yamasaki,
poral networks and motion compensation. In: Proceedings of T., Aizawa, K.: Manga109 dataset and creation of metadata. In:
Computer Vision and Pattern Recognition, pp. 2848–2857 (2017) Proceedings of International Workshop on Comics Analysis,
12. Cao, Q., Lin, L., Shi, Y., Liang, X., Li, G.: Attention-aware Processing and Understanding, pp. 2–3 (2016)
face hallucination via deep reinforcement learning. CoRR 32. Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artis-
abs/1708.03132 (2017) tic style. CoRR arxiv:abs/1508.06576 (2015)
13. Capel, D., Zisserman, A.: Automated mosaicing with super-res- 33. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hier-
olution zoom. In: Proceedings of Computer Vision and Pattern archies for accurate object detection and semantic segmentation.
Recognition, pp. 885–891 (1998) In: Proceedings of the IEEE Conference on Computer Vision and
14. Chai, Y., Ren, J., Zhao, H., Li, Y., Ren, J., Murray, P.: Hierarchi- Pattern Recognition, pp. 580–587 (2014)
cal and multi-featured fusion for effective gait recognition under 34. Glasner, D., Bagon, S., Irani, M.: Super-resolution from a sin-
variable scenarios. Pattern Anal. Appl. 19(4), 905–917 (2016) gle image. In: Proceedings of IEEE International Conference on
15. Chang, H., Yeung, D.Y., Xiong, Y.: Super-resolution through Computer Vision, pp. 349–356 (2009)
neighbor embedding. Proc. Comput. Vis. Pattern Recognit. 1, 35. Gong, W., Qi, L., Xu, Y.: Privacy-aware multidimensional mobile
275–282 (2004) service quality prediction and recommendation in distributed fog
16. Che, T., Li, Y., Jacob, A.P., Bengio, Y., Li, W.: Mode regularized environment. Wireless Communications and Mobile Computing
generative adversarial networks. CoRR arxiv:abs/1612.02136 2018, (2018)
(2016) 36. Goodfellow, I.J., Abadie, J.P., Mirza, M., Xu, B., Farley, D.W.,
17. Chen, R., Qu, Y., Li, C., Zeng, K., Xie, Y., Li, C.: Single-image Ozair, S., Courville, A.C., Bengio, Y.: Generative adversarial
super-resolution via joint statistical models-guided deep auto- nets. In: Proceedings of Neural Information Processing Systems,
encoder network. Neural Comput. Appl. pp. 1–11 (2019) pp. 2672–2680 (2014)
13
Journal of Real-Time Image Processing
37. Greenspan, H.: Super-resolution in medical imaging. Comput. J. Proceedings of IEEE Conference on Computer Vision and Pat-
52(1), 43–63 (2009) tern Recognition, pp. 5835–5843 (2017)
38. Gu, S., Sang, N., Ma, F.: Fast image super resolution via local 57. Lai, W., Huang, J., Ahuja, N., Yang, M.: Fast and accurate
regression. In: Proceedings of International Conference on Pat- image super-resolution with deep laplacian pyramid networks.
tern Recognition, pp. 3128–3131 (2012) CoRR arXiv:abs/1710.01992 (2017)
39. Ha, V.K., Ren, J., Xu, X., Zhao, S., Xie, G., Vargas, V.M.: Deep 58. Lai, W.S., Huang, J.B., Ahuja, N., Yang, M.H.: Deep lapla-
learning based single image super-resolution: A survey. In: Inter- cian pyramid networks for fast and accurate super-resolution.
national Conference on Brain Inspired Cognitive Systems, pp. In: Proceedings of Computer Vision and Pattern Recognition
106–119 (2018) (2017)
40. Haris, M., Shakhnarovich, G., Ukita, N.: Deep back-projection 59. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al.: Gradient-
networks for super-resolution. In: Proceedings of the IEEE based learning applied to document recognition. Proc. IEEE
Conference on Computer Vision and Pattern Recognition, pp. 86(11), 2278–2324 (1998)
1664–1673 (2018) 60. Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A.,
41. Hayat, K.: Multimedia super-resolution via deep learning: a sur- Acosta, A., Aitken, A.P., Tejani, A., Totz, J., Wang, Z., Shi, W.:
vey. Digital Signal Processing (2018) Photo-realistic single image super-resolution using a generative
42. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for adversarial network. In: Proceedings of 2017 IEEE Conference
image recognition. In: Proceedings of the IEEE Conference on on Computer Vision and Pattern Recognition, pp. 105–114
Computer Vision and Pattern Recognition, pp. 770–778 (2016) (2017)
43. Hong, C., Yu, J., Tao, D., Wang, M.: Image-based three-dimen- 61. Li, J., Feng, J., Kuo, C.C.J.: Deep convolutional neural network
sional human pose recovery by multiview locality-sensitive for latent fingerprint enhancement. Signal Process. Image Com-
sparse retrieval. IEEE Trans. Ind. Electr. 62(6), 3742–3751 mun. 60, 52–63 (2018)
(2015) 62. Li, K., Zhu, Y., Yang, J., Jiang, J.: Video super-resolution using
44. Hong, C., Yu, J., Wan, J., Tao, D., Wang, M.: Multimodal deep an adaptive superpixel-guided auto-regressive model. Pattern
autoencoder for human pose recovery. IEEE Trans. Image Pro- Recognit. 51, 59–71 (2016)
cess. 24(12), 5659–5670 (2015) 63. Li, X., Orchard, M.T.: New edge-directed interpolation. IEEE
45. Hu, Y., Li, J., Huang, Y., Gao, X.: Channel-wise and spatial Trans. Image Process. 10(10), 1521–1527 (2001)
feature modulation network for single image super-resolution. 64. Li, Y., Agustsson, E., Gu, S., Timofte, R., Van Gool, L.: Carn:
CoRR arXiv:abs/1809.11130 (2018) convolutional anchored regression network for fast and accurate
46. Huang, J.B., Singh, A., Ahuja, N.: Single image super-resolution single image super-resolution. In: Proceedings of European Con-
from transformed self-exemplars. In: Proceedings of the IEEE ference on Computer Vision, pp. 166–181 (2018)
Conference on Computer Vision and Pattern Recognition, pp. 65. Lim, B., Son, S., Kim, H., Nah, S., Lee, K.M.: Enhanced deep
5197–5206 (2015) residual networks for single image super-resolution. In: Proceed-
47. Huang, Y., Shao, L., Frangi, A.F.: Simultaneous super-resolution ings of 2017 IEEE Conference on Computer Vision and Pattern
and cross-modality synthesis of 3d medical images using weakly- Recognition Workshops, pp. 1132–1140 (2017)
supervised joint convolutional sparse coding. arXiv preprint 66. Lin, M., Chen, Q., Yan, S.: Network in network. CoRR arXiv
arXiv:1705.02596 (2017) :abs/1312.4400 (2013)
48. Hung, K.W., Siu, W.C.: New motion compensation model via 67. Mao, X., Shen, C., Yang, Y.: Image restoration using convolu-
frequency classification for fast video super-resolution. In: Pro- tional auto-encoders with symmetric skip connections. CoRR
ceedings of IEEE International Conference on Image Processing, arXiv:abs/1606.08921 (2016)
pp. 1193–1196 (2009) 68. Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human
49. Ji, H., Fermüller, C.: Robust wavelet-based super-resolution segmented natural images and its application to evaluating seg-
reconstruction: theory and algorithm. IEEE Trans. Pattern Anal. mentation algorithms and measuring ecological statistics. Proc.
Mach. Intell. 31(4), 649–660 (2009) Int. Conf. Comput. Vis. 2, 416–423 (2001)
50. Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time 69. Metz, L., Poole, B., Pfau, D., Sohl-Dickstein, J.: Unrolled genera-
style transfer and super-resolution. In: Proceedings of European tive adversarial networks. CoRR arXiv:abs/1611.02163 (2016)
Conference on Computer Vision, pp. 694–711 (2016) 70. Milanfar, P.: Super-resolution imaging. CRC Press, Boca Raton
51. Joshi, M.V., Chaudhuri, S., Panuganti, R.: Super-resolution imag- (2010)
ing: use of zoom as a cue. Image Vis. Comput. 22(14), 1185– 71. Minaee, S., Abdolrashidi, A.: Iris-gan: Learning to gener-
1196 (2004) ate realistic iris images using convolutional gan. CoRR arXiv
52. Kim, J., Kwon Lee, J., Mu Lee, K.: Accurate image super-resolu- :abs/1812.04822(2018)
tion using very deep convolutional networks. In: Proceedings of 72. Minetto, R., Thome, N., Cord, M., Leite, N.J., Stolfi, J.: Snoop-
the IEEE Conference on Computer Vision and Pattern Recogni- ertext: a text detection system for automatic indexing of urban
tion, pp. 1646–1654 (2016) scenes. Comput. Vis. Image Underst. 122, 92–104 (2014)
53. Kim, J., Kwon Lee, J., Mu Lee, K.: Deeply-recursive convolu- 73. Nasrollahi, K., Escalera, S., Rasti, P., Anbarjafari, G., Baro, X.,
tional network for image super-resolution. In: Proceedings of the Escalante, H.J., Moeslund, T.B.: Deep learning based super-
IEEE Conference on Computer Vision and Pattern Recognition, resolution for improved action recognition. In: Proceedings of
pp. 1637–1645 (2016) Image Processing Theory, Tools and Applications, pp. 67–72
54. Krämer, P., Benois-Pineau, J., Domenger, J.P.: Local object- (2015)
based super-resolution mosaicing from low-resolution video. 74. Nasrollahi, K., Moeslund, T.B.: Super-resolution: a comprehen-
Signal Process. 91(8), 1771–1780 (2011) sive survey. Mach. Vis. Appl. 25(6), 1423–1468 (2014)
55. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classifi- 75. Nguyen, K., Fookes, C., Sridharan, S., Tistarelli, M., Nixon, M.:
cation with deep convolutional neural networks. In: Proceed- Super-resolution for biometrics: a comprehensive survey. Pattern
ings of Advances in neural information processing systems, pp. Recognit. 78, 23–42 (2018)
1097–1105 (2012) 76. Nguyen, T., Le, T., Vu, H., Phung, D.Q.: Dual discriminator gen-
56. Lai, W., Huang, J., Ahuja, N., Yang, M.: Deep laplacian erative adversarial nets. In: Proceedings of Advances in Neural
pyramid networks for fast and accurate super-resolution. In: Information Processing Systems, pp. 2670–2680 (2017)
13
Journal of Real-Time Image Processing
77. Park, S.C., Park, M.K., Kang, M.G.: Super-resolution image 95. Schultz, R.R., Stevenson, R.L.: A bayesian approach to image
reconstruction: a technical overview. IEEE Signal Process. Mag. expansion for improved definition. IEEE Trans. Image Process.
20(3), 21–36 (2003) 3(3), 233–242 (1994)
78. Patti, A.J., Sezan, M.I., Tekalp, A.M.: Super-resolution video 96. Shamsolmoali, P., Zhang, J., Yang, J., et al.: Image super resolu-
reconstruction with arbitrary sampling lattices and nonzero aper- tion by dilated dense progressive network. Image Vis. Comput.
ture time. IEEE Trans. Image Process. 6(8), 1064–1076 (1997) 88, 9–18 (2019)
79. Pérez-Pellitero, E., Salvador, J., Ruiz-Hidalgo, J., Rosenhahn, B.: 97. Shan, Q., Li, Z., Jia, J., Tang, C.K.: Fast image/video upsampling.
Psyco: manifold span reduction for super resolution. In: Proceed- ACM Trans. Graph. 27(5), 153 (2008)
ings of the IEEE Conference on Computer Vision and Pattern 98. Sheikh, H.R., Wang, Z., Cormack, L., Bovik, A.C.: Live image
Recognition, pp. 1837–1845 (2016) quality assessment database release 2, 2005 (2005)
80. Petrou, M., Jaward, M.H., Chen, S., Briers, M.: Super-resolu- 99. Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A.P., Bishop,
tion in practice: the complete pipeline from image capture to R., Rueckert, D., Wang, Z.: Real-time single image and video
super-resolved subimage creation using a novel frame selection super-resolution using an efficient sub-pixel convolutional neural
method. Mach. Vis. Appl. 23(3), 441–459 (2012) network. In: Proceedings of the IEEE Conference on Computer
81. Qi, L., Wang, R., Hu, C., Li, S., He, Q., Xu, X.: Time-aware Vision and Pattern Recognition, pp. 1874–1883 (2016)
distributed service recommendation with privacy-preservation. 100. Simonyan, K., Zisserman, A.: Very deep convolutional net-
Inf. Sci. 480, 354–364 (2019) works for large-scale image recognition. arXiv preprint arXiv
82. Qi, L., Zhang, X., Dou, W., Hu, C., Yang, C., Chen, J.: A two- :1409.1556 (2014)
stage locality-sensitive hashing based approach for privacy-pre- 101. Singh, K., Gupta, A., Kapoor, R.: Fingerprint image super-
serving mobile service recommendation in cross-platform edge resolution via ridge orientation-based clustered coupled sparse
environment. Future Gener. Comput. Syst. 88, 636–643 (2018) dictionaries. J. Electr. Imaging 24(4), 043015 (2015)
83. Qi, L., Zhang, X., Dou, W., Ni, Q.: A distributed locality-sensi- 102. Song, H., Zhang, L., Wang, P., Zhang, K., Li, X.: An adaptive l
tive hashing-based approach for cloud service recommendation 1–l 2 hybrid error model to super-resolution. In: Proceedings of
from multi-source data. IEEE J. Sel. Areas Commun. 35(11), IEEE International Conference on Image Processing, pp. 2821–
2616–2624 (2017) 2824 (2010)
84. Radford, A., Metz, L., Chintala, S.: Unsupervised representa- 103. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov,
tion learning with deep convolutional generative adversarial D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with
networks. CoRR arXiv:abs/1511.06434 (2015) convolutions. In: Proceedings of the IEEE Conference on Com-
85. Radford, A., Metz, L., Chintala, S.: Unsupervised representa- puter Vision and Pattern Recognition, pp. 1–9 (2015)
tion learning with deep convolutional generative adversarial 104. Tai, Y., Yang, J., Liu, X.: Image super-resolution via deep recur-
networks. In: Proceedings of 4th International Conference on sive residual network. In: Proceedings of the IEEE Conference
Learning Representations (2016) on Computer Vision and Pattern Recognition, pp. 2790–2798
86. Robinson, M.D., Chiu, S.J., Toth, C.A., Izatt, J.A., Lo, J.Y., Far- (2017)
siu, S.: New applications of super-resolution in medical imaging. 105. Takeda, H., Milanfar, P., Protter, M., Elad, M.: Super-resolution
In: Super-Resolution Imaging, pp. 401–430. CRC Press (2017) without explicit subpixel motion estimation. IEEE Trans. Image
87. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by Process. 18(9), 1958–1975 (2009)
locally linear embedding. Science 290(5500), 2323–2326 (2000) 106. Tian, J., Ma, K.K.: Stochastic super-resolution image reconstruc-
88. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, tion. J. Vis. Commun. Image Represent. 21(3), 232–244 (2010)
S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: 107. Tian, J., Ma, K.K.: A survey on super-resolution imaging. Signal
Imagenet large scale visual recognition challenge. Int. J. Comput. Image Video Process. 5(3), 329–342 (2011)
Vis. 115(3), 211–252 (2015) 108. Timofte, R., Agustsson, E., Van Gool, L., Yang, M.H., Zhang, L.:
89. Ryoo, M.S., Rothrock, B., Fleming, C., Yang, H.J.: Privacy-pre- Ntire 2017 challenge on single image super-resolution: Methods
serving human activity recognition from extreme low resolution. and results. In: Proceedings of the IEEE Conference on Com-
In: Proceedings of AAAI Conference on Artificial Intelligence, puter Vision and Pattern Recognition Workshops, pp. 114–125
pp. 4255–4262 (2017) (2017)
90. Sajjadi, M.S., Schölkopf, B., Hirsch, M.: Enhancenet: Single 109. Timofte, R., De Smet, V., Van Gool, L.: Anchored neighborhood
image super-resolution through automated texture synthesis. regression for fast example-based super-resolution. In: Proceed-
In: Proceedings of IEEE International Conference on Computer ings of the IEEE International Conference on Computer Vision,
Vision, pp. 4501–4510. IEEE (2017) pp. 1920–1927 (2013)
91. Sajjadi, M.S.M., Schölkopf, B., Hirsch, M.: Enhancenet: Sin- 110. Timofte, R., De Smet, V., Van Gool, L.: A+: Adjusted anchored
gle image super-resolution through automated texture synthesis. neighborhood regression for fast super-resolution. In: Proceed-
In: Proceedings of IEEE International Conference on Computer ings of Asian Conference on Computer Vision, pp. 111–126
Vision, pp. 4501–4510 (2017) (2014)
92. Salimans, T., Goodfellow, I.J., Zaremba, W., Cheung, V., Rad- 111. Timofte, R., Rothe, R., Van Gool, L.: Seven ways to improve
ford, A., Chen, X.: Improved techniques for training gans. In: example-based single image super resolution. In: Proceedings of
Proceedings of Neural Information Processing Systems, pp. the IEEE Conference on Computer Vision and Pattern Recogni-
2226–2234 (2016) tion, pp. 1865–1873 (2016)
93. Salvador, J., Perez-Pellitero, E.: Naive bayes super-resolution 112. Tong, T., Li, G., Liu, X., Gao, Q.: Image super-resolution using
forest. In: Proceedings of the IEEE International Conference on dense skip connections. In: Proceedings of IEEE International
Computer Vision, pp. 325–333 (2015) Conference on Computer Vision, pp. 4809–4817 (2017)
94. Schulter, S., Leistner, C., Bischof, H.: Fast and accurate image 113. Trinh, D.H., Luong, M., Dibos, F., Rocchisani, J.M., Pham, C.D.,
upscaling with super-resolution forests. In: Proceedings of the Nguyen, T.Q.: Novel example-based method for super-resolution
IEEE Conference on Computer Vision and Pattern Recognition, and denoising of medical images. IEEE Trans. Image Process.
pp. 3791–3799 (2015) 23(4), 1882–1895 (2014)
13
Journal of Real-Time Image Processing
114. Wallach, D., Lamare, F., Kontaxakis, G., Visvikis, D.: Super- adversarial networks. In: Proceedings of 2018 IEEE Conference
resolution in respiratory synchronized positron emission tomog- on Computer Vision and Pattern Recognition Workshops, pp.
raphy. IEEE Trans. Med. Imaging 31(2), 438–448 (2012) 701–710 (2018)
115. Wang, X., Tang, X.: Hallucinating face by eigentransformation. 134. Yuan, Z., Wu, J., Kamata, S.i., Ahrary, A., Yan, P.: Fingerprint
IEEE Trans. Syst. Man Cybern. Part C 35(3), 425–434 (2005) image enhancement by super resolution with early stopping. In:
116. Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., Qiao, Y., Proceedings of IEEE International Conference on Intelligent
Loy, C.C.: ESRGAN: enhanced super-resolution generative Computing and Intelligent Systems, vol. 4, pp. 527–531 (2009)
adversarial networks. In: Proceedings of European Conference 135. Yue, L., Shen, H., Jie, L., Yuan, Q., Zhang, H., Zhang, L.: Image
on Computer Vision Workshops, pp. 63–79 (2018) super-resolution: the techniques, applications, and future. Signal
117. Wang, Y., Perazzi, F., McWilliams, B., Sorkine-Hornung, A., Process. 128, 389–408 (2016)
Sorkine-Hornung, O., Schroers, C.: A fully progressive approach 136. Zareapoor, M., Zhou, H., Yang, J.: Perceptual image quality
to single-image super-resolution. In: Proceedings of 2018 IEEE using dual generative adversarial network. Neural Computing
Conference on Computer Vision and Pattern Recognition Work- and Applications pp. 1–11 (2019)
shops, pp. 864–873 (2018) 137. Zeyde, R., Elad, M., Protter, M.: On single image scale-up using
118. Wang, Y.H., Qiao, J., Li, J.B., Fu, P., Chu, S.C., Roddick, J.F.: sparse-representations. In: Proceedings of International Confer-
Sparse representation-based mri super-resolution reconstruction. ence on Curves and Surfaces, pp. 711–730 (2010)
Measurement 47, 946–953 (2014) 138. Zhang, H., Yang, Z., Zhang, L., Shen, H.: Super-resolution
119. Wang, Z., Liu, D., Yang, J., Han, W., Huang, T.: Deep networks reconstruction for multi-angle remote sensing images consider-
for image super-resolution with sparse prior. In: Proceedings ing resolution differences. Rem. Sens. 6(1), 637–657 (2014)
of the IEEE International Conference on Computer Vision, pp. 139. Zhang, H., Zhang, L., Shen, H.: A super-resolution reconstruc-
370–378 (2015) tion algorithm for hyperspectral images. Signal Process. 92(9),
120. Wu, H., Zheng, S., Zhang, J., Huang, K.: GP-GAN: towards 2082–2096 (2012)
realistic high-resolution image blending. CoRR arXiv 140. Zhang, J., Pu, J., Chen, C., Fleischer, R.: Low-resolution gait rec-
:abs/1703.07195(2017) ognition. IEEE Trans. Syst. Man Cybern. Part B 40(4), 986–996
121. Wu, Y., Shivakumara, P., Lu, T., Tan, C.L., Blumenstein, M., (2010)
Kumar, G.H.: Contour restoration of text components for rec- 141. Zhang, K., Zuo, W., Zhang, L.: Learning a single convolutional
ognition in video/scene images. IEEE Trans. Image Process. super-resolution network for multiple degradations. In: Proceed-
25(12), 5622–5634 (2016) ings of IEEE Conference on Computer Vision and Pattern Rec-
122. Wu, Y., Shivakumara, P., Wei, W., Lu, T., Pal, U.: A new ring ognition, pp. 3262–3271 (2018)
radius transform-based thinning method for multi-oriented video 142. Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y.: Image
characters. Int. J. Doc. Anal. Recognit. 18(2), 137–151 (2015) super-resolution using very deep residual channel attention net-
123. Xu, X., Fu, S., Qi, L., Zhang, X., Liu, Q., He, Q., Li, S.: An iot- works. In: Proceedings of the European Conference on Computer
oriented data placement method with privacy preservation in Vision, pp. 286–301 (2018)
cloud environment. J. Netw. Comput. Appl. 124, 148–157 (2018) 143. Zhang, Y., Liu, S., Dong, C., Zhang, X., Yuan, Y.: Multiple
124. Xu, X., Ma, Y., Sun, W.: Towards real scene super-resolution cycle-in-cycle generative adversarial networks for unsupervised
with raw images. CoRR arXiv:abs/1905.12156 (2019) image super-resolution. IEEE transactions on Image Processing:
125. Xu, X., Zhang, X., Khan, M., Dou, W., Xue, S., Yu, S.: A bal- (2019)
anced virtual machine scheduling method for energy-perfor- 144. Zhang, Y., Tian, Y., Kong, Y., Zhong, B., Fu, Y.: Residual dense
mance trade-offs in cyber-physical cloud systems. Future Gen- network for image super-resolution. In: Proceedings of 2018
eration Computer Systems (2017) IEEE Conference on Computer Vision and Pattern Recognition,
126. Xu, X., Zhao, X., Ruan, F., Zhang, J., Tian, W., Dou, W., Liu, pp. 2472–2481 (2018)
A.X.: Data placement for privacy-aware applications over big 145. Zhao, S., Han, H., Peng, S.: Wavelet-domain hmt-based image
data in hybrid clouds. Secur. Commun. Netw. 2017, 2376484:1– super-resolution. In: Proceedings os International Conference on
2376484:15 (2017) Image Processing, vol. 2, p. 953 (2003)
127. Xu, Y., Qi, L., Dou, W., Yu, J.: Privacy-preserving and scalable 146. Zhao, T., Ren, W., Zhang, C., Ren, D., Hu, Q.: Unsupervised
service recommendation based on simhash in a distributed cloud degradation learning for single image super-resolution. CoRR
environment. Complexity 2017, 3437854:1–3437854:9 (2017) arXiv:abs/1812.04240 (2018)
128. Yamanaka, J., Kuwashima, S., Kurita, T.: Fast and accurate 147. Zheng, H., Wang, X., Gao, X.: Fast and accurate single image
image super resolution by deep cnn with skip connection and super-resolution via information distillation network. In: Pro-
network in network. In: Proceedings of Neural Information Pro- ceedings of IEEE Conference on Computer Vision and Pattern
cessing, pp. 217–225 (2017) Recognition, pp. 723–731 (2018)
129. Yang, C.Y., Yang, M.H.: Fast direct super-resolution by simple 148. Zheng, Z., Zheng, L., Yang, Y.: Unlabeled samples generated
functions. In: Proceedings of the IEEE International Conference by gan improve the person re-identification baseline in vitro. In:
on Computer Vision, pp. 561–568 (2013) Proceedings of IEEE Conference on Computer Vision and Pat-
130. Yang, J., Wright, J., Huang, T.S., Ma, Y.: Image super-resolution tern Recognition, pp. 3754–3762 (2017)
via sparse representation. IEEE Trans. Image Process. 19(11), 149. Zhu, J., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image
2861–2873 (2010) translation using cycle-consistent adversarial networks. In: Pro-
131. Ye, Q., Doermann, D.S.: Text detection and recognition in ceedings of IEEE International Conference on Computer Vision,
imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), pp. 2242–2251 (2017)
1480–1500 (2015) 150. Zhu, J.Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Generative
132. Yu, X., Fernando, B., Ghanem, B., Porikli, F., Hartley, R.: Face visual manipulation on the natural image manifold. In: Proceed-
super-resolution guided by facial component heatmaps. In: ings of European Conference on Computer Vision, pp. 597–613.
Proceedings of the European Conference on Computer Vision Springer (2016)
(ECCV), pp. 217–233 (2018) 151. Zhuang, Y., Zhang, J., Wu, F.: Hallucinating faces: Lph super-
133. Yuan, Y., Liu, S., Zhang, J., Zhang, Y., Dong, C., Lin, L.: Unsu- resolution and neighbor reconstruction for residue compensation.
pervised image super-resolution using cycle-in-cycle generative Pattern Recognit. 40(11), 3178–3194 (2007)
13
Journal of Real-Time Image Processing
Publisher’s Note Springer Nature remains neutral with regard to College of Computer and Information Engineering, Hohai University,
jurisdictional claims in published maps and institutional affiliations. Nanjing, China. His research areas include pattern recognition, multi-
media system and smart hydrology modeling.
13