REF-20-Accurate Image Super-Resolution Using Very Deep Convolutional Networks
REF-20-Accurate Image Super-Resolution Using Very Deep Convolutional Networks
Abstract 37.6
VDSR (Ours)
37.4
We present a highly accurate single-image super-
resolution (SR) method. Our method uses a very deep con- 37.2
PSNR (dB)
volutional network inspired by VGG-net used for ImageNet
37
classification [19]. We find increasing our network depth
shows a significant improvement in accuracy. Our final 36.8
model uses 20 weight layers. By cascading small filters
SRCNN
many times in a deep network structure, contextual infor- 36.6
A+
mation over large image regions is exploited in an efficient SelfEx RFL
way. With very deep networks, however, convergence speed 36.4
10 2 10 1 10 0 10 -1 10 -2
becomes a critical issue during training. We propose a sim- slow running time(s) fast
1647
Authorized licensed use limited to: DRDO-ITR. Downloaded on December 20,2024 at 08:33:09 UTC from IEEE Xplore. Restrictions apply.
ILR Conv.1 ReLu.1 Conv.D-1 ReLu.D-1 Conv.D (Residual) HR
x r y
Figure 2: Our Network Structure. We cascade a pair of layers (convolutional and nonlinear) repeatedly. An interpolated low-resolution
(ILR) image goes through layers and transforms into a high-resolution (HR) image. The network predicts a residual image and the addition
of ILR and the residual gives the desired output. We use 64 filters for each convolutional layer and some sample feature maps are drawn
for visualization. Most features after applying rectified linear units (ReLu) are zero.
with receptive field size n × n, the output image is 1 × 1. squared error 12 ||y − f (x)||2 averaged over the training set
This is in accordance with other super-resolution meth- is minimized.
ods since many require surrounding pixels to infer cen- Residual-Learning In SRCNN, the network must pre-
ter pixels correctly. This center-surround relation is use- serve all input detail since the image is discarded and the
ful since the surrounding region provides more constraints output is generated from the learned features alone. With
to this ill-posed problem (SR). For pixels near the image many weight layers, this becomes an end-to-end relation
boundary, this relation cannot be exploited to the full extent requiring very long-term memory. For this reason, the van-
and many SR methods crop the result image. ishing/exploding gradients problem [2] can be critical. We
This methodology, however, is not valid if the required can solve this problem simply with residual-learning.
surround region is very big. After cropping, the final image As the input and output images are largely similar, we
is too small to be visually pleasing. define a residual image r = y − x, where most values are
To resolve this issue, we pad zeros before convolutions likely to be zero or small. We want to predict this resid-
to keep the sizes of all feature maps (including the output ual image. The loss function now becomes 12 ||r − f (x)||2 ,
image) the same. It turns out that zero-padding works sur- where f (x) is the network prediction.
prisingly well. For this reason, our method differs from In networks, this is reflected in the loss layer as follows.
most other methods in the sense that pixels near the image Our loss layer takes three inputs: residual estimate, network
boundary are also correctly predicted. input (ILR image) and ground truth HR image. The loss
Once image details are predicted, they are added back to is computed as the Euclidean distance between the recon-
the input ILR image to give the final image (HR). We use structed image (the sum of network input and output) and
this structure for all experiments in our work. ground truth.
Training is carried out by optimizing the regression ob-
3.2. Training jective using mini-batch gradient descent based on back-
We now describe the objective to minimize in order to propagation (LeCun et al. [14]). We set the momentum
find optimal parameters of our model. Let x denote an in- parameter to 0.9. The training is regularized by weight de-
terpolated low-resolution image and y a high-resolution im- cay (L2 penalty multiplied by 0.0001).
age. Given a training dataset {x(i) , y(i) }N
i=1 , our goal is to
High Learning Rates for Very Deep Networks Train-
learn a model f that predicts values ŷ = f (x), where ŷ is ing deep models can fail to converge in realistic limit of
an estimate of the target HR image. We minimize the mean time. SRCNN [6] fails to show superior performance with
1648
Authorized licensed use limited to: DRDO-ITR. Downloaded on December 20,2024 at 08:33:09 UTC from IEEE Xplore. Restrictions apply.
more than three weight layers. While there can be various Epoch 10 20 40 80
reasons, one possibility is that they stopped their training Residual 36.90 36.64 37.12 37.05
procedure before networks converged. Their learning rate Non-Residual 27.42 19.59 31.38 35.66
10−5 is too small for a network to converge within a week Difference 9.48 17.05 5.74 1.39
on a common GPU. Looking at Fig. 9 of [6], it is not easy to (a) Initial learning rate 0.1
say their deeper networks have converged and their perfor-
mances were saturated. While more training will eventually Epoch 10 20 40 80
resolve the issue, but increasing depth to 20 does not seems Residual 36.74 36.87 36.91 36.93
practical with SRCNN. Non-Residual 30.33 33.59 36.26 36.42
It is a basic rule of thumb to make learning rate high to Difference 6.41 3.28 0.65 0.52
boost training. But simply setting learning rate high can (b) Initial learning rate 0.01
also lead to vanishing/exploding gradients [2]. For the rea-
son, we suggest an adjustable gradient clipping for maximal Epoch 10 20 40 80
boost in speed while suppressing exploding gradients. Residual 36.31 36.46 36.52 36.52
Adjustable Gradient Clipping Gradient clipping is a Non-Residual 33.97 35.08 36.11 36.11
technique that is often used in training recurrent neural net- Difference 2.35 1.38 0.42 0.40
works [17]. But, to our knowledge, its usage is limited in (c) Initial learning rate 0.001
training CNNs. While there exist many ways to limit gra- Table 1: Performance table (PSNR) for residual and non-residual
dients, one of the common strategies is to clip individual networks (‘Set5’ dataset, ×2). Residual networks rapidly ap-
gradients to the predefined range [−θ, θ]. proach their convergence within 10 epochs.
With clipping, gradients are in a certain range. With
stochastic gradient descent commonly used for training,
learning rate is multiplied to adjust the step size. If high
4. Understanding Properties
learning rate is used, it is likely that θ is tuned to be small In this section, we study three properties of our proposed
to avoid exploding gradients in a high learning rate regime. method. First, we show that large depth is necessary for
But as learning rate is annealed to get smaller, the effective the task of SR. A very deep network utilizes more con-
gradient (gradient multiplied by learning rate) approaches textual information in an image and models complex func-
zero and training can take exponentially many iterations to tions with many nonlinear layers. We experimentally verify
converge if learning rate is decreased geometrically. that deeper networks give better performances than shallow
For maximal speed of convergence, we clip the gradients ones.
to [− γθ , γθ ], where γ denotes the current learning rate. We Second, we show that our residual-learning network con-
find the adjustable gradient clipping makes our convergence verges much faster than the standard CNN. Moreover, our
procedure extremely fast. Our 20-layer network training is network gives a significant boost in performance.
done within 4 hours whereas 3-layer SRCNN takes several Third, we show that our method with a single network
days to train. performs as well as a method using multiple networks
Multi-Scale While very deep models can boost perfor- trained for each scale. We can effectively reduce model
mance, more parameters are now needed to define a net- capacity (the number of parameters) of multi-network ap-
work. Typically, one network is created for each scale fac- proaches.
tor. Considering that fractional scale factors are often used,
we need an economical way to store and retrieve networks. 4.1. The Deeper, the Better
For this reason, we also train a multi-scale model. With
this approach, parameters are shared across all predefined Convolutional neural networks exploit spatially-local
scale factors. Training a multi-scale model is straightfor- correlation by enforcing a local connectivity pattern be-
ward. Training datasets for several specified scales are com- tween neurons of adjacent layers [1]. In other words, hidden
bined into one big dataset. units in layer m take as input a subset of units in layer m−1.
Data preparation is similar to SRCNN [5] with some dif- They form spatially contiguous receptive fields.
ferences. Input patch size is now equal to the size of the Each hidden unit is unresponsive to variations outside of
receptive field and images are divided into sub-images with the receptive field with respect to the input. The architecture
no overlap. A mini-batch consists of 64 sub-images, where thus ensures that the learned filters produce the strongest
sub-images from different scales can be in the same batch. response to a spatially local input pattern.
We implement our model using the MatConvNet1 pack- However, stacking many such layers leads to filters that
age [23]. become increasingly global (i.e. responsive to a larger re-
gion of pixel space). In other words, a filter of very large
1 http://www.vlfeat.org/matconvnet/
support can be effectively decomposed into a series of small
1649
Authorized licensed use limited to: DRDO-ITR. Downloaded on December 20,2024 at 08:33:09 UTC from IEEE Xplore. Restrictions apply.
37.1 33.3 31
37 33.2 30.9
33.1
36.9 30.8
33
PSNR (dB)
PSNR (dB)
PSNR (dB)
36.8 30.7
32.9
36.7 30.6
32.8
36.6 30.5
32.7
(a) Test Scale Factor 2 (b) Test Scale Factor 3 (c) Test Scale Factor 4
36 36
36
34
34
34
32
32
32
30
PSNR (dB)
PSNR (dB)
PSNR (dB)
30
28 30
28
26
28
26
24
26
24
22
Residual Residual Residual
Non-Residual 24 Non-Residual 22 Non-Residual
20
Bicubic Bicubic Bicubic
18 22 20
0 20 40 60 80 0 20 40 60 80 0 20 40 60 80
Epochs Epochs Epochs
(a) Initial learning rate 0.1 (b) Initial learning rate 0.01 (c) Initial learning rate 0.001
Figure 4: Performance curve for residual and non-residual networks. Two networks are tested under ‘Set5’ dataset with scale factor 2.
Residual networks quickly reach state-of-the-art performance within a few epochs, whereas non-residual networks (which models high-
resolution image directly) take many epochs to reach maximum performance. Moreover, the final accuracy is higher for residual networks.
filters.
[19].
In this work, we use filters of the same size, 3×3, for all
We now experimentally show that very deep networks
layers. For the first layer, the receptive field is of size 3×3.
significantly improve SR performance. We train and test
For the next layers, the size of the receptive field increases
networks of depth ranging from 5 to 20 (only counting
by 2 in both height and width. For depth D network, the
weight layers excluding nonlinearity layers). In Figure 3,
receptive field has size (2D + 1) × (2D + 1). Its size is
we show the results. In most cases, performance increases
proportional to the depth.
as depth increases. As depth increases, performance im-
In the task of SR, this corresponds to the amount of proves rapidly.
contextual information that can be exploited to infer high-
frequency components. A large receptive field means the 4.2. Residual-Learning
network can use more context to predict image details. As
SR is an ill-posed inverse problem, collecting and analyz- As we already have a low-resolution image as the in-
ing more neighbor pixels give more clues. For example, if put, predicting high-frequency components is enough for
there are some image patterns entirely contained in a recep- the purpose of SR. Although the concept of predicting resid-
tive field, it is plausible that this pattern is recognized and uals has been used in previous methods [21, 22, 26], it has
used to super-resolve the image. not been studied in the context of deep-learning-based SR
framework.
In addition, very deep networks can exploit high nonlin-
In this work, we have proposed a network structure that
earities. We use 19 rectified linear units and our networks
learns residual images. We now study the effect of this mod-
can model very complex functions with moderate number
ification to a standard CNN structure in detail.
of channels (neurons). The advantages of making a thin
First, we find that this residual network converges much
deep network is well explained in Simonyan and Zisserman
faster. Two networks are compared experimentally: the
1650
Authorized licensed use limited to: DRDO-ITR. Downloaded on December 20,2024 at 08:33:09 UTC from IEEE Xplore. Restrictions apply.
Test / Train ×2 ×3 ×4 ×2,3 ×2,4 ×3,4 ×2,3,4 Bicubic
×2 37.10 30.05 28.13 37.09 37.03 32.43 37.06 33.66
×3 30.42 32.89 30.50 33.22 31.20 33.24 33.27 30.39
×4 28.43 28.73 30.84 28.70 30.86 30.94 30.95 28.42
Table 2: Scale Factor Experiment. Several models are trained with different scale sets. Quantitative evaluation (PSNR) on dataset ‘Set5’
is provided for scale factors 2,3 and 4. Red color indicates that test scale is included during training. Models trained with multiple scales
perform well on the trained scales.
×418# ×418#
×5# ×5#
×518# ×518#
×6# ×6#
×618# ×618#
7#
×7 7#
×7
Figure 5: (Top) Our results using a single network for all scale factors. Super-resolved images over all scales are clean and sharp. (Bottom)
Results of Dong et al. [5] (×3 model used for all scales). Result images are not visually pleasing. To handle multiple scales, existing
methods require multiple networks.
residual network and the standard non-residual network. ple scales. Many SR processes for different scales can be
We use depth 10 (weight layers) and scale factor 2. Perfor- executed with our multi-scale machine with much smaller
mance curves for various learning rates are shown in Figure capacity than that of single-scale machines combined.
4. All use the same learning rate scheduling mechanism that We start with an interesting experiment as follows: we
has been mentioned above. train our network with a single scale factor strain and it is
Second, at convergence, the residual network shows su- tested under another scale factor stest . Here, factors 2,3 and
perior performance. In Figure 4, residual networks give 4 that are widely used in SR comparisons are considered.
higher PSNR when training is done. Possible pairs (strain ,stest ) are tried for the dataset ‘Set5’
Another remark is that if small learning rates are used, [15]. Experimental results are summarized in Table 2.
networks do not converge in the given number of epochs. If Performance is degraded if strain = stest . For scale factor
initial learning rate 0.1 is used, PSNR of a residual-learning 2, the model trained with factor 2 gives PSNR of 37.10 (in
network reaches 36.90 within 10 epochs. But if 0.001 is dB), whereas models trained with factor 3 and 4 give 30.05
used instead, the network never reaches the same level of and 28.13, respectively. A network trained over single-scale
performance (its performance is 36.52 after 80 epochs). In data is not capable of handling other scales. In many tests,
a similar manner, residual and non-residual networks show it is even worse than bicubic interpolation, the method used
dramatic performance gaps after 10 epochs (36.90 vs. 27.42 for generating the input image.
for rate 0.1). We now test if a model trained with scale augmentation
In short, this simple modification to a standard non- is capable of performing SR at multiple scale factors. The
residual network structure is very powerful and one can ex- same network used above is trained with multiple scale fac-
plore the validity of the idea in other image restoration prob- tors strain = {2, 3, 4}. In addition, we experiment with the
lems where input and output images are highly correlated. cases strain = {2, 3}, {2, 4}, {3, 4} for more comparisons.
We observe that the network copes with any scale used
4.3. Single Model for Multiple Scales
during training. When strain = {2, 3, 4} (×2, 3, 4 in Ta-
Scale augmentation during training is a key technique to ble 2), its PSNR for each scale is comparable to those
equip a network with super-resolution machines of multi- achieved from the corresponding result of single-scale net-
1651
Authorized licensed use limited to: DRDO-ITR. Downloaded on December 20,2024 at 08:33:09 UTC from IEEE Xplore. Restrictions apply.
Ground Truth A+ [22] RFL [18] SelfEx [11] SRCNN [5] VDSR (Ours)
(PSNR, SSIM) (22.92, 0.7379) (22.90, 0.7332) (23.00, 0.7439) (23.15, 0.7487) (23.50, 0.7777)
Figure 6: Super-resolution results of “148026” (B100) with scale factor ×3. VDSR recovers sharp lines.
Ground Truth A+ [22] RFL [18] SelfEx [11] SRCNN [5] VDSR (Ours)
(PSNR, SSIM) (27.08, 0.7514) (27.08, 0.7508) (27.02, 0.7513) (27.16, 0.7545) (27.32, 0.7606)
Figure 7: Super-resolution results of “38092” (B100) with scale factor ×3. The horn in the image is sharp in the result of VDSR.
Bicubic A+ [22] RFL [18] SelfEx [11] SRCNN [5] VDSR (Ours)
Dataset Scale
PSNR/SSIM/time PSNR/SSIM/time PSNR/SSIM/time PSNR/SSIM/time PSNR/SSIM/time PSNR/SSIM/time
×2 33.66/0.9299/0.00 36.54/0.9544/0.58 36.54/0.9537/0.63 36.49/0.9537/45.78 36.66/0.9542/2.19 37.53/0.9587/0.13
Set5 ×3 30.39/0.8682/0.00 32.58/0.9088/0.32 32.43/0.9057/0.49 32.58/0.9093/33.44 32.75/0.9090/2.23 33.66/0.9213/0.13
×4 28.42/0.8104/0.00 30.28/0.8603/0.24 30.14/0.8548/0.38 30.31/0.8619/29.18 30.48/0.8628/2.19 31.35/0.8838/0.12
×2 30.24/0.8688/0.00 32.28/0.9056/0.86 32.26/0.9040/1.13 32.22/0.9034/105.00 32.42/0.9063/4.32 33.03/0.9124/0.25
Set14 ×3 27.55/0.7742/0.00 29.13/0.8188/0.56 29.05/0.8164/0.85 29.16/0.8196/74.69 29.28/0.8209/4.40 29.77/0.8314/0.26
×4 26.00/0.7027/0.00 27.32/0.7491/0.38 27.24/0.7451/0.65 27.40/0.7518/65.08 27.49/0.7503/4.39 28.01/0.7674/0.25
×2 29.56/0.8431/0.00 31.21/0.8863/0.59 31.16/0.8840/0.80 31.18/0.8855/60.09 31.36/0.8879/2.51 31.90/0.8960/0.16
B100 ×3 27.21/0.7385/0.00 28.29/0.7835/0.33 28.22/0.7806/0.62 28.29/0.7840/40.01 28.41/0.7863/2.58 28.82/0.7976/0.21
×4 25.96/0.6675/0.00 26.82/0.7087/0.26 26.75/0.7054/0.48 26.84/0.7106/35.87 26.90/0.7101/2.51 27.29/0.7251/0.21
×2 26.88/0.8403/0.00 29.20/0.8938/2.96 29.11/0.8904/3.62 29.54/0.8967/663.98 29.50/0.8946/22.12 30.76/0.9140/0.98
Urban100 ×3 24.46/0.7349/0.00 26.03/0.7973/1.67 25.86/0.7900/2.48 26.44/0.8088/473.60 26.24/0.7989/19.35 27.14/0.8279/1.08
×4 23.14/0.6577/0.00 24.32/0.7183/1.21 24.19/0.7096/1.88 24.79/0.7374/394.40 24.52/0.7221/18.46 25.18/0.7524/1.06
Table 3: Average PSNR/SSIM for scale factor ×2, ×3 and ×4 on datasets Set5, Set14, B100 and Urban100. Red color indicates the best
performance and blue color indicates the second best performance.
1652
Authorized licensed use limited to: DRDO-ITR. Downloaded on December 20,2024 at 08:33:09 UTC from IEEE Xplore. Restrictions apply.
work: 37.06 vs. 37.10 (×2), 33.27 vs. 32.89 (×3), 30.95 the learning rate was decreased 3 times, and the learning is
vs. 30.86 (×4). stopped after 80 epochs. Training takes roughly 4 hours on
Another pattern is that for large scales (×3, 4), our multi- GPU Titan Z.
scale network outperforms single-scale network: our model
(×2, 3), (×3, 4) and (×2, 3, 4) give PSNRs 33.22, 33.24 5.3. Benchmark
and 33.27 for test scale 3, respectively, whereas (×3) gives For benchmark, we follow the publicly available frame-
32.89. Similarly, (×2, 4), (×3, 4) and (×2, 3, 4) give 30.86, work of Huang et al. [21]. It enables the comparison of
30.94 and 30.95 (vs. 30.84 by ×4 model), respectively. many state-of-the-art results with the same evaluation pro-
From this, we observe that training multiple scales boosts cedure.
the performance for large scales. The framework applies bicubic interpolation to color
components of an image and sophisticated models to lumi-
5. Experimental Results nance components as in other methods [4], [9], [26]. This is
In this section, we evaluate the performance of our because human vision is more sensitive to details in inten-
method on several datasets. We first describe datasets used sity than in color.
for training and testing our method. Next, parameters nec- This framework crops pixels near image boundary. For
essary for training are given. our method, this procedure is unnecessary as our network
After outlining our experimental setup, we compare our outputs the full-sized image. For fair comparison, however,
method with several state-of-the-art SISR methods. we also crop pixels to the same amount.
5.1. Datasets for Training and Testing 5.4. Comparisons with State-of-the-Art Methods
Training dataset Different learning-based methods use We provide quantitative and qualitative comparisons.
different training images. For example, RFL [18] has two Compared methods are A+ [22], RFL[18], SelfEx [11] and
methods, where the first one uses 91 images from Yang et al. SRCNN [5]. In Table 3, we provide a summary of quantita-
[25] and the second one uses 291 images with the addition tive evaluation on several datasets. Our methods outperform
of 200 images from Berkeley Segmentation Dataset [16]. all previous methods in these datasets. Moreover, our meth-
SRCNN [6] uses a very large ImageNet dataset. ods are relatively fast. The public code of SRCNN based
We use 291 images as in [18] for benchmark with other on a CPU implementation is slower than the code used by
methods in this section. In addition, data augmentation (ro- Dong et. al [6] in their paper based on a GPU implementa-
tation or flip) is used. For results in previous sections, we tion.
used 91 images to train network fast, so performances can In Figures 6 and 7, we compare our method with top-
be slightly different. performing methods. In Figure 6, only our method perfectly
Test dataset For benchmark, we use four datasets. reconstructs the line in the middle. Similarly, in Figure 7,
Datasets ‘Set5’ [15] and ‘Set14’ [26] are often used for contours are clean and vivid in our method whereas they are
benchmark in other works [22, 21, 5]. Dataset ‘Urban100’, severely blurred or distorted in other methods.
a dataset of urban images recently provided by Huang et
al. [11], is very interesting as it contains many challeng- 6. Conclusion
ing images failed by many of the existing methods. Finally,
dataset ‘B100’, natural images in the Berkeley Segmenta- In this work, we have presented a super-resolution
tion Dataset used in Timofte et al. [22] and Yang and Yang method using very deep networks. Training a very deep
[24] for benchmark, is also employed. network is hard due to a slow convergence rate. We use
residual-learning and extremely high learning rates to opti-
5.2. Training Parameters mize a very deep network fast. Convergence speed is max-
imized and we use gradient clipping to ensure the train-
We provide parameters used to train our final model. We ing stability. We have demonstrated that our method out-
use a network of depth 20. Training uses batches of size 64. performs the existing method by a large margin on bench-
Momentum and weight decay parameters are set to 0.9 and marked images. We believe our approach is readily appli-
0.0001, respectively. cable to other image restoration problems such as denoising
For weight initialization, we use the method described in and compression artifact removal.
He et al. [10]. This is a theoretically sound procedure for
networks utilizing rectified linear units (ReLu).
References
We train all experiments over 80 epochs (9960 iterations
with batch size 64). Learning rate was initially set to 0.1 and [1] Y. Bengio, I. J. Goodfellow, and A. Courville. Deep learning.
then decreased by a factor of 10 every 20 epochs. In total, Book in preparation for MIT Press, 2015. 4
1653
Authorized licensed use limited to: DRDO-ITR. Downloaded on December 20,2024 at 08:33:09 UTC from IEEE Xplore. Restrictions apply.
[2] Y. Bengio, P. Simard, and P. Frasconi. Learning long-term [22] R. Timofte, V. De Smet, and L. Van Gool. A+: Adjusted
dependencies with gradient descent is difficult. Neural Net- anchored neighborhood regression for fast super-resolution.
works, IEEE Transactions on, 5(2):157–166, 1994. 3, 4 In ACCV, 2014. 1, 2, 5, 7, 8
[3] M. Bevilacqua, A. Roumy, C. Guillemot, and M.-L. [23] A. Vedaldi and K. Lenc. Matconvnet – convolutional neural
Morel. Super-resolution using neighbor embedding of back- networks for matlab. CoRR, abs/1412.4564, 2014. 4
projection residuals. In Digital Signal Processing (DSP), [24] C.-Y. Yang and M.-H. Yang. Fast direct super-resolution by
2013 18th International Conference on, pages 1–8. IEEE, simple functions. In ICCV, 2013. 8
2013. 2 [25] J. Yang, J. Wright, T. S. Huang, and Y. Ma. Image super-
[4] H. Chang, D.-Y. Yeung, and Y. Xiong. Super-resolution resolution via sparse representation. TIP, 2010. 1, 8
through neighbor embedding. In CVPR, 2004. 1, 8 [26] R. Zeyde, M. Elad, and M. Protter. On single image scale-up
[5] C. Dong, C. C. Loy, K. He, and X. Tang. Learning a deep using sparse-representations. In Curves and Surfaces, pages
convolutional network for image super-resolution. In ECCV. 711–730. Springer, 2012. 1, 5, 8
2014. 4, 6, 7, 8
[6] C. Dong, C. C. Loy, K. He, and X. Tang. Image super-
resolution using deep convolutional networks. TPAMI, 2015.
1, 2, 3, 4, 8
[7] C. E. Duchon. Lanczos filtering in one and two dimensions.
Journal of Applied Meteorology, 18(8):1016–1022, 1979. 1
[8] W. T. Freeman, E. C. Pasztor, and O. T. Carmichael. Learn-
ing low-level vision. International journal of computer vi-
sion, 40(1):25–47, 2000. 1
[9] D. Glasner, S. Bagon, and M. Irani. Super-resolution from a
single image. In ICCV, 2009. 1, 8
[10] K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into
rectifiers: Surpassing human-level performance on imagenet
classification. CoRR, abs/1502.01852, 2015. 8
[11] J.-B. Huang, A. Singh, and N. Ahuja. Single image super-
resolution using transformed self-exemplars. In CVPR, 2015.
7, 8
[12] M. Irani and S. Peleg. Improving resolution by image reg-
istration. CVGIP: Graphical models and image processing,
53(3):231–239, 1991. 1
[13] K. I. Kim and Y. Kwon. Single-image super-resolution using
sparse regression and natural image prior. TPAMI, 2010. 1
[14] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-
based learning applied to document recognition. Proceed-
ings of the IEEE, 86(11):2278–2324, 1998. 3
[15] C. G. Marco Bevilacqua, Aline Roumy and M.-L. A.
Morel. Low-complexity single-image super-resolution based
on nonnegative neighbor embedding. In BMVC, 2012. 1, 2,
6, 8
[16] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database
of human segmented natural images and its application to
evaluating segmentation algorithms and measuring ecologi-
cal statistics. In ICCV, 2001. 8
[17] R. Pascanu, T. Mikolov, and Y. Bengio. On the difficulty of
training recurrent neural networks. In ICML, 2013. 4
[18] S. Schulter, C. Leistner, and H. Bischof. Fast and accu-
rate image upscaling with super-resolution forests. In CVPR,
2015. 1, 7, 8
[19] K. Simonyan and A. Zisserman. Very deep convolutional
networks for large-scale image recognition. In ICLR, 2015.
1, 2, 5
[20] J. Sun, Z. Xu, and H.-Y. Shum. Image super-resolution using
gradient profile prior. In CVPR, 2008. 1
[21] R. Timofte, V. De, and L. V. Gool. Anchored neighborhood
regression for fast example-based super-resolution. In ICCV,
2013. 1, 2, 5, 8
1654
Authorized licensed use limited to: DRDO-ITR. Downloaded on December 20,2024 at 08:33:09 UTC from IEEE Xplore. Restrictions apply.