0% found this document useful (0 votes)
18 views10 pages

Deep Wavelet 2017

Deep wavet transformation within SR

Uploaded by

Israa aa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views10 pages

Deep Wavelet 2017

Deep wavet transformation within SR

Uploaded by

Israa aa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Deep Wavelet Prediction for Image Super-resolution

Tiantong Guo, Hojjat Seyed Mousavi, Tiep Huu Vu, Vishal Monga
School of Electrical Engineering and Computer Science
The Pennsylvania State University, State College, PA, 16803
http://signal.ee.psu.edu

Abstract DWSR
33.8
VDSR
Recent advances have seen a surge of deep learning 33.6

approaches for image super-resolution. Invariably, a 33.4


network, e.g. a deep convolutional neural network (CNN)

PSNR (dB)
33.2 FSRCNN
or auto-encoder is trained to learn the relationship between
low and high-resolution image patches. Recognizing that a 33

wavelet transform provides a “coarse” as well as “detail” 32.8 SRCNN


separation of image content, we design a deep CNN to 32.6 SelfEx A+
predict the “missing details” of wavelet coefficients of the
32.4
low-resolution images to obtain the Super-Resolution (SR) 101 100 10-1 10-2
running time (s)
results, which we name Deep Wavelet Super-Resolution
(DWSR). Out network is trained in the wavelet domain Figure 1: DWSR and other state-of-the-art methods
with four input and output channels respectively. The input reported PSNR with scale factor of 3 on Set5. For
comprises of 4 sub-bands of the low-resolution wavelet experimental setup see Section 4.4.
coefficients and outputs are residuals (missing details) of 4
sub-bands of high-resolution wavelet coefficients. Wavelet
coefficients and wavelet residuals are used as input and
the historic data and form dictionaries of LR and HR
outputs of our network to further enhance the sparsity
image patches [3, 4]. These dictionaries are then used to
of activation maps. A key benefit of such a design is
transform each LR patch to the HR domain. For instance,
that it greatly reduces the training burden of learning
[5, 6, 7, 8, 9] explored the similarity of self-examples, while
the network that reconstructs low frequency details. The
others mapped the LR to HR patches with use of external
output prediction is added to the input to form the final SR
samples [10, 11, 12, 13, 14, 15, 16, 17].
wavelet coefficients. Then the inverse 2d discrete wavelet
transformation is applied to transform the predicted details In this paper, we address the problem of single image
and generate the SR results. We show that DWSR is super resolution, and we propose to apply super resolution
computationally simpler and yet produces competitive and in the wavelet domain for the reasons that we will justify
often better results than state-of-the-art alternatives. later. Wavelet coefficients prediction for super-resolution
has been applied successfully to multi-frames SR. For
instance, [18, 19, 20, 21] used multi-frames images to
1. Introduction interpolate the missing details in the wavelet sub-bands
to enhance the resolution. Several different interpolation
In image processing, reconstructing High-Resolution methods for wavelet coefficients in SISR were studied
(HR) image from its corresponding Low-Resolution (LR) as well. [22] used straightforward bicubic interpolation
image is known as Super-Resolution (SR). The methods to enlarge the wavelet sub-bands to produce SR results
accomplishing this task are usually classified into two in spatial domain. [23] explored interlaced sampling
categories: multi-frame super-resolution and single image structure in the low-resolution data for wavelet coefficients
super-resolution (SISR). In multi-frame super-resolution, interpolation. [24] formed a minimization problem to
multiple LR images that are captured from the same scene learn the suitable wavelet interpolation with a smooth prior.
are combined to generate the corresponding HR image [1, Since the detailed wavelet sub-bands are often sparse, it is
2]. In SISR, it is very common to utilize examples from suitable to apply sparse coding methods to estimate detailed

1104
wavelet coefficients and can significantly refine image benefit from sparsity of input and output, and the fact
details. Methods [25, 26, 27] used different interpolations that learning networks with sparse activations is much
related to sparse coding. Other attempts [28, 29] utilize easier and more robust. This motivates us to exploit
Markov chains and [30] used nearest neighbor to interpolate spatial wavelet coefficients which are naturally sparse.
wavelet coefficients. However, due to limited training More importantly, using residuals (differences) of wavelet
and straightforward prediction procedures, these methods coefficients as training data pairs further enhances the
are not powerful enough to process general input images sparsity of training data resulting in more efficient learning
and fail to deliver state-of-the-art SR results, especially of filters and activations. In other words, using wavelet
compared to more recent deep learning based methods for coefficients encourages activation sparsity in middle layers
super resolution. as well as output layer. Consequently, residuals for wavelet
Deep learning promotes the design of large scale coefficients themselves become sparser and therefore easier
networks [31, 32, 33] for a variety of problems including for the network to learn. In addition to this, wavelet
SR. To this end, deep neural networks were applied to coefficients decompose the image into sub-bands which
super resolution task. Among the first deep learning based provide structural information depending on the types of
super resolution methods, Dong et al. [34] trained a wavelets used. For example, Haar wavelets provide vertical,
deep convolution neural network (SRCNN) to accomplish horizontal and diagonal edges in wavelet sub-bands which
the image super-resolution task. In this work, the can be used to infer more structural information about
training set comprises of example LR inputs and their the image. Essentially our network uses complementary
corresponding HR output images which were fed as training structural information from other sub-bands to predict the
data to the SRCNN network. Combined with sparse desired high-resolution structure in each sub-band.
coding methods, [35] proposed a coupled network structure The main contributions of this paper are the following:
utilizing middle layer representations for generating SR 1) To the best of our knowledge, the proposed DWSR
results which reduced training and testing time. In different is the first approach to combine the complementarity of
approaches, Cui et al. [9] proposed a cascade network information (into low and high frequency sub-bands) in the
to gradually upscale LR images after each layer, while wavelet domain with a deep CNN. Specifically, wavelets
[17] trained a high complexity convolutional auto-encoder promote sparsity and also provide structural information
called Deep Joint Super Resolution (DJSR) to obtain the about the image. 2) In addition to a wavelet prediction
SR results. Self examples of images were explored in [36] network, we built on top of residual networks which fit well
where training sets exploit self-example similarity, which to the wavelet coefficients due to their sparsity promoting
leads to enhanced results. However, similar to SRCNN, nature and further enhancing it by inferring residuals. 3)
DJSR suffers from expensive computation in training and Our network has multiple input and output channels which
processing to generate the SR images. allows to learn different structures at different levels of
Recently, residual net [37] has shown great ability at the image. This complementary structural information in
reducing training time and faster convergence rate. Based wavelet coefficients helps in better reconstruction of SR
on this idea, a Very Deep Super-Resolution (VDSR) [38] results with less artifacts. Extensive experimental results
method is proposed which emphasizes on reconstructing validate that our approach produces less artifacts around
the residuals (differences) between LR and HR images edges and outperforms many state-of-the-art methods.
rather than putting too much effort on reconstructing
low frequency details of HR images. VDSR uses 20
2. 2D Discrete Wavelet Transformation (2dDWT)
convolutional layers producing state-of-the-art results in To perform a 1D Discrete Wavelet Transformation, a
super resolution and takes significantly shorter training signal x[n] ∈ RN is first passed through a half band high-
time for convergence; however, VDSR is massively pass filter GH [n] and a low-pass filter GL [n], which are
parameterized with these 20 layers. defined as (for Haar (“db1”) wavelet):
Motivations: Most of the deep learning based image 
1, n=0 (
super resolution methods work on spatial domain data

1, n = 0, 1
and aim to reconstruct pixel values as the output of GH [n] = −1, n = 1 , GL [n] =
 0, otherwise
network. In this work we explore the advantages of 0, otherwise

exploiting transform domain data in the SR task especially (1)
for capturing more structural information in the images After filtering, half of the samples can be eliminated
to avoid artifacts. In addition to this and motivated according to the Nyquist rule, since the signal now has a
by promising performance of VDSR and residual nets frequency bandwidth of π/2 radians instead of π.
in super resolution task, we propose our Deep Wavelet Any digital image x can be viewed as a 2D signal with
network for super resolution (DWSR). Residual networks index [n, m] where x[n, m] is the pixel value located at nth

105
AB a b
CD

2dDWT

LL HL
c d

2dIDWT

HR
LH HH

Figure 3: The 2dDWT and 2dIDWT. A, B, C, D are four


example pixels located in a 2×2 grid at the top left corner of
HR image. a, b, c, d are four pixels from the top left corner
of four sub-bands correspondingly.
Figure 2: The procedure of 1-level 2dDWT decomposition.

wavelet, the coefficients of 2dIDWT can be computed as:



column and mth row. The 2D signal x[n, m] can be treated 
 A=a+b+c+d
as 1D signals among the rows x[n, :] at a given nth column

B = a − b + c − d
and among the columns x[:, m] at a given mth row. A 1- (2)
 C =a+b−c−d
level 2D wavelet transform of an image can be captured


D =a−b−c+d

by following the procedure in Figure 2 along rows and
columns, respectively. As mentioned earlier, we are using where A, B, C, D and a, b, c, d represent the pixel values
Haar kernels in this work. from corresponding image/sub-bands.
An example of 1-level 2dDWT decomposition with Haar Therefore, with the help of wavelet transformation,
kernels is shown in Figure 3. The right part of Figure 3 is the SR problem becomes a wavelet coefficients prediction
the notation of each sub-band of wavelet coefficients. It problem. In this paper, we propose a new deep learning
is clear that the 2dDWT captures the image details in four based method to predict details of wavelet sub-bands from
sub-bands: average (LL), vertical(HL), horizontal(LH) and the input LR image. To the best of our knowledge, DWSR
diagonal(HH) information, which are corresponding to each is the first deep learning based wavelet SR method.
wavelet sub-bands coefficients. Note that after 2dDWT 3.1. Network Structure
decomposition, the combination of four sub-bands always
have the same dimension as the original input image. The structure of the proposed network is illustrated in
Figure 4. The proposed network has a deep structure similar
The 2d Inverse DWT (2dIDWT) can trace back the to the residual network [37] with two input and output layers
2dDWT procedure by inverting the steps in Figure 2. This with 4 channels. While most of deep learning based SR
allows the prediction of wavelet coefficients to generate SR methods have only one channel for input and output, our
results. Detailed wavelet decomposition introduction can be network takes four input channels into consideration and
found in [39]. produces four corresponding channels at the output. There
are 64 filters of size 4 × 3 × 3 in the first layer and 4 filters
of size 64 × 3 × 3 in the last layer. In the middle part of
the network, the network has N same-sized hidden layers
3. Proposed Method: Deep Wavelet Prediction with 64 × 3 × 3 × 64 filters each. The output of each layer,
for Super-resolution (DWSR) except the output layer, is fed into ReLU activation function
to generate a nonlinear activation map.
The SR can be viewed as the problem of restoring Usually, the CNN based SR methods only take valid
the details of the image given an input LR image. This regions into consideration while feeding forward the inputs.
viewpoint can be combined with wavelet decomposition. For example, in SRCNN [34], the network has three layers
As shown in Figure 3, if we treat the input image as an LL with filter size of 9 × 9, 1 × 1 then 5 × 5, from which
output of 1-level 2dDWT, predicting the HL, LH and HH we can compute the cropped out information width, which
sub-bands of the 2dDWT will give us the missing details is (9 + 1 + 5 − 3) = 12 pixels. During the training
of the LL image. Then one can use 2dIDWT to gather the process, SRCNN takes in sub-images of size 33 × 33, but
predicted details and generate the SR results. With Haar only produce outputs of size 21 × 21. This procedure is

106
2dDWT 2dIDWT

b b b b b b
+

LR Input Conv. 1 Conv. 2 Conv. N Output LRSB + ∆SB SR


LRSB ∆SB
SRSB
{LA, LV, LH, LD} {∆A, ∆V, ∆H, ∆D}
0.7 {SA, SV, SH, SD}
4 input channels 4 output channels
0.6

0.5
Histogram of ∆SB
0.4

0.2

0.1

0
0 50 100 150 200 250 300

Figure 4: Wavelet prediction for SR network structure: there are input layers which takes four channels and output layers
produce four channels. The network body has repeated N same-sized layers with ReLU activation functions. One example
of the input LRSB and network output ∆SB are plotted. The histogram of all coefficients in ∆SB is drawn to illustrate the
sparsity of the outputs.

unfavorable in our deep model since the final output could and diagonal details of the HR image, respectively.
be too small to contain any useful information. Then the difference ∆SB (residual) between correspond-
To solve this problem, we use zero padding at each layer ing LRSB and HRSB is computed as:
to keep the outputs having the same sizes as the inputs. In
this manner, we can produce the same size final outputs as ∆SB = HRSB − LRSB
the inputs. Later the experiments shows that with the special = {HA − LA, HV − LV, HH − LH, HD − LD}
wavelet sparsity, the padding will not affect the quality of = {∆A, ∆V, ∆H, ∆D}
the SR results. (5)
3.2. Training Procedure ∆SB is the target that we desire the network to produce with
To train the network, the low-resolution training images input LRSB. The feeding forward procedure is denoted as
are enlarged by bicubic interpolation with the original f (LRSB).
downscale factor. Then the enlarged LR images are passed The cost of the network outputs is defined as:
through the 2dDWT with Haar wavelet to produce four LR 1
wavelet Sub-Bands (LRSB) which is denoted as: cost = k∆SB − f (LRSB)k22 (6)
2
LRSB = {LA, LV, LH, LD} := 2dDWT{LR} (3) The weights and biases can be denoted as (Θ, b). Then
where the LA, LV, LH and LD are sub-bands containing the optimization problem is defined as:
wavelet coefficients for average, vertical, horizontal 1
and diagonal details of the LR image, respectively. (Θ, b) = arg min k∆SB − f (LRSB)k22 + λkΘk22 (7)
Θ,b 2
2dDWT{LR} denotes the 2dDWT of the LR image.
The transformation is also applied on the corresponding where the kΘk22 is the standard weight decay regularization
HR training images to produce four HR wavelet Sub-Bands with parameter λ.
(HRSB): Essentially, we want our network to learn the differences
between wavelet sub-bands of LR and HR images. By
HRSB = {HA,HV,HH,HD} := 2dDWT{HR} (4)
adding these differences (residual) to the input wavelet sub-
where the HA, HV, HH and HD denote the sub-bands con- bands, we will get the final super resolution wavelet sub-
taining wavelet coefficients for average, vertical, horizontal bands.

107
3.3. Generating SR Results By taking more structural similarity into account while
training, the proposed network increases both the PSNR
To produce SR results, the bicubic enlarged LR input
and SSIM assessments to deliver a visually improved
images are transformed by 2dDWT to produce LRSB as
SR result. Moreover, benefiting from wavelet domain
Equation (3). Then LRSB is fed forward through the trained
information, DWSR produces SR results with less artifacts
network to produce ∆SB. Adding LRSB and ∆SB together
while other methods suffers from misleading artificial
generates four SR wavelet Sub-Bands (SRSB) denoted as:
blocks introduced by bicubic (see Section 4.5).
SRSB = {SA, SV, SH, SD}
= LRSB + ∆SB 4. Experimental Evaluation
= {LA + ∆A, LV + ∆V, LH + ∆H, LD + ∆D} 4.1. Data Preparation
(8)
During the training phase, the NTIRE [41] 800 training
Finally, 2dIDWT generates the SR image results: images are used without augmentation. The NTIRE HR
images {Yi }800
i=1 are down-sampled by the factor of c.
SR = 2dIDWT{SRSB} (9) Then the down-sampled images are enlarged busing bicubic
interpolation by the same factor c to form the LR training
3.4. Understanding Wavelet Prediction
images {Xi }800
i=1 . Note that the image Yi is cropped so that
Training in wavelet domain can boost up the training and its width and height be multiple of c. Therefore Xi and
testing procedure. Using wavelet coefficients encourages Yi have the same size where Yi represents the HR training
activation sparsity in hidden layers as well as output image, Xi represents the corresponding LR training image.
layer. Moreover, by using residuals, wavelet coefficients Xi and Yi are then cropped to 41 × 41 pixels sub-images
themselves become sparser and therefore easier for the with 10 pixels overlapping for training.
network to learn sparse maps rather than dense ones. The For each sub-image from Xi , the LRSB is computed as
histogram in Figure 4 illustrates the sparse distribution of Equation (3). For each corresponding sub-image from Yi ,
all the ∆SB coefficients. This high level of sparsity further the HRSB is computed as Equation (4). Then the residual
reduces the training time required for the network resulting ∆SB is computed as Equation (5).
in more accurate super resolution results. During the testing phase, several standard testing data
In addition, training a deep network is actually to sets are used. Specifically, Set5 [13], Set14 [42], BSD100
minimize a cost function which is usually defined by [43], Urban100 [36] are used to evaluate our proposed
l2 norm. This particular norm is used because it method DWSR.
homogeneously describes the quality of the output image Both training and testing phases of DWSR only utilize
comparing to the ground truth. The image quality is then the luminance channel information. For color images,
quantified by the assessment metric PSNR. However, SSIM Cr and Cb channels are directly enlarged by bicubic
[40] has been proven to be a conceptually better way to interpolation from LR images. These enlarged chrominance
describe the quality of an image (comparing to the target) channels are combined with SR luminance channel to
which unfortunately can not be easily optimized. Nearly all produce color SR results.
the SR methods use SSIM as final testing metric but it is not
emphasized in the training procedure. 4.2. Training Settings
However, DWSR encourages the network to produce
more structural details. As shown in Figure 4, the SRSB During the training process, several training techniques
has more defined structural details than LRSB after adding are used. The gradients are clipped to 0.01 by norm clipping
the predicted ∆SB. With Haar wavelet, every fine detail has option in the training package. We use Adam optimizer as
different intensity of coefficients spreading in all four sub- described in [44] to updates Θ and b. The initial learning
bands. Overlaying four sub-bands together can enhance rate is 0.01 and decreases by 25% every 20 epochs. The
the structural details the network taking in by providing weight regulator is set to 1 × 10−3 to prevent over-fitting.
additional relationships between structural details. At Other than input and output layers, the DWSR has N = 10
a given spatial location, the first sub-band gives the same-sized convolutional hidden layers with filter size of
general information of the image, following three detailed 64 × 3 × 3 × 64. This configuration results in a network
sub-bands provide horizontal/vertical/diagonal structural with only half of parameters in VDSR [38].
information to the network at this location. The structural The training scheme is implemented with TensorFlow
correlation information between the sub-bands helps the [45] package with Python 2.7 interaction interface. We use
network weights forming in a way to emphases the fine one GTX TITAN X GPU 12 GB for both the training and
details. testing.

108
Original Bicubic ScSR A+ SelfEx

(19.0292, 0.6432) (19.0588, 0.65216) (20.1155, 0.72667) (20.6903, 0.7603)


SRCNN FSRCNN SCN VDSR DWSR

(20.0734, 0.7192) (20.0687, 0.7172) (20.0824, 0.7220) (20.8673, 0.7690) (21.1639, 0.7776)

Figure 5: Test image No.19 in Urban100 data set. From top left to bottom right are results of: ground truth, bicubic, ScSR,
A+, SelfEx, SRCNN, FSRCNN, SCN, VDSR, DWSR. The numeral assessments are labeled as (PSNR, SSIM). DWSR
(bottom right) produces more defined structures with better SSIM and PSNR than state-of-the-art methods.

4.3. Convergence Speed


7
Since the gradients are clipped to a numerical large
norm, with the high initial learning rate, DWSR reaches 6
Cost Evaluation

convergence with a really fast speed and produces practical


5
results (see following reported evaluations). Figure 6 shows
the convergence process during the training by plotting the 4
evaluation of cost over training epochs. After 100 epochs,
3
the network is fully converged and (Θ,b) is used for testing.
The training procedure for 100 epochs takes about 4 hours 2
to finish with one GPU. 101 102
Epoch
4.4. Comparison with State-of-the-Art
Figure 6: The evaluations of cost function (6) over training
We compare DWSR with several state-of-the-art
epochs for training scale factor 4. At 100 epoch, the
methods and use Bicubic as the baseline reference1 .
network training convergences.
ScSR [4] and A+ [15] are selected to represent the sparse
coding based and dictionary learning based methods. For
deep learning based methods, DWSR is compared with above for deep learning based methods. For FSRCNN,
SCN [46], SelfEx [36], FSRCNN [47], SRCNN [34] and SRCNN and sparse based methods we use their public CPU
VDSR [38]. We use publicly published testing codes from testing codes.
different authors, the tests are carried on GPU as mentioned Table 1 shows the summarized results of PSNR and
1 Please refer to http://signal.ee.psu.edu/DWSR.html for SSIM evaluations. The best results are shown in red
high quality color images and to download our code. and second best are shown in blue. DWSR has a clear

109
Original Bicubic ScSR A+ SelfEx

(16.5566, 0.4357) (16.5806, 0.43733) (17.3281, 0.52305) (17.8706, 0.5642)


SRCNN FSRCNN SCN VDSR DWSR

(17.3284, 0.5176) (17.1200, 0.5003) (17.4754, 0.5424) (18.1470, 0.6016) (18.3464, 0.6141)

Figure 7: Test image No.92 in Urban100 data set. From top left to bottom right are results of: ground truth, bicubic, ScSR,
A+, SelfEx, SRCNN, FSRCNN, SCN, VDSR, DWSR. The numeral assessments are labeled as (PSNR, SSIM). DWSR
(bottom right) produces more fine structures with better SSIM and PSNR than state-of-the-art methods. Also note DWSR
does not produce artifacts diagonal edges in the red circled region.

advantage on the large scaling factors owing to its reliance Visually, the edges are more enhanced in DWSR than other
on incorporating the structural information and correlation state-of-the-art methods and is clearly illustrated in the
from wavelet transform sub-bands. For large scale factors, enlarged areas. The image generated by DWSR has less
DWSR delivers better results than the best known method artifacts that are caused by initial bicubic interpolation of
(VDSR) with only half parameters benefiting from training LR image and results in sharper edges which are consistent
in wavelet feature domain. with the ground truth image. Also quite clearly, DWSR has
Table 2 shows the execution time of different methods. an advantage on reconstructing edges especially diagonal
Since DWSR only has half of the parameters than the most ones due to the fact that these structural information are
parameterized method (VDSR) and benefiting from really prominently emphasized with sub-bands in Haar wavelets
sparse network activations, DWSR takes much less time to coefficients.
apply super-resolution. For 2K images in NTIRE testing
set, DWSR takes less than 0.1s to produce the outputs of 4.5. Large Scaling Factor SR Artifacts
the network including the loading time from GPU. Figure 7 illustrates SR results from different methods
Figure 5 shows SR results of a testing image from with scale factor 4. DWSR produces more enhanced
Urban100 dataset with scale factor 4. Overall, deep learning details than state-of-the-art methods. Moreover, since the
based methods produce better results than sparse coding scale factor is large for bicubic interpolations to keep the
based and dictionary learning based methods. Compared structural information, some artificial blocks are introduced
to SRCNN, DWSR produces more defined structures during the bicubic enlargement. Meanwhile nearly all
benefiting from training in wavelet domain. Compared to the deep learning based methods are utilizing the bicubic
VDSR, DWSR results give higher PSNR and SSIM values interpolations as the starting point, these artificial blocks get
using less than half parameters of VDSR with a faster speed. more pronounced during the SR enhancements. Eventually,

110
Table 1: PSNR and SSIM result comparisons with other approaches for 4 different datasets.
Bicubic ScSR A+ SelfEx FSRCNN SRCNN VDSR DWSR
PSNR SSIM
[Baseline] [TIP 10] [ACCV 14] [CVPR 15] [ECCV 16] [PAMI 16] [CVPR 16] [ours]
x2 33.64 0.9292 35.78 0.9485 36.55 0.9544 36.47 0.9538 36.94 0.9558 36.66 0.9542 37.52 0.9586 37.43 0.9568
Set5 x3 30.39 0.8678 31.34 0.8869 32.58 0.9088 32.57 0.9092 33.06 0.9140 32.75 0.9090 33.66 0.9212 33.82 0.9215
x4 28.42 0.8101 29.07 0.8263 30.27 0.8605 30.32 0.8640 30.55 0.8657 30.48 0.8628 31.35 0.8820 31.39 0.8833
x2 30.22 0.8683 31.64 0.8940 32.29 0.9055 32.24 0.9032 32.54 0.9088 32.42 0.9063 33.02 0.9102 33.07 0.9106
Set14 x3 27.53 0.7737 28.19 0.7977 29.13 0.8188 29.16 0.8196 29.37 0.8242 29.28 0.8209 29.77 0.8308 29.83 0.8308
x4 25.99 0.7023 26.40 0.7218 27.33 0.7489 27.40 0.7518 27.50 0.7535 27.40 0.7503 28.01 0.7664 28.04 0.7669
x2 29.55 0.8425 30.77 0.8744 31.21 0.8864 31.18 0.8855 31.66 0.8920 31.36 0.8879 31.85 0.8960 31.80 0.8940
B100
x4 25.96 0.6672 26.61 0.6983 26.82 0.7087 26.84 0.7106 26.92 0.7201 26.84 0.7101 27.23 0.7238 27.25 0.7240
x2 26.66 0.8408 28.26 0.8828 29.20 0.8938 29.54 0.8967 29.87 0.9010 29.50 0.8946 30.76 0.9140 30.46 0.9162
Urban100
x4 23.14 0.6573 24.02 0.7024 24.32 0.7186 24.78 0.7374 24.61 0.7270 24.52 0.7221 25.18 0.7524 25.26 0.7548

Table 2: Results of the execution time comparison to other approaches

ScSR A+ SelfEx FSRCNN SRCNN VDSR DWSR


[TIP 10] [ACCV 14] [CVPR 15] [ECCV 16] [PAMI 16] [CVPR 16] [ours]
x2 80.22 0.58 45.76 0.30 2.56 0.13 0.06
Set5 x3 82.67 0.32 32.28 0.23 2.63 0.13 0.05
x4 84.88 0.24 29.32 0.26 2.16 0.12 0.06
x2 86.12 0.85 112.3 0.32 4.52 0.25 0.07
Set14 x3 91.52 0.59 76.02 0.42 4.25 0.26 0.08
x4 89.25 0.32 66.06 0.39 4.68 0.25 0.07
x2 98.03 0.60 62.02 0.32 2.65 0.16 0.09
B100
x4 100.43 0.26 36.67 0.39 2.98 0.26 0.12
x2 1021.06 2.96 663.66 2.23 23.2 0.98 0.33
Urban100
x4 1282.33 1.21 662.68 2.35 25.6 1.07 0.38

the enhancements on the artificial blocks produce artificial References


edges in the SR results. For instance, in Figure 7, these
[1] S. C. Park, M. K. Park, and M. G. Kang, “Super-
blocks and artificial edges are labeled within red circles
resolution image reconstruction: a technical overview,”
for bicubic and VDSR. The diagonal edges are introduced Signal Processing Magazine, IEEE, vol. 20, no. 3, pp. 21–
by SR enhancement on the artificial blocks from bicubic 36, 2003.
enlargement, which are not present in the ground truth
image. [2] S. Farsiu, M. D. Robinson, M. Elad, and P. Milanfar, “Fast
and robust multiframe super resolution,” Image processing,
However, DWSR utilizes wavelet coefficients to take in
IEEE Transactions on, vol. 13, no. 10, pp. 1327–1344, 2004.
more structural correlation information into account which
does not enhance the artificial blocks and produces edges [3] J. Yang, J. Wright, T. Huang, and Y. Ma, “Image super-
more similar to the ground truth. resolution as sparse representation of raw image patches,” in
Computer Vision and Pattern Recognition, IEEE Conference
on, pp. 1–8, 2008.
5. Conclusion
[4] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-
Our work presents a deep wavelet super resolution resolution via sparse representation,” Image Processing,
(DWSR) technique that recovers the “missing details” by IEEE Transactions on, vol. 19, no. 11, pp. 2861–2873, 2010.
using (low-resolution) wavelet sub-bands as inputs. DWSR [5] D. Glasner, S. Bagon, and M. Irani, “Super-resolution from
is significantly economical in the number of parameters a single image,” in Computer Vision, IEEE International
compared to most state-of-the-art methods and yet achieves Conference on, pp. 349–356, 2009.
competitive or better results. We contend that this is
[6] G. Freedman and R. Fattal, “Image and video upscaling from
because wavelets provide an image representation that local self-examples,” ACM Trans. Graph., vol. 28, no. 3,
naturally simplifies the mapping to be learned. While we pp. 1–10, 2010.
used the Haar wavelet, effects of different wavelet basis can
be examined in future work. Of particular interest could be [7] J. Yang, Z. Lin, and S. Cohen, “Fast image super-resolution
based on in-place example regression,” in Computer Vision
to learn the “optimal” wavelet basis for the SR task.
and Pattern Recognition, IEEE Conference on, pp. 1059–
1066, 2013.
6. Acknowledgment
[8] S. Minaee, A. Abdolrashidi, and Y. Wang, “Screen content
This work is supported by NSF Career Award to V. image segmentation using sparse-smooth decomposition,”
Monga. arXiv preprint arXiv:1511.06911, 2015.

111
[9] Z. Cui, H. Chang, S. Shan, B. Zhong, and X. Chen, “Deep [24] C. Jiji, M. V. Joshi, and S. Chaudhuri, “Single-frame
network cascade for image super-resolution,” in Computer image super-resolution using learned wavelet coefficients,”
Vision, ECCV, pp. 49–64, Springer, 2014. International journal of Imaging systems and Technology,
[10] W. T. Freeman, E. C. Pasztor, and O. T. Carmichael, vol. 14, no. 3, pp. 105–112, 2004.
“Learning low-level vision,” International journal of [25] S. Mallat and G. Yu, “Super-resolution with sparse mixing
computer vision, vol. 40, no. 1, pp. 25–47, 2000. estimators,” Image Processing, IEEE Transactions on,
[11] H. Chang, D.-Y. Yeung, and Y. Xiong, “Super-resolution vol. 19, no. 11, pp. 2889–2900, 2010.
through neighbor embedding,” in Computer Vision and [26] M. F. Tappen, B. C. Russell, and W. T. Freeman, “Exploiting
Pattern Recognition, IEEE Conference on, vol. 1, pp. I–I, the sparse derivative prior for super-resolution and image
2004. demosaicing,” in Statistical and Computational Theories of
Vision, IEEE Workshop on, Citeseer, 2003.
[12] K. I. Kim and Y. Kwon, “Single-image super-resolution
using sparse regression and natural image prior,” Pattern [27] W. Dong, L. Zhang, G. Shi, and X. Wu, “Image deblurring
Analysis and Machine Intelligence, IEEE transactions on, and supe r-resolution by adaptive sparse domain selection
vol. 32, no. 6, pp. 1127–1133, 2010. and adaptive regularization,” Image Processing, IEEE
Transactions on, vol. 20, no. 7, pp. 1838–1857, 2011.
[13] M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-
Morel, “Low-complexity single-image super-resolution [28] K. Kinebuchi, D. D. Muresan, and T. W. Parks, “Image
based on nonnegative neighbor embedding,” 2012. interpolation using wavelet based hidden markov trees,”
in Acoustics, Speech, and Signal Processing, IEEE
[14] R. Timofte, V. De, and L. Van Gool, “Anchored neighbor- International Conference on, vol. 3, pp. 1957–1960, 2001.
hood regression for fast example-based super-resolution,”
in Computer Vision, IEEE International Conference on, [29] S. Zhao, H. Han, and S. Peng, “Wavelet-domain hmt-
pp. 1920–1927, 2013. based image super-resolution,” in Image Processing, IEEE
International Conference on, vol. 2, pp. II–953, 2003.
[15] R. Timofte, V. De Smet, and L. Van Gool, “A+: Adjusted
[30] H. Chavez-Roman and V. Ponomaryov, “Super resolution
anchored neighborhood regression for fast super-resolution,”
image generation using wavelet domain interpolation
in Computer Vision, ACCV, pp. 111–126, Springer, 2014.
with edge extraction via a sparse representation,” IEEE
[16] K. Jia, X. Wang, and X. Tang, “Image transformation Geoscience and Remote Sensing Letters, vol. 11, no. 10,
based on learning dictionaries across image spaces,” Pattern pp. 1777–1781, 2014.
Analysis and Machine Intelligence, IEEE Transactions on,
[31] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning
vol. 35, no. 2, pp. 367–380, 2013.
algorithm for deep belief nets,” Neural computation, vol. 18,
[17] Z. Wang, Y. Yang, Z. Wang, S. Chang, W. Han, J. Yang, no. 7, pp. 1527–1554, 2006.
and T. S. Huang, “Self-tuned deep super resolution,” arXiv
[32] Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle, et al.,
preprint arXiv:1504.05632, 2015.
“Greedy layer-wise training of deep networks,” Advances in
[18] M. E.-S. Wahed, “Image enhancement using second neural information processing systems, vol. 19, p. 153, 2007.
generation wavelet super resolution,” International Journal [33] C. Poultney, S. Chopra, Y. L. Cun, et al., “Efficient learning
of Physical Sciences, vol. 2, no. 6, pp. 149–158, 2007. of sparse representations with an energy-based model,”
[19] H. Ji and C. Fermüller, “Robust wavelet-based super- in Advances in neural information processing systems,
resolution reconstruction: theory and algorithm,” Pattern pp. 1137–1144, 2006.
Analysis and Machine Intelligence, IEEE Transactions on, [34] C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a
vol. 31, no. 4, pp. 649–660, 2009. deep convolutional network for image super-resolution,” in
[20] H. Demirel, S. Izadpanahi, and G. Anbarjafari, “Improved Computer Vision, ECCV, pp. 184–199, Springer, 2014.
motion-based localized super resolution technique using dis- [35] T. Guo, H. S. Mousavi, and V. Monga, “Deep learning based
crete wavelet transform for low resolution video enhance- image super-resolution with coupled backpropagation,”
ment,” in Signal Processing, IEEE European Conference on, in Signal and Information Processing, IEEE Global
pp. 1097–1101, 2009. Conference on, pp. 237–241, 2016.
[21] M. D. Robinson, C. A. Toth, J. Y. Lo, and S. Farsiu, “Ef- [36] J.-B. Huang, A. Singh, and N. Ahuja, “Single image super-
ficient fourier-wavelet super-resolution,” Image Processing, resolution from transformed self-exemplars,” in Computer
IEEE Transactions on, vol. 19, no. 10, pp. 2669–2681, 2010. Vision and Pattern Recognition, IEEE Conference on,
[22] G. Anbarjafari and H. Demirel, “Image super resolution pp. 5197–5206, 2015.
based on interpolation of wavelet domain high frequency [37] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning
subbands and the spatial domain input image,” ETRI journal, for image recognition,” in Computer Vision and Pattern
vol. 32, no. 3, pp. 390–394, 2010. Recognition, IEEE Conference on, pp. 770–778, 2016.
[23] N. Nguyen and P. Milanfar, “An efficient wavelet-based [38] J. Kim, J. K. Lee, and K. M. Lee, “Accurate image
algorithm for image superresolution,” in Image Processing. super-resolution using very deep convolutional networks,” in
IEEE International Conference on, vol. 2, pp. 351–354, Computer Vision and Pattern Recognition, IEEE Conference
2000. on, June 2016.

112
[39] S. Mallat, A wavelet tour of signal processing: the sparse
way. Academic press, 2008.
[40] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli,
“Image quality assessment: from error visibility to structural
similarity,” Image Processing, IEEE Transactions on,
vol. 13, no. 4, pp. 600–612, 2004.
[41] R. Timofte, E. Agustsson, L. Van Gool, M.-H. Yang,
L. Zhang, et al., “Ntire 2017 challenge on single image
super-resolution: Methods and results,” in Computer Vision
and Pattern Recognition Workshops, IEEE Conference on,
July 2017.
[42] R. Zeyde, M. Elad, and M. Protter, “On single image scale-
up using sparse-representations,” in International conference
on curves and surfaces, pp. 711–730, Springer, 2010.
[43] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database
of human segmented natural images and its application
to evaluating segmentation algorithms and measuring
ecological statistics,” in Proc. 8th Int’l Conf. Computer
Vision, vol. 2, pp. 416–423, July 2001.
[44] D. Kingma and J. Ba, “Adam: A method for stochastic
optimization,” arXiv preprint arXiv:1412.6980, 2014.
[45] M. Abadi, A. Agarwal, and P. B. et. al., “TensorFlow: Large-
scale machine learning on heterogeneous systems,” 2015.
Software available from tensorflow.org.
[46] Z. Wang, D. Liu, J. Yang, W. Han, and T. Huang, “Deep
networks for image super-resolution with sparse prior,”
in Computer Vision, IEEE International Conference on,
pp. 370–378, 2015.
[47] C. Dong, C. C. Loy, and X. Tang, “Accelerating the
super-resolution convolutional neural network,” in Computer
Vision, ECCV, pp. 391–407, Springer, 2016.

113

You might also like