Image Super-Resolution As Sparse Representation of Raw Image Patches
Image Super-Resolution As Sparse Representation of Raw Image Patches
1
directly used to recover the corresponding high-resolution
patch from D ~ . We obtain a locally consistent solution by
allowing patches to overlap and demanding that the recon-
structed high-resolution patches agree on the overlapped ar-
eas. Finally, we apply global optimization to eliminate the
reconstruction errors in the recovered high-resolution im-
age from local sparse representation, suppressing noise and
ensuring consistency with the low-resolution input.
Figure 1. Reconstruction of a raccoon face with magnification fac-
Compared to the aforementioned learning-based meth-
tor 2. Left: result by our method. Right: the original image. There ods, our algorithm requires a much smaller database. The
is little noticeable difference. online recovery of the sparse representation uses the low-
resolution dictionary only – the high-resolution dictionary
is used only to calculate the final high-resolution image.
by recent results in sparse signal representation, which en-
The computation, mainly based on linear programming, is
sure that linear relationships among high-resolution signals
reasonably efficient and scalable. In addition, the computed
can be precisely recovered from their low-dimensional pro- sparse representation adaptively selects the most relevant
jections [3, 9].
patches in the dictionary to best represent each patch of the
To be more precise, let D ∈ Rn×K be an overcomplete given low-resolution image. This leads to superior perfor-
dictionary of K prototype signal-atoms, and suppose a sig- mance, both qualitatively and quantitatively, compared to
nal x ∈ Rn can be represented as a sparse linear combi-
methods [5] that use a fixed number of nearest neighbors,
nation of these atoms. That is, the signal vector x can be generating sharper edges and clearer textures.
written as x = Dα0 where α0 ∈ RK is a vector with very
The remainder of this paper is organized as follows. Sec-
few (≪ K) nonzero entries. In practice, we might observe
tion 2 details our formulation and solution to the image
only a small set of measurements y of x:
super-resolution problem based on sparse representation. In
.
y = Lx = LDα0 , (1) Section 3, we discuss how to prepare a dictionary from sam-
ple images and what features to use. Various experimental
where L ∈ Rk×n with k < n. In the super-resolution results in Section 4 demonstrate the efficacy of sparsity as a
context, x is a high-resolution image (patch), while y is prior for image super-resolution.
its low-resolution version (or features extracted from it). If
the dictionary D is overcomplete, the equation x = Dα is
underdetermined for the unknown coefficients α. The equa- 2. Super-resolution from Sparsity
tion y = LDα is even more dramatically underdetermined. The single-image super-resolution problem asks: given a
Nevertheless, under mild conditions, the sparsest solution low-resolution image Y , recover a higher-resolution image
α0 to this equation is unique. Furthermore, if D satisfies X of the same scene. The fundamental constraint is that the
an appropriate near-isometry condition, then for a wide va- recovered X should be consistent with the input, Y :
riety of matrices L, any sufficiently sparse linear represen-
Reconstruction constraint. The observed low-resolution
tation of a high-resolution image x in terms of the D can be
image Y is a blurred and downsampled version of the solu-
recovered (almost) perfectly from the low-resolution image
tion X:
[9, 21]. Figure 1 shows an example that demonstrates the Y = DHX (2)
capabilities of our method derived from this principle. Even Here, H represents a blurring filter, and D the downsam-
for this complicated texture, sparse representation recovers pling operator.
a visually appealing reconstruction of the original signal. Super-resolution remains extremely ill-posed, since for
Recently sparse representation has been applied to many a given low-resolution input Y , infinitely many high-
other related inverse problems in image processing, such resolution images X satisfy the above reconstruction con-
as compression, denoising [10], and restoration [17], often straint. We regularize the problem via the following prior
improving on the state-of-the-art. For example in [10], the on small patches x of X:
authors use the K-SVD algorithm [1] to learn an overcom-
Sparse representation prior. The patches x of the high-
plete dictionary from natural image patches and success-
resolution image X can be represented as a sparse linear
fully apply it to the image denoising problem. In our set-
combination in a dictionary D ~ of high-resolution patches
ting, we do not directly compute the sparse representation
sampled from training images:1
of the high-resolution patch. Instead, we will work with two
coupled dictionaries, D ~ for high-resolution patches, and x ≈ D ~ α for some α ∈ RK with kαk0 ≪ K. (3)
Dℓ = LD~ for low-resolution patches. The sparse repre- 1 Similar mechanisms – sparse coding with an overcomplete dictionary
sentation of a low-resolution patch in terms of Dℓ will be – are also believed to be employed by the human visual system [19].
To address the super-resolution problem using the sparse Solving (6) individually for each patch does not guar-
representation prior, we divide the problem into two steps. antee compatibility between adjacent patches. We enforce
First, using the sparse prior (3), we find the sparse repre- compatibility between adjacent patches using a one-pass
sentation for each local patch, respecting spatial compati- algorithm similar to that of [13].3 The patches are pro-
bility between neighbors. Next, using the result from this cessed in raster-scan order in the image, from left to right
local sparse representation, we further regularize and refine and top to bottom. We modify (5) so that the super-
the entire image using the reconstruction constraint (2). In resolution reconstruction D ~ α of patch y is constrained to
this strategy, a local model from the sparse prior is used closely agree with the previously computed adjacent high-
to recover lost high-frequency for local details. The global resolution patches. The resulting optimization problem is
model from the reconstruction constraint is then applied to
remove possible artifacts from the first step and make the min kαk1 s.t. kF Dℓ α − F yk22 ≤ ǫ1
(7)
image more consistent and natural. kP D~ α − wk22 ≤ ǫ2 ,
2.1. Local Model from Sparse Representation where the matrix P extracts the region of overlap be-
tween current target patch and previously reconstructed
As in the patch-based methods mentioned previously, high-resolution image, and w contains the values of the pre-
we try to infer the high-resolution patch for each low- viously reconstructed high-resolution image on the overlap.
resolution patch from the input. For this local model, we The constrained optimization (7) can be similarly reformu-
have two dictionaries D ℓ and D~ : D~ is composed of high- lated as:
resolution patches and D ℓ is composed of corresponding min λkαk1 + 21 kD̃α − ỹk22 , (8)
low-resolution patches. We subtract the mean pixel value
for each patch, so that the dictionary represents image tex- F Dℓ Fy
where D̃ = and ỹ = . The parameter β
tures rather than absolute intensities. βP D ~ βw
controls the tradeoff between matching the low-resolution
For each input low-resolution patch y, we find a sparse
input and finding a high-resolution patch that is compatible
representation with respect to Dℓ . The corresponding high-
with its neighbors. In all our experiments, we simply set
resolution patches D~ will be combined according to these
β = 1. Given the optimal solution α∗ to (8), the high-
coefficients to generate the output high-resolution patch x.
resolution patch can be reconstructed as x = D~ α∗ .
The problem of finding the sparsest representation of y can
be formulated as: 2.2. Enforcing Global Reconstruction Constraint
min kαk0 s.t. kF Dℓ α − F yk22 ≤ ǫ, (4) Notice that (5) and (7) do not demand exact equality
between the low-resolution patch y and its reconstruction
where F is a (linear) feature extraction operator. The main Dℓ α. Because of this, and also because of noise, the
role of F in (4) is to provide a perceptually meaningful con- high-resolution image X 0 produced by the sparse repre-
straint2 on how closely the coefficients α must approximate sentation approach of the previous section may not satisfy
y. We will discuss the choice of F in Section 3.
the reconstruction constraint (2) exactly. We eliminate this
Although the optimization problem (4) is NP-hard in discrepency by projecting X 0 onto the solution space of
general, recent results [7, 8] indicate that as long as the DHX = Y , computing
desired coefficients α are sufficiently sparse, they can be
efficiently recovered by instead minimizing the ℓ1 -norm, as X ∗ = arg min kX − X 0 k s.t. DHX = Y . (9)
X
follows:
The solution to this optimization problem can be efficiently
min kαk1 s.t. kF Dℓ α − F yk22 ≤ ǫ. (5) computed using the back-projection method, originally de-
veloped in computer tomography and applied to super-
Lagrange multipliers offer an equivalent formulation
resolution in [15, 4]. The update equation for this iterative
min λkαk1 + 12 kF Dℓ α − F yk22 , (6) method is
X t+1 = X t + ((Y − DHX t ) ↑ s) ∗ p, (10)
where the parameter λ balances sparsity of the solution and
fidelity of the approximation to y. Notice that this is es- where X t is the estimate of the high-resolution image af-
sentially a linear regression regularized with ℓ1 -norm on the ter the t-th iteration, p is a “backprojection” filter, and ↑ s
coefficients, known in statistical literature as the Lasso [24]. denotes upsampling by a factor of s.
2 Traditionally,one would seek the sparsest α s.t. kDℓ α − yk2 ≤ ǫ. 3 There are different ways to enforce compatibility. In [5], the values in
For super-resolution, it is more appropriate to replace this 2-norm with a the overlapped regions are simply averaged, which will result in blurring
quadratic norm k · kF T F that penalizes visually salient high-frequency effects. The one-pass algorithm [13] is shown to work almost as well as
errors. the use of a full MRF model [12].
Algorithm 1 (Super-resolution via Sparse Representation). may take the form of a generic regularization term (e.g.,
1: Input: training dictionaries D ~ and D ℓ , a low- Huber MRF, Total Variation, Bilateral Total Variation).
resolution image Y . Algorithm 1 can be interpreted as a computationally effi-
2: for each 3 × 3 patch y of Y , taken starting from the cient approximation to (11). The sparse representation step
upper-left corner with 1 pixel overlap in each direction, recovers the coefficients α by approximately minimizing
• Solve the optimization problem with D̃ and ỹ de- the sum of the second and third terms of (11). The sparsity
fined in (8): min λkαk1 + 21 kD̃α − ỹk22 . term kαij k0 is relaxed to kαij k1 , while the high-resolution
fidelity term kD~ αij −Pij Xk2 is approximated by its low-
• Generate the high-resolution patch x = D ~ α∗ . resolution version kF Dℓ αij − F y ij k2 .
Put the patch x into a high-resolution image X 0 . Notice, that if the sparse coefficients α are fixed, the
3: end third term of (11) essentially penalizes the difference be-
4: Using back-projection, find the closest image to X 0
which satisfies the reconstruction constraint:
tween the super-resolution image P X and the reconstruc-
tion given by the coefficients: i,j kD~ αij − Pij Xk22 ≈
X ∗ = arg min kX − X 0 k s.t. DHX = Y . kX 0 − Xk22 . Hence, for small γ, the back-projection step
X
5: Output: super-resolution image X ∗ . of Algorithm 1 approximately minimizes the sum of the first
and third terms of (11).
Algorithm 1 does not, however, incorporate any prior be-
We take result X ∗ from backprojection as our final es- sides sparsity of the representation coefficients – the term
timate of the high-resolution image. This image is as close ρ(X) is absent in our approximation. In Section 4 we will
as possible to the initial super-resolution X 0 given by spar- see that sparsity in a relevant dictionary is a strong enough
sity, while satisfying the reconstruction constraint. The en- prior that we can already achieve good super-resolution per-
tire super-resolution process is summarized as Algorithm 1. formance. Nevertheless, in settings where further assump-
tions on the high-resolution signal are available, these pri-
2.3. Global Optimization Interpretation ors can be incorperated into the global reconstruction step
of our algorithm.
The simple SR algorithm outlined above can be viewed
as a special case of a general sparse representation frame-
work for inverse problems in image processing. Related 3. Dictionary Preparation
ideas have been profitably applied in image compression,
denoising [10], and restoration [17]. These connections
3.1. Random Raw Patches from Training Images
provide context for understanding our work, and also sug- Learning an over-complete dictionary capable of opti-
gest means of further improving the performance, at the cost mally representing broad classes of image patches is a dif-
of increased computational complexity. ficult problem. Rather than trying to learn such a dictionary
Given sufficient computational resources, one could [19, 1] or using a generic set of basis vectors [21] (e.g.,
in principle solve for the coefficients associated with Fourier, Haar, curvelets etc.), we generate dictionaries by
all patches simultaneously. Moreover, the entire high- simply randomly sampling raw patches from training im-
resolution image X itself can be treated as a variable. ages of similar statistical nature. We will demonstrate that
Rather than demanding that X be perfectly reproduced by so simply prepared dictionaries are already capable of gen-
the sparse coefficients α, we can penalize the difference be- erating high-quality reconstructions,4 when used together
tween X and the high-resolution image given by these co- with the sparse representation prior.
efficients, allowing solutions that are not perfectly sparse, Figure 2 shows several training images and the patches
but better satisfy the reconstruction constraints. This leads sampled from them. For our experiments, we prepared
to a large optimization problem: two dictionaries: one sampled from flowers (Figure 2 top),
X which will be applied to generic images with relative sim-
X ∗ = arg min kDHX − Y k22 + η
kαij k0 ple textures, and one sampled from animal images (Figure
X,{αij }
i,j
X (11) 2 bottom), with fine furry or fractal textures. For each high-
+γ kD~ αij − Pij Xk22 + τ ρ(X) . resolution training image X, we generate the correspond-
i,j ing low-resolution image Y by blurring and downsampling.
For each category of images, we sample only about 100,000
Here, αij denotes the representation coefficients for the patches from about 30 training images to form each dic-
(i, j)th patch of X, and Pij is a projection matrix that se- tionary, which is considerably smaller than that needed by
lects the (i, j)th patch from X. ρ(X) is a penalty function
that encodes prior knowledge about the high-resolution im- 4 The competitiveness of such random patches has also been noticed
age. This function may depend on the image category, or empirically in the context of content-based image classification [18].
35
30
Number of Supports
25
20
15
10
0
0 50 100 150 200 250 300
Patch Index
Figure 2. Left: three out of the 30 training images we use in our
experiments. Right: the training patches extracted from them. Figure 3. Number of nonzero coefficients in the sparse representa-
tion computed for 300 typical patches in a test image.
the high-resolution patches. The features are not extracted slightly from the original. The feature for the low-resolution patch is not
extracted from the original 3 × 3 patch, which will give smoother results,
directly from the 3 × 3 low-resolution patch, but rather from but on the upsampled low-resolution patch. We find that setting K = 15
an upsampled version produced by bicubic interpolation. gives the best performance. This is approximately the average number of
For color images, we apply our algorithm to the illuminance coefficients recovered by sparse representation (see Figure 3).
Figure 4. The flower and girl image magnified by a factor of 3. Left to right: input, bicubic interpolation, neighbor embedding [5], our
method, and the original. (Also see Figure 8 for the same girl image magnified by a factor of 4).
Figure 5. Results on an image of the Parthenon with magnification factor 3. Top row: low-resolution input, bicubic interpolation, back
projection. Bottom row: neighbor embedding[5], soft edge prior [6], and our method.
proposed method based on a learned soft edge prior [6]. The come more difficult than on images with simpler textures,
result from back projection has many jagged effects along such as flowers or faces. In Figure 6, we apply our method
the edges. Neighbor embedding generates sharp edges in to the same raccoon face image with magnification factor 3.
places, but blurs the texture on the temple’s facade. The Since there are no explicit edges in most part of the image,
soft edge prior method gives a decent reconstruction, but methods proposed in [12], [23], and [6] would have tremen-
introduces undesired smoothing that is not present in our dous difficulty here. Compared to neighbor embedding [5],
result. Additional results on generic images using this dic- our method gives clearer fur and sharper whiskers. Figure 7
tionary are shown in Figure 7 left and center. Notice that in shows an additional image of a cat face reconstructed using
both cases, the algorithm significantly improves the image this dictionary. We compare several SR methods quantita-
resolution by sharpening edges and textures. tively in terms of their RMS errors for some of the images
We now conduct more challenging experiments on more shown above. The results are shown in Table 1.
intricate textures found in animal images, using the ani- Finally, we test our algorithm on the girl image again, but
mal dictionary with merely 100,000 training patches (sec- with a more challenging magnification factor 4. The results
ond row of Figure 2). As already shown in Figure 1, our are shown in Figure 8. Here, back-projection again yields
method performs quite well in magnifying the image of a jagged edges. Freeman et. al’s method [12] introduces many
raccoon face by a factor of 2. When complex textures such artifacts and fails to capture the facial texture, despite rely-
as this one are down-sampled further, the SR task will be- ing on a much larger database. Compared to the soft edge
Figure 6. A raccoon face magnified by a factor of 3. The input image, bicubic interpolation, neighbor embedding, and our method.
Figure 7. More results on a few more generic (left and center) and animal (right) images. Top: input images. Bottom: super-resolution
images by our method, with magnification factor 3.
Figure 8. The girl image magnified by a factor of 4. From left to right: low-resolution input, back projection, learning-based method in
[12], soft edge prior [6], and our method.
Images Bicubic NE [5] Our method [8] D. L. Donoho. For most large underdetermined systems
Flower 3.5052 4.1972 3.2276 of linear equations, the minimal ℓ1 -norm near-solution ap-
Girl 5.9033 6.6588 5.6175 proximates the sparsest near-solution. Preprint, accessed at
Parthenon 12.7431 13.5562 12.2491 http://www-stat.stanford.edu/˜donoho/. 2004. 3
Raccoon 9.7399 9.8490 9.1874 [9] D. L. Donoho. Compressed sensing. Preprint, accessed at
http://www-stat.stanford.edu/˜donoho/. 2005. 2
Table 1. The RMS errors of different methods for super-resolution
with magnification factor 3, respect to the original images. [10] M. Elad and M. Aharon. Image denoising via sparse and
redundant representations over learned dictionaries. IEEE
TIP, Vol. 15, No. 12, 2006. 2, 4
prior method [6], our method generates shaper edges and is [11] S. Farsiu, M. D. Robinson, M. Elad, and P. Milanfar. Fast
more faithful to the original facial texture. and robust multiframe super-resolution. IEEE TIP, 2004. 1
[12] W. T. Freeman, E. C. Pasztor, and O. T. Carmichael. Learn-
ing low-level vision. IJCV, 2000. 1, 3, 5, 6, 7, 8
5. Discussion
[13] W. T. Freeman, T. R. Jones, and E. C. Pasztor. Example-
The experimental results of the previous section demon- based super-resolution. IEEE Computer Graphics and Ap-
strate the effectiveness of sparsity as a prior for learning- plications, Vol. 22, Issue 2, 2002. 3
based super-resolution. However, one of the most important [14] R.C. Hardie, K.J. Barnard, and E.A. Armstrong. Joint MAP
questions for future investigation is to determine, in terms registration and high-resolution image estimation using a se-
of the within-category variation, the number of raw sam- quence of undersampled images. IEEE TIP, 1997. 1
ple patches required to generate a dictionary satisfying the [15] M. Irani and S. Peleg. Motion analysis for image enhance-
sparse representation prior. Tighter connections to the the- ment: resolution, occlusion and transparency. JVCI, 1993.
ory of compressed sensing may also yield conditions on the 3
appropriate patch size or feature dimension. [16] C. Liu, H. Y. Shum, and W. T. Freeman. Face hallucina-
From a more practical standpoint, it would be desirable tion: theory and practice. IJCV, Vol. 75, No. 1, pp. 115-134,
to have a way of effectively combining dictionaries to work October, 2007. 1
with images containing multiple types of textures or mul- [17] J. Mairal, G. Sapiro, and M. Elad. Learning multi-
tiple object categories. One approach to this would inte- scale sparse representations for image and video restoration.
grate supervised image segmentation and super-resolution, SIAM Multiscale Modeling and Simulation, 2008. 2, 4
applying the appropriate dictionary within each segment. [18] E. Nowak, F. Jurie, and B. Triggs. Sampling strategies for
bag-of-features image classification. Proc. ECCV, 2006. 4
[19] B. Olshausen and D. Field. Sparse coding wih an overcom-
References plete basis set: A strategy employed by V1? Vision Re-
[1] M. Aharon, M. Elad, and A. Bruckstein. K-SVD: An algo- search, 37:3311-3325, 1997. 3, 4
rithm for designing overcomplete dictionaries for sparse rep- [20] L. C. Pickup, S. J. Roberts, and A. Zisserman. A sampled
resentation. IEEE Transactions on Signal Processing, Vol. texture prior for image super-resolution. Proc. NIPS, 2003.
54, No. 11, Novermber 2006. 2, 4 1
[2] S. Baker and T. Kanade. Limits on super-resolution and how [21] H. Rauhut, K. Schnass, and P. Vandergheynst. Compressed
to break them. IEEE TPAMI, 24(9):1167-1183, 2002. 1 sensing and redundant dictionaries. Preprint, accessed at
http://homepage.univie.ac.at/holger.rauhut/.
[3] E. Candes. Compressive sensing. Proc. International
2007. 2, 4
Congress of Mathematicians, 2006. 2
[22] S. T. Roweis and L. K. Saul. Nonlinear dimensionality re-
[4] D. Capel. Image mosaicing and super-resolution. Ph.D. duction by locally linear embedding. Science, 290(5500):
Thesis, Department of Eng. Science, University of Oxford, 2323-2326, 2000. 1
2001. 3
[23] J. Sun, N.-N. Zheng, H. Tao, and H. Shum. Image halluci-
[5] H. Chang, D.-Y. Yeung, and Y. Xiong. Super-resolution nation with primal sketch priors. Proc. CVPR, 2003. 1, 5,
through neighbor embedding. CVPR, 2004. 1, 2, 3, 5, 6, 6
8 [24] R. Tibshirani. Regression shrinkge and selection via the
[6] S. Dai, M. Han, W. Xu, Y. Wu, and Y. Gong. Soft edge Lasso. J. Royal Statist. Soc B., Vol. 58, No. 1, pages 267-
smoothness prior for alpha channel super resolution Proc. 288, 1996. 3
ICCV, 2007. 6, 7, 8 [25] M. E. Tipping and C. M. Bishop. Bayesian image super-
[7] D. L. Donoho. For most large underdetermined systems of resolution. Proc. NIPS, 2003. 1
linear equations, the minimal ℓ1 -norm solution is also the [26] Q. Wang, X. Tang, and H. Shum. Patch based blind image
sparsest solution. Comm. on Pure and Applied Math, Vol. super resolution. Proc. ICCV, 2005. 1
59, No. 6, 2006. 3