Super Resolution Imaging by Peyman
Super Resolution Imaging by Peyman
IMAGING
SUPER-RESOLUTION
IMAGING
Edited by
Peyman Milanfar
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copy-
right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-
vides licenses and registration for a variety of users. For organizations that have been granted a photo-
copy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
To my wife Sheila, and
our children Leila and Sara
Contents
Preface xv
vii
viii Contents
Index 449
Preface
“Enhance 34 to 36. Pan right and pull back. Stop. Enhance 34 to 46. Pull
back. Wait a minute, go right, stop. Enhance 57 to 19. Track 45 left. Stop.
Enhance 15 to 23. Give me a hard copy right there.”
If you have never seen the movie Blade Runner, you should. Aside from be-
ing one of the greatest science fiction films of all time, it is uniquely relevant
to the subject of this book: almost 30 years ago, the opening scene of this
movie foresaw the idea of super-resolution. In the intervening years, a great
deal has transpired: computing power has increased by orders of magnitude,
digital cameras are everywhere, and of course digital displays have become
magnificently detailed. Along with these advances, the public’s expectations
for high-quality imagery has naturally intensified, often out of proportion with
the state-of-the-art technology. In fact, in the last few years, the visual quality
of captured images and video has not kept pace with these lofty expectations.
By packing increasingly larger number of pixels into ever smaller spaces, and
using less sophisticated optical elements, public, commercial, and official users
alike have seen an overall decline in the visual quality of their recorded con-
tent. So despite what might at first seem like a losing battle against better
and cheaper sensors, super-resolution technology (and image enhancement
more generally) has really become more relevant than ever. Given that almost
all recorded visual content is now enhanced in one form or another by just
about every digital camera sold today, it is not entirely outrageous then to
believe that before long, super-resolution will become the ”killer application”
for imaging.
Ironically, only two years after the release of Blade Runner, the semi-
nal paper by Tsai and Huang kick-started the modern idea of computational
super-resolution. While results in sampling theory dating as far back as the
’50s (Yan) and the ’70s (Papoulis) had hinted at the idea, it was Tsai and
Huang who explicitly showed that, at least in theory, it was possible to im-
prove resolution by registering and fusing multiple images. The rest, as they
say, is history. We are fortunate to be able to write a little bit of that history
in this book. In the last five years or so, the field of super-resolution imaging
has truly flourished both academically and commercially. The growing impor-
tance of super-resolution imaging has manifested itself in an explosive growth
in the number of papers in this area and citations to these papers (a few dozen
xv
xvi Preface
in 1994, to more than 500 in 2004, and to more than 2000 in 2008). What has
been missing, however, is a book to not only gather key recent contributions
in one place, but also to serve as a starting point for those interested in this
field to begin learning about and exploring the state of the art. This is what
this book hopes to accomplish.
As is probably well-known by now, every super-resolution algorithm ever
developed is sabotaged by at least one spoke of our triumvirate “axis of evil”:
the need for (1) (subpixel) accurate motion estimation, (2) (spatially varying)
deblurring, and (3) robustness to modeling and stochastic errors. To be sure,
these are not independent problems and should ideally be treated in unison
(ambitious graduate students take note!). But realistically, each is sufficiently
complex as to merit its own section in the library, or at least a couple of nice
chapters in this book. This books gathers contributions that will present the
reader with a snapshot of where the field stands, a reasonable idea of where the
field is heading—and perhaps where it should be heading. Chapter 1 provides
an introduction to the history of the subject that should be of broad interest.
Indeed, the collection of citations summarized in this chapter is an excellent
wellspring for continued research on super-resolution.
One of the most active areas of work in image and video enhancement
in recent years has been the subject of locally adaptive processing methods,
which are discussed in Chapters 2, 3, 4, and 5. In contrast to globally optimal
methods (treated later in the book), these methods are built on the notion
that processing should be strongly tailored to the local behavior of the given
data. An explicit goal in some cases, and a happy consequence in others,
local processing enables us to largely avoid direct and detailed estimation of
motion. Readers interested in methods for explicit motion estimation will find
an excellent overview of modern techniques in Chapter 6.
While motion estimation is typically the first step in many super-resolution
algorithms, deblurring is typically the last step. Unfortunately, having been
relegated to the last position has meant that this important aspect of enhance-
ment has not received the respect and attention it deserves. Despite heavy
recent activity in both the image processing and machine vision community,
and some notable successes, deblurring even in its simplest (space-invariant,
known point-spread-function) form is still largely an unsolved problem. Inas-
much as we would like to hope, blur almost never manifests itself in a spatially
uniform fashion. In Chapter 7, the reader will find a well-motivated and di-
rect attack at this challenging problem. Despite our best efforts, a sequential
approach to super-resolution consisting of motion estimation, fusion, and de-
blurring will always be subject to the vagaries of the data, the models, and
noise. As such, building robustness into the reconstruction process, as treated
in Chapter 8, is vital if the algorithm is to be practically useful.
As with most inverse problems, super-resolution is highly ill-posed. In the
most general case, the motion between the frames, the blur kernel(s), and the
high-resolution image of interest are three interwoven unknowns that should
ideally be estimated together (rather than sequentially), and whose effect is
Preface xvii
directly felt in the three points of weakness to which I referred earlier. Prin-
cipled Bayesian statistical approaches addressing these issues are presented
in Chapters 9 and 10, where the ever important prior information is brought
to bear on the super-resolution problem. Of course, prior information can be
brought to the table either in “bulk” form as a statistical distribution, or in
more specific “piecemeal” form as examples. Naively speaking, this distinction
is indeed what leads us to learning-based methods described in Chapter 11.
In the final three chapters of the book, we concentrate on applications.
Among the many areas of science to which super-resolution has been success-
fully applied in recent years, medicine and remote sensing have perhaps seen
the most direct impact. In Chapter 12, a novel application of super-resolution
to massive multispectral remote-sensing data sets is detailed. Medical imaging
applications of super-resolution in Chapter 13 discuss two important problems.
In what should be good news to everyone, high resolution X-ray imaging is
made possible at lower radiation dosages thanks to super-resolution. In an-
other application, detailed imaging of the retina is made possible for diagnostic
purposes. Finally, in Chapter 14, a successful commercial application of super-
resolution (in which I was fortunate to have a hand) is discussed. This chapter
is quite informative not only because of the interesting perspective it provides,
but also because of the valuable practical nuggets it imparts to the reader.
It is an interesting glimpse into what it really takes to make super-resolution
tick.
Perhaps it is worth saying a few words about how this book can be used.
As with any edited volume, it is intended to provide a snapshot of the field,
which is sure to evolve over time. Yet, I have endeavored to organize the
chapters to be used as a teaching tool as well. Indeed, the first four chapters
can easily be incorporated into the latter part of a graduate-level course on
image processing. Other selected chapters of the book can be used to offer
short courses on the subject to a wide audience of engineers and scientists.
The book as a whole can also be used as a text for a semester-long focused
seminar course on the topic, with one or two lectures dedicated to each chapter.
It is hoped that over time the authors may provide supplementary material
for each chapter, including slides, code, or data, which will be archived at the
book Website – so the interested reader is encouraged to revisit the site.
This book is the collective effort of a kind group of friends and colleagues.
I am grateful to each of the authors for giving generously of their time and
contributing to this book. I am also thankful to my students past and present
for their contributions to this topic, and to this book in particular. Specifically,
I acknowledge (soon to be Dr.) Hiroyuki Takeda for his assistance with myriad
LATEX issues.
It is my hope that this book will help to promote this field of endeavor for
many years to come.
Peyman Milanfar
Menlo Park, March 2010
1
Image Super-Resolution: Historical Overview
and Future Challenges
Jianchao Yang
University of Illinois at Urbana-Champaign
Thomas Huang
University of Illinois at Urbana-Champaign
CONTENTS
1.1 Introduction to Super-Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Techniques for Super-Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Image Observation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.2 Super-Resolution in the Frequency Domain . . . . . . . . . . . . . . . . . . . . . . 7
1.3.3 Interpolation-Restoration: Non-Iterative Approaches . . . . . . . . . . . . 8
1.3.4 Statistical Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.4.1 Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.4.2 Maximum a Posteriori . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.4.3 Joint MAP Restoration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.4.4 Bayesian Treatments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.5 Example-Based Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.6 Set Theoretic Restoration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4 Challenge Issues for Super-Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.1 Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.2 Computation Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4.3 Robustness Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.4.4 Performance Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1
2 Super-Resolution Imaging
FIGURE 1.1: The 1951 USAF resolution test target, a classic test target used
to determine spatial resolution of imaging sensors and imaging systems.
sensor also decreases, causing the so-called shot noise. Also, the hardware cost
of a sensor increases with the increase of sensor density or corresponding im-
age pixel density. Therefore, the hardware limitation on the size of the sensor
restricts the spatial resolution of an image that can be captured.
While the image sensors limit the spatial resolution of the image, the
image details (high-frequency bands) are also limited by the optics, due to lens
blurs (associated with the sensor point spread function (PSF)), lens aberration
effects, aperture diffractions, and optical blurring due to motion. Constructing
imaging chips and optical components to capture very high-resolution images
is prohibitively expensive and not practical in most real applications, e.g.,
widely used surveillance cameras and cell phone built-in cameras. Besides the
cost, the resolution of a surveillance camera is also limited in the camera speed
and hardware storage. In some other scenarios such as satellite imagery, it is
difficult to use high resolution sensors due to physical constraints. Another way
to address this problem is to accept the image degradations and use signal
processing to post-process the captured images, to trade off computational
cost with the hardware cost. These techniques are specifically referred to as
Super-Resolution (SR) reconstruction.
Super-Resolution (SR) are techniques that construct high-resolution (HR)
images from several observed low-resolution (LR) images, thereby increasing
the high-frequency components and removing the degradations caused by the
imaging process of the low-resolution camera. The basic idea behind SR is to
combine the non-redundant information contained in multiple low-resolution
frames to generate a high-resolution image. A closely related technique with
SR is the single-image interpolation approach, which can be also used to in-
crease the image size. However, since there is no additional information pro-
vided, the quality of the single-image interpolation is very much limited due to
the ill-posed nature of the problem, and the lost frequency components cannot
be recovered. In the SR setting, however, multiple low-resolution observations
are available for reconstruction, making the problem better constrained. The
nonredundant information contained in the these LR images is typically intro-
duced by subpixel shifts between them. These subpixel shifts may occur due to
uncontrolled motions between the imaging system and scene, e.g., movements
of objects, or due to controlled motions, e.g., the satellite imaging system
orbits the earth with predefined speed and path.
Each low-resolution frame is a decimated, aliased observation of the true
scene. SR is possible only if there exists subpixel motions between these low-
resolution frames,1 and thus the ill-posed upsampling problem can be better
conditioned. Figure 1.2 shows a simplified diagram describing the basic idea
of SR reconstruction. In the imaging process, the camera captures several
LR frames, which are downsampled from the HR scene with subpixel shifts
between each other. SR construction reverses this process by aligning the LR
1 The mainstream SR techniques rely on motions, although there are some works using
defocus as a cue.
4 Super-Resolution Imaging
FIGURE 1.2: The basic idea for super-resolution reconstruction from multiple
low-resolution frames. Subpixel motion provides the complementary informa-
tion among the low-resolution frames that makes SR reconstruction possible.
1. Surveillance video [20, 55]: frame freeze and zoom region of interest
(ROI) in video for human perception (e.g., look at the license plate
in the video), resolution enhancement for automatic target recognition
(e.g., try to recognize a criminal’s face).
2. Remote sensing [29]: several images of the same area are provided, and
an improved resolution image can be sought.
3. Medical imaging (CT, MRI, Ultrasound, etc.) [59, 70, 47, 60]: several
images limited in resolution quality can be acquired, and SR technique
can be applied to enhance the resolution.
4. Video standard conversion, e.g., from NTSC video signal to HDTV sig-
nal.
1.2 Notations
Before talking about SR techniques, we introduce the notations we use in this
chapter. Uppercase bold letters X and Y denote the vector form in lexico-
graphical order for HR and LR images respectively. Lowercase bold letters x
and y denote the vector form in lexicographical order for HR and LR image
patches respectively. Underlined uppercase bold letters are used to denote a
vector concatenation of multiple vectors, e.g., Y is a vector concatenation of
Yk (k = 1, 2, ..., K). We use plain uppercase symbols to denote matrices, and
plain lowercase symbols to denote scalars.
FIGURE 1.3: The observation model of a real imaging system relating a high
resolution image to the low-resolution observation frames with motion between
the scene and the camera.
fore reaching the imaging system. Sampling the continues signal beyond the
Nyquist rate generates the high resolution digital image (a) we desire. In
our SR setting, usually there exists some kind of motion between the cam-
era and scene to capture. The inputs to the camera are multiple frames of
the scene, connected by possibly local or global shifts, leading to image (b).
Going through the camera, these motion related high-resolution frames will
incur different kinds of blurring effects, such as optical blur and motion blur.
These blurred images (c) are then downsampled at the image sensors (e.g.,
CCD detectors) into pixels, by an integral of the image falling into each sen-
sor area. These downsampled images are further affected by the sensor noise
and color filtering noise. Finally, the frames captured by the low-resolution
imaging system are blurred, decimated, and noisy versions of the underlying
true scene.
Let X denote the HR image desired, i.e., the digital image sampled above
Nyquist sampling rate from the band-limited continuous scene, and Yk be
the k-th LR observation from the camera. X and Yk s are represented in
lexicographical order. Assume the camera captures K LR frames of X, where
the LR observations are related with the HR scene X by
Yk = Dk Hk Fk X + Vk , k = 1, 2, ..., K, (1.1)
where Fk encodes the motion information for the k-th frame, Hk models the
blurring effects, Dk is the down-sampling operator, and Vk is the noise term.
These linear equations can be rearranged into a large linear system
⎡ ⎤ ⎡ ⎤
Y1 D1 H1 F1
⎢ Y2 ⎥ ⎢ D2 H2 F2 ⎥
⎢ ⎥ ⎢ ⎥
⎢ · ⎥=⎢ · ⎥X + V (1.2)
⎢ ⎥ ⎢ ⎥
⎣ · ⎦ ⎣ · ⎦
YK DK HK FK
or equivalently
Y = MX + V (1.3)
Image Super-Resolution: Historical Overview and Future Challenges 7
The involved matrices Dk , Hk , Fk , or M are very sparse, and this linear system
is typically ill-posed. Furthermore, in real imaging systems, these matrices are
unknown and need to be estimated from the available LR observations, leaving
the problem even more ill-conditioned. Thus, proper prior regularization for
the high resolution image is always desirable and often even crucial. In the
following, we will introduce some basic super-resolution techniques proposed
in the literature and give an overview of the recent developments.
The shifted images are impulse sampled with the sampling period T1 and T2 to
yield observed low-resolution image yk [n1 , n2 ] = xk (n1 T1 + k1 , n2 T2 + k2 )
with n1 = 0, 1, 2, ..., N1 − 1 and n2 = 0, 1, 2, ..., N2 − 1. Denote the discrete
Fourier transforms (DFTs) of these low-resolution images by Yk [r1 , r2 ]. The
CFTs of the shifted images are related with their DFTs by the aliasing prop-
erty:
∞
∞
1 2π r1 2π r2
Yk [r1 , r2 ] = Xk − m1 , − m2 .
T1 T2 m1 =−∞ m2 =−∞
T1 N1 T2 N2
(1.5)
Y and X . Eqn. 1.6 defines a set of linear equations from which we intend to
solve X and then use the inverse DFT to obtain the reconstructed image.
The above formulation for SR reconstruction assumes a noise-free and
global translation model with known parameters. The downsampling process
is assumed to be impulse sampling, with no sensor blurring effects modeled.
Along this line of work, many extensions have been proposed to handle more
complicated observation models. Kim et al. [49] extended [99] by taking into
account the observation noise as well as spatial blurring. Their later work in
[5] extend the work further by introducing Tikohonov regularization [95]. In
[89], a local motion model is considered by dividing the images into overlap-
ping blocks and estimating motions for each local block individually. In [98],
the restoration and motion estimation are done simultaneously using an EM
algorithm. However, the frequency domain SR theory of these works did not
go beyond as what was initially proposed. These approaches are computation-
ally efficient, but limited in their abilities to handle more complicated image
degradation models and include various image priors as proper regularization.
Later works on super-resolution reconstruction have been almost exclusively
in the spatial domain.
= arg max P r(Y |X, M (ν, h))P r(X)P r(M (ν, h))dν.
X ν,h
1
P r(Y |X, M (ν, h)) ∝ exp − Y − M (ν, h)X2 . (1.9)
2σ 2
where λ absorbs the variance of the noise and α in Eqn. 1.10, balancing the
data consistence and the HR image prior strength. Eqn. 1.11 is the popular
Maximum a Posteriori (MAP) formulation for SR, where M is assumed to be
known. The statistical approaches discussed below vary in the ways of treat-
ing degradation matrix M (ν, h), prior term P r(X), and statistical inference
methods toward Equation 1.8.
estimation. In [41], the authors applied this idea to real applications by in-
corporating a multiple motion tracking algorithm to deal with partially oc-
cluded objects, transparent objects and some objects of interest. The back-
projection algorithm is simple and flexible in handing many observations with
different degradation procedures. However, the solution of back-projection is
not unique, depending on the initialization and the choice of back-projection
kernel. As shown in [26] and [10], the back-projection algorithm is none other
than an ML estimator. The choice of BPF implies some underlying assumption
about the noise covariance of the observed low-resolution pixels [10]. Treating
the motion estimates M (ν) as unknown, Tom et al. [98] proposed an ML SR
image estimation algorithm to estimate the subpixel shifts, noise of the image,
and the HR image simultaneously. The proposed ML estimation is treated by
the Expectation-Maximization (EM) algorithm.
As in the image denoising and single image expansion case, direct ML
estimator without regularization in SR where the number of observations is
limited can be severely ill-posed, especially when the zoom factor is large
(e.g. greater than 2). The ML estimator is usually very sensitive to noise,
registration estimation errors, and PSF estimation errors [10], and therefore
proper regularization on the feasible solution space is always desirable. This
leads to the mainstream SR reconstruction approaches based on MAP.
1. Gaussian MRF. The Gaussian Markov Random Field (GMRF) [37, 33]
takes the form
A(X) = X T QX, (1.15)
where Q is a symmetric positive matrix, capturing spatial relations be-
tween adjacent pixels in the image by its off-diagonal elements. Q is
often defined as ΓT Γ, where Γ acts as some first or second derivative
operator on the image X. In such a case, the log likelihood of the prior
is
log p(X) ∝ ΓX2 , (1.16)
which is well-known as the Tikhonov regularization [95, 26, 63], the most
commonly used method for regularization of ill-posed problems. Γ is usu-
ally referred as Tikhonov matrix. Hardie et al. [35] proposed a joint MAP
framework for simultaneous estimation of the high-resolution image and
motion parameters with Gaussian MRF prior for the HR image. Bishop
and Tipping [96] proposed a simple Gaussian process prior where the
Image Super-Resolution: Historical Overview and Future Challenges 13
a2 |a| ≤ α
ρ(a) = (1.17)
2α|a| − α2 otherwise,
where a is the first derivative of the image. Such a prior encourages piece-
wise smoothness, and can preserve edges well. Schultz and Stevenson
[83] applied this Huber MRF to the single image expansion problem,
and later to the SR reconstruction problem in [84]. Many later works on
super-resolution employed the Huber MRF as the regularization prior,
such as [11, 12, 15, 13, 73, 74] and [3].
3. Total Variation. The Total Variation (TV) norm as a gradient penalty
function is very popular in the image denoising and deblurring litera-
ture [81, 54, 16]. The TV criterion penalizes the total amount of change
in the image as measured by the 1 norm of the magnitude of the gra-
dient
A(X) = ∇X1 (1.18)
where ∇ is a gradient operator that can be approximated by Laplacian
operators [81]. The 1 norm in the TV criterion favors sparse gradients,
preserving steep local gradients while encouraging local smoothness[13].
Farsiu et al. [30] generalized the notation of TV and proposed the so
called bilateral TV (BTV) for robust regularization.
For more comparisons of these generic image priors on effecting the solution
of super-resolution, one can further refer to [10] and [25].
between them are allowed. In joint MAP restoration, Eqn. 1.11 is extended to
include the motion and PSF estimates as unknowns for inference:
{X, ν, h} = arg max P r(Y |X, M (ν, h))P r(X)P r(M (ν, h))
X,ν,h
= arg min − log [P r(Y |X, M (ν, h))] − log [P r(X)] (1.19)
X,ν,h
Tom et al. [98] divided the SR problem into three subproblems, namely
registration, restoration and interpolation. Instead of solving them indepen-
dently they simultaneously estimated registration and restoration by maxi-
mizing likelihood using Expectation-Maximization (EM). Later they included
interpolation into the framework and estimated all of the unknowns using EM
in [97]. [35] applied the MAP framework for simultaneous estimation of the
high-resolution image and translation motion parameters (PSF is taken as a
known prior). The high-resolution image and motion parameters are estimated
using a cyclic coordinate-descent optimization procedure. The algorithm con-
verges slowly but improves the estimation a lot. Segall et al. [86, 85] presented
an approach of joint estimation of dense motion vectors and HR images applied
to compressed video. Woods et al.[105] treated the noise variance, regulariza-
tion and registration parameters all as unknowns and estimated them jointly
in a Bayesian framework based on the available observations. Chung et al. [19]
proposed a joint optimization framework and showed superior performance to
the coordinate descent approach [46]. The motion model they handled is affine
transformations. To handle more complex multiple moving objects problems
in the SR setting, Shen et al. [87] addressed the problem by MAP formulation
combining motion estimation, segmentation and SR reconstruction together.
The optimization is done in a cyclic coordinate descent process similar to [46].
Patch size should also be chosen properly. If the patch size is very small, the
co-occurrence prior is too weak to make the prediction meaningful. On the
other hand, if the patch size is too large, one may need a huge training set to
find proximity patches for the current observations.
A naive way to do super-resolution with such a coupled training sets is, for
each low-resolution patch in the low-resolution image, find its nearest neigh-
bor ỹ in {yi }ni=1 , and then put the corresponding x̃ from {xi }ni=1 to the high-
resolution image grid. Unfortunately, this simple approach will produce dis-
turbing artifacts due to noise and the ill-posed nature of super-resolution [25].
Relaxing the nearest neighbor search to k-nearest neighbors can ensure that
the proximity patch we desire will be included. Freeman et al. [31] proposed
a belief propagation [108] algorithm based on the above MRF model to se-
lect the best high-resolution patch found by k-nearest neighbors that has best
compatibility with adjacent patches. Sun et al.[90] extended this idea using
the sketch prior to enhance only the edges in the image, aiming to speed up
the algorithm. The IBP [39] algorithm is then applied as a post processing
step to ensure data consistence on the whole image. Wang et al. [103] further
followed this line of work and proposed a statistical model that can handle
unknown PSF.
The above methods are based on image patches directly, requiring large
training sets to include any patterns possibly encountered in testing. Chang
et al. [17] proposed another simple but effective method based on neighbor
embedding [93], with the assumption of correspondence between the two man-
ifolds formed by the low- and high-resolution image patches. For each low-
resolution image patch ykt from the test image (superscript “t” distinguishes
the test patch from the training patches), the algorithm finds its k-nearest
neighbors Nt from {yi }ni=1 , and computes the reconstruction weights by neigh-
Image Super-Resolution: Historical Overview and Future Challenges 17
bor embedding
ŵs = arg min ykt − ws ys 2 ,
ws
ys ∈Nt
(1.20)
s.t. ws = 1.
ys ∈Nt
approach is by Protter et al. [78], generalized from the nonlocal means denois-
ing algorithm [8]. Instead of sampling examples from other training images,
the algorithm explores self-similarities within the image (or sequence) and ex-
tract the example patches from the target image (or sequence) itself. A recent
work by Glasner et al. further explored self-similarities in images for SR by
combining the classical algorithm based on subpixel displacements and the
example-based method based on patch pairs extracted from the target image.
The use of examples can be much more effective when dealing with nar-
row families of images, such as text and face images. A group of algorithms
have emerged targeting face super-resolution in recent years due to its im-
portance in surveillance scenarios. Face super-resolution is usually referred
to as face hallucination, following the early work by Baker and Kanade [2].
Capel and Zisserman [14] proposed an algorithm where PCA [45] subspace
models are used to learn parts of the faces. Liu et al. [58], [57] proposed a
two-step approach toward super-resolution of faces, where the first step uses
the eigenface [100] to generate a medium resolution face, followed by the non-
parametric patch-based approach [31] in the second step. Such an Eigenface-
based approach has been explored in several later works [32],[104] too. Yang
et al. [106] proposed a similar two-step approach. Instead of using the holistic
PCA subspace, [106] uses local Nonnegative Matrix Factorization (NMF)[51]
to model faces and the patch-based model in the second step is adopted from
[107]. Jia and Gong [43], [44] proposed the tensor approach to deal with more
face variations, such as illuminations and expressions. Although these face hal-
lucination algorithms work surprisingly well, they only apply to frontal faces,
and only few works have been devoted on evaluating face hallucination for
recognition [32], [36].
Example-based regularization is effective in our SR problem when insuffi-
cient observations are available. There are still a number of questions we need
to answer regarding this kind of approaches. First, how to choose the opti-
mal patch size given the target image. Perhaps a multiresolution treatment is
needed. Second, how to choose the database. Different images have different
statistics, and thereby need different databases. An efficient method for dic-
tionary adaptation to the current target image may suggest a way out. Third,
how to use the example-based prior more efficiently. The computation issue
could be a difficulty for practical applications. Readers are suggested to refer
to [25] for more detailed analysis on example-based regularization for inverse
problems.
system blurs. They cannot estimate those registration parameters and the
high-resolution image as in the stochastic approaches simultaneously. The hy-
brid approach combining a stochastic view and the POCS philosophy suggests
a promising way to pursue.
the joint estimation for registration parameters and HR image may result in
overfitting. To overcome this overfitting problem, Tipping and Bishop [96]
employed a Bayesian approach for estimating both registration and blur pa-
rameters by marginalizing the unknown high-resolution image. The algorithms
shows noteworthy estimation accuracy both for registration and blur param-
eters, however the computation cost is very high. Pickup et al. [73, 74, 72]
instead cast the Bayesian approach in another way by marginalizing the un-
known registration parameters, to address the uncertainty inherent with the
image registration [79].
The stochastic approaches associating the HR image estimation toward im-
age registration do demonstrate promising results, however such parametric
methods are limited in the motion models they can effectively handle. Usually,
some simple global motion models are assumed. Real videos are complicated
comprising arbitrary local motions, where parametrization of the motion mod-
els may be intractable. Optical flow motion estimation can be applied to such
scenarios. However, the insufficient measurements for local motion estimations
make these algorithms vulnerable to errors, which may cause disasters for SR
reconstruction [112]. Another promising approach toward SR reconstruction
is the nonparametric methods, which try to bypass the explicit motion esti-
mation. Protter et al. [78] extended the non-local means denoising algorithm
to SR reconstruction, where fuzzy motion estimation based on block matching
is used. Later they proposed a probabilistic motion model in [77], which is a
nonparametric model analogy to [72]. Both [78] and [77] can handle complex
motion patterns in real videos. Compared to the classical SR methods based
on optical flow motion estimation, Protter’s methods reduce the errors caused
by misalignment by a weighting strategy over multiple possible candidates.
Takeda et al. [92] on the other hand applied an 3-D steering kernel proposed
in their early work [91] to video, which also avoids explicit motion estimation,
for denoising and SR reconstruction. The 3-D steering kernel captures both
spatial and temporal structure, encoding implicit motion information, and
thus can be applied to both spatial and temporal SR for video with complex
motion activities. While methods without explicit motion estimation indeed
produce promising results toward the practical applicability of SR techniques,
further improvements may include computation efficiency, combining adaptive
interpolation or regression together with deblurring, and generalizing obser-
vation models to 3-D motions in video, e.g., out-of-plane rotation.
evaluations are needed to see how much the robustness efforts can benefit real
SR performance.
Bibliography
[1] M. S. Alam, J. G. Bognar, R. C. Hardie, and B. J. Yasuda. Infrared im-
age registration and high-resolution reconstruction using multiple trans-
lationally shifted aliased video frames. IEEE Transactions on Instru-
mentation and Measurement, 49(5):915–923, 2000.
[2] S. Baker and T. Kanade. Limits on super-resolution and how to break
them. IEEE Transactions on Pattern Analysis and Machine Intelli-
gence, 24(9):1167–1183, 2002.
[3] S. Borman and R. L. Stevenson. Simultaneous multi-frame MAP super-
resolution video enhancement using spatio-temporal priors. In Proceed-
ings of IEEE International Conference on Image Processing, volume 3,
pages 469–473, 1999.
[4] Sean Borman and Robert L. Stevenson. Super-resolution from image
sequences - A review. In Proceedings of the 1998 Midwest Symposium
on Circuits and Systems, pages 374–378, 1998.
[5] N. K. Bose, H. C. Kim, and H. M. Valenzuela. Recursive implementation
of total least squares algorithm for image reconstruction from noisy,
undersampled multiframes. In Proceedings of the IEEE Conference on
Acoustics, Speech and Signal Processing, volume 5, pages 269–272, 1993.
[6] O. Bowen and C. S. Bouganis. Real-time image super resolution using
an FPGA. In International Conference on Field Programmable Logic
and Applications, pages 89–94, 2008.
[7] L. Brown. A survey of image registration techniques. ACM Computing
Surveys, 24(4):325–376, 1992.
[8] A. Buades, B. Coll, and J. M. More. A non-local algorithm for image
denoising. In Proceedings of IEEE Computer Society Conference on
Computer Vision and Pattern Recognition, pages 60–65, 2005.
[9] E. Candes. Compressive sensing. In Proceedings of International
Congress of Mathematicians, volume 3, pages 1433–1452, 2006.
[10] D. Capel. Image Mosaicing and Super-resolution. Springer, 2004.
[11] D. Capel and A. Zisserman. Automated mosaicing with super-resolution
zoom. In Proceedings of IEEE Computer Society Conference on Com-
puter Vision and Pattern Recognition, pages 885–891, 1998.
[12] D. Capel and A. Zisserman. Super-resolution enhancement of text image
sequences. In Proceedings of the International Conference on Pattern
Recognition, volume 1, pages 1600–1605, 2000.
Image Super-Resolution: Historical Overview and Future Challenges 25
[29] X. Jia F. Li and D. Fraser. Universal HMT based super resolution for
remote sensing images. In IEEE International Conference on Image
Processing, pages 333–336, 2008.
[55] Frank Lin, Clinton B. Fookes, Vinod Chandran, and Sridha Sridharan.
Investigation into optical flow super-resolution for surveillance applica-
tions. In The Austrilian Pattern Recognition Society Worshop on Digital
Image Computing, 2005.
[65] Sung C. Park, Min K. Park, and Moon G. Kang. Super-resolution image
reconstruction: a technical overview. IEEE Signal Processing Magazine,
20(3):21–36, 2003.
[67] A. J. Patti, M. Sezan, and A. M. Tekalp. Robust methods for high qual-
ity stills from interlaced video in the presence of dominant motion. IEEE
Transactions on Circuits and Systems for Video Technology, 7(2):328–
342, 1997.
[81] L. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise
removal algorithms. Physica D: Nonlinear Phenomena, 60(1-4):259–268,
1992.
[82] M. Elad S. Farsiu, D. Robinson and P. Milanfar. Advances and chal-
lenges in super-resolution. International Journal of Imaing Systems and
Technology, 14(2):47–57, 2004.
[87] H. Shen, L. Zhang, B. Huang, and P. Li. A MAP approach for joint mo-
tion estimation, segmentation and super-resolution. IEEE Transactions
on Image Processing, 16(2):479–490, 2007.
[94] R. Tibshirani. Regression shrinkge and selection via the Lasso. Jour-
nal of Royal Statistical Society: Series B (Statistical Methodology),
59(1):267–288, 1996.
Registration for Super-Resolution: Theory, Algorithms, and Applications in Image and Mobile Video
Enhancement
[1] C. B. Barber , D. P. Dobkin , and H. Huhdanpaa . The Quickhull algorithm for convex hulls. ACM Transactions on Mathematical
Software (TOMS), 22(4): 469483, December 1996.
[2] S. Borman and R. Stevenson . Spatial resolution enhancement of low-resolution image sequences - a comprehensive review with
directions for future research. Technical report, University of Notre Dame, 1998.
[3]The Computer Journal, Special Issue on Super-Resolution, 2009.
[4]EURASIP Journal on Applied Signal Processing, Special Issue on Super-Resolution, 2006.
[5] S. Farsiu , M. Elad , and P. Milanfar . Multiframe demosaicing and super-resolution of color images. IEEE Transactions on Image
Processing, 15(1): 141159, January 2006.
[6]D. A. Forsyth and J. Ponce . Computer Vision: A Modern Approach. Prentice Hall, August 2002.
[7] G. Golub and V. Pereyra . Separable nonlinear least squares: the variable projection method and its applications. Inverse Problems,
19(2): R1R26, 2003.
[8] E. Hecht . Optics. Pearson - Addison Wesley, 2002.
[9] P. J. Huber . Robust Statistics. John Wiley and Sons, 1981.
[10]IEEE Signal Processing Magazine, Special Issue on Super-Resolution, May 2003.
[11]International Organization for Standardization. ISO 12233:2000 - Photography - Electronic still picture cameras - Resolution
measurements, 2000.
[12]Mathworks (The). MATLAB function reference: griddata, 2009.
[13] A. Papoulis . Generalized sampling expansion. IEEE Transactions on Circuits and Systems, 24(11): 652654, November 1977.
185 [14] S. C. Park , M. K. Park , and M. G. Kang . Super-resolution image reconstruction: a technical overview. IEEE Signal Processing
Magazine, 20(3): 2136, May 2003.
[15] D. Robinson , S. Farsiu , and P. Milanfar . Optimal registration of aliased images using variable projection with applications to super-
resolution. The Computer Journal, 52(1): 3142, 2009.
[16] P. J. Rousseeuw and A. M. Leroy . Robust Regression and Outlier Detection. Wiley, 1986, 2003.
[17] H. Sawhney and S. Ayer . Compact representations of videos through dominant and multiple motion estimation. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 18:814830, 1996.
[18] R. Y. Tsai and T. S. Huang . Multiframe image restoration and registration. In T. S. Huang , editor, Advances in computer vision and
image processing, volume 1, pages 317339. JAI Press, 1984.
[19] P. Vandewalle . Super-Resolution from Unregistered Aliased Images. PhD thesis, Ecole Polytechnique Fdrale de Lausanne,
Switzerland, July 2006. No. 3591, [Reproducible] http://rr.epfl.ch/6.
[20] P. Vandewalle , L. Sbaiz , J. Vandewalle , and M. Vetterli . Super-Resolution from Unregistered and Totally Aliased Signals using
Subspace Methods. IEEE Transactions on Signal Processing, 55(7, Part 2):36873703, 2007. [Reproducible] http://rr.epfl.ch/4.
[21] P. Vandewalle , S. Ssstrunk , and M. Vetterli . A Frequency Domain Approach to Registration of Aliased Images with Application to
Super-Resolution. EURASIP Journal on Applied Signal Processing, Special Issue on Super-Resolution Imaging, 2006, 2006. Article ID
71459, 14 pages, [Reproducible] http://rr.epfl.ch/3.186
[1]Generic coding of moving pictures and associated audio: Video. ISO/IEC, pages 138182, 1996.
[2]Coding of audio-visual objects: Visual. ISO/IEC, pages 144962, 1998.
245 [3] S. Baker and I. Matthews . Lucas-Kanade 20 years on: a unifying frame-work. International Journal of Computer Vision, 56(3):
221255, 2004.
[4] M.J. Black and P. Anandan . The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields. Computer Vision
and Image Understanding, 63(1): 75104, 1996.
[5] D. Capel . Image mosaicing and super-resolution. Springer-Verlag New York Inc, 2004.
[6] S. Chang , M. Shimizu , and M. Okutomi . Multi-frame super-resolution with multiple motion regions. Korean Japan Joint Workshop on
Pattern Recognition (KJPR), 107(281): 5762, 2007.
[7] O. Choi , H. Kim , and I.S. Kweon . Simultaneous plane extraction and 2D homography estimation using local feature transformations.
Asian Conference on Computer Vision (ACCV), 4844:269278, 207.
[8] O. Chum and J. Matas . Matching with PROSAC-progressive sample consensus. IEEE Computer Society Conference on Computer
Vision and Pattern Recognition (CVPR), 1:220226, 2005.
[9] F. Durand and J. Dorsey . Fast bilateral filtering for the display of high-dynamic-range images. Proc. the Annual Conference on
Computer Graphics and Interactive Techniques, pages 257266, 2002.
[10] N.A. El-Yamany and P.E. Papamichalis . Robust color image super-resolution: an adaptive M-estimation framework. Journal on Image
and Video Processing, 8(2), 2008.
[11] S. Farsiu , M. Elad , P. Milanfar , et al. Multiframe demosaicing and super-resolution of color images. IEEE Transactions on Image
Processing, 15(1): 141159, 2006.
[12] S. Farsiu , D. Robinson , M. Elad , and P. Milanfar . Advances and challenges in super-resolution. International Journal of Imaging
Systems and Technology, 14(2): 4757, 2004.
[13] S. Farsiu , M.D. Robinson , M. Elad , and P. Milanfar . Fast and robust multiframe super resolution. IEEE Transactions on Image
Processing, 13(10): 13271344, 2004.
[14] M.A. Fischler and R.C. Bolles . Random sample consensus: a paradigm for model fitting with applications to image analysis and
automated cartography. Communications of the ACM, 24(6): 381395, 1981.
[15] T. Gotoh and M. Okutomi . Direct super-resolution and registration using raw CFA images. In IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (CVPR), volume 2. IEEE Computer Society; 1999, 2004.
246 [16] H. He and L.P. Kondi . An image super-resolution algorithm for different error levels per frame. IEEE Transactions on image
processing, 15(3): 592603, 2006.
[17] Z.A. Ivanovski , L. Panovski , and L.J. Karam . Robust super-resolution based on pixel-level selectivity. In Proceedings of SPIE,
volume 6077, 2006.
[18] E.S. Lee and M.G. Kang . Regularized adaptive high-resolution image reconstruction considering inaccurate subpixel registration.
IEEE Transactions on Image Processing, 12(7): 826837, 2003.
[19] D.G. Lowe . Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2): 91110, 2004.
[20] S.C. Park , M.K. Park , and M.G. Kang . Super-resolution image reconstruction: a technical overview. IEEE Signal Processing
Magazine, 20(3): 2136, 2003.
[21] V. Patanavijit and S. Jitapunkul . A Lorentzian stochastic estimation for a robust iterative multiframe superresolution reconstruction
with Lorentzian-Tikhonov regularization. EURASIP Journal on Advances in Signal Processing, 2007(2): 2121, 2007.
[22] R.R. Schultz and R.L. Stevenson . Extraction of high-resolution frames from video sequences. IEEE Transactions on Image
Processing, 5(6): 9961011, 1996.
[23] M. Shimizu and M. Okutomi . Subpixel estimation error cancellation on area-based matching. International Journal of Computer
Vision, 63(3): 207224, 2005.
[24] C. Sun . Fast algorithms for stereo matching and motion estimation. Proc. of Australia-Japan Advanced Workshop on Computer
Vision, pages 3848, 2003.
[25] M. Tanaka and M. Okutomi . A fast MAP-based super-resolution algorithm for general motion. Electronic Imaging Computational
Imaging IV, 6065:112, 2006.
[26] W.Y. Zhao and H.S. Sawhney . Is super-resolution with optical flow feasible? Lecture Notes in Computer Science, pages 599613,
2002.
[27] B. Zitov and J. Flusser . Image registration methods: a survey. Image and Vision Computing, 21(11): 9771000, 2003.
[28] A. Zomet , A. Rav-Acha , and S. Peleg . Robust super-resolution. In IEEE Computer Society Conference on Computer Vision and
Pattern Recognition (CVPR), volume 1, 2001.
Multiframe Super-Resolution from a Bayesian Perspective
[1] S. Baker and T. Kanade . Limits on super-resolution and how to break them. 24(9): 11671183, 2002.
[2] C. Bishop . Neural Networks for Pattern Recognition. Oxford University Press, 1995.
[3] M. J. Black and P. Anandan . A framework for the robust estimation of optical flow. pages 231236, 1993.
[4] D. P. Capel . Image Mosaicing and Super-resolution (Distinguished Dissertations). Springer, ISBN: 1852337710, 2004.
[5] S. Farsiu , M. Elad , and P. Milanfar . A practical approach to super-resolution. In Proc. of the SPIE: Visual Communications and Image
Processing, San-Jose, 2006.
[6] S. Farsiu , M. D. Robinson , M. Elad , and P. Milanfar . Fast and robust multiframe super resolution. 13(10): 13271344, October 2004.
[7] R. C. Hardie , K. J. Barnard , and E. E. Armstrong . Joint MAP registration and high-resolution image estimation using a sequence of
undersampled images. 6(12): 16211633, 1997.
[8] R. I. Hartley and A. Zisserman . Multiple View Geometry in Computer Vision. Cambridge University Press, ISBN: 0521540518, 2nd
Edn., 2004.
[9] P. J. Huber . Robust Statistics. John Wiley and Sons, 1981.
[10] M. Irani and S. Peleg . Improving resolution by image registration. Graphical Models and Image Processing, 53:231239, 1991.
[11] D. Kundur and D. Hatzinakos . Blind image deconvolution. IEEE Signal Processing Magazine, 13(3): 4346, May 1996.
[12] I. Nabney . Netlab Algorithms for Pattern Recognition. Springer, 2002.
[13] N. Nguyen , P. Milanfar , and G. Golub . Efficient generalized cross-validation with applications to parametric image restoration and
resolution enhancement. 10(9): 12991308, September 2001.
[14] L. C. Pickup . Machine Learning in Multi-frame Image Super-resolution. PhD thesis, University of Oxford, February 2008.
[15] M. E. Tipping and C. M. Bishop . Bayesian image super-resolution. In S. Thrun , S. Becker , and K. Obermayer , editors, Advances in
Neural Information Processing Systems, volume 15, pages 12791286, Cambridge, MA, 2003. MIT Press.
284 [16] W. Triggs , P. McLauchlan , R. Hartley , and A. Fitzgibbon . Bundle adjustment: A modern synthesis. In W. Triggs , A. Zisserman ,
and R. Szeliski , editors, Vision Algorithms: Theory and Practice, LNCS, pages 298375. Springer Verlag, 2000.