0% found this document useful (0 votes)
19 views6 pages

Super-Resolution Enhancement of Text Image Sequences.

Super-resolution enhancement of text image sequences.

Uploaded by

antonio Scacchi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views6 pages

Super-Resolution Enhancement of Text Image Sequences.

Super-resolution enhancement of text image sequences.

Uploaded by

antonio Scacchi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Super-resolution Enhancement of Text Image Sequences

David Capel and Andrew Zisserman


Robotics Research Group
Department of Engineering Science
University of Oxford
Oxford OX1 3PJ, UK.

Abstract computing a maximum a posterior (MAP) estimate. The


The objective of this work is the super-resolution en- traditional approach is to model the texture as a first-order,
hancement of image sequences. We consider in particular stationary Markov Random Field (MRF). We propose a
images of scenes for which the point-to-point image trans- MAP estimator which uses the Huber edge-penalty func-
formation is a plane projective transformation. tion, and compare this with regularization based on the total
We first describe the imaging model, and a maximum variation norm.
likelihood (ML) estimator of the super-resolution image. In this work our target images are of text, and the image
We demonstrate the extreme noise sensitivity of the uncon- to image transformation is a planar projective transforma-
strained ML estimator. We show that the Irani and Peleg tion. This is the most general transformation required to
[9, 10] super-resolution algorithm does not suffer from this relate perspective images of planar scenes.
sensitivity, and explain that this stability is due to the error
back-projection method which effectively constrains the so- 1.1. Previous Work
lution. We then propose two estimators suitable for the en-
hancement of text images: a maximum a posterior (MAP) Early super-resolution work by Irani and Peleg consid-
estimator based on a Huber prior, and an estimator reg- ered images undergoing similarity [9] and affine [10] trans-
ularized using the Total Variation norm. We demonstrate formations. Mann and Picard [12] extended this work
the improved noise robustness of these approaches over the to include projective transformations. Other authors have
Irani and Peleg estimator. We also show the effects of a considered non-parametric motion models [16] and re-
poorly estimated point spread function (PSF) on the super- gion/contour tracking [1].
resolution result and explain conditions necessary for this Various imaging degradation models have been used.
parameter to be included in the optimization. Irani and Peleg modelled image degradations including
Results are evaluated on both real and synthetic se- both optical blur and spatial quantization. The techniques
quences of text images. In the case of the real images, were extended by Bascle et al. [1] to include motion blur.
the projective transformations relating the images are esti- Cheeseman et al. [6] obtain their imaging model from the
mated automatically from the image data, so that the entire bench calibration of their Vidicon camera.
algorithm is automatic. Approaches also vary in their use of statistical priors or
regularizing terms. Cheeseman et al. [6] develop a MAP
1. Introduction estimator based on a Gaussian smoothness prior for the
Super-resolution enhancement involves generating a purpose of enhancing Viking Orbiter images. Schultz and
“still” image from a sequence at a higher-resolution than Stevenson [16] furthered the Bayesian approach by com-
is present in any of the individual frames. The observed paring restoration methods on both single and multiple im-
images are regarded as degraded observations of a real, ages using an MRF prior with a Huber penalty function on
high-resolution texture. These degradations typically in- edge response. Capel and Zisserman [3] also investigated
clude geometric warping, optical blur, spatial sampling and both ML and MAP estimators for the super-resolution en-
noise. Given several such observations a maximum likeli- hancement of video mosaics. Zomet and Peleg [18] applied
hood (ML) estimate of the high-resolution texture is ob- the Irani and Peleg error-backprojection algorithm to mo-
tained such that when reprojected back into the images saics obtained using their pipe-projection method. Rudin et
it minimizes the difference between the actual and “pre- al. [15] propose a method in which registered frames are re-
dicted” observations. sampled and then deblurring is applied as a final step. They
If a model of the high-resolution texture is available in employ the total variation norm as a regularizer in their de-
the form of a Bayesian prior then this may be utilized in blurring algorithm.
2. The imaging model
The imaging model specifies how the high-resolution
texture is transformed to synthesize a low-resolution image.
This typically involves a geometric transformation, an il-
lumination model, blurring (optical and/or motion), spatial

sampling (due to the CCD array), and a noise term. The
synthesized image is given by

   (1)
 
where is the super-resolution image,

is a function
Ž
Figure 1. The ground-truth images used to create

formation into the image, is the PSF, and


specifying the illumination model, is the geometric trans-
is the down-  the synthetic sequences (100 100 pixels).

sampling operator by a factor S.


The model used here allows for a projective transfor-
mation, an affine illumination model (spatially invariant
shift/scaling of intensities), and blurring by a linear, spa-


tially invariant, symmetric PSF. Hence the model becomes

W L‘“’}” , • –%‘ 
for the -th image

!#"%$'&)(+*-,/.1013 2 04362 5 $'7)(89,$':/"9;<$=>"?$'&/@A7)(*


@A89,,<@CB%"-,+DFEGD9H
where JIKL is defined on the sampling lattice of the im-
(2)

age, NMOIPJ are the affine illumination parameters, JQIR% is


the PSF, and SUT JIKL is the homography transforming im-
W L‘ —9 , • –%‘ 
age. The model parameters SUT IM T IVP T vary from image to
age coordinates to coordinates in the super-resolution im-

image.
Figure 2. Examples of the synthetic projective

• Ž
images created with Gaussian smoothing
down-sampling ratio (50 50 pixels).
and W
Knowledge of the PSF for any given image sequence is
usually unavailable, so here it is modelled as a Gaussian.
Comparisons with the measured PSF of several CCD based 3.1. Tests on synthetic images
imaging systems show this approximation to be quite rea-
In order to test various estimators under controlled con-
sonable and this is further verified by good super-resolution
ditions we use synthetic images. The ground-truth image
results obtained on real images. A procedure for estimating
(figure 1) is projectively warped (using bicubic interpo-
the PSF is described in [14].
lation), smoothed with an isotropic Gaussian, and down-
3. The ML estimator sampled (figure 2). Various levels of additive Gaussian

 T W/X
Assuming the image noise to be Gaussian with mean noise are then applied.
zero, variance , the total probability of the observed im-
 The starting point for all minimizations is the average
age
Y/Z given an estimate of the super-resolution image of the registered (warped) input frames. Such an initial esti-

$'! "?[ ;,\.^] fg haikjl9mn ! " $'&)(+*F,)haopf)r ! " $'&)(q*-, s


is mate has the desirable properties of being smooth and being
close to the optimal solution.
_a`cb d e (3) The ML formulation given above is simply a very large,
sparse system of linear equations. Analysis of the eigen-
and hence the associated negative log-likelihood function is values of this system shows that in general it is extremely

t N  T uwv1x   T \IVKLv€ T N\IKLV X (4) poorly conditioned. Figure 3 shows the result of the ML es-

y{z}| ~ timator applied to 10 low-resolution, synthetic images, with


3 different levels of additive noise. Note the high frequency

The maximum likelihood estimate ƒ‚ is obtained by max-


error which appears as noise is increased in the input im-
ages. This is to be expected given the poor conditioning
imizing this function over all observed images. of the system. The reprojection error is extremely insensi-
„†…ˆ‡}‰Š9kŒ ‡}‹ x t NU- tive to these high-frequency components which are almost

T (5) completely attenuated in the simulation process by the (low-


pass) PSF (i.e. they are near-null vectors of the linear sys-
noise W ^˜9‘  noise W ™–L‘  noise W ™š?‘  noise W ¯˜9‘  noise W –%‘  noise W šL‘ 
Figure 3. Results of applying the ML estimator to Figure 4. Results of applying the Irani-Peleg esti-
10 synthetic input images with 3 levels of additive, mator to the same images used in figure 3.
Gaussian noise.

tem). Clearly this estimator is extremely sensitive to even


small amounts of noise in the input images. It does not per-
form well unless many more images are available (100 or
more).
Implementation details All large scale optimizations in
this paper (except in the case of the Irani and Peleg algo-
rithm) are carried out using Liu and Nocedal’s Netlib imple-
mentation of the limited memory BFGS method [11, 13].
°+±² ¯ ±³c²  X +° ±²  ±)³´²
4. The Irani and Peleg algorithm
Irani and Peleg proposed an algorithm [9, 10] which min-
imizes the same cost function as the ML estimator above mator to 5 images with noise W ¯”%‘ 
Figure 5. Results of applying the Irani-Peleg esti-
grey-levels,
using 2 different forms of back-projection kernel.
(although the illumination parameters are omitted), but the
iterative update of the super-resolution estimate proceeds
by an error back-projection scheme inspired by computer-
aided tomography. When all the low-resolution images effect of the BPF on the estimate is illustrated in figure 5

W ”%‘ 
have been simulated, the residual images (simulated mi- in which estimates are obtained from 5 images with addi-
nus observed) are convolved with a back-projection func- tive Gaussian noise grey-levels using two different
tion (BPF) and warped back into the super-resolution frame. BPF. The narrower BPF (left) produces a noisy result. The
The back-projected errors from all the observed images are wider BPF (right) reduces noise but increases smoothing.
averaged and used to directly update the estimate as follows

›œJžŸ™›G  ¡ ˜ x kT ¢£ž


¤¥ ¦C §
  T v¨ T ++I (6)
5. A MAP estimator

yT
If a prior probability distribution on the super-resolution
image is available then this information may be used to
¡
where is a constant and
¤¥ ¦ is the back-projection ker-
“regularize” the estimation. The maximum a posterior

and Peleg suggest that )¤¥ ¦™© %¥ ª«¦G¬ where


(MAP) estimator has the form:
­® ˜ Irani
nel.
 „†µ¶p™‡9‰VŠ9UŒ ‡}‹ qx t NUF\ 4· X t V (7)
is a good choice of BPF, ensuring convergence whilst
T
Throughout this paper we have used ¤¥ ¦ ©q ¥ ª«¦  X . t
where  provides a measure of the likelihood of a partic-
suppressing spurious noise components in the solution.

ular estimate  .
Figure 4 shows results of this algorithm applied to the same

ent ¸  . The MAP estimator is then


images as the ML estimator in figure 3. The increased
The prior used here applies a penalty to the image gradi-
noise-robustness is clear.
The reason for this robustness lies in the choice of BPF.
 „†µ¶ ™‡9‰VŠFkŒ ‡{‹ qx t N  v¹· X x ²  ¸  N\IKLV (8)
Since each update to the super-resolution estimate is sim-
ply a linear combination of BPF kernels then, if the BPF is T y´z9| ~
smooth, the resulting estimate tends to be smooth also. The
algorithm is unable to introduce the high-frequency noise
²
where the penalty function ) is defined by the Huber

² N)º  X I»¼J¾½¿M
components that tend to dominate the unconstrained ML es- function,
timates. It is similar to a constrained minimization in which
the smallest (and most troublesome) eigenvectors of the lin-
ear system mentioned above are constrained to zero. The  –}M¹À À{vÁM X IVÂ}ÃVÄ?Åa‰Ɵ»ÇVÅ
This penalty function encourages local smoothness, whilst Irani and Peleg
being more lenient toward step edges, thus encouraging a
piecewise constant solution.
6. The Total Variation estimator
The total variation norm is commonly used as a regular-

W ™?‘  W  Ñ· –L‘ ™ ?‘˜FIVM€” W ”%‘ 


izer in the literature on denoising/deblurring of single im-
ages (see [17, 4, 15]). It applies the same penalty to a step
edge as it does to a smooth transition of the same height.
ÈCÉ NÊu G0 Ë À ¸ ´À Ì
Í
Map estimator ( )

(9)

and is employed here in a Tikhonov style regularization

VÎLÏЙ‡9‰VŠFUŒ ‡}‹ qx t N ƒGvÁ· X x À ¸  N\IKL À  (10)


scheme :

T y{z9| ~ W ™?‘  W –L‘  ·pˆL‘ǘ W


”%‘ 
Total variation estimator ( )

Ì ÈCÉ ¯v 0 ¸ ‘ ¸ 
The gradient of the TV term is

ÌL À ¸ cÀ (11)

and hence there is a singularity at ¸ Ñҏ . This is prob-


lematic for gradient descent minimization, so the term À ¸ ´À
³ ³
is often replaced by Ó zX   ~X  ÐP , where beta is a tiny W ™?‘  W –L‘  W ”%‘ 
perturbation. An alternative scheme with better global con-
vergence properties is proposed by Chan et al. [5]. Figure
Ž
Figure 6. Estimates obtained at 2 zoom using 10
images, with increasing levels of additive Gaus-
6 compares results from the Irani-Peleg algorithm to those sian noise applied to the input images.
obtained using the MAP and TV estimators given increas-
ingly noisy input images. The Irani-Peleg estimate becomes

W
rather blotchy at high noise levels. The MAP and TV esti- in figure 7. The graph in figure 8 shows the corresponding

W ԏL‘“’}”
mators maintain a more piece-wise constant solution. variation of reprojection error as the PSF varies. Note
7. The point-spread function that the minimum lies at the correct value of .
The point-spread function used in the imaging model can This was the value used to make the original simulated im-

W
have a pronounced effect on the super-resolution estimate. ages. Hence, in the case of affine or projective sequences
This is demonstrated in figure 7. The reason for this be- the value of the PSF may be optimized by gradient de-
haviour is easiest to imagine in the case where the input im- scent.
ages are related by only Euclidean transformations (transla- 7.1. A note on PSF implementation
tion/rotation) and the PSF is isotropic. In this case the op- When implementing the image synthesis process the
erations of warping the super-resolution estimate and con- super-resolution estimate must first be geometrically
volving with the PSF commute. This means that there is a warped, then blurred with the PSF and finally down-
family of PSF/super-resolution pairs that can give rise to the sampled. This gives rise to two alternative schemes which
were not made explicit in Irani and Peleg’s original paper,
Õ
same set of observed images. If the PSF is too “low-pass”
then the super-resolution image develops high-frequency, Either perform the warp onto a regular lattice using an
“ringing” artifacts to compensate. Similarly, if the PSF is interpolation operator (e.g. bicubic), followed by con-
too “high-pass” the estimate becomes smoother. So under volution with a discretized form of the PSF and down-
Euclidean transformations the reprojection error is insen- sampling.
sitive to the size of the PSF, only the cost of the prior or
regularizing term varies. This can confound methods which Õ Or warp the super-resolution image as point samples,
attempt to optimize the PSF along with the super-resolution and convolve with a continuous form of the PSF at the
estimate. required sampling positions.
The same effect is observed when dealing with affine or
projective transformations, although in these cases the re- The former is generally much easier to implement. The
projection error is sensitive to PSF variations, as illustrated warping and convolution are easily optimzed for speed with
1.6
1 1 1

0.8 0.8 0.8 1.4

0.6 0.6 0.6 1.2

rms reprojection error


0.4 0.4 0.4 1

0.2 0.2 0.2 0.8

0 0 0
0 50 100 0 50 100 0 50 100 0.6

ground truth PSF sigma = 0.55 PSF sigma = 0.65


0.4
1 1 1
0.2
0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95
0.8 0.8 0.8 Gaussian PSF sigma

W ¥ ª«¦
0.6 0.6 0.6
Figure 8. The variation of reprojection error with
0.4 0.4 0.4 of the Gaussian point-spread function. Each
data point is the end-point of an estimation using
0.2 0.2 0.2 the MAP estimator applied to 10 synthetic, projec-
tively warped images. The minimum corresponds
0 0 0
0 50 100 0 50 100 0 50 100 to the correct PSF.
PSF sigma = 0.75 (correct) PSF sigma = 0.85 PSF sigma = 0.95

thetic images (figure 2 top), ,


·ԏL‘ǘ MÖה%‘ 
Figure 7. The MAP estimator is applied to 10 syn-
.
timate, and then again at intervals throughout the minimiza-
tion. Figure 9 shows results of the Irani-Peleg, MAP and
When the PSF is too narrow (high-pass) the super- TV estimators when applied to 20 CCD camera images of
resolution estimate is too smooth. When the PSF
is too wide (low-pass) the estimate develops “ring- a sample of text undergoing planar-projective motion. The
ing” artifacts to compensate. super-resolution zoom ratio is 2.0. The MAP and TV esti-
mates are both slightly sharper than the Irani-Peleg version
because they encourage the solution to be more piece-wise
constant. There is little difference between the MAP and
· ·
little need for caching of intermediate steps or look-up ta-
TV estimates. However, the TV estimator only requires one
M
bles. However, linear interpolation schemes such as bilin-
global parameter to be set ( ), as opposed to the two ( and
ear or bicubic have an unavoidable low-pass effect. If the
) required by the Huber function, hence the TV scheme is
projective transformation of the images is severe (e.g. pro-
rather easier to use in practice.
nounced foreshortening) then these interpolants often per-
form poorly and this in turn adversely affects the super- 9. Summary and future development
resolution result. Also, when using a gradient descent min- We have demonstrated the superiority of the Irani and
imizer, the Jacobian must include both the interpolant and Peleg algorithm to the ML estimator and explained the rea-
the PSF. This leads to a rather inelegant formulation which sons for its robustness.
is further complicated by any boundary conditions. For this Furthermore, it has been shown that results comparable
reason we have chosen the latter implementation path. The
PSF is a continuous, isotropic Gaussian, truncated at šL‘” to or better than those obtainable with Irani and Peleg’s al-
gorithm can be achieved using a simple MAP estimator or a
standard deviations. This continuous form allows fairly traditional total variation regularizer. We have shown these
simple evaluation of simulated pixels and Jacobians. It estimators to have improved noise robustness.
also allows for straightforward propagation of registration We have also demonstrated the effect of a poorly esti-
parameter covariance to provide confidence weightings on mated point-spread function on the super-resolution result,
the simulated pixels. Such covariance information is a by- and explained the conditions under which this parameter
product of unconstrained, ML registration algorithms such may be successfully optimized.
as feature-based bundle-adjustment [8]. The estimators proposed here are particularly applicable
to enhancement of text since they encourage a piece-wise
8. Results using real data constant solution. For other types of image, such simplis-
In the synthetic examples the registration was known ex- tic priors are inappropriate. We are therefore investigating
actly. In these real examples initial registration is obtained methods of learning statistical image models directly from

The illumination parameters MT


and PT
using the feature based ML algorithm described in [3].
are estimated us-
images such as proposed by Freeman and Pasztor [7], and
also Borman et al. [2].
ing a robust line-fit to the intensities of corresponding pixels The basic ML super-resolution estimator is too poorly
in the simulated and observed images. This estimation is conditioned to be useful. However, the Irani and Peleg al-
carried out at the start, using the initial super-resolution es- gorithm demonstrates that with a restricted solution basis
References
[1] B. Bascle, A. Blake, and A. Zisserman. Motion deblur-
ring and super-resolution from an image sequence. In Proc.
ECCV, pages 312–320. Springer-Verlag, 1996.
[2] S. Borman, K. Sauer, and C. Bouman. Nonlinear predic-
tion methods for estimation of clique weighting parameters
in nongaussian image models. In Optical Science, Engineer-
ing and Instumentation, volume 3459 of Proceedings of the
SPIE, San Diego, CA, Jul 1998.
[3] D. Capel and A. Zisserman. Automated mosaicing with
a b super-resolution zoom. In Proc. CVPR, pages 885–891, Jun
1998.
[4] T. Chan, P. Blomgren, P. Mulet, and C. Wong. Total variation
image restoration: Numerical methods and extensions. In
ICIP, pages III:384–xx, 1997.
[5] T. Chan, G. Golub, and P. Mulet. A nonlinear primal-dual
method for total variation-based image restoration. SIAM
Journal on Scientific Computing, 20(6):1964–1977, 1999.
[6] P. Cheeseman, B. Kanefsky, R. Kraft, and J. Stutz. Super-
resolved surface reconstruction from multiple images. Tech-
nical report, NASA, 1994.
[7] W. Freeman and E. Pasztor. Learning low-level vision. In
c d ICCV, pages 1182–1189, 1999.
[8] R. I. Hartley. Self-calibration from multiple views with a ro-
tating camera. In Proc. ECCV, LNCS 800/801, pages 471–
478. Springer-Verlag, 1994.
[9] M. Irani and S. Peleg. Improving resolution by image regis-
tration. GMIP, 53:231–239, 1991.
[10] M. Irani and S. Peleg. Motion analysis for image enhance-
ment:resolution, occlusion, and transparency. Journal of Vi-
sual Communication and Image Representation, 4:324–335,
1993.
[11] D. Liu and J. Nocedal. On the limited memory BFGS
e f method for large scale optimization. Mathematical Pro-

Ž
Figure 9. Super-resolution at 2 zoom from 20 im-
ages captured using a Pulnix CCD camera. (a) one
gramming, B(45):503–528, 1989.
[12] S. Mann and R. W. Picard. Virtual bellows: Constructing
high quality stills from video. In International Conference
of the original low-res images, (b) the initial esti- on Image Processing, 1994.

·¯؏L‘ L·1˜ MÚُL‘ 9”%?‘ ˜ W Ø ??‘‘’’


mate (average of registered frames), (c) the result
of the Irani-Peleg algorithm, (d) the result of the [13] W. Press, B. Flannery, S. Teukolsky, and W. Vetterling. Nu-
MAP algorithm,
pixels, (e) the TV result,
, ,PSF
, PSF W
·¯؏L‘ 9-” Ú
 merical Recipes in C. Cambridge University Press, 1988.
[14] S. Reichenbach, S. Park, and R. Narayanswamy. Character-
pixels, (f) the TV estimate with . The izing digitial image aquisition devices. Optical Engineering,
MAP and TV estimates are both slightly sharper 30(2):170–177, 1991.
than the Irani-Peleg estimate, although there is lit- [15] L. Rudin, F. Guichard, and P. Yu. Video super-resolution
tle difference in quality between MAP and TV. Both via contrast-invariant motion segmentation and frame fusion
are clearly far superior to the original image reso- (with applications to forensic video evidence). In ICIP, page
lution.
27PS1, 1999.
[16] R. R. Schultz and R. L. Stevenson. Extraction of high-
useful results can still be obtained. With this in mind we resolution frames from video sequences. IEEE Transactions
are also investigating restricted image bases in which the on Image Processing, 5(6):996–1011, Jun 1996.
[17] C. Vogel and M. Oman. Fast, robust total variation based
super-resolution problem is better conditioned, thereby al-
reconstruction of noisy, blurred images. IP, 7(6):813–824,
lowing an ML estimator to be used.
June 1998.
[18] A. Zomet and S. Peleg. Applying super-resolution to
Acknowledgements Funding for this work was provided panoramic mosaics. In WACV, 1998.
by the EPSRC and the EU project IMPROOFS. Many
thanks to Dr Andrew Fitzgibbon for valuable discussions
about optimization algorithms and bundle-adjustment, and
for providing lots of useful software.

You might also like