0% found this document useful (0 votes)
31 views5 pages

Video Stabilization For A Hand-Held Camera Based o

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views5 pages

Video Stabilization For A Hand-Held Camera Based o

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/221124189

Video stabilization for a hand-held camera based on 3D motion model

Conference Paper in Proceedings / ICIP ... International Conference on Image Processing · November 2009
DOI: 10.1109/ICIP.2009.5413831 · Source: DBLP

CITATIONS READS
24 244

4 authors, including:

Sei-Wang Chen Chiou-Shann Fuh


National Taiwan Normal University National Taiwan University
77 PUBLICATIONS 2,423 CITATIONS 175 PUBLICATIONS 2,946 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

HEALS Project View project

All content following this page was uploaded by Sei-Wang Chen on 01 June 2014.

The user has requested enhancement of the downloaded file.


VIDEO STABILIZATION FOR A HAND-HELD CAMERA BASED ON 3D MOTION MODEL
J. M. Wang1, H. P. Chou3, S. W. Chen2, and C. S. Fuh1
1
Computer Science and Information Engineering, National Taiwan University, Taiwan
2
Computer Science and Information Engineering, National Taiwan Normal University, Taiwan
3
Information Management, Chung-Hua University, Taiwan

ABSTRACT interframe motions between successive images are calculated


In this paper, a video stabilization technique is presented. and then fitted to the preselected motion model, from which
There are four steps in the proposed approach. We begin with an overdetermined system of equations of motion parameters
extracting feature points from the input image using the is obtained. By finding an optimal solution to this system,
Lowe SIFT (Scale Invariant Feature Transform) point motion parameters are determined.
detection technique. This set of feature points is then The techniques for calculating interframe motions can be
matched against the set of feature points detected in the classified into two categories, differential and matching (or
previous image using the Wyk et al. RKHS (Reproducing correlation) approaches. The differential approaches,
Kernel Hilbert Space) graph matching technique. We can primarily for computing optical flow [5] and based on the
calculate the camera motion between the two images with the assumption of image brightness constancy, are known to be
aid of a 3D motion model. Expected and unexpected sensitive to both high-order derivative errors and the aperture
components are separated using a motion taxonomy method. problem. Matching techniques can be divided into two
Finally, a full-frame technique to fill up blank image areas is classes, block matching [6] and feature-based matching [7].
applied to the transformed image. Block matching is intrinsically weak at handling motions
Index Terms—SIFT detection, RKHS graph matching, involving rotation and scale change. On the other hand,
3D motion, Motion taxonomy, Full-frame process. feature-based matching can cope with motions involving
translation, rotation, and scaling. However, the effectiveness
1. INTRODUCTION of the feature-based method depends heavily on the features
Vision systems play important roles in many intelligent employed.
applications, such as transportation systems, security systems, The rest of this paper is organized as follows. In Section 2,
and monitoring systems. Cameras may be installed on we address the problem under consideration and give the
building or held by a people. Hand held cameras often suffer solution process. The critical techniques for implementing
from image instability [1]. In this paper, a video stabilization the process are given in Sections 3-5. These include global
technique for a video camera held by a person is presented. feature extraction and matching in Section 3, camcorder
This technique can be considered as a solution for the general motion estimation in Section 4, image compensation in
video instability problems. Section 5. Experimental results are presented in Section 6.
There are typically three major stages constituting a video Finally, Section 7 presents our conclusions and gives
stabilization process, motion estimation, motion taxonomy, suggestions for future work.
and image compensation. Motion estimation is an ill-
conditioned problem because of camcorder motion, which is 2. SYSTEM WORK FLOW
three-dimensional (3-D), is estimated on the basis of input Our stabilization method consists of four steps (Fig. 1):
images that are two-dimensional (2-D). This stage actually feature point matching, camera motion estimation, motion
dominates the entire stabilization process in both accurancy taxonomy, and image compensation. To extract feature
and time complexity. Many techniques, both hardware and points, we apply SIFT (Scale Invariant Feature Transform)
software, have been proposed for improving the performance proposed by Lowe [8] to detect the feature points in an image.
of motion estimation. These feature points will be invariant to changes in scale,
Since 3-D motion estimation from 2-D images is ill- rotation, and illumination, and so are suitable for our
conditioned, additional information needs be included in application. A graph matching method modified from the
order to make the problem well-conditioned. Several motion method proposed in [9] and discussed in Section 3 is applied
models, which serve as an important source of a priori here for matching feature points between two successive
information, have been introduced, including a 2-D rigid images.
model with four parameters [2], a 2-D affine model with six
parameters [3], a 2.5-D model with seven parameters [4], and
a 3-D model with nine parameters. Intuitively, the higher the
dimension of the model is used, the better the accuracy is
achieved. However, this may not be always the case because
high dimensional models involve more complicated
computations, which themselves may incur numerical
instability.
Fig. 1. Image stabilization flowchart.
There are two important tasks when using motion models:
interframe motion calculation and model fitting. The

978-1-4244-5654-3/09/$26.00 ©2009 IEEE 3477 ICIP 2009


Given a motion model, the 3-D motion of the camera can the weight matrix. After constructing the weight matrix, we
be calculated from the 2-D image. However, among these assign the optimal value for each element of P. Hopfield
motions, some are unexpected because of camera motion, neural network is a well known assigning method. In this
while some are expected because of object motion. In motion application, we use Hungarian algorithm [11] because of its
taxonomy, we can ignore the expected motions to calculate polynomial processing time.
the camera motion. Camera motions are then smoothed in In our work, the unary feature values are location (x, y),
time and applied to transform each image to obtain a color values, magnitude m and orientation T:
stabilized video sequence. Some missing parts of one image m(x, y) (L(x  1, y)  L(x 1, y)) 2  (L(x, y  1)  (L(x, y 1)) 2 ,
will be compensated at the same time to have a full-frame
T (x, y) tan1 ((L(x, y  1)  L(x, y 1) /(L(x  1, y)  L(x 1, y)),
sequence.
where L is the corresponding scale image after convoluting
3. FEATURE-POINT EXTRACTION with the Gaussian filter. The binary feature value is the
Feature points in one image are first detected by the SIFT Euclidean distance between the feature points in the same
method [8]. They are represented as a full connected graph G image. After these computations, each node will match a
= (V, E), where V is the set of the nodes denoting the feature node in the other graph. Redundant nodes will match either a
points, and E is the set of the edges showing the relations null node or another redundant node in the other graph.
between points. Using the graphs for two successive images,
the corresponding points in each image can be found by 4. GEOMETRIC CAMERA CALIBRATION
using a graph matching method. From the corresponding In the camera model shown in Fig. 2, O is the optical
feature points in successive images, the geometric center of the lens; i, j, k are three orthogonal unit vectors and
relationship between the images can be estimated. k points in the viewing direction. The image plane is located
To represent the feature points in an image I, we at z = f, where f is the focal length. In the image plane, the
construct matrices Ak, k = 1,2,..,m, where m is the number of perspective projection point p of a scene point P = (x, y, z)T
kinds of features. The elements of Ak are the feature values ( x OP ˜ i , y OP ˜ j , z OP ˜ k ) can be defined as the
calculated according to the k-th kind of feature for all feature intersection of OP and the image plane. If (u, v, f)T is the
points. Unary feature points are represented by a diagonal f
matrix, and binary features values by a symmetric matrix coordinate vector of p, u and v can be calculated by u x
z
whose (i, j)-th element is the value for the relationship f
between the i-th and j-th feature points. and v y . We may say (u, v) are the image coordinates of
z
Feature points in the following image I’ in the sequence
p. This process is known as geometric camera calibration.
can also be represented by matrices Akc . The correspondence
between the feature points in successive images can be
obtained by solving the following equation for the
permutation matrix P:
m
P min(¦ Ak  PAkc P T ) ,
P
k 1
where P can be found by the method based on RKHS Fig. 2. Camera model.
proposed in [9], and . is some norm and it is Frobenius Consider another camera model in the same space, where
norm in our application. its optical center is located at Ot. We can define another
RKHS graph matching, however, can be applied to Ak coordinate frame F’ by the origin Ot and another three
whose elements have been normalized. Here we modify this orthogonal unit vectors i’, j’, and k’, where k’ points in the
method by applying a new measurement function to adapt to viewing direction of this camera. The point P can be
the various types of feature values. Matrix P is found in two represented using a new coordinate vector P’ in F’ by a
steps, weight matrix construction and optimal assignment. rotation and translation given by:
Each element W(i, j) of the weight matrix is computed by: P c RP  O c (1)
¦ max>min a (i,s)  ac( j,t) @ max>min a (s,i)  ac(t, j) @ ,
m
where R is a rotation matrix, and O’ is the coordinate vector
k k k k

W (i, j) k 1
s t s t
of origin O in F’.
max[ak (i,.), ack ( j,.),ak (.,i), akc(., j)] There is a projection point p’ of P in the image plane,
where ak(.) and akc(.) are the element in Ak and Akc , which is denoted by (u’, v’); it can be calculated by the same
respectively. The calculated value gives the degree of the function G mentioned above. When we obtain an image
correspondence between node i and node j. sequence from a moving camera, a scene point may change
The graph with fewer nodes is padded with null nodes to position from image to image. We want to estimate R and O’
give an equal number of nodes in each graph. In the matrices, in two successive images given corresponding points pi and
feature values of the null nodes and their corresponding pi’, i=1,…,n, where n is the number of point pairs.
edges are set to null value, and are ignored in constructing

3478
The matrix R is a combination of the rotation matrices RD, The latter are mixed with the object moving and are more
RE, and RJ about the i, j, and k axes of the coordinate frame F, complex than the previous one.
respectively Expected motion points cause the initial x to be too
R RD RE RJ , imprecise. If we assume there are many more of unexpected
motion points than the expected motion points, the initial x
where D, E, and J, are the rotating angles and
can be assumed to be close to the unexpected motion points
ª1 0 0 º ª cos E 0 sin E º ªcos J  sin J 0º
«0 cosD  sin D »» « 0 0 »» « sin J cos J 0»»
and those unexpected motion points will have smaller ('xi,
RD « RE « 1 RJ « .
«¬0 sin D cos D »¼ «¬ sin E 0 cos E »¼ «¬ 0 0 1»¼ 'yi). To improve on the initial x, we can eliminate those
Since the translation and rotation of the camera should be points with larger differences, which are assumed to be
very small in a small time interval, we can assume that the expected points, and recalculate x again. In our experience,
rotating angles are very small. Under this assumption, RD, RE, the point with largest difference is eliminated, and we repeat
and RJ can be simplified as the above process until the value of x approaches that of the
ª1 0 0 º ª 1 0 Eº ª1  J 0º previous one.
RD* «0 1  D »» R E* « 0 1 0 »» RJ* «J 1 0»» Suppose that x(t) is the vector of the camera motion from
« « «
«¬0 D 1 »¼ «¬  E 0 1 »¼ «¬ 0 0 1»¼ time t-1 to time t, and s(t) is the summation from x(0) to x(t).
and Eq. (1) can be written as We may say that a video sequence has been stabilized if the
P c RD* RE* RJ* P  Oc . stabilized vector x’(t) does not change significantly from the
(2)
previous one x’(t-1). In other words, the summation of the
Expending Eq. (2) under Oc ('x, 'y, 'z )T gives stabilized motion, s’(t), should smoothly vary. To obtain s’(t)
­ x c x  Jy  Ez  'x we convolve s(t) with a Gaussian function. The image at time
°
® y c Jx  y  DE x  DEJ y  Dz  'y (3) t, It, is then transformed using the compensation values, Ӕxt,t,
° z c DJx  Dy  E x  EJy  z  'z defined as 'x t,t sc( t )  s( t ) . We denote the transformed
¯
image as I tc, and all of the transformed images will constitute
where (x’, y’, z’)T is the coordinate vector of P’. Then we can
a stable image sequence.
fc
calculate the image coordinate (u’, v’) of p’ by u c x c and The boundary of the transformed image will be blank as
zc shown in Fig. 3(b). To fill up this image, we can extract the
fc lost information from the prior and following images Ik,
vc y c . If DEx and DEJy in Eq. (3) are very small relative
zc k (t  n)...t...(t  n) . First, the image Ik is transformed using
to y, we will have 'x k,t sc( t )  s( k ) to match the stabilized camera model at
­ fc fc z f f f f time t. We denote the new image as I kt . When the value of
°°u c zc
( x  Jy  Ez  'x)
zc
( x J y  Ez  'x)
®
f z z z z
. the image point at (x, y) in I tc is missing, we replace this
fc fc z f f f f point using the same point in I kt , k (t  n)...t...(t  n) . Fig.
°v c ( y  Jx  Dz  'z ) ( y J x  Dz  'y )
¯° zc zc f z z z z 3(c) shows the result.
fc z
Replacing fD, fE, and with three constants, m, n, and
zc f
1 f f
, and eliminating small values, 'x and 'y , these
S z z
(a) (b) (c)
functions can be rearranged as: Fig. 3. (a) Original image. (b) After transformation for stabilization.
­ 1 (c) Filling up the boundary with the successive frame.
°uc S (u  Jv  m) ­Suc  Jv  m u
® Ÿ® . (4)
1 c
° vc (v  Ju  n) ¯ Sv  Ju  n v We convolve the Gaussian function with s(t) from time
¯ S (t-n) to (t+n), which helps to obtain sc(t) and I tc at time t+n.
Given some corresponding point pairs (ui, vi) and (ui’, vi’), In filling images, I kt with small t  k is checked at first,
i=1,…,n, we can solve for J, m, and n by the least squares because we believe that images closer to time t will be more
method. Let the solution be x = (ʌ, Ȗ, m, n)T. This solution is similar to I tc. Finally, the other points which still have no
not the actual value because of the expected motions which values can be filled up using the interpolation and
we will discuss later. extrapolation methods described in [10].

5. IMAGE COMPENSATION 6. EXPERIMENTAL RESULTS


If we calculate the transformation results ( u*i ,v *i ) from (ui, vi) We test our algorithm on three kinds of image sequences.
according to x, there will be some differences ('xi, 'yi), First, we capture images with the camera in a static position.
where 'xi ui  uic and 'yi vi  vic . These differences
* *
Motions in the video are caused by movements of the object
and the camera. And second, we capture the images with a
should be zero if all of the known points are shifted because
moving camera. In this sequence the camera has a significant
of the camera motion. We call such a shift “unexpected
change along the z-axis. Final sequence is captured with the
motion”. However, there will be some “expected motions”.
camera moved and panned at the same time.

3479
Each image frame is extracted and processed with our during the video acquisition. It shows that the camera motion
algorithm to obtain a stabilized image. The processing time is more stable. Fig. 7(b) shows the ʌ value (scale) in the
for one image (435x240 pixels) is about two seconds on a 3.0 second case. Since the camera is moving forward, we will
GHz Pentium IV. The images before processing are shown in have ʌ < 1. Our processing stabilizes the value and produces
the top rows of Fig. 4 and Fig. 5, and the images after a stable image sequence.
processing are shown in the bottom rows. An X in the middle
0.02 1.04

0.015 1.02

0.01

is shown to help see the shift distance of the objects. More


1

0.005
0.98

0
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 109 115 121 127 133 139 145 151 157 163 0.96

experimental results are shown in our website:


-0.005
0.94
-0.01
0.92
-0.015

http://www.csie.ntnu.edu.tw/~ipcv/Research/ulin/
0.9
-0.02 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141

In the first case (Fig. 4), there is a large area of water, (a) (b)
Fig. 7. (a) Ȗ value (vertical axis) of the first sequence along time
where the feature points are few and difficult to match. Our
(horizontal axis), where lighter lines and darker lines denote the
processing result shows that the moving object is more stable value before and after stabilization respectively. (b) ʌ value of the
than that in the original images (top rows). In addition, some second sequence.
lost information in the stabilized image will be filled up, for
example, the plane in the top of frame 1. 7. CONCLUSIONS
In this paper, we propose an algorithm for stabilizing an
image sequence captured by a hand-held video camera. This
algorithm makes the following contributions: First, a 3D
camera motion model can be obtained and used to stabilize
the image sequence. It is more precise than traditional
methods using 2D models. Second, our method can be
Frame 1 Frame 2 Frame 3 applied to general case because it need not detect any objects
Fig. 4. Top row is original image sequence. Bottom row is image in the scene. Third, by detecting the expected and unexpected
stabilized sequence.
motions, the camera motion model can be calculated more
precisely and the foreground objects can be located at the
In the second case (Fig. 5), an image sequence is captured
same time.
with a moving camera. Objects in the scene have a
Image stabilization helps to more conveniently extract
significant change in size. Our processing result shows that
information from the video. However, our method can not be
the objects in the scene are stable, but there are some errors
applied to real-time systems because of the slowness of the
in the boundary after filling up. We can compute the depth (z
SIFT computation. Our next phase of research is to improve
value) to correct that, which will be our future works.
the speed so that it can be performed in real time.

REFERENCES
[1] Y. M. Liang, H. R. Tyan, S. L. Chang, H. Y. Liao, and S. W. Wang,
“Video Stabilization for a Camcorder Mounted on a Moving Vehicle,” IEEE
Trans. on Vehicular Technology, vol. 53, no. 6, pp. 1636-1648, 2004.
[2] C. Morimoto and R. Chellappa, “Fast Electronic Digital Image
Frame 1 Frame 2 Frame 3 Frame 4 Stabilization for Off-Road Navigation,” Real-Time Imaging, vol. 2, no. 5, pp.
285-296, 1996.
Fig. 5. Image sequence captured by a moving camera.
[3] M. Betke, “Recognition, Resolution, and Complexity of Objects Subject
to Affine Transformations,” Int. Journal of Computer Vision, vol. 44, no. 1,
In the final case (Fig. 6), camera is held by a people on pp. 5–40, 2001.
the boat. The camera has a significant change along x-axis. [4] J. S. Jin, Z. Zhu, and G. Xu, “A Stable Vision System for Moving
Vehicles,” IEEE Trans. Intell. Transport. Syst., vol. 1, pp. 32–39, 2000.
Our algorithm needs no motion assumption, so that it could
[5] J. Y. Chang, W. F. Hu, M. H. Cheng and B. S. Chang, “Digital Image
be applied to many kinds of video sequences. Translational and Rotational Motion Stabilization Using Optical Flow
Technique,” IEEE Trans. on Consumer Electronics, vol. 48, no. 1, pp. 108–
115, 2002.
[6] L. Xu and X. Lin, “Digital Image Stabilization Based on Circular Block
Matching,” IEEE Trans. on Consumer Electronics, vol. 52, no. 2, pp. 566-
574, 2006
[7] Z. Duric and A. Rosenfeld, “Image Sequence Stabilization in Real
Time,” Real-Time Imaging, vol. 2, no. 5, pp. 271–284, 1996.
Frame 1 Frame 2 Frame 3 Frame 4 [8] D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,”
Int. Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
Fig. 6. Image sequence captured with significant pan motion. [9] M. A. van Wyk, T. S. Durrani, and B. J. van Wyj, “A RKHS
Interpolator-Base Graph Matching Algorithm,” IEEE Trans. on Pattern
Fig. 7 shows the motion values before and after Analysis and Machine Intelligence, vol. 24, no. 7, pp. 988-995, 2002.
stabilization. In this figure, lighter lines show the values [10] Y. Matsushita, E. Ofek, W. Ge, X. Tang, and H. Y. Shum, “Full-Frame
Video Stabilization with Motion Inpainting,” IEEE Trans. on Pattern
before processing, and the darker lines show the stabilized Analysis and Machine Intelligence, vol. 28, no. 7, pp. 1150-1163, 2006.
results. In Fig. 7(a), the Ȗ values (rotation angle) are shown [11] H. W. Kuhn, “The Hungarian Method for the Assignment Problem,”
along the vertical axis (time), which should be close to zero Naval Research Logistics Quarterly, vol. 2, pp. 83–97, 1955.

3480

View publication stats

You might also like