Video Stabilization For A Hand-Held Camera Based o
Video Stabilization For A Hand-Held Camera Based o
net/publication/221124189
Conference Paper in Proceedings / ICIP ... International Conference on Image Processing · November 2009
DOI: 10.1109/ICIP.2009.5413831 · Source: DBLP
CITATIONS READS
24 244
4 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Sei-Wang Chen on 01 June 2014.
W (i, j) k 1
s t s t
of origin O in F’.
max[ak (i,.), ack ( j,.),ak (.,i), akc(., j)] There is a projection point p’ of P in the image plane,
where ak(.) and akc(.) are the element in Ak and Akc , which is denoted by (u’, v’); it can be calculated by the same
respectively. The calculated value gives the degree of the function G mentioned above. When we obtain an image
correspondence between node i and node j. sequence from a moving camera, a scene point may change
The graph with fewer nodes is padded with null nodes to position from image to image. We want to estimate R and O’
give an equal number of nodes in each graph. In the matrices, in two successive images given corresponding points pi and
feature values of the null nodes and their corresponding pi’, i=1,…,n, where n is the number of point pairs.
edges are set to null value, and are ignored in constructing
3478
The matrix R is a combination of the rotation matrices RD, The latter are mixed with the object moving and are more
RE, and RJ about the i, j, and k axes of the coordinate frame F, complex than the previous one.
respectively Expected motion points cause the initial x to be too
R RD RE RJ , imprecise. If we assume there are many more of unexpected
motion points than the expected motion points, the initial x
where D, E, and J, are the rotating angles and
can be assumed to be close to the unexpected motion points
ª1 0 0 º ª cos E 0 sin E º ªcos J sin J 0º
«0 cosD sin D »» « 0 0 »» « sin J cos J 0»»
and those unexpected motion points will have smaller ('xi,
RD « RE « 1 RJ « .
«¬0 sin D cos D »¼ «¬ sin E 0 cos E »¼ «¬ 0 0 1»¼ 'yi). To improve on the initial x, we can eliminate those
Since the translation and rotation of the camera should be points with larger differences, which are assumed to be
very small in a small time interval, we can assume that the expected points, and recalculate x again. In our experience,
rotating angles are very small. Under this assumption, RD, RE, the point with largest difference is eliminated, and we repeat
and RJ can be simplified as the above process until the value of x approaches that of the
ª1 0 0 º ª 1 0 Eº ª1 J 0º previous one.
RD* «0 1 D »» R E* « 0 1 0 »» RJ* «J 1 0»» Suppose that x(t) is the vector of the camera motion from
« « «
«¬0 D 1 »¼ «¬ E 0 1 »¼ «¬ 0 0 1»¼ time t-1 to time t, and s(t) is the summation from x(0) to x(t).
and Eq. (1) can be written as We may say that a video sequence has been stabilized if the
P c RD* RE* RJ* P Oc . stabilized vector x’(t) does not change significantly from the
(2)
previous one x’(t-1). In other words, the summation of the
Expending Eq. (2) under Oc ('x, 'y, 'z )T gives stabilized motion, s’(t), should smoothly vary. To obtain s’(t)
x c x Jy Ez 'x we convolve s(t) with a Gaussian function. The image at time
°
® y c Jx y DE x DEJ y Dz 'y (3) t, It, is then transformed using the compensation values, Ӕxt,t,
° z c DJx Dy E x EJy z 'z defined as 'x t,t sc( t ) s( t ) . We denote the transformed
¯
image as I tc, and all of the transformed images will constitute
where (x’, y’, z’)T is the coordinate vector of P’. Then we can
a stable image sequence.
fc
calculate the image coordinate (u’, v’) of p’ by u c x c and The boundary of the transformed image will be blank as
zc shown in Fig. 3(b). To fill up this image, we can extract the
fc lost information from the prior and following images Ik,
vc y c . If DEx and DEJy in Eq. (3) are very small relative
zc k (t n)...t...(t n) . First, the image Ik is transformed using
to y, we will have 'x k,t sc( t ) s( k ) to match the stabilized camera model at
fc fc z f f f f time t. We denote the new image as I kt . When the value of
°°u c zc
( x Jy Ez 'x)
zc
( x J y Ez 'x)
®
f z z z z
. the image point at (x, y) in I tc is missing, we replace this
fc fc z f f f f point using the same point in I kt , k (t n)...t...(t n) . Fig.
°v c ( y Jx Dz 'z ) ( y J x Dz 'y )
¯° zc zc f z z z z 3(c) shows the result.
fc z
Replacing fD, fE, and with three constants, m, n, and
zc f
1 f f
, and eliminating small values, 'x and 'y , these
S z z
(a) (b) (c)
functions can be rearranged as: Fig. 3. (a) Original image. (b) After transformation for stabilization.
1 (c) Filling up the boundary with the successive frame.
°uc S (u Jv m) Suc Jv m u
® ® . (4)
1 c
° vc (v Ju n) ¯ Sv Ju n v We convolve the Gaussian function with s(t) from time
¯ S (t-n) to (t+n), which helps to obtain sc(t) and I tc at time t+n.
Given some corresponding point pairs (ui, vi) and (ui’, vi’), In filling images, I kt with small t k is checked at first,
i=1,…,n, we can solve for J, m, and n by the least squares because we believe that images closer to time t will be more
method. Let the solution be x = (ʌ, Ȗ, m, n)T. This solution is similar to I tc. Finally, the other points which still have no
not the actual value because of the expected motions which values can be filled up using the interpolation and
we will discuss later. extrapolation methods described in [10].
3479
Each image frame is extracted and processed with our during the video acquisition. It shows that the camera motion
algorithm to obtain a stabilized image. The processing time is more stable. Fig. 7(b) shows the ʌ value (scale) in the
for one image (435x240 pixels) is about two seconds on a 3.0 second case. Since the camera is moving forward, we will
GHz Pentium IV. The images before processing are shown in have ʌ < 1. Our processing stabilizes the value and produces
the top rows of Fig. 4 and Fig. 5, and the images after a stable image sequence.
processing are shown in the bottom rows. An X in the middle
0.02 1.04
0.015 1.02
0.01
0.005
0.98
0
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 109 115 121 127 133 139 145 151 157 163 0.96
http://www.csie.ntnu.edu.tw/~ipcv/Research/ulin/
0.9
-0.02 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141
In the first case (Fig. 4), there is a large area of water, (a) (b)
Fig. 7. (a) Ȗ value (vertical axis) of the first sequence along time
where the feature points are few and difficult to match. Our
(horizontal axis), where lighter lines and darker lines denote the
processing result shows that the moving object is more stable value before and after stabilization respectively. (b) ʌ value of the
than that in the original images (top rows). In addition, some second sequence.
lost information in the stabilized image will be filled up, for
example, the plane in the top of frame 1. 7. CONCLUSIONS
In this paper, we propose an algorithm for stabilizing an
image sequence captured by a hand-held video camera. This
algorithm makes the following contributions: First, a 3D
camera motion model can be obtained and used to stabilize
the image sequence. It is more precise than traditional
methods using 2D models. Second, our method can be
Frame 1 Frame 2 Frame 3 applied to general case because it need not detect any objects
Fig. 4. Top row is original image sequence. Bottom row is image in the scene. Third, by detecting the expected and unexpected
stabilized sequence.
motions, the camera motion model can be calculated more
precisely and the foreground objects can be located at the
In the second case (Fig. 5), an image sequence is captured
same time.
with a moving camera. Objects in the scene have a
Image stabilization helps to more conveniently extract
significant change in size. Our processing result shows that
information from the video. However, our method can not be
the objects in the scene are stable, but there are some errors
applied to real-time systems because of the slowness of the
in the boundary after filling up. We can compute the depth (z
SIFT computation. Our next phase of research is to improve
value) to correct that, which will be our future works.
the speed so that it can be performed in real time.
REFERENCES
[1] Y. M. Liang, H. R. Tyan, S. L. Chang, H. Y. Liao, and S. W. Wang,
“Video Stabilization for a Camcorder Mounted on a Moving Vehicle,” IEEE
Trans. on Vehicular Technology, vol. 53, no. 6, pp. 1636-1648, 2004.
[2] C. Morimoto and R. Chellappa, “Fast Electronic Digital Image
Frame 1 Frame 2 Frame 3 Frame 4 Stabilization for Off-Road Navigation,” Real-Time Imaging, vol. 2, no. 5, pp.
285-296, 1996.
Fig. 5. Image sequence captured by a moving camera.
[3] M. Betke, “Recognition, Resolution, and Complexity of Objects Subject
to Affine Transformations,” Int. Journal of Computer Vision, vol. 44, no. 1,
In the final case (Fig. 6), camera is held by a people on pp. 5–40, 2001.
the boat. The camera has a significant change along x-axis. [4] J. S. Jin, Z. Zhu, and G. Xu, “A Stable Vision System for Moving
Vehicles,” IEEE Trans. Intell. Transport. Syst., vol. 1, pp. 32–39, 2000.
Our algorithm needs no motion assumption, so that it could
[5] J. Y. Chang, W. F. Hu, M. H. Cheng and B. S. Chang, “Digital Image
be applied to many kinds of video sequences. Translational and Rotational Motion Stabilization Using Optical Flow
Technique,” IEEE Trans. on Consumer Electronics, vol. 48, no. 1, pp. 108–
115, 2002.
[6] L. Xu and X. Lin, “Digital Image Stabilization Based on Circular Block
Matching,” IEEE Trans. on Consumer Electronics, vol. 52, no. 2, pp. 566-
574, 2006
[7] Z. Duric and A. Rosenfeld, “Image Sequence Stabilization in Real
Time,” Real-Time Imaging, vol. 2, no. 5, pp. 271–284, 1996.
Frame 1 Frame 2 Frame 3 Frame 4 [8] D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,”
Int. Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
Fig. 6. Image sequence captured with significant pan motion. [9] M. A. van Wyk, T. S. Durrani, and B. J. van Wyj, “A RKHS
Interpolator-Base Graph Matching Algorithm,” IEEE Trans. on Pattern
Fig. 7 shows the motion values before and after Analysis and Machine Intelligence, vol. 24, no. 7, pp. 988-995, 2002.
stabilization. In this figure, lighter lines show the values [10] Y. Matsushita, E. Ofek, W. Ge, X. Tang, and H. Y. Shum, “Full-Frame
Video Stabilization with Motion Inpainting,” IEEE Trans. on Pattern
before processing, and the darker lines show the stabilized Analysis and Machine Intelligence, vol. 28, no. 7, pp. 1150-1163, 2006.
results. In Fig. 7(a), the Ȗ values (rotation angle) are shown [11] H. W. Kuhn, “The Hungarian Method for the Assignment Problem,”
along the vertical axis (time), which should be close to zero Naval Research Logistics Quarterly, vol. 2, pp. 83–97, 1955.
3480