Active Camera Tracking Using Affine Motion Compensation
Conference Paper in Proceedings of SPIE - The International Society for Optical Engineering · June 2003
DOI: 10.1117/12.502770 · Source: DBLP
This paper describes a feature-based tracking system that can track moving objects with an active camera. A
robust camera motion estimation algorithm is proposed to obtain a stable global motion for feature tracking.
After we identify background and foreground motions based on dominant motion estimates, we estimate camera
motion on the background by applying a parametric affine motion model. After compensating for camera motion,
we trace multiple corner features in the scene and segment foreground objects by clustering motion trajectories
of the corner features. We can also command the pan-tilt controller to position the moving object at the center
of the camera.
Keywords: Active Camera, Feature Detection, Feature Tracking, Affine Motion Estimation
In this paper, we develop an automated camera system that can watch moving objects in the restricted area
with a camera having active pan and tilt control. If objects move outside the field of view, the camera should
pan or tilt such that the objects always stay within the filed of view. In those applications, motion estimation
and motion tracking are key components.
There have been various research works addressing these areas[1],[2],[3],[4] , but it is difficult to design general
and robust solutions to the problems involved. This difficulty stems from the complicated relationship between
the motion of objects in the 3-D scene and the apparent motion of brightness patterns in the sequence of 2-
D projections of the scene. Information about the relative depth of objects is lost in the projection, and the
observed motion in the image plane can result from other phenomena than the object motion in the scene, such
as changes in the lighting conditions. Moreover, the presence of observation in the 2-D image sequence is in itself
a non-trivial task because of the presence of observation noise, occlusions and temporal aliasing. Especially, in
case of active camera, because the moving camera creates image changes due to its own motion, object tracking
with a mobile camera is very a challenging task.
The work described in this paper attempts to address these problems. First, we utilize an affine model to
describe camera motion variation within a sequence. Affine models provide greater flexibility in modelling camera
motion, being able to represent rotation, dilation and shear as well as translation. Second, after discriminating
between background and foreground motions, camera motion is robustly estimated on the background. Therefore,
the camera motion estimate is not disturbed by the presence of outliers due to foreground objects whose motion
is not representative of the camera motion.
In order to extract the background motion, we compute a dominant motion by averaging the block motion
vectors derived in the previous step. Then, we filter out the noise or foreground motion vectors that have much
deviation from the average motion vector.
Since we use the affine motion model of six parameters, the motion vector can be expressed as follows:
x a1 a2 x a3
= + (2)
y a4 a5 y a6
In order to estimate six affine motion parameters, we define an error function to be minimized by
E(a) = {[vX (xi , yi ) − vX (xi , yi )]2 + [vY (xi , yi ) − vY (xi , yi )]2 } (3)
The optimal values of the six parameters are estimated by the least square method. The resulting equation
is represented by
x2i xi yi xi 0 0 0 a1 vX (xi , yi )x
xi yi yi2 yi 0 0 0 a2 vX (xi , yi )y
xi yi 1 0 0 0 a3 vX (xi , yi )
= (5)
0 0 0 x2i xi yi xi a4 vY (xi , yi )x
i=1 i=1
0 0 0 xi yi yi2 yi a5 vY (xi , yi )y
0 0 0 xi yi 1 a6 vY (xi , yi )
If the matrix Z has two large eigenvalues, the original window contains a corner feature of high spatial
frequency. Therefore, we can declare the corner point if min(λ1 , λ2 ) > λc , where λ1 and λ2 are two eigenvalues
of the matrix Z and λc is a predefined threshold value.
moving angle
Once a corner point is detected, we compensate the corner position using inverse affine transformation for
camera motion and trace the feature efficiently by predicting the next coordinate from the observed coordinate
of the feature point. We design a 2D token-based tracking scheme using Kalman filtering[7] . The center position
of the feature is used as the token t(k). After we define the system model and the measurement model, we
apply the recursive Kalman filtering algorithm to obtain linear minimum variance (LMV) estimates of motion
Ma = Mi
N i=1 (7)
where i is the time segment, Mi is the moving distance of the corner point at time i, Cx and Cy are the horizontal
and vertical positions of the corner point at the current image, and N is the trajectory length.
Aa = Ai
N i=1
Cy (i) − Cy (i − 1)
Ai = arctan
Cx (i) − Cx (i − 1)
U = {q0 , q1 , q2 , · · · , qn } (9)
We compute the first-order moment from the elements of U and denote it as the initial center −
m 0 . If the
standard deviation obtained from U and m 0 is greater than the predetermined threshold, a new center vector
of a cluster −
m 1 is determined by
m1 = −
m 0 + ασ0 , α : constant (10)
The cluster points are reassigned based on the Euclidean distances, d(→m 0 , qk ) and d(−
− →
m 1 , qk ), from −
m 0 and
m 1 . The criterion for reassignment of the cluster points is described by
C0 = {qk : d(−
→m 0 , qk ) ≥ d(−
→m 1 , qk )}
− →
− (11)
C1 = {qk : d( m 0 , qk ) < d( m 1 , qk )} k = 1, 2, 3, · · · , n
where x0 and x1 are numbers of elements in the cluster sets C0 and C1 , respectively.
After finding the new first moments − m 0 and −
→ →
m 1 with elements of the sets C0 and C1 , we perform the
m 0 , qk ) and d(−
reassignment process for the elements classified before by computing d(−
→ →
m 1 , qk ) for all elements of
the set U.
We repeat the process recursively until each standard deviation σk is smaller than the specific threshold value.
Eventually, the cluster set Ck comprises all the corner points.
The proposed tracking system has been tested on several video sequences in indoor environments. Fig. 4 shows
the block feature selection results for three activity thresholds. A high activity threshold diminishes the number
of the block features. We use 35.0 as the threshold value for the tracking system.
Four types of video sequences are captured, as shown in Fig. 5; right-panning and left-moving person, right-
panning and right-moving, left-panning and right-moving person, and left-panning and left-moving person. As
shown in Fig. 5(a), the right panning of camera makes one motion. A moving person occurs the other motion.
The background motion is separated by extracting dominant motion vectors. The center image of Fig. 5(a)
displays the results before the camera motion compensation. The results after the global motion compensation
are represented in the most right image of Fig. 5(a).
Fig. 6 and Fig. 7 show the tracking results for the scene of the person who moved to the left and the right
directions, respectively. As shown in Fig. 6, a number of corners are selected as the active corners. In Fig. 6, we
note that there are several feature paths corresponding to the person in the scene. Since the global motion by
camera movement is eliminated, the result shows the only local motions of the person. The pan-tilt operation is
commanded to move the camera to the centroid of local motion.
Figure 4. Block Feature Selection for Activity Threshold Values: (a)TH(25.0) (b)TH(35.0) (c)TH(45.0)
In this paper, we proposed an active camera tracking system based on feature-based object tracking. In the
proposed system, we estimate the camera motion using the affine motion model and compensate the camera
motion using inverse affine transformation and trace the feature efficiently by predicting the next coordinate
from the observed coordinate of the feature point. Finally, the local motion trajectories of the corner features
that represent the motion coherence property are clustered. In case of a single moving object, the proposed
algorithm demonstrates robust tracking results. In the future, we plan to improve our algorithm by applying
active zooming and multiple objects tracking.
This work was supported in part by grant NO. R05-2002-000-00868-0 from the Basic Research Program of the
Korea Science & Engineering Foundation. This work was supported in part by K-JIST, in part by KOSEF
through UFON, and in part by MOE through BK 21.
Figure 5. Background Motion Separation: (a)right-panning and left-moving person, (b) right-panning and right-moving,
(c)left-panning and right-moving person, (d)left-panning and left-moving person person
Figure 7. Tracking Results for the Scene of Left Moving Person. Frame numbers (top to bottom, left to right) are 244,
247, 250, 253, 256, 259