0% found this document useful (0 votes)
28 views10 pages

A Kalman-Filter-Based Method For Pose Estimation in Visual Servoing

Uploaded by

bob wu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views10 pages

A Kalman-Filter-Based Method For Pose Estimation in Visual Servoing

Uploaded by

bob wu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/224171200

A Kalman-Filter-Based Method for Pose Estimation in Visual Servoing

Article in IEEE Transactions on Robotics · November 2010


DOI: 10.1109/TRO.2010.2061290 · Source: IEEE Xplore

CITATIONS READS
167 1,747

2 authors, including:

Mohammed Abd El Rahman Marey


Ain Shams University
33 PUBLICATIONS 440 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Arm robot control with singularity avoidance View project

Breast Cancer Detection & Classification View project

All content following this page was uploaded by Mohammed Abd El Rahman Marey on 17 May 2015.

The user has requested enhancement of the downloaded file.


IEEE TRANSACTIONS ON ROBOTICS, VOL. 26, NO. 5, OCTOBER 2010 939

[9] E. Chong and S. H. Zak, An Introduction to Optimization, 2nd ed. New A Kalman-Filter-Based Method for Pose Estimation in
York: Wiley, 2001, ISBN-10: 0471391263. Visual Servoing
[10] S. Durola, P. Danès, D. Coutinho, and M. Courdesses, “Rational systems
and matrix inequalities to the multicriteria analysis of visual servos,” in
Proc. IEEE Int. Conf. Robot. Autom., Kobe, Japan, May 2009, pp.1504–
Farrokh Janabi-Sharifi and Mohammed Marey
1509.
[11] P. Danès, D. Bellot, “Towards an LMI approach to multicriteria vi-
sual servoing in robotics,” Eur. J. Control, vol. 12, no. 1, pp. 86–110, Abstract—The problem of estimating position and orientation (pose) of
2006. an object in real time constitutes an important issue for vision-based control
[12] R. Findeisen and F. Allgöwer, “An introduction to nonlinear model predic- of robots. Many vision-based pose-estimation schemes in robot control rely
tive control,” presented at the Benelux Meeting Syst. Control, Veldhoven, on an extended Kalman filter (EKF) that requires tuning of filter parame-
Pays Bas, The Netherlands, 2002. ters. To obtain satisfactory results, EKF-based techniques rely on “known”
[13] J. Gangloff and M. De Mathelin, “Visual servoing of a 6 dof manipulator noise statistics, initial object pose, and sufficiently high sampling rates
for unknown 3-D profile following,” IEEE Trans. Robot. Autom., vol. 18, for good approximation of measurement-function linearization. Deviations
no. 4, pp. 511–520, Aug. 2002. from such assumptions usually lead to degraded pose estimation during
[14] R. Ginhoux, J. Gangloff, M. De Mathelin, M. Soler, and L. Sanchez, “Ac- visual servoing. In this paper, a new algorithm, namely iterative adaptive
tive filtering of physiological motion in robotized surgery using predictive EKF (IAEKF), is proposed by integrating mechanisms for noise adaptation
control,” IEEE Trans. Robot. Autom., vol. 21, no. 1, pp. 67–79, Feb. and iterative-measurement linearization. The experimental results are pro-
2005. vided to demonstrate the superiority of IAEKF in dealing with erroneous
[15] K. Hashimoto and H. Kimura, “LQ optimal and nonlinear approaches to a priori statistics, poor pose initialization, variations in the sampling rate,
visual servoing,” in Visual Servoing(World Scientific Series in Robotics and trajectory dynamics.
and Intelligent Systems), K. Hashimoto, Ed, vol. 7. Singapore: World
Index Terms—Adaptation, Kalman filter (KF), control, pose estimation,
Scientific, 1993, pp. 165–198.
robotic manipulator, visual servoing.
[16] M. Kazemi, K. Gupta, and M. Mehrandezh, “Global path planning for
robust visual servoing in complex environments,” in Proc. IEEE Int. Conf.
Robot. Autom., Kobe, Japan, May 2009, pp. 326–332.
[17] R. Mahony, P. Corke, and F. Chaumette, “Choice of image features for
I. INTRODUCTION
depth-axis control in image-based visual servo control,” in Proc. IEEE/RSJ
Int. Conf. Intell. Robots Syst., Lausanne, Switzerland, Oct. 2002, pp. 390– In computer vision, the problem of pose estimation is to determine
395.
the position and orientation (pose) of a camera with respect to an ob-
[18] Y. Mezouar and F. Chaumette, “Optimal camera trajectory with image-
based control,” Int. J. Robot. Res., vol. 22, no. 10, pp. 781–804, 2003. ject’s coordinate frame using the image information. The problem is
[19] T. Murao, T. Yamada, and M. Fujita, “Predictive visual feedback control also known as extrinsic camera-calibration problem with its solution
with eye-in-hand system via stabilizing receding horizon approach,” in playing a crucial rule in the success of many computer-vision applica-
Proc. 45th IEEE CDC, San Diego, CA, Dec. 2006, pp.1758–1763. tions, such as object recognition [1], intelligent surveillance [2], and
[20] M. Morari and E. Zafiriou, Robust Control. Paris, France: Dunod, 1983.
[21] N. Papanikolopoulos, P. Khosla, and T. Kanade, “Visual tracking of a robotic visual servoing (RVS) [3]. Estimation of the camera displace-
moving target by a camera mounted on a robot: A combination of vision ment (CD) between the current and desired pose for RVS [4], [5] is
and control,” IEEE Trans. Robot. Autom., vol. 9, no. 1, pp. 14–35, Feb. also relevant to this problem. However, the focus of this study will be
1993. on pose estimation for RVS where the relative pose between a camera
[22] M. Sauvée, P. Poignet, E. Dombre, and E. Courtial, “Image based visual
and an object is used for real-time control of a robot motion [3].
servoing through nonlinear model predictive control,” in Proc. 45th IEEE
CDC, San Diego, CA, Dec. 2006, pp.1776–1781. In RVS, the control error can be calculated in the image space,
[23] F. Schramm and G. Morel, “Ensuring visibility in calibration-free path Cartesian space, or both (hybrid) spaces [3], [6], [7]. While partial
planning for image-based visual servoing,” IEEE Trans. Robot. Autom., estimation of the pose vector (e.g., depth) is required for image-based
vol. 22, no. 4, pp. 848–854, Aug. 2006. and hybrid visual-servoing schemes [8], [9], an important class of
visual-servoing methods, namely the position-based visual-servoing
(PBVS) scheme, requires full pose estimation to calculate Cartesian
error of the relative pose between the endpoint and the object [10].
Two major difficulties with pose estimation for RVS are related to the
requirements for efficiency and robustness of pose estimation [11].
The solutions to pose-estimation problem usually focus on using
sets of 2-D–3-D correspondences between geometric features and their
projections on the image plane. Although high-level geometric fea-
tures, such as lines and conics, have been proposed, point features
are typically used for pose estimation due to their ease of availability

Manuscript received September 1, 2009; revised April 17, 2010; accepted


July 20, 2010. Date of publication September 2, 2010; date of current version
September 27, 2010. This work was supported by the Natural Sciences and
Engineering Research Council of Canada under Grant 903060-07. The work
of M. Marey was supported by a grant from the Egyptian Ministry of High
Education and Scientific Research.
F. Janabi-Sharifi is with the Department of Mechanical and Industrial Engi-
neering, Ryerson University, Toronto, ON M5B 2K3, Canada (e-mail: fsharifi@
ryerson.ca).
M. Marey is with IRISA/INRIA Rennes Bretagne Atlantique, Campus de
Beaulieu, Universitaire de Beaulieu, 35042 Rennes Cedex, France (e-mail:
[email protected]).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TRO.2010.2061290

1552-3098/$26.00 © 2010 IEEE


940 IEEE TRANSACTIONS ON ROBOTICS, VOL. 26, NO. 5, OCTOBER 2010

in many objects [12]. Solutions for three points [13], and more than phy matrix estimation, e.g., in [5] and [25] and, hence, face the issue of
three points [14] have already been presented. However, exact and degeneration of the epipolar geometry in some cases, thus leading to
closed-form solutions are only available for three or four noncollinear unstable estimation [4]. Despite some treatments [4], they remain sus-
points [15]. Such methods, although simple to implement, are often ex- ceptible to outliers. In addition, majority of them require several images
posed to difficulty in point matching in crowded environments. Besides, for reconstruction and, hence, are more appealing for postproduction
point-based solutions are not robust and demonstrate high susceptibility applications [26]. The assumption of known object model is not a ma-
to noise in image coordinates [16]. For three-point solutions, it has been jor issue in many industrial setups since computer-aided-design (CAD)
shown that the points configuration and noise in the points coordinates models of the objects are usually available. For uncertain environments
can drastically affect the output errors [13]. It has also been demon- with a poor (or unknown) model of the object, an EKF-based approach
strated that when the noise level exceeds a knee level or the number of for real-time estimation of combined target model and pose has been
points is below a knee level, least-squares-based methods, which are proposed in [27] and [28]. Therefore, this issue will not be the subject
commonly used for points solutions, become unstable leading to large of our focus. Second, while a KF provides optimal solution under the
errors [17]. Addition of more points would enhance pose-estimation assumption of zero-mean Gaussian noise for a linear problem, the EKF
robustness with the cost of increased computational expense. Nonlin- formulation may not provide optimal results. In fact, linearization can
ear, iterative, and/or recursive methods are then recommended for more generate unstable filters when the assumption of local linearity is not
than four points as well as high-level features. met [29]. In the previous work, it has been recommended to take a
The iterative approaches formulate the problem as a nonlinear least- sufficiently high sampling rate to enforce accuracy of the linearization
squares problem. Such solutions offer more accuracy and robustness, over the sampling period [10]. However, in practice, RVS-system band-
yet they are computationally more intensive than closed-form ap- width would limit the sampling rate for the filter. As it has been shown
proaches, and their accuracy depends on the quality of the initial pose in [30], an EKF-based system might easily diverge under fast and non-
estimates [18], [19]. The iterative methods usually rely on nonlinear linear trajectory dynamics, even with a relatively high sampling rate.
optimization techniques, such as the Gauss–Newton method [1]. To Third, statistics of the measurement and dynamic noise are assumed
reduce the problem complexity, approximate methods have also been to be known in advance and to remain constant. Poor measurement
proposed by simplifying the perspective camera model, e.g., relaxing and dynamic models or poor noise estimates would degrade the system
the orthogonality constraint on the rotation matrix [19], [20]. The sur- performance and might even lead to the filter divergence. In particular,
vey of both exact and approximate pose-estimation methods can be while the measurement noise-covariance matrix can be tuned through
found in the literature [15], [21]. In short, this class of methods ex- experiments, dynamic covariance matrix is difficult to tune [23]. This
hibits convergence problems and does not effectively account for the is because dynamics of the object motion with respect to the camera
orthonormal structure of rotation matrices [22]. Furthermore, with this cannot be accurately predicted in a dynamic environment. Fourth, the
class of techniques, noisy visual-servo images usually lead to poor convergence of EKF depends on the choice of initial state estimate and
individual pose estimates [23], thus requiring temporal filtering. tuning of filter parameters. In many RVS applications, such as assem-
A class of recursive methods relies on temporal-filtering methods, bly industry, initial pose of the object with respect to the camera can
and in particular, Kalman-filtering techniques, to address robustness be readily approximated. Yet, sufficiently good pose estimates cannot
and efficiency issues. Since a 3-D pose and its time rate constitute a be initially available in unstructured and uncertain environments. This
12-D state vector to be estimated in real time, many of these filter- paper will contribute by formulating an EKF method to address the last
ing methods, such as particle filters [24], can hardly model the true two aforementioned issues.
distribution in real time. A true 3-D pose estimation using Kalman Several methods have been proposed in the literature to deal with
filter (KF) for RVS has been realized in [10]. With KFs, photogram- varying statistics and poor filter initialization of EKF for RVS sys-
metric equations are formed by first mapping the object features into tems. An adaptive EKF (AEKF) with a fixed set of image features has
the camera frame and then projecting them onto the image plane. A been formulated for the first time in [30] to update the dynamic-noise-
KF is then applied to provide an implicit and recursive solution of the covariance matrix in order to address the issue of varying and/or uncer-
pose parameters. Since the filter output model for RVS is nonlinear in tain dynamic noise. The AEKF-based approach has later been extended
the system states, an extended KF (EKF) is usually applied, in which the in [31] to have a variable set of image features during the servoing for
output equations are linearized about the current state estimates. The improving servoing robustness. Despite the adaptation capability of
use of a KF in RVS is motivated by its several advantages, including its AEKF to unknown noise statistics, the presented AEKF methods do not
recursive implementation, capability to statistically combine redundant provide robust and accurate pose estimation in the presence of poor filter
information (such as features) or sensors, temporal filtering, possibility initialization and camera calibration, particularly when tracking a fast
of using lower number of features, and the possibility for changing and nonlinear trajectory is desired. This aspect will be investigated ex-
the measurement set without disrupting the operation [3], [10]. For in- perimentally in this paper. While tuning EKF noise-covariance matrices
stance, an EKF-based platform has been proposed in [11] to integrate were addressed in the aforementioned AEKF-based approaches [30],
range sensor with vision sensor for robust pose estimation in RVS. [31], tuning and initialization of other EKF parameters and mechanisms
Additionally, an EKF implementation facilitates dynamic windowing to enhance output linearization for RVS did not receive much attention.
of the features of interest by providing estimation of the next time-step To address tuning of other filter parameters and to facilitate its initial-
feature location. This allows only small window areas to be processed ization, an initial proposal for iterative EKF (IEKF) use in RVS has been
for image-parameter measurements and leads to a significant reduc- provided in [32]. As a matter of fact, Lefebvre et al. [33] have studied
tions in image-processing time. It has been shown that, in practice, an several modifications of KFs for general nonlinear systems. They have
EKF provides near-optimal estimation [10]. categorized all the different versions of KFs such as the central differ-
Despite its advantages, there are a few issues with the application of ence filter (CDF), unscented KF (UKF), and the divided difference filter
EKF to pose estimation in RVS. First, a known object model is usually (DD1) as linear regression KFs (LRKFs) and have compared them with
assumed to be available. Model-free approaches based on Euclidean EKF and IEKF [34]. They have concluded that EKF and IEKF generally
reconstruction have been proposed for CD estimation [4], [5]. These outperform LRKFs, yet they require a careful tuning. An interesting
approaches typically rely on fundamental, essential, and/or homogra- result of their study is that IEKF outperforms EKF, because it uses the
IEEE TRANSACTIONS ON ROBOTICS, VOL. 26, NO. 5, OCTOBER 2010 941

intrinsic parameters (PX , PY , F ), coordinates of optical axis on the


image plane (principal point) O i , radial and tangential distortion pa-
rameters, and aspect ratio are all determined from camera-calibration
tests [30]. The camera extrinsic parameters include the pose of the
camera with respect to the end-effector or the robot base frame, which
are calculated by inspection of camera housing and kinematic calibra-
tion [36]. Excellent solutions to the camera-calibration problem exist
in the literature [37].
Substituting (1) into (2) results in two nonlinear equations with six
unknown pose parameters of W . Therefore, at least three noncollinear
features are required for pose estimation (i.e., p = 3) [38]. However,
to obtain a unique solution, at least four features will be needed. It has
been shown that the inclusion of more than six features will not improve
the performance of EKF estimation significantly [23]. In addition, the
features need to be noncollinear and noncoplanar to provide good
Fig. 1. Projection of an object feature onto the image plane.
results. Therefore, in many RVS applications, 4 ≤ p ≤ 6.

measurements to linearize the measurement function, whereas in EKF III. EXTENDED KALMAN FILTER
and LRKFs, the measurement is not used for the same purpose. Despite
its advantages, lack of adaptive noise estimation mechanism would de- For pose estimation, the state vector of dynamic model is defined to
grade the performance of IEKF. In this paper, for the first time, an include pose and velocity parameters, i.e.,
iterative AEKF (IAEKF) for RVS is proposed to overcome limitations ! "T
of IEKF and AEKF. The presented work in this paper is continuation of x = X, Ẋ, Y, Ẏ , Z, Ż, φ, φ̇, α, α̇, ψ, ψ̇ . (3)
the previous works on EKF for RVS [3], [11], [27], [28], [30], [32]. This
The relative target velocity is usually assumed to be constant during
study contributes by detailed formulation of IAEKF and experimental
each sample period. This is a reasonably valid assumption for suf-
comparison of EKF, AEKF, IEKF, and IAEKF for RVS.
ficiently small sample periods in RVS systems. A discrete dynamic
model will be then
II. FEATURE-POINT TRANSFORMATION
xk = Axk −1 + γ k (4)
The commonly used perspective projection model of the camera
is shown in Fig. 1. Image frame is located at F (i.e., effective focal  A being a block diagonal matrix with 2 × 2 blocks of the form
with
1T
length) along the Z C -axis with its X i - and Y i -axes parallel to the , T being the sample period, k being the sample step, and γ k
0 1
X C - and Y C -axes of the camera frame, respectively. In this study,
being the disturbance noise vector described by a zero-mean Gaussian
similar to many iterative methods, point features will be used for pose
distribution with covariance Qk , i.e.,
estimation. Let the relative pose of the object to the camera (or end-
effector) frame be W = (T , Θ)T , where T = [X, Y, Z]T denotes the
! "
E[γ i ] = q i , E (γ i − q i )(γ j − q j )T = Qi δ i j (5)
relative position vector of the object frame with respect to the camera
frame, and Θ = [φ, α, ψ]T is the relative orientation vector with roll, where q i and Qi are true mean and true moments about the mean of
pitch, and yaw parameters, respectively. Let P Cj = (XjC , YjC , ZjC )T state noise sequences, respectively, and δ is the Kronecker delta. The
and P oj = (Xjo , Yjo , Zjo )T represent the coordinate vectors of the jth output model will be based on the projection model given by (1) and
object feature point in the camera and object frames, respectively (see (2) and defines the image-feature locations in terms of the state vector
Fig. 1). The vector of P oj is available from the CAD model of the object xk as follows:
or measurements and can be described in the camera frame using the z k = G(xk ) + ν k (6)
following transformation:
with measurements for p feature points
P Cj =T + R(φ, α, ψ)P oj (1)
z k = [xi1 , y1i , xi2 , y2i , . . . , xip , ypi ]Tk (7)
where the rotation matrix is given in [3] and [10]. For control error
calculations, the Euler angles can be approximately related to the total and
angles in a PBVS structure using a transition matrix [10]. The coordi-  T
X1C Y1C XpC YpC
nates of the projection of a feature point on the image plane using a G(xk ) = F , , · · · , , . (8)
PX Z1C PY Z1C PX ZpC PY ZpC
pin-hole camera model will be xij and yji given by (see Fig. 1)
 T Here, XjC , YjC , and ZjC are given by (1), and ν k denotes the image-
F XjC YjC
[ xij yji ]T = C (2) parameter measurement noise that is assumed to be described by a
Zj PX PY zero-mean Gaussian distribution with covariance Rk , i.e.,
! "
where PX and PY are interpixel spacing in X i - and Y i -axes of the E[ν i ] = r i , E (ν i − r i )(ν i − r i )T = R i δi j (9)
image plane, respectively. This model assumes that the origin of the
image coordinates is located at the principal point, and |ZjC | F. where r i and Ri are true mean and true moments about the mean
For short focal lengths, lens distortion can have a drastic effect on the of measurement-noise sequences, respectively. Since (6) is nonlinear,
feature-point locations. For details of distortion model and its relation to an optimal solution cannot be obtained through a KF implementation.
projection model, see [30] and [35]. The perspective projection model Instead, an extension of KF (i.e., EKF) can be formulated by linearizing
requires both intrinsic and extrinsic camera parameters. The camera the output equation about the current state.
942 IEEE TRANSACTIONS ON ROBOTICS, VOL. 26, NO. 5, OCTOBER 2010

Let xk be the state at step k, x̂k , k −1 denote the a priori state es- 1
$ T
Rik = Rk −1 + r̂ ik − r̄ ik r̂ ik − r̄ ik
timate at step k given the knowledge of the process or measurement N −1
at the end of step k − 1, and let x̂k , k be the a posteriori state es- T
timate at step k given measurement z k . Then, a priori and a poste- − r̂ k −N − r̄ ik r̂ k −N − r̄ ik
riori estimate errors, and their corresponding covariances are defined 1 T N −1
%
as ek = xk − r̂ ik − r̂ k −N r̂ ik − r̂ k −N (Γk −N − Γik )
" E[ek ek ], ek , k −1 = xk − x̂k , k −1 , and + +
T
! x̂k , k , PT k , k = N N
P k , k −1 = E ek , k −1 ek , k −1 , respectively. It is well known that the
recursive EKF algorithm consists of two major parts of prediction and (20)
estimation as follows. T
$ T
%−1
K ik = P k , k −1 H ik Rik + H ik P k , k −1 H ik (21)
Prediction:
x̂ik+ 1 = x̂ik + K ik (z k − G(x̂ik ). (22)
x̂k , k −1 = Ax̂k −1 , k −1 (10)
At the end of iterations, the iteration output is propagated as follows:
P k , k −1 = AP k −1 , k −1 AT + Qk −1 . (11)
x̂k , k = x̂m
k , r̄ k = r̄ m
k , Rk = Rm
k ,
Linearization:
∂G(x) ## Γk = Γm
k , Kk = Km
k (23)
Hk = x= x̂ k , k −1 . (12)
∂x
and a posteriori error-covariance estimate is updated according to (15).
Kalman gain update: Here, a window of past measurements of size N is selected for adapta-
tion of Rk . The observation noise sample r̂ j is assumed to be represen-
K k = P k , k −1 H Tk (Rk + H k P k , k −1 H Tk )−1 . (13)
tative of ν j , and for j = k − N → k to be independent and identically
Estimation updates: distributed.
Finally, the state noise statistics are estimated adaptively as follows:
x̂k , k = x̂k , k −1 + K k (z k − G(x̂k , k −1 ) (14)
q̂ j = x̂j, j −1 − Ax̂j −1 , j −1 (24)
P k , k = P k , k −1 − K k H k P k , k −1 . (15)
and for j = k − N → k, it is assumed to be independent and identically
Here, K k is the Kalman gain matrix at step k. The measurement-
distributed. In addition, let
and process-noise covariances Qk and Rk are usually assumed to
be constant during servoing and obtained through tuning [10]. While ∆k ≡ AP k −1 , k −1 AT − P k , k . (25)
Rk can be determined through the experiments [23], matrix Qk is
difficult to determine a priori due to unknown object’s and/or camera’s Then, the process-noise-covariance matrix will be updated according
motions. In general, the aim of adaptive filtering in RVS is to estimate to
not only the state but the time-varying statistical parameters given by
1
Υi = {r i , Ri , q i , Qi } as well. An AEKF has been introduced in [30] q̄ k = q̄ k −1 + (q̂ − q̂ k −N ) (26)
N k
and [31] to estimate Rk and Qk in real time. The adaptation capability $
of AEKF with poor initialization of noise-covariance matrices has been 1
Qk = Qk −1 + (q̂ k − q̄ k )(q̂ k − q̄ k )T
demonstrated in our previous work [30]. However, the results also N −1
showed that in quicker changes of the pose, the error of AKEF will − (q̂ k −N − q̄ k )(q̂ k −N − q̄ k )T
increase. This is mainly due to the time required by AEKF to react to
1 N −1
%
such a sudden change. Linearization approximation in (12) cannot be + (q̂ k − q̂ k −N )(q̂ k − q̂ k −N )T + (∆k −N − ∆k )
treated by AEKF properly and is another source of errors, especially N N
in tracking the trajectories with faster and higher dynamics. Besides, (27)
the linearization approximation errors would lead to high sensitivity
to poor initialization and camera-calibration error. An IEKF has been followed by predictions stage, which is represented by (10) and (11).
proposed in our previous work [32] to alleviate this issue. However, it must be noted that the above algorithm is computationally
In the next section, adaptive and iterative mechanisms are combined intensive when compared with EKF, IEKF, and AEKF. In order to
to address the aforementioned issues simultaneously and to establish a improve computing time, adaptation steps are performed outside the
robust framework for pose estimation in RVS. iterations. After initialization and prediction steps, the first limited filter
algorithm [30] to estimate the measurement-noise statistics is applied
IV. ITERATIVE ADAPTIVE EXTENDED KALMAN FILTER to find Rk before the iterations (using (16)–(20) without index i).
Next, the iteration is established for m cycles to obtain Kalman gain
The proposed approach combines advantages of AEKF and IEKF. and estimation updates according to (16), (21), and (22). State noise
After initialization and prediction stages, iteration is started for m statistics are estimated outside the iterations, according to (24)–(27).
iterations by first setting x̂0k = x̂k , k −1 , i.e., for i = 0, and then To ensure positive definiteness of Rk and Qk , the diagonal elements
# of covariance estimators are reset to their absolute values. In addition, a
∂G(x) #
H ik = #x= x̂ ik (16) fading-memory approach is applied to give low weights to initial (i.e.,
∂x
less reliable) samples of length and growing weight to successive noise
r̂ ik ≡ z k − G(x̂ik ) (17) samples as follows [30]:
T
Γik ≡ H ik P k , k −1 H ik (18)  k = (k − 1)(k − 2) · · · (k − η)/k η , if k ≥ η (28)
1
r̄ ik = r̄ k −1 + (r̂ ik − r̂ k −N ) (19) with the property of limk →∞  k = 1.
N
IEEE TRANSACTIONS ON ROBOTICS, VOL. 26, NO. 5, OCTOBER 2010 943

TABLE I
COMPUTATIONAL COST FOR POSE ESTIMATION WITH
p = 5, N = 10, m = 10(20, 30)

In our experiments with IEKF [32], the optimal number of iterations


were found to be m = 30, where m = 20 also provided reasonably
good results. It should also be noted that for computational efficiency, Fig. 2. (a) Experimental system for set 3 consisting of AFMA-6 manipulator
a fixed number of iterations is not necessary for pose-estimation tasks. and target object. (b) Image of the target object with its coordinate frame and
features used.
The iteration can be stopped if the iterated state estimate is close to
the previous value, i.e., (K ik (z k − G(x̂ik ))T (K ik (z k − G(x̂ik )) < τ ,
where τ is a threshold that could be found from experiments.
Table I shows the computational costs for different EKF-based al-
gorithms used for pose estimation. For flops calculation flop option of
MATLAB 5.0 has been used. The CPU times were obtained using a P4
1.7-GHz PC with 256-MB RAM. Given the current technology of PCs,
the increased computational costs with IEKF and IAEKF do not nec-
essarily imply major disadvantages and a bottleneck as the total time
of the filter computations with moderate number of iterations is much
less than the time required for feature selection and image processing
in RVS.

V. EXPERIMENTAL RESULTS
Extensive simulations and experiments were conducted to inves-
tigate and compare the performance of various Kalman-filtering ap-
proaches for pose estimation.
The default filter parameters were as follows: R0 is a diagonal matrix
with diagonal elements of 0.01 (in pixels square) measured through the
Fig. 3. Dynamic performance of pose estimation by EKF and forward kine-
experiments, P 0 , 0 is a block diagonal matrix with 2 × 2 blocks of the matics estimators (experiment 1).
form diag[0.02 0.01] (in meters square, in (meters per second) square,
in degrees, in (degrees per second) square), N = 20, m = 30, η = 5,
and p = 5. different trajectories with various dynamics were designed. A good
To evaluate the accuracy of estimation, the results of estimations estimation power of tuned EKF in relatively slow motion has already
were compared with the relative pose calculations through the robot been shown [3], [10]. Therefore, EKF formed the comparison base.
forward kinematics using the joint encoders. Another measure of accu- The purpose of experiment 1 was to compare the performance of vari-
racy was the inspection of the Kalman-estimate output errors that are ous KF-based methods under an accurately calibrated robot framework.
the errors between the true image-feature locations and those obtained The maximum-velocity components of the endpoint trajectory were set
from KF estimates, i.e., filter residues: z k − G(x̂k , k −1 ). to 50 mm/s and 5◦ /s, for translational and rotational coordinates, re-
A 6-degree-of-freedom (DOF) Cartesian manipulator, i.e., AFMA- spectively, to generate a moderate motion dynamics. A null-state noise-
6, with an endpoint mounted AVT-MARLIN F-033C CCD camera covariance matrix was initially introduced to simulate the case of poorly
(at IRISA-INRIA, Rennes) and a target object shown in Fig. 2 were tuned KF-based estimators for variety of trajectories. The endpoint rela-
used. The robot was calibrated and operated under Linux with visual- tive trajectory was designed to incorporate sudden-velocity changes and
servoing software, i.e., VISP [39]. The camera images were sent at significant nonlinearities. The purpose was to investigate the adaptation
50 fps (frames/s) to the host PC with Intel Core 2–2.93 GHz running capability of AEKF and IAEKF to deviations from constant-velocity
under Linux on which frame grabbers had been installed. The images assumption of KF process model, and to evaluate the iterative per-
had the size of 128 × 182 pixels and had an effective focal length of F = formance of IEKF and IAEKF in approximating the output-model lin-
12.5 mm. The image processing and control computations were carried earization. The inspection of the results (see Figs. 3–6 and Tables II and
out on the host, and then, the control output was transmitted to the robot III) indicates that estimation accuracy of all algorithms is better in X,
controller via a PCI-VME bus-adapter board. About 10 ms was required Y , and roll than in depth parameters Z, pitch, and yaw. The results also
for control action. The camera parameters, namely image center and show that sudden changes in the velocity lead to divergence of EKF (see
interpixel spacings, were obtained from the calibration program. Initial Fig. 3). This is due to the assumption of constant velocity in the state
estimate of the pose was obtained using DeMenthon’s method [19]. model. However, both AEKF and IAEKF were able to adapt to velocity
The sampling period was T = 0.06325 s. changes (see Figs. 4 and 5). Fig. 5 shows that, although IEKF perfor-
The robot was commanded to travel through a predefined trajectory mance is superior to that of EKF, lack of a noise-adaptation mechanism
over a stationary object. The maximum velocity of the AFMA-6 end- in IEKF leads to significant errors and divergence toward the end of the
point was set through VISP. Therefore, for a given set of nodal points, relative pose trajectory. Table II shows pose-estimate-error statistics
944 IEEE TRANSACTIONS ON ROBOTICS, VOL. 26, NO. 5, OCTOBER 2010

TABLE II
POSE ERROR STATISTICS FOR DIFFERENT KF-BASED ESTIMATORS WHEN
COMPARED WITH KINEMATIC ESTIMATOR (EXPERIMENT 1)

TABLE III
IMAGE-PLANE-ERROR VARIANCE FOR DIFFERENT KF-BASED ESTIMATORS IN
Fig. 4. Dynamic performance of pose estimation by AEKF and forward kine- PIXELS SQUARE (EXPERIMENT 1)
matics estimators (experiment 1).

when different KF-based estimates are compared with kinematic esti-


mates. Table III lists Kalman-estimate-output-error variances for five
image-feature locations used in different KF-based methods. High lev-
els of error can be observed for EKF estimates; however, both AEKF
and IAEKF show good and comparable levels of accuracy, with IAEKF
indicating slightly advantageous performance. Tracking accuracies of
IAEKF for X and Y were approximately within ±1.3 and ±1 mm,
respectively, and those for Z, roll, yaw, and pitch were within ±4 mm,
Fig. 5. Dynamic performance of pose estimation by IEKF and forward kine- ±0.3◦ , ±0.5◦ , and ±0.3◦ , respectively.
matics estimators (experiment 1).
In experiment 2, the same condition, as in experiment 1, was used
except that the magnitude of the maximum velocity of the robot end-
point was increased first ten times and next 27 times of the one used in
the previous experiment, thereby resulting in experiments 2a and 2b,
respectively (see Figs. 7 and 8). The resulting kinematic trajectories
were almost the same as the trajectories in experiment 1, except that
they were completed in shorter times. Consequently, faster dynamics
and increased nonlinearities per sampling period would be expected.
In both scenarios, EKF remained divergent with further degraded per-
formance. The performance of AEKF was also degraded with the in-
creased velocity. For instance, the mean errors of the AEKF estimator
in experiment 2b along X- and Y -directions were approximately three
and ten times more than those in experiment 1. Similarly, the standard
deviation in the same directions increased 14 and 24 times in exper-
iment 2b compared with experiment 1. This can be explained by the
AEKF lag and its disability to keep up a good approximate for output
linearization under faster and added nonlinear dynamics per sampling
period. However, the performances of IEKF and IAEKF remained
comparable with their performance in experiment 1.
In experiment 3, the same condition as experiment 1 was applied,
Fig. 6. Dynamic performance of pose estimation by IAEKF and forward
kinematics estimators (experiment 1). but instead of null covariance matrices, tuned covariance matrices were
used. The noise-covariance matrices were approximated using offline
IEEE TRANSACTIONS ON ROBOTICS, VOL. 26, NO. 5, OCTOBER 2010 945

TABLE IV
POSE-ERROR STATISTICS FOR DIFFERENT KF-BASED ESTIMATORS WHEN
COMPARED WITH KINEMATIC ESTIMATOR (EXPERIMENT 3)

TABLE V
Fig. 7. Dynamic performance of pose estimation by AEKF and forward kine- IMAGE-PLANE-ERROR VARIANCE FOR DIFFERENT KF-BASED ESTIMATORS IN
PIXELS SQUARE (EXPERIMENT 3)
matics estimators (experiment 2a).

(see Tables IV and V). Interestingly, while the performance of EKF,


AEKF, and IEKF improves with tuning of the noise-covariance matri-
ces, the results of IAEKF remains approximately the same as those in
experiment 1. This result again highlights the robustness of IAEKF to
tuning errors of the measurement-noise-covariance matrices. The re-
sults were also obtained for various covariance matrices by varying q
and r values according to q ∈ {103 , 10, 10−1 , 10−3 , 10−5 , 10−2 0 }, and
Fig. 8. Dynamic performance of pose estimation by AEKF and forward kine- r ∈ {0.05, 0.1, 1, 10, 100, 1000}. The results again confirmed robust-
matics estimators (experiment 2b).
ness of IAEKF to changes in Q and R matrices. The results of IEKF
were also acceptable. However, AEKF, and particularly EKF, demon-
tuning with q = 10−5 (in (meters per second) square, in (degrees per strated high levels of sensitivity, as observed in [32]. For instance,
second) square) for mean-error values for IAEKF remained within ±10% of the values
obtained with a null-process-noise-covariance matrix (see Table I).
Q = diag[0, q, 0, q, 0, q, 0, q, 0, q, 0, q] (29) In experiment 4, the sensitivity of KF-based estimators to the sam-
pling rate was compared under dynamic conditions. The same con-
and r = 0.01 pixel2 for
dition as the previous experiment with the tuned measurement-noise-
R = diag[r, r, r, r, r, r, r, r, r, r]. (30) covariance matrices (with q = 10−5 in (29), and r = 10−2 in (30))
was applied, but the sampling time was changed to T = 0.020 s from
The results are summarized in Tables IV and V. As it would be the default value of 0.06325 s. Results consistent with the previous
expected, carefully tuned covariance matrices in relatively moderate experiments [32] were obtained. The results for all estimators were
dynamic conditions enabled EKF to remain convergent. Compared improved with a higher sampling rate. However, IAEKF results were
with oscillating and divergent behavior of EKF in experiment 1, signif- not significantly different than the results reported in Tables II and IV.
icant improvement was gained with tuned covariance matrices. AEKF For instance, mean-error values for IAEKF remained within ±15% of
also provides good results, which are superior to the EKF results in those reported in Table II. The results for IEKF were also relatively
terms of mean error, standard deviation, and maximum error. The consistent, e.g., mean-error values remained within ±20% of the values
image-plane-error variance for EKF is not significantly different than reported in Table II. However, changes in EKF and AEKF results were
that for AEKF (see Table V). Again, IAEKF provides the best results more significant and, often, an order of magnitude different than those
in terms of all comparison categories. It is also noted that with fine in Table II.
tuning of noise-covariance matrices, IEKF performance approaches In experiment 5, the sensitivity of estimators to errors in initial poses
that of IAEKF, as IEKF gives very similar results to those of IAEKF was investigated by changing the initial positions 100, 200, 300, and
946 IEEE TRANSACTIONS ON ROBOTICS, VOL. 26, NO. 5, OCTOBER 2010

400 mm in all position coordinates. While IEKF and IAEKF provided [13] R. M. Haralick, C. Lee, K. Ottenberg, and M. Nolle, “Review and analysis
acceptable results upto 200 mm deviation from the initial position, of solutions of the three point perspective pose estimation,” Int. J. Comput.
other methods failed at 100 mm deviation. The mean-error values for Vis., vol. 12, no. 3, pp. 331–356, 1994.
[14] O. Faugeras, Three-Dimensional Computer Vision. Cambridge, MA:
IAEKF and IEKF remained within +10% of those reported in Table II MIT Press, 1993.
(i.e., almost-perfect pose initialization). [15] D. DeMenthon and L. S. Davis, “Exact and approximate solutions of the
perspective-three point problem,” IEEE Trans. Pattern Analys. Mach.
Intell., vol. 14, no. 11, pp. 1100–1105, Nov. 1992.
VI. CONCLUSION
[16] X. Wang and G. Xu, “Camera parameters estimation and evaluation in
Different KF-based methods of pose estimation have been discussed. active vision systems,” Pattern Recognit., vol. 29, no. 3, pp. 439–447,
1996.
A new pose-estimation method, namely the IAEKF algorithm, has also
[17] R. M. Haralick, H. Joo, C. Lee, X. Zhang, V. Vaidya, and M. Kim, “Pose
been introduced. All methods have been compared for their perfor- estimation from corresponding point data,” IEEE Trans. Syst., Man, Cy-
mance under different experimental conditions. It has been shown that bern., vol. 19, no. 6, pp. 1426–1446, Nov./Dec. 1989.
mechanisms of noise adaptation and iterative-measurement lineariza- [18] Q. Ji, M. S. Costa, R. M. Haralick, and L. G. Shapiro, “An inte-
tion can be integrated within a novel IAEKF algorithm to obtain a grated linear technique for pose estimation from different geometric fea-
tures,” Int. J. Pattern Recognit. Artif. Intell., vol. 13, no. 5, pp. 705–733,
superior performance in comparison with other KF-based methods. In 1999.
particular, robustness of IAKEF has been established through exper- [19] D. DeMenthon and L. S. Davis, “Model-based object pose in 25 lines of
iments, and it has been demonstrated that IEAKF can improve pose- code,” in Proc. Eur. Conf. Comput. Vis., Santa Margherita Ligure, Italy,
estimation performance in the presence of erroneous a priori statistics, 1992, pp. 335–343.
[20] R. K. Lenz and R.Y. Tsai, “Techniques for calibration of the scale fac-
nonlinear and fast-tracking trajectories and measurement function, slow
tor and image center for high accuracy 3D machine vision metrology,”
sampling rates, and erroneous pose initialization. The improvements IEEE Trans. Pattern Anal. Mach. Intell., vol. 10, no. 5, pp. 713–720, Sep.
have been obtained at an additional computational cost, which are, in 1988.
general, modest given the current PC technology and when compared [21] R. Kumar and A. R. Hanson, “Robust methods for estimating pose and a
with feature selection and image-processing time in RVS. sensitivity analysis,” CVGIP: Image Understanding, vol. 60, pp. 313–342,
1994.
[22] C.-P. Lu, G. D. Hager, and E. Mjolsness, “Fast and globally convergent
ACKNOWLEDGMENT pose estimation from video images,” IEEE Trans. Pattern Anal. Mach.
Intell., vol. 22, no. 6, pp. 610–622, Jun. 2000.
The authors would like to thank the Lagadic staff, particularly, [23] J. Wang and W. J. Wilson, “3D relative position and orientation estimation
F. Chaumette and F. Spindler, during his visit to INRIA-IRISA for using Kalman filtering for robot control,” in Proc. IEEE Int. Conf. Robot.
useful discussions and their assistance with the experiments. The au- Autom., Nice, France, 1992, pp. 2638–2645.
thors also acknowledge the assistance of A. Vakanski in simulations [24] J. Carpenter, P. Clifford, and P. Fearnhead, “Improved particle filter for
nonlinear problems,” Inst. Electr. Eng. Proc. Radar Sonar Navig., vol. 146,
and efficiency calculations. no. 1, pp. 2–7, 1999.
[25] Q.-T. Luong and O. Faugeras, “The fundamental matrix: Theory, algo-
REFERENCES rithms, and stability analysis,” Int. J. Comput. Vis., vol. 17, no. 1, pp. 43–
75, 1996.
[1] D. G. Lowe, “Three-dimensional object recognition from single two- [26] É. Marchand and F. Chaumette, “Virtual visual servoing: A framework for
dimensional images,” Artif. Intell, vol. 31, pp. 355–395, 1987. real-time augmented reality,” EUROGRAPHICS, vol. 21, no. 3, pp. 289–
[2] A. Mittal, L. Zhao, and L.S. Davis, “Human body pose estimation using 298, 2002.
silhouette shape analysis,” in Proc. IEEE Conf. Adv. Video Signal Based [27] L. Deng, W. J. Wilson, and F. Janabi-Sharifi, “Combined target model
Surveill., Jul. 2003, pp. 263–270. estimation and position-based visual servoing,” in Proc. IEEE/RSJ
[3] F. Janabi-Sharifi, “Visual servoing: Theory and applications,” in Opto- Int. Conf. Intell. Robot. Syst., Sendai, Japan, Oct. 2004, pp. 1395–
Mechatronic Systems Handbook, H. Cho, Ed. Boca Raton, FL: CRC, 1400.
2002, pp. 15-1–15-24. [28] L. Deng, W. J. Wilson, and F. Janabi-Sharifi, “Decoupled EKF for simul-
[4] E. Malis and F. Chaumette, “2 1/2 D visual servoing with respect to un- taneous target model and relative pose estimation using feature points,”
known objects through a new estimation scheme of camera displacement,” in Proc. IEEE Int. Conf. Control Appl., Toronto, ON, Canada, Aug. 2005,
Int. J. Comput. Vision, vol. 37, no. 1, pp. 79–97, 2000. pp. 749–754.
[5] G. Chesi and K. Hashimoto, “A simple technique for improving camera [29] S. J. Julier and J. K. Uhlmann, “Unscented filtering and nonlinear estima-
displacement estimation in eye-in-hand visual servoing,” IEEE Trans. tion,” Proc. IEEE, vol. 92, no. 3, pp. 401–422, Mar. 2004.
Pattern Anal. Mach. Intell., vol. 26, no. 9, pp. 1239–1242, Sep. 2004. [30] M. Ficocelli and F. Janabi-Sharifi, “Adaptive filtering for pose estimation
[6] F. Chaumette and S. Hutchinson, “Visual servo control, Part I: Basic in visual servoing,” in Proc. IEEE/RSJ Int. Conf. Intel. Robot. Syst., Maui,
approaches,” IEEE Robot. Autom. Mag., vol. 13, no. 4, pp. 82–90, Dec. HI, 2001, pp. 19–24.
2006. [31] V. Lippiello, B. Siciliano, and L. Villani, “Adaptive extended Kalman
[7] F. Chaumette and S. Hutchinson, “Visual servo control, Part II: Advanced filtering for visual motion estimation of 3D objects,” Control Eng Pract.,
approaches,” IEEE Robot. Autom. Mag., vol. 14, no. 1, pp. 109–118, Mar. vol. 15, pp. 123–134, 2007.
2007. [32] A. Shademan and F. Janabi-Sharifi, “Sensitivity analysis of EKF and
[8] J. Feddema and O. R. Mitchell, “Vision-guided servoing with feature- Iterated EKF for position-based visual servoing,” in Proc. IEEE Int. Conf.
based trajectory generation,” IEEE Trans. Robot. Autom., vol. 5, no. 5, Control Appl., Toronto, ON, Canada, Aug.2005, pp. 755–760.
pp. 691–700, Oct. 1989. [33] T. Lefebvre, H. Bruyninckx, and J. De Schutter, “Kalman filters for non-
[9] E. Malis, F. Chaumette, and S. Boudet, “2-1/2 D visual servoing,” IEEE linear systems: A comparison of performance,” Intl. J. Control, vol. 77,
Trans. Robot. Autom., vol. 15, no. 2, pp. 234–246, Apr. 1999. no. 7, pp. 639–653, 2004.
[10] W. J. Wilson, C. W. Hulls, and G. S. Bell, “Relative end-effector control [34] Y. Bar-Shalom and X. R. Li, Estimation and Tracking: Principles, Tech-
using Cartesian position based visual servoing,” IEEE Trans. Robot. niques, and Software. Boston, MA: Artech House, 1993.
Autom., vol. 12, no. 5, pp. 684–696, Oct. 1996. [35] M. Ficocelli, “Camera calibration: Intrinsic parameters,” Robot. Manuf.
[11] W. J. Wilson, C. W. Hulls, and F. Janabi-Sharifi, “Robust image processing Autom. Lab., Ryerson Univ., Toronto, ON, Canada, Tech. Rep. TR-1999-
and position-based visual servoing,” in Robust Vision for Vision-Based 12-17-01, 1999.
Control of Motion, M. Vincze and G. D. Hager, Eds. New York: IEEE [36] P. I. Corke, Visual Control of Robots: High Performance Visual Servoing.
Press, 2000, pp. 163–201. Somerset, U.K.: Res. Studies, 1999.
[12] F. Janabi-Sharifi and W. J. Wilson, “Automatic selection of image features [37] R. Tsai and R. Lenz, “A new technique for fully autonomous and efficient
for visual servoing,” IEEE Trans. Robot. Autom., vol. 13, no. 6, pp. 890– 3D robotic hand/eye calibration,” IEEE Trans. Robot. Autom., vol. 5,
903, Dec. 1997. no. 3, pp. 345–358, Jun. 1989.
IEEE TRANSACTIONS ON ROBOTICS, VOL. 26, NO. 5, OCTOBER 2010 947

[38] J. S.-C. Yuan, “A general photogrammetric method for determining object


position and orientation,” IEEE Trans. Robot. Autom., vol. 5, no. 2,
pp. 129–142, Apr. 1989.
[39] É. Marchand, F. Spindler, and F. Chaumette, “VISP for visual servoing: A
generic software platform with a wide class of robot control skills,” IEEE
Robot. Autom. Mag., vol. 12, no. 4, pp. 40–52, Dec. 2005.

Autonomous Behavior-Based Switched Top-Down and


Bottom-Up Visual Attention for Mobile Robots
Fig. 1. ACE robot.
Tingting Xu, Student Member, IEEE, Kolja Kühnlenz, Member, IEEE,
and Martin Buss, Member, IEEE
the limited processing capability of most technical systems, espe-
cially autonomous mobile robots, a biologically plausible and tech-
Abstract—In this paper, autonomous switching between two basic atten-
tion selection mechanisms, i.e., top-down and bottom-up, is proposed. This nically applicable visual attention system is to be developed in or-
approach fills a gap in object search using conventional top-down biased der to fill the gap between the fundamental studies and the robotics
bottom-up attention selection, which fails, if a group of objects is searched research.
whose appearances cannot be uniquely described by low-level features used Normally, when operating in the real world, a robot has a task such
in bottom-up computational models. Three internal robot states, such as
observing, operating, and exploring, are included to determine the visual as detecting and manipulating a target object. For a mobile robot, a
selection behavior. A vision-guided mobile robot equipped with an active typical task is to find a target and move toward it. In a simple scenario
stereo camera is used to demonstrate our strategy and evaluate the per- with unique target objects, a conventional top-down biased bottom-up
formance experimentally. This approach facilitates adaptations of visual strategy can help a lot in terms of efficiency [1]. However, it fails if
behavior to different internal robot states and benefits further develop-
a group of objects is searched whose appearances cannot be uniquely
ment toward cognitive visual perception in the robotics domain.
described by low-level features used in a primary bottom-up computa-
Index Terms—Vision-guided robotics, visual attention control. tion model. For example, different traffic signs are all salient in color
but different in geometry and have different patterns on them. They
are, therefore, not distinguishable from each other and only rely on
low-level features used in bottom-up attention selection. An exhaus-
I. INTRODUCTION
tive search is still needed. To lower the computational cost, a search
To achieve efficient processing of visual information about the en- window is usually defined for exhaustive search as the robot FOA, in
vironment, humans select their focus of attention (FOA), such that which the exhaustive search is conducted.
the most interesting regions will be processed first in detail. Stud- A search window based on bottom-up attention can predict image
ies about human visual perception show that visual attention selec- regions with higher probability to contain a target object, while a search
tion is affected by two distinct mechanisms: top-down and bottom- window based on top-down attention is efficient for task accomplish-
up. Top-down signals are derived from the task specification or the ment. Both bottom-up attention and top-down attention are essential
previous knowledge and highlight the task-relevant information. It for robot-attention control. On the one hand, if a task-relevant object is
is goal-directed and essential for task accomplishment. In contrast, not located in the robot field of view (FOV), pure top-down attention
bottom-up attention is driven by distinct stimuli based on primary selection can also use position data in the 3-D task space to direct
visual features. Interaction and coordination of both enable gaze- robot attention toward the target, while bottom-up or top-down biased
fixation-point selection and guide the visual behavior. To deal with bottom-up attention selection only relies on the 2-D image data. On
the other hand, if there is no task-relevant information in the FOV
Manuscript received January 25, 2010; revised May 31, 2010; accepted at all, pure bottom-up attention can guide the robot attention to ex-
July 25, 2010. Date of publication August 26, 2010; date of current version plore the environment in a flexible way. In this paper, autonomous
September 27, 2010. This paper was recommended for publication by Asso- switching between top-down and bottom-up attention mechanisms is
ciate Editor T. Kanda and Editor G. Oriolo upon evaluation of the reviewers’
comments. This work was supported in part by the German Research Foun- proposed, which enables autonomy of robots in terms of adaptations
dation (DFG) Excellence Initiative Research Cluster Cognition for Technical of visual behavior to different internal robot states and which fills the
Systems (CoTeSys) (www.cotesys.org) and in part by the Institute for Advanced gap for object search not solvable using conventional combination of
Study, Technische Universität München (www.tum-ias.de). them. A vision-guided mobile robot, which is the Autonomous City
T. Xu and M. Buss are with the Insititute of Automatic Control Engineering,
Explorer (ACE) [2] developed at our institute (see Fig. 1), is used
Technische Universität München, Munich 80290, Germany (e-mail: tingting.
[email protected]; [email protected]). to demonstrate our strategy and evaluate the performance experimen-
K. Kühnlenz is with the Insititue of Automatic Control Engineering, Technis- tally. It is equipped with an activevision system, which consists of a
che Universität München, Munich 80290, Germany, and also with the Institute Bumblebee XB3 stereo camera from Point Grey Research, Inc., and a
for Advanced Study, Technische Universität München, Munich 80333 Germany high-performance pan-tilt platform [3].
(e-mail: [email protected]).
Color versions of one or more of the figures in this paper are available online This paper is organized as follows: In Section II, related works
at http://ieeexplore.ieee.org. about combination of top-down and bottom-up attention selections
Digital Object Identifier 10.1109/TRO.2010.2062571 are introduced. In Section III, the proposed autonomous switching

1552-3098/$26.00 © 2010 IEEE

View publication stats

You might also like