0% found this document useful (0 votes)
172 views

Visual Localisation of A Robot With An External RGBD Sensor

Best paper at ACRA 2011 This paper presents a novel approach to visual localisation that uses a camera on the robot coupled wirelessly to an external RGB-D sensor. Unlike systems where an external sensor observes the robot, our approach merely assumes the robots camera and external sensor share a portion of their field of view. Experiments were performed using a Microsoft Kinect as the external sensor and a small mobile robot. The robot carries a smartphone, which acts as its camera, sensor processor, control platform and wireless link. Computational effort is distributed between the smartphone and a host PC connected to the Kinect. Experimental results show that the approach is accurate and robust in dynamic environments with substantial object movement and occlusions.

Uploaded by

wai_li_8
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
172 views

Visual Localisation of A Robot With An External RGBD Sensor

Best paper at ACRA 2011 This paper presents a novel approach to visual localisation that uses a camera on the robot coupled wirelessly to an external RGB-D sensor. Unlike systems where an external sensor observes the robot, our approach merely assumes the robots camera and external sensor share a portion of their field of view. Experiments were performed using a Microsoft Kinect as the external sensor and a small mobile robot. The robot carries a smartphone, which acts as its camera, sensor processor, control platform and wireless link. Computational effort is distributed between the smartphone and a host PC connected to the Kinect. Experimental results show that the approach is accurate and robust in dynamic environments with substantial object movement and occlusions.

Uploaded by

wai_li_8
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Visual Localisation of a Robot with an external RGBD Sensor

Winston Yii, Nalika Damayanthi, Tom Drummond, Wai Ho Li


Monash University, Australia
{winston.yii, nalika.dona, tom.drummond, wai.ho.li}@monash.edu
Abstract
This paper presents a novel approach to visual
localisation that uses a camera on the robot cou-
pled wirelessly to an external RGB-D sensor. Un-
like systems where an external sensor observes
the robot, our approach merely assumes the robots
camera and external sensor share a portion of their
eld of view. Experiments were performed us-
ing a Microsoft Kinect as the external sensor and
a small mobile robot. The robot carries a smart-
phone, which acts as its camera, sensor processor,
control platform and wireless link. Computational
effort is distributed between the smartphone and a
host PC connected to the Kinect. Experimental re-
sults show that the approach is accurate and robust
in dynamic environments with substantial object
movement and occlusions.
1 Introduction
Localisation is an important problem in robotics, especially
for mobile robots working within domestic and ofce envi-
ronments [Cox et al., 1990]. This paper presents a novel ap-
proach to robot localisation that uses a conventional camera
on a robot, coupled wirelessly to an external RGB-D sen-
sor mounted rigidly in the environment. Our approach (il-
lustrated in Figure 1) sits between systems which use an ex-
ternal sensor mounted in the environment to sense the robot
and those which use an on-board system which senses the en-
vironment.
A static Microsoft Kinect acts as the external sensor ob-
serving a scene, which provides both colour and 2.5D depth
information of an indoor environment. Parts of the same
scene are also visible to a mobile robot. The robot carries a
smartphone, which acts as its camera, sensor processor, con-
trol platformand wireless link. By matching features between
the external sensor and the smartphone camera, the robots
location is estimated in real time in 6 degrees of freedom.
Further details of the system are available from Section 3.
The proposed robot localisation approach offers a number
of advantages:
By contrast to external localisation systems [Rekleitis et
al., 2006; Rawlinson et al., 2004], we do not require
the robot to be in the eld of view of the external sen-
sor. Our system merely requires that the external sen-
sor and the robots camera share a portion of their elds
of view. This greatly extends the workspace of mobile
robots being localised by external sensors by removing
out-of-sensor-range deadspots.
By contrast to purely on-board vision-based systems [Se
et al., 2001; Dellaert et al., 1999], our approach does
not require the environment to be static. Our approach is
robust against moving objects in the scene because these
are dynamically observed by the external sensor and are
used to generate useful information for localisation even
while they are moving. We only require sufcient visual
feature matches between the external sensor view and
the robot-eye-view.
In addition, our approach operates without the need for
articial landmarks such as those employed in [Betke
and Gurvits, 1997]. This is possible because we make
use of observed natural features that are matched in
real time between the external sensor and robot cam-
era. These features are robust to scale and orientation
changes.
These are important improvements as they facilitate robot
localisation in real world environments, including those with
dynamic human users. However, in order to achieve real time
localisation, we had to overcome two challenges:
The limited computing available on the robot restricts
the type of visual processing and robotic control that
can be performed in real time. In particular, the on-
board computer is not powerful enough to run the kind
of model-based visual localisation we use in this work.
The wireless link between robot and external sensor
means that the transmission of large quantities of data,
such as raw video sensor data, is prohibitively slow (and
also consumes unnecessary power).
Both challenges are solved simultaneously by distribut-
ing computational processing between the relatively powerful
Figure 1: Overview of System: Visual Localisation between a Mobile Phone Camera and an external RGB-D sensor
host PC attached to the external sensor and the smartphone
CPU on the robot. This asymmetric computing arrangement
lies conceptually in between a remote brain conguration [In-
aba et al., 2000] and having all processing occur locally on
the robot. Our robot processes camera video to extract image
features, which are then sent wirelessly to the host PC. This
help minimise the quantity of data being transmitted over the
wireless link. The robots pose is then estimated using the host
PC by comparing features extracted from the external sensor
with those from the robots camera. The resulting pose esti-
mate can be used by the host PC for robot path planning or
sent back to the robot for local planning.
The rest of the paper is structured as follows:
Section 2 outlines the hardware and robotic platform
used for the experiments.
Section 3 details the localisation system in a number of
subsections; 3.2 feature extraction, 3.3 feature descriptor
building, 3.4 feature matching and 3.5 pose estimation.
Section 4 outlines the experiments conducted to mea-
sure the accuracy of the system and demonstrate the sys-
tems capabilities for dynamic environments and when
the robot lies outside the external sensors view.
2 Hardware
The setup for this project is a mobile robot operating in an
indoor evironment surveiled by a stationary kinect.
The Microsoft Kinect provided a cheap platform for RGB-
D information. The recent release of the open source OpenNI
drivers [OpenNI, 2010] made it possible to extract factory cal-
ibrated RGB-D data.
The robotic platform we used for the experiment test bed
was an eBug smart mobile robot, developed by Monash Uni-
versity (gure 2). The hardware details for the eBug are the
subject of another paper submitted to ACRA 2011. For these
experiments the eBug was connected to an Android based
mobile phone. All the sensor processing (vision) and WiFi
communication are handled by the mobile phone. Communi-
cation between mobile phone and the eBug is done through
an IOIO board.
1
The IOIO device acts as a communication
bridge to transfer motion commands to the robot, with one
side connected to the Android Phone as a USB host (through
Android accessory / ADBprotocols) and other side connected
with eBug via serial link. The mobile phone used was a
HTC Desire Android 2.2 (Froyo) based smart mobile phone
with CPU processing speed of 1GHz and internal memory of
512MB. The images obtained from the phone have a resolu-
tion of 800x480.
Figure 2: Hardware platform - HTC Desire Android Phone
on an eBug
1
http://www.sparkfun.com/products/10585
3 Robot Localisation
3.1 System Overview
An overview of the robot localisation process is illustrated in
the Figure 1. The core concept of the localisation presented
is to estimate a pose from a set of appearance-based feature
correspondences between the mobile phone camera and the
Kinect. The process begins by detecting salient features and
building feature descriptors in greyscale images from both the
Kinect and smartphone cameras. The feature locations and
descriptors built on the mobile phone are sent to a Kinect-
CPU server via a Wi link. The server performs robust fea-
ture matching to provide a set of correspondences between
2D points from the robot camera and 3D points from the
Kinect. The Lie algebra of rigid body motions is used to lin-
earise the displacement of these salient features and to calcu-
late a pose estimation from correspondences. This localises
the robot in 3D space from the Kinects perspective and the
location can be projected onto the ground plane to obtain the
robots 2D position.
3.2 Corner Detection
FAST features [Rosten and Drummond, 2006] are computed
from the greyscale images in both the Kinect-CPU server
and the robot. FAST was chosen for its computational ef-
ciency to accommodate the limited computational power of
the robots (smartphone) processor. To allow for differences
of scale between the two frames, FAST features are detected
across a pyramid, 5-layers in the server (downsampled at fac-
tors of 1, 1.5, 2, 3 and 4) and 2-layers on the robot (down-
sampled at factors of 1 and 2).
3.3 Feature Descriptor Building
For each FAST corner, a Histogrammed Intensity PatcheS
(HIPS) feature descriptor is calculated as described in [Taylor
and Drummond, 2009]. This descriptor uses 64 evenly spaced
pixels from a 15x15 image patch centred on the FAST corner.
These pixels are normalised for mean and variance and then
quantised into 4 intensity bins. This produces a descriptor
that only requires 4 arrays of 64 bits, where each array corre-
sponds to a quantised intensity bin and each bit position is a
1 if the pixel is in that intensity range. Each corner k(x, y) in
the Kinect image and c(x, y) in the mobile camera image has
a descriptor:
d
b,i
(x, y) =
_
1 Q
b
< Im((x, y) +offset
i
) < Q
b+1
0 else
(1)
where Q =
_
_
_
_
_
_

0.675
0
+0.675

_
_
_
_
_
_
where b is the index of the intensity bin and i is the index
of the sampled pixel
The quantisation bins are selected to distribute the samples
into each range approximately evenly and assume a normal
distribution of intensities among the 64 sampled pixels in the
patch for which the mean intensity and standard deviation
are computed.
To optimise the build-time for the descriptors on the mo-
bile phone, we applied ARM Neon SIMD instructions to the
feature patch selection and quantisation process. The 64-bit
quantised bins were computed by data parallelism, treating
each row of the feature patch intensities as a vector of 8 wide
8 bit elements.
The descriptors for each corner in the Kinect frame are
computed for a 3x3 patch of centre locations and aggregated
by being ORed together to create a descriptor that will toler-
ate small deviations in translation, rotation, scale or shear.
K(x, y) =

i, j{1,0,1}
k(x +i, y + j) (2)
This ORing process creates a descriptor which is the union
of all nine appearances of the simple descriptor.
The mobile cameras feature locations and descriptors are
sent to the Kinect-CPU server via Wireless LAN where they
are matched to the Kinects current database of features.
Sparse features are sent over the wireless link rather than full
mobile phone camera images to reduce the amount of trans-
mitted data. For each feature, a 32 byte descriptor and 8 bytes
of coordinates are sent. There are typically 800 features in
an image from our smartphone camera, which gives approx-
imately 32KB of data to be sent to the Kinect-CPU server.
Sending data over the Wi link causes the largest latency in
the system.
3.4 Feature Matching
A computationally efcient comparison between a test de-
scriptor and a descriptor in the database can be performed by
counting bit differences. The error function, E, between a test
descriptor from the mobile camera, c, and database descriptor
from the Kinect image, k, is expressed as:
E = Error(K, c) = bitcount(c K) (3)
This is the number of bits set in c and not set in K and thus
counts how many pixels in c have an intensity that was never
observed in any of the nine k descriptors that were ORed to-
gether to form K. Whilst the ORing operation reduces the
error function between correct matches where the viewpoint
differs slightly, some descriptors are rather unstable in ap-
pearance and this results in them acquiring a large number of
bits set. This results in those descriptors acquiring too many
correspondences at matching time. In order to prevent this, a
threshold on the range of appearance is used to exclude un-
stable descriptors from the matching stage.
accept K if (bitcount(K) < threshold) (4)
In order to perform rapid matching, a fast tree-based index
of the Kinect HIPS descriptors is formed using the or-tree
method described in [Taylor and Drummond, 2009]. The tree
is built once per frame allowing the system to deal with dy-
namic scenes. Building the tree only takes approximately 20
ms to build on our system.
Once the tree index has been built, it can be used to rapidly
nd matches for the mobile camera corners. On our system,
this typically takes 10 ms to nd matches for 2000 corners.
In order to tune the system for speed, only the match that min-
imises the error function for each mobile camera corner (pro-
vided it is below an error threshold) is retained, rather than
admitting all matches below the threshold. This results in a
set of correspondences between corners in the mobile camera
image and the kinect image as shown in gure 3 below.
Figure 3: All matches between the Kinect RGB image (left)
and the mobile camera RGB image (right) produced by HIPS
3.5 Pose Estimation with Lie Algebra
The pose of the mobile camera relative to the coordinate
frame of the Kinect can be expressed as a rigid body trans-
formation in SE(3):
T
ck
=
_
R t
000 1
_
where R SO(3) and t R
3
(5)
The subscript ck indicates the camera coordinate frame rel-
ative to the kinect frame.
A point at a known position in the kinect frame, p
k
=
(x, y, z, 1)
T
, can be expressed in the camera frame by applying
the rigid transformation to obtain p
c
= T
ck
p
k
.
p
k
is projectively equivalent to (
x
z
,
y
z
, 1,
1
z
)
T
= (u, v, 1, q)
T
and it is more convenient to solve for pose using this formu-
lation as (u, v) are a linear function of pixel position; Our
cameras exhibit almost no radial distortion and we assume a
linear camera model for both cameras, and the value reported
in the depth image from the Kinect is linearly related to q
k
.
This gives:
_
_
_
_
u
c
v
c
1
q
c
_
_
_
_
T
ck
_
_
_
_
u
k
v
k
1
q
k
_
_
_
_
=
_
_
_
_
R
_
_
u
k
v
k
1
_
_
+q
k
t
q
k
_
_
_
_
(6)
q
c
is unobserved, so only two constraints are provided per
correspondence and thus three correspondences are needed to
compute T
ck
. When this has been computed, the same equa-
tion can be used to calculate q
c
, thus making it possible to
transform the Kinect depth map into the mobile camera frame
should that be required (e.g. to build an obstacle map from the
robots perspective).
Pose is solved iteratively from an initial assumption T
0
ck
=
I, ie that the mobile and kinect coordinate frames are coinci-
dent. At each iteration, the error in T
ck
is parameterised using
the exponential map and at each iteration T
ck
is updated by:
T
j+1
ck
= exp
_

i
G
i
_
T
j
ck
(7)
where G
i
are the generators of SE(3). The six coefcients
i
are then estimated by considering the projection of the kinect
coordinates into the mobile camera frame.
_
_
_
_
u
c
v
c
1
q
c
_
_
_
_
exp
_

i
G
i
_
T
j
ck
_
_
_
_
u
k
v
k
1
q
k
_
_
_
_
, (8)
Writing
_
_
_
_
u

k
v

k
1
q

k
_
_
_
_
T
j
ck
_
_
_
_
u
k
v
k
1
q
k
_
_
_
_
, (9)
then gives the Jacobian J of partial derivatives of (u
c
, v
c
)
T
with respect to the six
i
as:
J =
_
q

k
0 u

k
q

k
u

k
v

k
1+u

k
2
v

k
0 q

k
v

k
q

k
1v

k
2
u

k
v

k
u

k
_
, (10)
where the rst three columns of J correspond to derivatives
with respect to translations in x, y and z (and thus depend on
q) and the last three columns correspond to rotations about
the x, y and z axes respectively. This simple form of J is
a convenient consequence of working in (u, v, q) coordinates
rather than (x, y, z).
From three correspondences between (u
k
, v
k
, q
k
) and
(u
c
, v
c
) it is possible to stack the three Jacobians to obtain
a 6x6 matrix which can be inverted to nd the linearised so-
lution to the
i
from the error between the observed coordi-
nates in the mobile camera frame and the transformed Kinect
coordinates: (u
c
, v
c
) (u

k
, v

k
). With more than three corre-
spondences, it is possible to compute a least squares solution
to obtain
i
.
When
i
have been computed, iteration proceeds by setting
T
j+1
ck
= exp(
i

i
G
i
)T
j
ck
and recomputing the
i
. In practice,
we found that 10 iterations is enough for this to converge to
acceptable accuracy.
Because the correspondences obtained using HIPS contain
many outliers, RANSAC was used (sampling on triplets of
correspondences) to obtain the set of inliers. The nal pose
was then calculated through an optimisation over all inliers.
This process is typically only takes about 0.5ms. Figure 4
shows all the inlier correspondences between the Kinect and
mobile camera images computed in this way.
Figure 4: The inlier matches between the Kinect RGB im-
age and the mobile camera RGB image as determined by
RANSAC
3.6 Robot localisation on a 2D planar map
Since the robot is restricted to movement on the ground sur-
face, we dene the ground plane using the Kinects depth data
to give the robot a 2D map of its operating environment. This
is done through a calibration process where a user manually
selects three points on the ground plane in the Kinect image.
The depth values at these three points are then used to gen-
erate a ground plane of the form ax +by +cz = d, where the
constants a,b and c are normalised to produce unit vector for
the plane normal.
Points in the kinect depth image are classied into three
categories
Points within a noise threshold of the ground plane are
considered to be part of the ground plane.
Points above the ground but belowthe height of the robot
are marked as obstacles.
Other points are disregarded.
Projection of a point [x, y, z]
T
to the ground plane with
plane normal n = [a, b, c]
T
is computed by:
Projected Point = [I
3x3
n.n
T
]
_
_
x
y
z
_
_
dn, (11)
The projected ground plane points (x

, y

, z

)
T
are trans-
formed onto a xy plane to give an overhead 2D map. The
transformation required is a rotation to align the plane nor-
mal, n, and two other perpendicular vectors lying on the
plane, x
plane
and y
plane
, with the Kinects camera coordinate
frame. This is followed by a translation in the plane normal
direction by an arbitrary amount . The transformation can
be expressed as:
_
_
x
map
y
map
z
map
_
_
=
_
_
x
plane
T
y
plane
T
n
T
_
_
_
_
x

_
_
n (12)
The z
map
coordinate is constant over all points in the xy
plane and is discarded.
To obtain the nal map localisation, the pose estimation
is projected from the Kinects frame onto the 2D map. The
translation component of the pose estimation is projected in
a similar manner to a 3D point as described above. The ori-
entation requires multiplying the rotation component with a
unit vector along the Kinects optical axis before projection.
The result is the localisation of robot position as well as ori-
entation. Figure 5 displays the position and orientation of the
robot in a map lled with obstacles.
Figure 5: Dynamic map generated using Kinect. Left:
Kinects point-of-view showing obstacles in red. Right:
kinect data projected as a 2D planar map.
4 Experiments
4.1 Localisation Accuracy for a Static Scene
Figure 6: Experimental setup to measure the accuracy of
localisation system The masking tape on the oor is used for
ground truth measurements.
To measure the precision and repeatability of the localisa-
tion system, the robot was localised at a number of marked
positions and orientations as shown in gure 6. This is done
by projecting the 6 DoF robot pose onto the 2D planar map.
The distance between the marked positions was measured
to 2mm accuracy with a measuring tape to provide ground
truth data. 10 data points were measured for each location to
provide a mean and standard deviation for each position. The
obtained results are shown in gure 7. Each cluster of points
represent a set of measurements for a single location.
Figure 7: Measured Position Localisation
This data was analysed in a number of ways:
Because the full 3D 6 DoF pose is calculated, the con-
straint that the robot sits on the ground plane is not re-
spected. In order to provide a measure of the global
accuracy of the system, a plane was tted to all the
computed mobile camera positions. The residual sum-
squared error between all computed positions and this
plane was then calculated and divided by (n3), where
n is the number of measured positions, to obtain an un-
biassed estimate for the variance of position error in the
direction of the plane normal. This analysis gave a stan-
dard deviation in this direction of 0.012m (1.2cm).
A second analysis was performed on each line of data
using a similar method. A line was tted to each of the
three subsets of the data, the residual squared distance
to the line was computed and divided by (n2) to give
unbiassed estimates of the standard deviation of 0.010m
for the right line, 0.16m for the centre line and 0.021m
for the left line. The reason for this variation in accu-
racy from left to right is unknown, but is likely to be due
to the specic geometry of landmark data used in the
experiments. The standard deviation of the orientation
was also computed to be 0.03 radians (All points on the
same line were assumed to have the same ground truth
orientation).
The accuracy of the localisation is largely dependent on
the number of correct feature matches between the kinect and
robots camera. With a cluttered scene where there are nu-
merous texture features, the system demonstrates results con-
sistent to 16mm of the ground truth. As the proportion, p,
of correct matches drops, the probability of selecting the cor-
rect hypotheses for a given number of RANSAC iterations, n,
descreases. For the three point pose estimation, we require
three correct correspondences. A 0.01 probability of testing
n randomly sampled hypotheses and not selecting the correct
inlier set is:
(1p
3
)
n
< 0.01 (13)
Our system samples n = 200 hypotheses, requiring an in-
lier rate of 0.283 for a 0.99 chance of selecting the correct
pose. This places a limit on the eld of view overlap required
between the Kinect and the robots camera for correct local-
isation. A smaller view overlap and likely increase in false
positive correspondences may diminish the chance obtaining
of the correct pose.
Another source of mismatches occurs from large viewpoint
differences between the Kinect and robot. Despite our HIPS
features having slight afne invariance, sufcient viewpoint
deviations will cause structural corners to change appearance
signicantly due to its changing background.
4.2 Localisation in Dynamic Scenes
It is difcult to perform visual localisation in dynamic scenes
as visual features may move unexpectedly or become oc-
cluded on a regular basis. Our system operates robustly in
dynamic scenes as moving objects still contribute visual fea-
tures to pose estimation. Occlusions have little effect because
they only matter when features visible in one view are oc-
cluded in another; typically a small portion of features.
The video submission accompanying this paper
2
shows the
robot localising repeatedly in a dynamic environment. Still
frames of the robotic experiment taken from a handycam and
screen captures of the 2D planar map showing the robots
pose are available at the end of this paper.
5 Conclusion and Future Work
We have presented a robot and external RGB-D sensor sys-
tem that performs a 6 DoF pose estimation in real time. The
system achieves up to 10 fps despite the processing limitation
of the smartphone on the robot and the bandwidth limitation
of the Wi-based communication link. Localisation is also
robust against dynamic environments. Future work includes
extending the system to include input from multiple Kinects
so that universal coverage can be provided, and the use of
2
http://www.youtube.com/watch?v=N5PXHWk8nCE
multiple robots, including kinect-carrying robots that can act
as the external sensor. We also plan to explore the compu-
tational asymmetry further by investigation of different divi-
sions of labour between robots and external sensors.
References
[Betke and Gurvits, 1997] M. Betke and L. Gurvits. Mobile
robot localization using landmarks. Robotics and Automa-
tion, IEEE Transactions on, 13(2):251263, 1997.
[Cox et al., 1990] I.J. Cox, G.T. Wilfong, and T.L. P erez.
Autonomous robot vehicles, volume 447. Springer-Verlag,
1990.
[Dellaert et al., 1999] F. Dellaert, W. Burgard, D. Fox, and
S. Thrun. Using the condensation algorithm for robust,
vision-based mobile robot localization. Computer Vision
and Pattern Recognition, 1999. IEEE Computer Society
Conference on., 2:2 vol. (xxiii+637+663), 1999.
[Inaba et al., 2000] Masayuki Inaba, Satoshi Kagami, Fumio
Kanehiro, Yukiko Hoshino, and Hirochika Inoue. A plat-
form for robotics research based on the remote-brained
robot approach. The International Journal of Robotics Re-
search, 19(10):933954, 2000.
[OpenNI, 2010] OpenNI. Kinect drivers.
http://www.openni.org/, 2010.
[Rawlinson et al., 2004] D. Rawlinson, P. Chakravarty, and
R. Jarvis. Distributed visual servoing of a mobile robot
for surveillance applications. Australasian Conference on
Robotics and Automation (ACRA), 2004.
[Rekleitis et al., 2006] Ioannis Rekleitis, David Meger, and
Gregory Dudek. Simultaneous planning, localization, and
mapping in a camera sensor network. Robotics and Au-
tonomous Systems, 54(11):921932, 2006. Planning Un-
der Uncertainty in Robotics.
[Rosten and Drummond, 2006] E. Rosten and T. Drum-
mond. Machine learning for high-speed corner detection.
Computer VisionECCV 2006, pages 430443, 2006.
[Se et al., 2001] S. Se, D. Lowe, and J. Little. Vision-
based mobile robot localization and mapping using scale-
invariant features. In Robotics and Automation, 2001. Pro-
ceedings 2001 ICRA. IEEE International Conference on,
volume 2, pages 20512058. IEEE, 2001.
[Taylor and Drummond, 2009] S. Taylor and T. Drummond.
Multiple target localisation at over 100 fps. In British ma-
chine vision conference, volume 4. Citeseer, 2009.
Figure 8: Initial movement of robot away from Kinect
Figure 9: No visual localisation as robot and external do not share eld of view
Figure 10: Example of localisation in dynamic scene with moving objects
Figure 11: Example of localisation in dynamic scene with occlusions
Figure 12: Example of localisation under signicant scale and viewpoint change

You might also like