Politecnico Di Torino Repository ISTITUZIONALE: Loop Detection in Robotic Navigation Using MPEG CDVS

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

POLITECNICO DI TORINO

Repository ISTITUZIONALE

Loop Detection in Robotic Navigation Using MPEG CDVS

Original
Loop Detection in Robotic Navigation Using MPEG CDVS / de Gusmão, P. P. B.; Rosa, S.; Magli, E.; Lepsøy, S.;
Francini, G.. - ELETTRONICO. - (2015), pp. 1-6. ((Intervento presentato al convegno 2015 IEEE International Workshop
on Multimedia Signal Processing nel 2015.

Availability:
This version is available at: 11583/2638899 since: 2016-04-01T17:24:11Z

Publisher:
IEEE

Published
DOI:10.1109/MMSP.2015.7340871

Terms of use:
openAccess
This article is made available under terms and conditions as specified in the corresponding bibliographic description in
the repository

Publisher copyright

(Article begins on next page)

19 September 2019
Robotics Navigation Using MPEG CDVS
Pedro P. B. de Gusmão #1 , Stefano Rosa #2 , Enrico Magli #3 , Skjalg Lepsøy ∗4 , Gianluca Francini ∗5
#
DET and DAUIN, Politecnico di Torino

Video and Image Analysis Lab - Telecom Italia

Abstract—The choice for image descriptor in a visual naviga- in terms of bandwidth requirement where remote processing
tion system is not straightforward. Descriptors must be distinctive is needed such as in collaborative mapping.
enough to allow for correct localization while still offering low In [7] authors have used an intermediate level of representa-
matching complexity and short descriptor size for real-time
applications. MPEG Compact Descriptor for Visual Search is tion to speed-up loop detection known as bags of visual words
a low complexity image descriptor that offers several levels of [8]; a technique originally developed to compare similarity
compromises between descriptor distinctiveness and size. In this between documents and which is still considered the state of
work we describe how these trade-offs can be used for efficient the art today.
loop-detection in a typical indoor environment. Finally as the robot navigates throughout its environment
the number of observed landmarks increases and so does the
I. I NTRODUCTION number of descriptors it stores for loop-detection.This means
Robots that navigate through unknown environments, such that loop-detection algorithms are bound to become expensive
as autonomous vacuum cleaners, all face a common chal- in terms of both memory and computational complexity [2] as
lenge: to create a representation of their environment while the map grows. This forces system designers to either choose
simultaneously trying to locate themselves. This problem is less complex descriptors, risking wrong data association, or to
known in literature as Simultaneous Localization and Mapping overestimate memory demands during hardware design.
(SLAM) and its formulation has been thoroughly reviewed The problem of finding a perfect balance between de-
in [1] and [2]. Approaches to SLAM usually involve two scriptor distinctiveness and descriptor size is not exclusive
alternate phases. During Motion Prediction the robot uses in- to the VSLAM domain. When dealing with large databases,
ternal parameters to estimate local displacements, while during Content-Based Image Retrieval (CBIR) systems face this very
Measurement Update the robot interacts with the environment same issue. Very recently, the Moving Picture Experts Group
to improve both its pose estimation as well as its estimate of (MPEG) has defined a new industry standard for CBIR known
its environments map. as Compact Descriptors for Visual Search (MPEG CDVS).
In visual-based SLAM, motion prediction can be obtained The standard specifies various modes of compression that offer
by extracting and matching visual features from a sequence of trade-offs between distinctiveness and size and also provides
images in a process called feature-based visual odometry[3]. with suggested metrics for image comparison that quantify
These same features can also be used as landmarks of for loop- how similar two images are.
detection during the measurement update phase in a complete In this work we claim that the characteristics that make
Visual SLAM (VSLAM) system [4]. MPEG CDVS a good descriptor for CBIR, also make it ideal
The purpose of Loop-detection is to identify landmarks on for robotic navigation. More specifically, we state that MPEG
the map that have already been seen by the robot during early CDVS can be used as a fast, reliable and storage-efficient loop
stages of navigation. During this process, if a particular land- detector in a typical indoor VSLAM application.
mark is allegedly found in two different places it means that Our first contribution comes in Section III where we de-
the estimated trajectory described by the robot has probably scribe a probabilistic approach to loop detection using the
drifted at some point. The relative displacement between these standard’s suggested similarity metric. We then compare per-
two appearances can then be used by a global optimizer to formance of CDVS compression modes in terms of matching
improve estimations of both the landmarks’ positions as well speed, feature extraction and storage requirements with the
as robot’s pose. well-known SIFT descriptor for five different types of indoor
Early approaches to loop-detection using visual features floors and show that CDVS has superior performance in all
include the work in [5], where authors used the SIFT [6] for cases in Section IV. Finally, in Section V we apply our
its distinctive power and thus capability of correctly find a proposed method to a real robotic application and show that
loop. SIFT’s distinctiveness; however, comes at a high price our VSLAM approach gives better results than state-of-the-art
in terms of compute complexity leading to substantial battery laser-based SLAM.
consumption. Moreover, the amount of SIFT features gener- II. T HE MPEG C OMPACT D ESCRIPTOR FOR V ISUAL
ated by a single image also makes it prohibitively expensive S EARCH
The Compact Descriptor for Visual Search (CDVS) is the
MMSP’15, Oct. 19 - Oct. 21, 2015, Xiamen, China. new standard for Content Based Image Retrieval developed by
the Moving Picture Experts Group [10]. It defines an image Final motion between time steps k−1 and k can be modeled
description tool designed for efficient and interoperable visual as a rotation followed by translation, so that at t = k pose can
search applications. be recursively obtained as
A. Descriptor Generation
[xk , yk ]T = R(∆θk−1,k ) [xk−1 , yk−1 ]T + Tk−1,k (1)
A CDVS descriptor is made of two parts: one global
descriptor associated to the entire image and a set of local θk = θk−1 + ∆θk−1,k (2)
descriptors associated to specific points in the image known where ∆θk−1,k is the rotation angle estimated between time
as interest points. The entire descriptor extraction process can steps k − 1 and k, R(∆θk−1,k ) is the rotation matrix for that
be summarized as follows: same angle, and Tk−1,k is the translation vector.
1) Interest Point Detection;
A. Loop Definition
2) Feature Selection and Descriptor Extraction; where
based on the level of compression used only a limited The use of a downward-facing camera allows for a natural
number of interest points will account for the final definition of loop based on the intersection of imaged regions.
descriptor. For images Ia and Ib taken along the robot’s path, we define
3) Local Descriptor Extraction; loop as a function of the overlap ratio between the floor area
4) Local Descriptor Compression; observed by these two images. So given the area of intersection
5) Coordinate Coding; area(Ia ∩ Ib ), and the respective area of union area(Ia ∪ Ib ),
6) Global Descriptor Generation: It is an aggregation of lo- a loop can be defined as
cal descriptors to generate a fixed, small size description area(Ia ∩ Ib )
of the entire image. J= (3)
area(Ia ∪ Ib )
The final result of CDVS extraction is a compressed file 
whose size is upper-bounded by 512B, 1kB, 2kB, 4kB, 8kB, 1 if J ≥ r
loop(Ia , Ib , r) = (4)
and 16kB associated to the extraction modes 1 to 6 respec- 0 if J < r
tively. where r is the threshold that defines the minimum overlap
ratio for which two intersecting images can be considered a
B. Descriptor Matching and Score
loop. In this work we set this threshold to r = 0.33, which
When comparing two images MPEG CDVS suggests the roughly amounts for an area intersection of 50% when Ia and
use of two different types of matching scores: global score Ib have the same areas.
and a locals score.
The global score is given as a weighted correlation between B. Loop Probability
the two images global descriptors. Loop detection as defined in (4) requires the knowledge
The local score given between two images results from the of how much intersection there is between the two images.
sum of local scores of each descriptor in those images, i.e a In order to indirectly measure the probability of having a
one-to-one comparison is made between local descriptors from particular area ratio we use the local score given between two
the two images. Finally, the standard also suggest the use of images so that
a geometric consistency analysis, known as Distrat [11], to
eliminate false matches between descriptors based on their P (loop = 1|score = s) = P (J ≥ r|score = s) (5)
geometry disposition.
In order to detect a loop as defined in III-A, we consider P (J ≥ r, score = s)
P (J ≥ r|score = s) = (6)
only those features that have passed also the geometric con- P (score = s)
sistency test. Moreover we consider the values given by local The conditional probability in (5) can be experimentally
score as our means to indirectly measure the probability of estimated through (6) by combining the knowledge of the cam-
loop detection for it gives more reliable results.. era’s parameters with a source of relative motion estimation.
III. P ROPOSED M OTION M ODEL AND L OOP D ETECTION This process will be described in depth during the next section.
A robot carrying a calibrated camera navigates through an IV. T RAINING OF P ROPOSED M ODEL
indoor environment while taking a picture Ik of the floor below Besides being distinctive, a descriptor needs also to be
at each time step k. The robot’s starting position and heading economical in terms of storage and extraction and matching
define both origin and x-axis of a global coordinate frame. times in order for it to be considered as a feasible option for
This coordinate system then becomes uniquely defined as we loop detection.
choose the z-axis to point upwards. In this section we analyze the distinctiveness of all CDVS’
We assume the environment’s floor to be a planar surface compression modes for the five different types of floorings
so that, for each time step k > 0, the robot’s pose is given seen in figure 1. We also compare their memory and processing
by xk = [xk , yk , θk ]T , where xk and yk indicate the robot’s time requirements with a popular implementation of the SIFT
coordinates and θk is the robot’s heading. descriptor found in [12].
Ideally, these matching matrices should display increasingly
intensity of pixel values (yellow) in regions near each diagonal
and very low values (dark blue) everywhere else. The natural
randomness intrinsically associated to the production of most
(a) Mosaic (b) Marble (c) Red Tiles of the floor types enables them to have a relatively the thick
principal diagonals and to display very low matching scores
where no overlap occurs.
The one noticeable exception occurs for the printed wood
floor. This particular artificial type flooring is made of a printed
(d) Printed Wood (e) Dotted Tiles repetitive patterns. The effect of such patterns appears as bright
spots on its matching matrix and can be particularly harmful
Fig. 1: Different types of floorings commonly found in indoor
environments. for loop-detection since it leads to erroneously detected loops.
We can observe the evolution of these spots and the diagonal
thickness in figure 3 as we vary the compression mode.
A. Distinctiveness of CDVS local score
Our analysis starts by driving the robot forward for 10 meter
using a PointGrey Grasshopper 3 camera rigidly mounted on
a Turtlebot 2 in a setup defined in section III.
For each floor type we extract CDVS descriptors the se-
quence of images and match each image with all the previous
(a) Mode=1 (b) Mode=2 (c) Mode=3
ones using CDVS local score to measure similarity. We repeat
this process for all modes of compression to evaluate its effect
on feature distinctiveness.
Distinctiveness in this context means to have high local
score for images having overlapping regions and very low
scores otherwise. Since images were taken in sequence during
robotic motion, images that are close in the sequence are also (d) Mode=4 (e) Mode=5 (f) Mode=6
spatially next to each other, and thus should have high local Fig. 3: Visual representation of Local Score for the Printed Wood
score. floor using different compression modes.
A visual representations of these matches using compression
mode 6 is given in figure 2 where pixel intensities in position It is clear that the diagonal thickness decreases as the
(i, j) represent the local score between current image i and compression level increases, i.e. for lower modes of compres-
a previously visited image j. Since we only match current sion. This phenomenon happens to all flooring types and it
images with previous ones, each matrix representing the is due to the fact that CDVS will use fewer keypoints with
matches is triangular. shorter local descriptors to represent each image. This makes
To allow for a fair visual comparison, the matrices values it difficult to correctly match images that are even just slightly
have been normalized. Yellow pixels mean high local score displaced with respect to one another. Therefore; as expected,
while dark blue pixels indicate a low score. The presence of lower modes of compression can be considered to offer less
small, bright triangles seen at the lower end of each matrix distinctive local descriptors.
indicates when the robot stopped. On the other hand and for the same reason, bright spots
on the wooden pattern become even more visible as the level
of compression increases, which makes this particular kind of
flooring the worst case scenario and also our study case to test
CDVS for loop detection.

B. Effects of Feature Selection


(a) Mosaic (b) Marble (c) Red Tiles Besides being able to correctly distinguish between different
floor patches, CDVS must also be economical in terms of
storage, extraction time and matching time if it is to be
considered as an alternative to well-established descriptors
such as SIFT [6]. Here we examine these characteristics by
analyzing the same five types of flooring.
As seen in figure 4, feature selection has the effect of
(d) Printed Wood (e) Dotted Tiles
reducing the number of local features generated for each
Fig. 2: Visual representation of Local Score for different floor types. image. Since the final binary size of a CDVS descriptor is
limited by its compression mode, the maximum number of floor types mode 1 mode 2 mode 3 mode 4 mode 5 mode 6 SIFT
local descriptors produced by each mode is upper-bounded Dotted Tiles 0.26 0.93 1.15 1.97 4.51 7.87 91
and does not significantly depend on the particular type of Marble 0.18 0.55 0.85 1.29 3.31 6.62 242
Mosaic 0.18 0.54 0.84 1.26 3.31 6.65 490
flooring. Red Tiles 0.21 0.53 1.01 1.62 3.84 7.39 84
Printed Wood 0.19 0.67 0.91 1.35 .51 7.05 182
TABLE II: Average matching times per image in milliseconds for
each CDVS mode of compression and SIFT.

parameters can also be used to individualize observed regions


by projecting the camera’s field-of-view onto the imaged floor.
Once the projected areas of images Ia and Ib are known, it is
sufficient to know their relative position to estimate the area
of intersection and thus to be able to evaluate the overlap ratio
J.
Relative motion during training was obtained using the
Fig. 4: Average number of extracted local descriptors per image for robot’s odometry, and although odometry suffers from error
each type of flooring. accumulation after long trajectories, it does provide depend-
able relative motion estimations over short range distances.
In terms of memory efficiency, feature selection has a Moreover, images that are relatively distant from each other,
clear effect on reducing storage requirements. For example, will have zero overlapping region an therefore error accumu-
an image taken from a Mosaic floor would normally require lation will constitute a problem. During training phase relative
over 300kB of memory if SIFT descriptor were to be used, motion was obtained by using a Kalman filter that combined
considering implementations such as [12], while CDVS would information from both wheel odometry and a robot’s internal
require at most 16kB at its least compressed mode. gyroscope during the experiment described at the beginning
Another positive effect of feature selection is the reduction of this section.
of extraction time as reported in table I. Since feature selection By combining these pieces of information with the local
is made based on keypoints’ characteristics, only features scores of each analyzed matching pair, we can generate for
from selected keypoints will be processed. Moreover, having each compression mode a loop detection probability curve
a limited number of descriptors per image will also limit the as defined in 6. The resulting curves as seen in 5 show the
time spent for comparing two images as reported in table II. probability two images having more than 50% of intersection
Finally we observe that both extraction and matching times for each mode given a local score.
are at least an order of magnitude lower than SIFT and that Lower compression modes achieve certainty at lower values
these values show little variation within a given compression of local score. This is due to the fact that low compression
mode. modes also have fewer descriptors to be used during match.
Having upper-bounded memory requirements and extraction
and matching times that are relatively invariant to the different
types of floorings are essential qualities for systems that
may work on different environments. For example, system
requirements for automatic vacuum cleaner should not depend
on consumer’s specific type of floor.

floor types mode 1 mode 2 mode 3 mode 4 mode 5 mode 6 SIFT


Dotted Tiles 16.2 15.4 15.5 16.2 18.9 21.0 217
Marble 15.6 15.3 15.3 16.3 18.9 21.4 295
Mosaic 15.9 15.8 16.0 18.9 22.4 22.3 388
Red Tiles 14.6 14.8 14.7 15.5 18.1 21.0 209
Printed Wood 15.2 15.2 15.3 16.0 18.8 21.0 270
TABLE I: Average extraction times per image in milliseconds for
each CDVS mode of compression and SIFT. Fig. 5: Conditional loop probability for printed wood floor.

From these curves we select the minimum values values


C. Estimating Loop Probability of local score s that guarantee loop detection for each com-
A camera’s intrinsic and extrinsic parameters define the pression mode. These hypothesis values are reported in table
camera’s pose with respect to the world and also allow us to III and used to define the loops during the final experiments
make real world measurements directly from images. These discussed in section V.
D. Visual Odometry for Testing local score mode 1 mode 2 mode 3 mode 4 mode 5 mode 6
Hypothesis 10 14 15 18 23 25
In order to demonstrate that our approach could be applied Experimental – 20 16 18 24 27
to a vision-only navigation system having no other sensors
TABLE III: Hypothesized and Experiemtal threshold values for local
such as gyroscope or wheel encoder, we have decided to score loop detection.
implement VSLAM also using visual odometry. Our robot
setup follows the one in [9]. However, although we do use
a similar approach to obtain odometry, our main concern in images, we report their respective paths in Fig. 7 where we
this work is the correct detection of loops for VSLAM. use the room’s blueprint as reference map.
Depending on system requirements, less complex feature
descriptors such as [13] and [14] could be used to generate
odometry, while CDVS would be used just for loop detection.
However, since local features from each image will already be
available, we choose to use CDVS local descriptor to generate
visual odometry as well.
For each pair of consecutive images Ik−1 and Ik we perform
a feature extraction and match of MPEG CDVS descriptors,
which results into two sets of N > 2 matching coordinate
pairs. We combine these pixel coordinates with the camera’s
calibration information and produce the sets Pk−1 and Pk each Fig. 7: Path comparison using visual odometry.
containing the 3D coordinates for the N matching pairs. By
defining Pbk−1 and Pbk to be the centroids of Pk−1 and Pk We then perform loop detection as described in Sec IV
respectively, we retrieve rotation and translation using Singular where for each image pair whose local score was above the
Vector Decomposition. hypothesized in table III a loop was declared.
A visual representation of this process is shown in figure 6. For each compression mode, we have represented data from
visual odometry and loop constraints as a path graph so that the
robot’s trajectory could be optimized using the LAGO graph
optimization software [16], whose purpose is to find a coherent
sequence of poses that better describe all loop and odomtery
constraints, and thus perform VSLAM.
During these experiments, we have observed that the pro-
posed local scores thresholds loop-detection found earlier were
slightly too permissive and still allowed for small amount of
(a) t=k (b) t = k + 1 false-positive loops to be detected. This fact has led us to
Fig. 6: Matching between images at time k and k + 1. Keypoints empirically increase these thresholds until reasonable results
are indicated as dots. Yellow dots represent non-matching features. were obtained. We report these new values as the Experimental
Green lines represent correspondences between matches. Red lines entries in III, which differ very little from the hypothesized
are false matches found by Distrat.
ones and thus proving that the method is still valid. Th
resulting trajectories for each compression mode using the
Although CDVS already performs geometric consistency experimental thresholds can be seen in Fig. 8.
validation, we make useof a few RANSAC [15] cycles to re-
move possible possible remaining outliers and improve results.

V. E XPERIMENTAL R ESULTS
Partial results from Sec. IV have lead us to try our loop-
detection technique on the most challenging flooring for loop-
closure, i.e. the flooring most susceptible false-loop detection.
In this experiment, the robot navigates through indoor office
for about 110 meter while taking a total of 7154 images of its
printed wood floor and performing loops before finally going
Fig. 8: Paths optimized using LAGO.
back to its original position.
We first use the sequence of images to generate the path’s
visual odometry as described in IV for all except the first A visual inspection between the two figures reveals the
compression mode, which was unable to generate enough improvements obtained for all compression modes when loops
matching points between consecutive images. For those modes are correctly detected. Except for compression mode 2, all
capable of estimating translation and rotation from consecutive improved trajectories pass through the hallway, enter and exit
Visual Odometry Visual SLAM We have shown experimentally that CDVS’ feature selection
∆x (m) ∆y (m) ∆θ (rad) ∆x (m) ∆y (m) ∆θ (rad) serves not only to reduce the final descriptor size but also
Mode 2 17.35 -6.58 -0.86 0.0725 -0.0088 0.0075 to significantly speed up feature extraction and matching. In
Mode 3 -4.36 1.27 0.03 0.0355 -0.0115 0.0001 our practical experiment CDVS’s least compressed mode was
Mode 4 0.22 0.19 -0.13 0.0359 -0.0149 0.0086 shown to be over 20 times faster than SIFT during matching
Mode 5 1.01 0.09 -0.17 0.0302 -0.0011 -0.0249
Mode 6 2.10 0.00 -0.23 0.0221 -0.0056 -0.0128 time and to require 10 times less storage space and still able
TABLE IV: Relative pose errors between staring and final position
to provide for correct loop-detection.
for both visual odometry and VSLAM Finally, when we compared to a laser scanner, we have seen
that our approach has generated far better results.
property mode 2 mode 3 mode 4 mode 5 mode 6 SIFT R EFERENCES
Storage (MB) 7.67 14.63 28.59 56.55 112.43 1213.84
[1] H. Durrant-Whyte and T. Bailey, “Simultaneous localization and map-
Time (s) 4.23 6.62 9.62 27.27 58.32 1264.20
ping: part i,” Robotics & Automation Magazine, IEEE, vol. 13, no. 2,
TABLE V: Storage requirement for all 7154 images and total pp. 99–110, 2006.
matching time between last sequence image and all previous ones [2] T. Bailey and H. Durrant-Whyte, “Simultaneous localization and map-
ping (slam): Part ii,” IEEE Robotics & Automation Magazine, vol. 13,
no. 3, pp. 108–117, 2006.
[3] D. Scaramuzza and F. Fraundorfer, “Visual Odometry [Tutorial],” IEEE
the northwest room and respect the physical constraints present Robotics & Automation Magazine, vol. 18, no. 4, pp. 80–92, Dec. 2011.
in the map. However, in order to have a more quantitative [4] F. Fraundorfer and D. Scaramuzza, “Visual odometry: Part ii: Matching,
robustness, optimization, and applications,” Robotics & Automation
measure of such improvements we report in III the pose Magazine, IEEE, vol. 19, no. 2, pp. 78–90, 2012.
difference between starting and ending poses in the trajectory, [5] P. Newman and K. Ho, “Slam-loop closing with visually salient fea-
which ideally should be none. tures,” in Robotics and Automation, 2005. ICRA 2005. Proceedings of
the 2005 IEEE International Conference on. IEEE, 2005, pp. 635–642.
To highlight the gains in terms of both storage savings and [6] D. G. Lowe, “Object recognition from local scale-invariant features,” in
matching times with respect to SIFT, we have compared the Computer vision, 1999. The proceedings of the seventh IEEE interna-
amount of memory required to save descriptors for all 7154 tional conference on, vol. 2. Ieee, 1999, pp. 1150–1157.
[7] J. Wang, H. Zha, and R. Cipolla, “Coarse-to-fine vision-based lo-
images using each compression mode and also report the time calization by indexing scale-invariant features,” Systems, Man, and
necessary to compare the last image in the sequence with all Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 36, no. 2,
previous one. We report these values in V. pp. 413–422, 2006.
[8] G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray, “Visual
Finally, in order to compare our proposed method with categorization with bags of keypoints,” in Workshop on statistical
existing state of the art frameworks for indoor SLAM, we also learning in computer vision, ECCV, vol. 1, no. 1-22. Prague, 2004,
report on both figures the path generated using a Hoyuko laser- pp. 1–2.
[9] ISO/IEC JTC 1/SC 29/WG 11 (MPEG), Information technology –
scanner optimized with the widely used Gmapping algorithm Multimedia content description interface – Part 13: Compact descriptors
[17]. for visual search, ISO/IEC Std.
[10] S. Lepsoy, G. Francini, G. Cordara, and P. P. de Gusmao, “Statistical
modelling of outliers for fast visual search,” in Multimedia and Expo
(ICME), 2011 IEEE International Conference on. IEEE, 2011, pp. 1–6.
[11] A. Vedaldi and B. Fulkerson, “VLFeat: An open and portable library of
computer vision algorithms,” 2008, http://www.vlfeat.org/.
[12] H. W. H. Wang, K. Y. K. Yuan, W. Z. W. Zou, and Q. Z. Q. Zhou,
“Visual odometry based on locally planar ground assumption,” 2005
IEEE International Conference on Information Acquisition, pp. 59–64,
2005.
[13] B. D. Lucas, T. Kanade et al., “An iterative image registration technique
with an application to stereo vision.” in IJCAI, vol. 81, 1981, pp. 674–
679.
Fig. 9: Map and path generated using a laser scanner with Gmapping [14] S. Leutenegger, M. Chli, and R. Y. Siegwart, “Brisk: Binary robust
algorithm. invariant scalable keypoints,” in Computer Vision (ICCV), 2011 IEEE
International Conference on. IEEE, 2011, pp. 2548–2555.
[15] M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm
for model fitting with applications to image analysis and automated
At first sight, results from laser scanner can be considered cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395,
incorrect and unreliable. This occurs because laser scanner was 1981.
unable to create a precise map of environment and thus was [16] R. R. G. P. di Torino, “LAGO: Linear approximation for graph opti-
mization,” https://github.com/rrg-polito/lago, 2000–2004.
unable to reproduce its path correctly on the real world map. [17] G. Grisetti, C. Stachniss, and W. Burgard, “Improved techniques for
This becomes evident in figure 9 where the path generated by grid mapping with rao-blackwellized particle filters,” Robotics, IEEE
the laser seems to be coherent to its self-generated "bended" Transactions on, vol. 23, no. 1, pp. 34–46, 2007.
map. Our method clearly does not suffer from the same issue.
VI. C ONCLUSION
In this work we have proposed the use of MPEG CDVS in
a SLAM framework for loop-detection in an indoor environ-
ment.

You might also like