2010 IEEE International Conference on Multisensor Fusion and
Integration for Intelligent Systems
University of Utah, Salt Lake City, UT, USA, Sept. 5-7, 2010
V-SLAM: Vision-Based Simultaneous Localization and Map
Building for an Autonomous Mobile Robot
Daniel T. Savaria, Ramprasad Balasubramanian, Member, IEEE
Abstract—Simultaneous localization and map building
(SLAM) is a desired feature for autonomous mobile robots. II. PREVIOUS WORK
This SLAM approach allows the robot to create a map on the
SLAM is a widely researched topic. Here previous work
fly and then backtrack to further explore the area without
human interaction. Data from the robot's encoder and sonar and the difference of this approach are described.
sensors are used along with depth information from a stereo A. Simultaneous Localization and Mapping
camera vision system to explore and map the surroundings.
The Speeded Up Robust Features algorithm (SURF) is used to Many approaches use a laser range finder to discover
visually identify landmarks in the environment. features in the environment that can be tracked as the robot
moves. These units work by emitting beams of laser and
I. INTRODUCTION measuring how long it takes the signal to return to the unit.
M OBILE robotics deals with technology concerning This provides accurate range measurements spanning a wide
robots that are capable of moving around within their field of view in front of the robot. These SLAM approaches
environment to perform a task. Autonomous mobile robots find features in the range measurements that can be used to
aim to achieve their goals on their own with limited or no create a map and localize as the robot views the features
input from human operators. A desired feature for these from multiple positions as it travels [2], [3].
robots is the ability to simultaneously localize and map Unlike the previous approaches just described, the
(SLAM) their surroundings. The SLAM problem has been VSLAM work presented in this paper is meant to provide a
highly researched. Solving it means that an autonomous solution using stereo vision instead of a laser scanner.
mobile robot will be able to carry out its tasks in an B. Visual Odometry
unknown environment, of which is has no prior information.
Visual Odometry is a technique for tracking a robot’s
In order to achieve this, two tasks must be carried out in
movement using a camera. It does not necessarily solve
parallel. The robot must be able to create and update a map
SLAM, but it can be used as a basis to an approach. These
of its surroundings, and while doing this it must also be able
systems provide accurate odometry readings [4], [5].
to place itself within the map. The robot needs to
What these approaches lack is a way to properly localize
automatically locate itself within the map it is creating to be
the robot. They provide an accurate method to plot the
able to operate autonomously.
robot’s course but do not provide a way for the robot to
Previous solutions often relied on laser range measuring
update and confirm its position using previously recorded
equipment to create the map and discover landmarks for the
information, a key aspect of this approach.
robot to use for localization. More recently, computer vision
systems have been used to tackle the problem. The solution C. Using Vision for SLAM
provided in this approach concentrates its efforts on using a By using features obtained through forward facing
stereo vision camera to both visually identify objects to use cameras instead of laser scanners, several vision based
as landmarks and determine the position of these objects. SLAM approaches have been proposed [6]–[9]. One
This process of identifying objects is achieved by comparing interesting approach in “CV-SLAM: A new Ceiling Vision-
images taken by the robot’s camera to stored images of based SLAM technique” by WooYeon Jeong and Kyoung
possible landmarks using an image-matching algorithm Mu Lee [10] uses a single camera is positioned on the robot
called Speeded Up Robust Features (SURF) [1]. to face directly up at the ceiling.
In addition to the vision system, sonar readings are used to The approach presented in this paper differs from those
gather additional data. This is important for the automatic just described these in several ways. First, the robot looks
discovery of branching paths. for real world objects with the SURF algorithm. It does this
by comparing images it takes of its environment to stored
Manuscript received July 15, 2010.
images of objects expect to be found in the environment. It
Daniel T. Savaria was a Computer Science graduate student at the can then use these object landmarks to identify and update
University of Massachusetts Dartmouth. He is now with the Naval its position within the map. These objects are used for both
Undersea Warfare Center in Newport, RI (e-mail:
[email protected]).
mapping and localization. Also, an exploration behavior is
Ramprasad Balasubramanian is an Associate Professor in the added to the robot so that it can autonomously discover and
Department of Computer Science at UMass Dartmouth in North Dartmouth, further explore the environment it is mapping.
MA (e-mail: [email protected]).
978-1-4244-5426-6/10/$26.00 ©2010 IEEE 1
III. APPROACH within sight. If one is spotted, its location is determined
The components required for this SLAM approach can be using the depth capabilities of the camera and stored within
split into two categories. The first is the hardware and the robot’s map. In order to aid with loop closing, if a
sensors that compose the robotic system. This includes the landmark is spotted a second time during this stage, the
robot itself, the stereo vision camera system, and a laptop robot will use this information to update its position within
used to control everything. The second is the SLAM is map in order to overcome encoder inaccuracies.
algorithm with the responsibilities of taking in data from the While driving, the robot checks its side-facing sonar
hardware, issuing commands to the hardware, creating the values several times a second to look for gateways or
map, and localizing the robot within this map. regions of bifurcation. These are places on the map where
The robot used is a Pioneer 3-DX (P3-DX) by further exploration may be needed and usually correspond to
MobileRobots. It carries with it a laptop and a Digiclops adjacent hallways or rooms.
three-camera stereo vision. More robot specification details At some point, the main exploration stage will terminate.
can be found on the MobileRobots website [11]. This may be at loop closure or when the robot revisits a
The algorithm developed for this approach has several location it has already been to, when the robot reaches a
stages. It provides a method not only for mapping and dead end, or human input during teleoperation mode.
localizing but also an intelligent, autonomous behavior for C. Backtracking and Gateway Exploration
backtracking and systematically exploring unknown sections
This stage is carried out completely autonomously. Here,
of the map. A flowchart of the algorithm’s logic is shown in
the robot demonstrates its exploration ability and its use of
Fig. 1. its own map to navigate the environment. It begins by
A. Algorithm Preparations reviewing the list of gateways it had made on the fly during
its initial exploration.
It maps a path from its current location back to each of the
gateways. As it travels it updates its perceived position
within its map using several techniques. The most advanced
of which is when the robot visits a waypoint at which it
previously spotted a landmark object. At this point it will
rotate itself to the direction at which the object was spotted.
It will capture an image to confirm the object is there and the
object’s location. By comparing the originally marked
location of the landmark to the new location, the robot is
able to correct its position and heading within its map.
There is another method for making slight adjustments to
its position in the map when none of the landmark objects
are around. When the robot reaches a waypoint, it can point
Fig. 1. This flowchart shows the control flow of the program used to
itself in the direction when it originally visited this point. By
drive the robot. The two major sections are the initial exploration and taking an image with the Digiclops, it can then compare the
the backtracking for gateway exploration. position information from this image to the one originally
Before starting, images of objects that the robot is taken here along with information from the sonar to adjust
expected to come across in the environment are loaded onto its position in the map. It will also use lateral sonar readings
the laptop. The robot is placed into an unknown, indoor to further correct its position estimate.
environment and the controlling program is then started. At When the robot reaches a gateway it will turn to the
this point the loaded images are analyzed and SURF direction of the opening and drive forward. While doing this,
descriptors are extracted. These will then be compared to the it will record the path it is traveling and the surrounding
images taken by camera to determine if the robot comes sonar values similar to its operation in the original
across any of these objects, which is detailed in Chapter 5. exploration stage. When the robot can travel no farther, it
will turn around and return to the intersection. At this point,
B. Initial Exploration it will follow the path it has determined to the next gateway.
The robot then makes its way through the environment in It will repeat this process until the opening at each gateway
a relatively straight line, either autonomously or through has been explored.
teleoperation. During this exploration, there are several
operations being performed concurrently. Perhaps most IV. IMPLEMENTATION
important, the robot creates a series of connected waypoints. The Hybrid Deliberative/Reactive paradigm is used to
These are sections of the map that are considered safe to control the robot. This separates the SLAM processing from
travel. the details of controlling robot movement.
At each of these waypoints, a frame is captured from the
Digiclops camera. SURF is used to see if a known object is
2
A. Waypoint Marking provides less data points for the calculations, its advantage is
As the robot moves through the environment it will that it can work when the robot is facing 180 degrees
continually take in sonar readings to measure its opposite of what it was when it first visited the landmark.
surroundings. After it moves for several seconds it will take This means if the robot is traveling back through the
detailed readings. These include the robot’s position and environment in the opposite direction it was originally
heading, the front and side sonar readings, and an image going, it does not have to stop and turn in order to correct its
captured from the Digiclops. position.
The image from the Digiclops is used for depth
measurements and is also run through the SURF detector
algorithm. If the robot thinks that it is in a position to close
the loop, it can use the information from SURF to compare
the scene to previous images it has taken to confirm the
location. The other purpose of SURF is to determine if one
of the preloaded objects has been spotted.
If one of these objects is determined to be within the taken
image, the object’s three dimensional position relative to the
camera is calculated. By taking the distances it is from the
front and to the side of the camera, this is translated into a
position on the map’s coordinates.
While doing this the robot looks at the queue of sonar
readings has accumulated. It analyzes it for possible
gateways that could lead to new areas.
Fig. 2. The robot can use previously seen objects to correct its
B. Position Updating with an Object Landmark position within its map.
When the robot spots an object a second time, it will not
simply calculate its location and place it in its map. Instead it V. IMAGE PROCESSING
will compare its position to where the object was spotted
previously. If the positions of the two are similar enough, the Computer vision plays a large role in this SLAM
robot can consider them the same landmark, as opposed to approach. It is used to look for objects within images taken
multiple occurrences of the same object in the environment. by the camera and to calculate the location of these objects.
The robot will adjust the position of the second sighting to A. Image Matching
be the same as that when it was first sighted. It then draws a There are many approaches to image matching. The
virtual circle around this position with the radius set to the Speeded Up Robust Features (SURF) algorithm [1] is used
distance between the robot and the object. A line is then here because it is an efficient algorithm that is able to deal
drawn from the robot’s perceived position to the object’s with the difficulties of image matching. One of the deciding
updated position. The intersection of this line and circle that factors was SURF’s quickness. Compared to the similar
is closest to the robot’s perceived position becomes the algorithm Scale-Invariant Feature Transform (SIFT) [12],
robot’s new, updated position within its map. See Fig. 2 for SURF produces comparable results but much more quickly.
a visual representation of this calculation. Another positive of SURF is that it works well with “busy”
C. Position Updating without an Object Landmark images that contain complex scenery.
When the robot is backtracking, it has the capability of First SURF identifies unique features in each image. Then
updating its position within its map without any nearby it compares features across the images to find matches. If
objects. If the robot is facing the same general direction it there are enough matches, the RANSAC algorithm [13] is
was when it first traveled, it will take an image with its used to find the projective transform. This will attempt to
camera and compare the three dimensional data from the draw a bounding box around the matches. If it is successful,
image to the data from the first time it was at this waypoint. the area within the bounding box represents the location of
It knows where the location of this area should be on its the found object.
map, so it calculates the difference between the actual B. Stereo Vision Processing
location and the location indicated by its most recent The Digiclops and its API perform all of the necessary
readings. It then compares its current lateral sonar readings calculations to determine the three dimensional position
to those it recorded the previous time it was at this location. relative to the camera form most pixels in each frame. When
Similar to the process above, since it knows where the walls an object is found in an image, the position of that object
on the sides of it should be, it can use its current readings to relative to the camera can be determined by considering only
adjust its perceived position. the pixels that lie within the bounding box drawn by
The second way of performing a position update is by RANSAC. From here, a position of the object relative to the
simply using the lateral sonar readings. Although this
3
map’s coordinate system can be calculated.
VI. EXPERIMENT
The robot was given a series of images of possible
landmarks that it might find in an indoor, office setting. It
was then placed in an unknown environment and allowed to
create a map. After traveling through the hallway, it was
then allowed to backtrack through the environment to
explore each gateway and stop at each landmark it had
spotted along the way. This allowed for testing of the robot’s
gateway detection and exploration capabilities and the
ability to use its own map.
Fig. 3 shows one of the environments that the test was run
in. The start and end points are marked on the map. The
Fig. 5. A map of the first environment created by the robot and an
numbered circles represent four areas adjacent to the central overlay of the result with the original map. Units are in millimeters.
hallway that the robot should find and identify as gateways.
This environment presents an interesting challenge in that The outlined areas represent the sonar values returned by
these explorable areas at each gateway are not simple boxes. the robot as it explored gateways. The algorithm used for
They contain many angled doors and walls. Some of the this approach created the entire map except for the numbers,
objects that the robot used as landmarks can be seen in Fig. which were added to help identify each explored gateway. In
4. this run, the robot identified all four gateways. Gateways 1
and 2 were fully explored, while gateways 3 and 4 were
partially explored. What happened here was that the robot
was able to identify these as gateways that may lead to new
areas, but the angle it turned into it caused it to reach a wall
before fully entering the hallway.
Fig. 6 shows a second run in the same environment and
the overlay. The robot did not use any information gathered
from the first run, so this was still and an unknown
environment to the robot. Two landmarks were spotted this
time, a large bin and a fire extinguisher. In this case the
areas at gateways 2 and 3 were fully explored; the area at
gateway 1 was partially explored; and gateway 4 was not
identified this time.
Fig. 3. A map of the first test environment.
Fig. 5 shows one run of the robot in this environment
followed by an overlay of the environment map and the
robot’s map. The area shaded with the diagonal lines is the
path of the robot in its initial exploration stage. It made its
way from one end of the hallway to the other. The section
shaded with the dots is the open area of the hallway, with the
outline representing the walls as detected by lateral sonar.
The triangle on the map is the location of a landmark spotted
in the environment, in this case a recycle bin.
Fig. 6. A second map of the first environment created by the robot
and an overlay of the result with the original map. Units are in
millimeters.
As can be seen marked on the map, there was one false
positive. The hallway contained a small nook that the robot
mistakenly identified as a gateway. It had no negative effect
on results though. The robot attempted to enter this area,
Fig. 4. Object landmarks in the environment. detected the wall, and simply returned to the main hallway to
4
continue explore the rest of the gateways. have happened was an incorrect sonar reading. As can be
Fig. 7 shows the second area that the robot was tested in. seen in the map, it appears that the sonar detected the start
The start, end, and expected gateways to be discovered are and end of the branching hallway correctly, but in the center
marked. This was an interesting environment to test in is an incorrect reading that returned a value much closer to
because the hallway is not simply a straight line, and the the robot than what was actually there. This caused the
entranceways to adjacent hallways and rooms are not always hallway to appear too narrow for the robot to enter, and thus
at perfect right angles. was disregarded. The area next to gateway 3 was fully
explored. Gateway 4 was detected, but the hallway was only
partially explored.
A third run in this area with the overlay can be seen in
Fig. 10. This time, the large bin and fire extinguisher spotted
in run 2 were also found here. In addition a recycle bin in a
different position was also located. Gateways 1 and 3 were
found and successfully explored. This time gateway 2,
which was undetected in the previous two runs was
discovered. This is most likely a result of the robot taking a
slightly different path. This meant it drove by this gateway
without turning and was able to detect it with the sonar.
However, this time gateway 4 went undiscovered. It appears
Fig. 7. A map of the second test environment. to have been a victim of a bad sonar reading, similar to what
happened at gateway 1 in run 2.
The robot created map of the first run of this area is The results of these experiments were positive. Of the
shown Fig. 8 along with the overlay. In this run, gateways 1, twenty gateways in all the runs, the robot automatically
3, and 4 were all discovered and fully explored. The robot detected fifteen. Fifteen out of fifteen were successfully
however missed gateway 2. The most likely reason for this is backtracked to. This indicates that the robot was able to
that the robot was turning as it passed this area. One of the maintain its position within its map accurately enough to
weaknesses of the sonar sensors on the robot is that they reach these gateways on the return trip. Ten out of these
cannot make accurate measurements while the robot is fifteen were fully explored, with the remaining five being
making significant changes in its heading. partially explored.
VII. CONCLUSIONS AND FUTURE WORK
The work here has shown that visually identifying objects
in an unknown environment can be used as a basis for
Fig. 8. A map of the second environment created by the robot and an
overlay of the result with the original map. Units are in millimeters.
A second run of this area and the overlay are show in Fig.
9. Once again the spotted landmarks are marked. In this
case, a large bin, a fire extinguisher and a recycle bin were Fig. 9. A second map of the second environment created by the robot
discovered. Similar to the previous run, gateway 2 was not and an overlay of the result with the original map. Units are in
millimeters.
detected. Also not detected was gateway 1. What appears to
5
REFERENCES
[1] Herbert Bay, Andreas Ess, Tinne Tuytelaars, Luc Van Gool, “SURF:
speeded up robust features,” Computer Vision and Image
Understanding (CVIU), Vol. 110, No. 3, pp. 346–359, 2008.
[2] Zhan Wang, Shoudong Huang, and Gamini Dissanayake, “DSLAM:
decoupled localization and mapping for autonomous robots,”
International Symposium of Robotics Research (ISRR 05), Oct 2005.
[3] Giorgio Grisetti, Cyrill Stachniss, and Wolfram Burgard, “Improved
techniques for grid mapping with Rao-Blackwellized particle filters,”
IEEE Transactions on Robotics, Vol. 23, pp. 34–46, Feb 2007.
[4] Jason Campbell, Rahul Sukthankar, Illah Nourbakhsh, Aroon Pahwa,
“A robust visual odometry and precipice detection system using
consumer-grade monocular vision,” IEEE International Conference
on Robotics and Automation, 2005, Vol. 3, pp. 3421–3427.
[5] Annalisa Milella and Roland Siegwart Siegwart, “Stereo-based ego-
motion estimation using pixel tracking and iterative closest point,”
IEEE International Conference on Computer Vision Systems, 2006.
[6] Miguel Angel Garcia and Agusti Solanas, “3D simultaneous
localization and modeling from stereo vision,” Proceedings of the
2004 IEEE International Conference on Robotics & Automation, New
Orleans, LA, 2004, pp. 847–853.
[7] Stephen Se, David G. Lowe, and James J. Little, “Vision-based global
localization and mapping for mobile robots,” IEEE Transactions on
Robotics, Vol. 21, No. 3, pp. 364–375, 2005.
[8] Juan Manuel Sáez and Francisco Escolano, “Entropy minimization
SLAM using stereo vision,” Proceedings of the International
Fig. 10. A third map of the second environment created by the robot Conference on Robotics and Automation, Barcelona, Spain, April
and an overlay of the result with the original map. Units are in 2005.
millimeters. [9] Pantelis Elinas, Robert Sim, and James J. Little. “σSLAM: stereo
vision SLAM using the Rao-Blackwellised particle filter and a novel
SLAM. The robot was able to use these as landmarks to mixture proposal distribution,” IEEE International Conference on
correct its position as it explored. The use of sonar and the Robotics and Automation, 2007.
[10] WooYeon Jeong and Kyoung Mu Lee. “CV-SLAM: a new ceiling
camera’s depth measurements also proved useful for
vision-based SLAM technique,” IEEE/RSJ International Conference
position correction when no landmarks were around. The use on Intelligent Robots and Systems, August 2005.
of lateral sonar for discovering branching paths and [11] PIONEER P3-DX. 2008. Retrieved on March 2010. Available:
gateways that connect these paths was successful as well. http://www.activrobots.com/ROBOTS/p2dx.html
[12] Lowe, D.G., “Object recognition from local scale-invariant features,”
The robot was able to identify these paths without human The Proceedings of the Seventh IEEE International Conference on
confirmation and autonomously return to them for further Computer Vision, 1999, Vol. 2, 1999, pp.1150–1157.
exploration. [13] Fischler, M. A. and Bolles, R. C., “Random sample consensus: a
paradigm for model fitting with applications to image analysis and
There are several features that would make welcomed automated cartography,” Communications of the ACM 24, 6 (June
additions to this work in the future. The first would be to add 1981), pp. 381–395.
a three dimensional visual odometry system. It has been
shown to be an accurate way of measuring travel distance
[4], [5]. Having a more accurate way of measuring distance
traveled would mean that the localization portion of this
work would have less error to overcome. Besides being
more accurate, it is also more robust than the robot’s
encoders. The encoders measure movement on and two
dimensional plane, whereas visual odometry is capable of
doing so in three dimensions. This would allow the robot to
accurately explore areas that do not have level floors.
Another future update would be a more robust gateway
detection algorithm. Careful analysis of the sonar readings
being returned should be able to detect and overcome the
sometimes faulty readings that were causing some gateways
not to be detected.
Finally, a desirable feature would be to extend this
landmark data in such a way that this exploration robot can
report these locations to a second robot, which would then
be able to use them to locate itself within the map created by
the first robot. This would allow for multi-robot missions
and for return visits to the same environment.