Service Robotic Paper
Service Robotic Paper
Repository ISTITUZIONALE
Original
Localization and Mapping for Service Robotics Applications / Rosa, Stefano. - (2014). [10.6092/polito/porto/2542488]
Availability:
This version is available at: 11583/2542488 since:
Publisher:
Politecnico di Torino
Published
DOI:10.6092/polito/porto/2542488
Terms of use:
Altro tipo di accesso
This article is made available under terms and conditions as specified in the corresponding bibliographic description in
the repository
Publisher copyright
28 July 2023
POLITECNICO DI TORINO
SCUOLA DI DOTTORATO
Dottorato in Meccatronica – XXVI ciclo
Tesi di Dottorato
Aprile 2014
“I love deadlines. I like the
whooshing sound they make as they
fly by.”
Douglas Adams
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Localization 6
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Relative measurements . . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Navigational information . . . . . . . . . . . . . . . . . . . . . 7
2.1.3 Landmark Based Methods . . . . . . . . . . . . . . . . . . . . 8
2.1.4 Map Based Methods . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Monte Carlo Localization . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Multi-robot Localization in Highly Symmetrical Environments . . . . 12
2.3.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.2 The Three-State Multirobot Collaborative Localization (3SMCL) 15
2.3.3 Simulation tests . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Localization with sensor fusion and markers detection . . . . . . . . . 34
2.4.1 IMU-corrected odometry . . . . . . . . . . . . . . . . . . . . . 34
2.4.2 Planar markers detection . . . . . . . . . . . . . . . . . . . . . 35
2.4.3 ROS implementation . . . . . . . . . . . . . . . . . . . . . . . 37
2.4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5 Map Updating in Dynamic Environments . . . . . . . . . . . . . . . . 41
2.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.5.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 43
2.5.3 The Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.5.4 Simulation Tests . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.6 Extension to the multi-robot case . . . . . . . . . . . . . . . . . . . . 55
2.6.1 The Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.6.2 ∆-awareness, local ∆-mapping and map merging . . . . . . . . 59
2.6.3 Exploration strategy . . . . . . . . . . . . . . . . . . . . . . . 60
2.6.4 Distributed auction-based task allocation . . . . . . . . . . . . 62
2.7 Simulation Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
iii
2.7.1 Simulation test 1 . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.7.2 Simulation test 2 . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.7.3 Simulation test 3 . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.7.4 Computational load . . . . . . . . . . . . . . . . . . . . . . . . 66
2.8 Experimental Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4 Human-robot interaction 91
4.1 Vision-based people tracking from a moving camera . . . . . . . . . . 91
4.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.1.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.1.3 Monocular people detection and tracking . . . . . . . . . . . . 93
4.1.4 Stereo-based people detection and tracking . . . . . . . . . . . 95
4.2 Adaptive people and object tracking . . . . . . . . . . . . . . . . . . 95
4.2.1 Histograms of oriented gradients . . . . . . . . . . . . . . . . . 95
4.2.2 The proposed approach . . . . . . . . . . . . . . . . . . . . . . 96
4.2.3 Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.2.4 Online training . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.2.5 Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.2.6 Experimental Tests . . . . . . . . . . . . . . . . . . . . . . . . 100
iv
5 Conclusions 104
5.1 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.1.1 Journal papers . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.1.2 Conference and workshop papers . . . . . . . . . . . . . . . . 105
5.1.3 Preprints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Bibliography 107
v
List of Tables
vi
List of Figures
vii
2.23 Comparison between the trend of the wavg (t) over time when no mod-
ifications in the map occur (a), and when (b) the robot passes near a
variation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.24 Trend of wavg (t), wf ast (t), wslow (t), rd (t) in absence of modifications
in the environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.25 Trend of wavg (t), wf ast (t), wslow (t), rd (t) in presence of modifications
in the environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.26 Path followed by the simulated robot to evaluate Nef f in the third
case. On the left there is the simulated robot and environment, on
the right our graphical user interface. . . . . . . . . . . . . . . . . . . 49
2.27 Comparison of the Nef f trend. . . . . . . . . . . . . . . . . . . . . . . 49
2.28 The map used to validate the Map Merge block, used also in the
simulation tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.29 The two parts of map in Figure 2.28 used to validate the Map Merge
block. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.30 Localization error of a robot in nr runs. . . . . . . . . . . . . . . . . 54
2.31 The initial map (a), the map after few variations (b), and the map
at the endo of the ∆-mapping process. . . . . . . . . . . . . . . . . . 56
2.32 The localization error in the long operativity test. . . . . . . . . . . 56
2.33 The trend of the quality of the map over time. . . . . . . . . . . . . . 57
2.34 Figures show the map merging in a typical case: 2.34(a) shows the
pose of the robots, Robot 1 receives a map from Robot 2 and it uses
it to update its map; 2.34(b) is the current map, 2.34(e) is the current
time-map, 2.34(c) is the received map, 2.34(f) is the received time-
map, 2.34(d) and 2.34(g) are the resulting map and time-map after
the merging process. . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.35 Time-map and skeleton of areas to explore (a) and final goal points
for three robots (b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.36 The simulation environment. . . . . . . . . . . . . . . . . . . . . . . . 63
2.37 Localization error and acceptance index for test 1. . . . . . . . . . . . 65
2.38 Localization error and acceptance index for test 2. . . . . . . . . . . . 66
2.39 Localization error for test 3. . . . . . . . . . . . . . . . . . . . . . . . 66
2.40 CPU usage (upper plot) and memory usage (lower plot) in a simulated
experiment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.41 Results for the experimental test. . . . . . . . . . . . . . . . . . . . . 68
3.1 The pose graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.2 Estimated trajectory for each of the considered datasets: fr079 (a),
csail (b), intel (c), M3500 (d), M10000 (e) . . . . . . . . . . . . . . . 79
3.3 Architecture of the system . . . . . . . . . . . . . . . . . . . . . . . . 87
3.4 Resulting map for simulated environment. . . . . . . . . . . . . . . . 88
3.5 Resulting map for office-like environment. . . . . . . . . . . . . . . . 89
viii
3.6 Mapping results for the data-center room. (a) Resulting map using
our approach; (b) Robot trajectory and loop closings. Red and blue
arrows represent the trajectory; yellow lines represent orientation-
only constraints; green lines represent full-pose constraints. . . . . . . 89
3.7 Mapping results for the data-center corridors. . . . . . . . . . . . . . 90
4.1 An overview of the feature extraction and object detection chain.
The detector window is tiled with a grid of overlapping blocks in
which Histogram of Oriented Gradient feature vectors are extracted.
The combined vectors are fed to a linear SVM for object/non-object
classification. The detection window is scanned across the image at
all positions and scales, and conventional non-maximum suppression
is run on the output pyramid to detect object instances. . . . . . . . 96
4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.3 Optional caption for list of figures . . . . . . . . . . . . . . . . . . . . 101
4.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.6 Example of robot following a user. (a) The algorithm in action. (b)
Estimated trajectory of the user (blue dots) and trajectory followed
by the robot (red line). . . . . . . . . . . . . . . . . . . . . . . . . . . 103
ix
Chapter 1
Introduction
1.1 Motivation
Service robotics include all the robotic systems that are built to perform tasks in
place of human users or in collaboration with them. Service robotics is defined as
those robotics systems that assist people in their daily lives at work, in their houses,
for leisure, and as part of assistance to the handicapped and elderly. In industrial
robotics, the task is typically to automate tasks to achieve a homogenous quality
of production or a high speed of execution. In contrast, service robotics’ tasks are
performed in spaces occupied by humans and typically in direct collaboration with
people. Service robotics is normally divided into professional and personal services.
Professional service robotics includes agriculture, emergency response, pipelines,
and the national infrastructure, forestry, transportation, professional cleaning, and
various other disciplines. These systems typically augment people for execution of
tasks in the workplace. According to the IFR/VDMA World Robotics, more than
110,000 professional robots are in use today and the market is growing rapidly every
year [9].
1
1 – Introduction
Personal service robots, on the other hand, are deployed for assistance to people
in their daily lives in their homes, or as assistants to them for compensation for
mental and physical limitations. By far, the largest group of personal service robots
consists of domestic vacuum cleaners; over 6 million iRobot Roombas alone have
been sold worldwide, and the market is growing 60% each year. In addition, a large
number of robots have been deployed for leisure applications such as artificial pets
(AIBO), dolls, etc. With more than 4 million units sold over the last 5 years, the
market for such leisure robots is experiencing exponential growth and is expected
to remain one of the most promising in robotics [9]. Some examples of popular
commercial service robots are shown in Figure 1.2.
Today, autonomous service robots are still far from being part of our everyday
life. This is hard to understand if one considers the remarkable success of robots in
automation industries. In controlled environments the robots’ speed, accuracy and
reliability by far exceed human capabilities.
There has been progress on challenging problems in highly unstructured envi-
ronments, and several successful applications in more structured environment and
tasks. These applications demonstrate that service robotics has value in solving
real-world problems. Nonetheless, a number of important problems remain open.
Finding solutions to these problems in the area of mobility will be necessary to
achieve the level of autonomy and versatility required for the identified application
areas.
Enabling safe autonomous navigation in unstructured environments with static
obstacles, human-driven vehicles, pedestrians, and animals will require significant
investments in component technologies.
2
1 – Introduction
Under the definition “autonomous navigation” we group all the techniques and
algorithms which are necessary for a mobile robotic platform in order to accomplish
these fundamental tasks:
• localization inside a known environment
• map building and simultaneous localization inside an unknown environment
(SLAM)
• obstacle avoidance and path planning
Robot localization denotes the problem of estimating the position of a robot inside
a known map of the environment. Map building is the process of estimating a
map of an unknown environment given the positions of the robot and a set of
measurements. Initially these two problems were studied independently; later they
have been studied as two related issues. The resulting problem has been called
simultaneous localization and mapping (SLAM). The third task is related to the
ability of a robot to autonomously move from one position to another, while avoiding
static and moving obstacles.
One of the key fields of development that has been indicated for Robotics in [9]
is Logistics. In this case Logistics is mainly intended as the autonomous transport
of goods and people. However there are other possible of developments of Robotics
in Logistics and one of them can be found in the three-years (2008-2011) project
MACP4Log - Mobile, autonomous and cooperating robotic platforms for supervision
and monitoring of large logistic surfaces (co-funded by Regione Piemonte), which
provided the foundation for the first part of this Phd work. MACP4Log is aimed
at the study and development of a prototype of a mobile robotic platform, with
on-board vision systems and sensors, integrating a flexible wireless communication
solution, able to move autonomously in large logistic spaces, and to communicate
with a supervisor and other similar platforms to achieve a coordinated action to
carry out specific tasks. Logistic spaces are large areas where logistic societies or
other transport enterprises receive, store and distribute large quantities of goods,
mainly bulky ones, as containers, cars and other similar items. Other logistic spaces
may include car-rental parkings, intermodal rail nodes, etc.
The main tasks to be performed by the robot team address the principal issues of
potential logistic users and are given by: building and updating maps of both indoor
and outdoor logistic spaces, performing programmed and pro-active surveillance,
locating on the map specific items, as container, cars, etc., that can be marked by
proper tags (e.g. license plates, container numbers, RFID) or unmarked, and should
be distinguished by color, shape and/or other physical characteristics, achieving a
full coordination of the team, always securing the wireless connectivity.
The mobile platform must be able to: autonomously move within substantially
flat and uniform indoor and outdoor spaces, partially structured, with or without
3
1 – Introduction
4
1 – Introduction
particular the Phd work reported in this thesis was done in collaboration with Thales
Alenia Space on astronaut detection and tracking for extravehicular activities where
semi-autonomous robots need to follow and assist the astronauts. The work was
partially done under the STEPS (Systems and Technologies for Space Exploration)
Space Project. This Space project, which lasted for three years and was co-financed
by Regione Piemonte, aimed to study new technologies that may be used in the
robotic or human space exploration, with particular attention to Moon and Mars
exploration.
5
Chapter 2
Localization
2.1 Introduction
The problem of robot localization consists of answering the question Where am I ?
from a robot’s point of view. This means the robot has to estimate its location
relative to the environment. When we talk about location, pose, or position we
mean the x and y coordinates and heading direction of a robot in a global coordinate
system. The localization problem is an important problem. It is a key component
in many successful autonomous robot systems. If a robot does not know where it is
relative to the environment, it is difficult for it to decide what to do. The robot will
most likely need to have at least some idea of where it is to be able to operate and
act successfully [26].
The general localization problem is composed by two increasingly difficult sub-
problems. In the position tracking problem the robot knows its initial location in the
environment. The goal of the localization is to keep track of the position while the
robot is navigating through the environment. Methods that address this problem
are called local techniques [40].
In the global localization, instead, problem the robot does not know its initial
position and it has to localize itself from scratch. It hereby possibly needs to be
able to deal with multiple hypotheses about its location. Methods that solve this
problem are called global techniques [40]. An even harder problem is the kidnapped
robot problem. The robot is already correctly localized, but it is transferred (or
“kidnapped”), to another location without the robot being aware of this. This can
happen for example in the case of failures or if the robot is physically moved by an
operator. The problem for the robot is to detect that it has been kidnapped and
to find out what its new location is [40]. A factor that complicates each of these
problems is the dynamics of the environment in which the robot is moving. Most
research on localization has been focused on static environments. This means that
6
2 – Localization
the robot is the only moving object in the environment. Obviously this is not the
case in the real world. Dynamic environments contain other moving objects and
in these environments localization is significantly more difficult, since these other
objects might confuse the robot about its location by corrupting the information
used for localization.
In determining its location, a robot has access to two kinds of information. First, it
has a-priori information gathered by the robot itself or supplied by an external source
in an initialization phase. Second, the robot gets information about the environment
through every observation and action it makes during navigation. In general, the a-
priori information supplied to the robot describes the environment where the robot
is driving around. It specifies certain features that are time-invariant and thus can
be used to determine a location. The a-priori information can come in different
flavors.
In most localization approaches the robot has access to a map representation of
the environment. Such map can be geometric or topological [81]. Geometric maps
describe the environment in metric terms, much like normal road maps. Topological
maps describe the environment in terms of characteristic features at specific loca-
tions and ways to get from one location to another. A map can be given by an
external source in an initialization phase. Alternatively, the robot can learn a map
of the environment while it is navigating through it, which is known as Simultaneous
Localization and Mapping (Chapter 3).
The information which is usually available to the robot in order to perform
localization is divided into two categories.
7
2 – Localization
and it senses the environment on the other. These two types of actions give rise to
two different kinds of position information.
To be able to move around in an environment a robotic vehicle has a guidance or
driving system [27]. A guidance system can consist of wheels, tracks or legs, in prin-
ciple anything that makes the vehicle move around. These components are called
actuators. Obviously, the guidance system plays an important role in the physical
position of a robot. The guidance system directly changes the location of the vehicle.
Without a guidance system the robot is not driving around, which makes localizing
itself a lot easier.
Assuming that a robot does have a guidance system, the way the guidance sys-
tem changes the location contains valuable information in estimating the location.
Knowing the effects of actions executed by the driving system gives a direct indica-
tion of the location of the vehicle after execution of these actions. By monitoring
what the driving system actually does using sensors, the displacement of the robot
vehicle can be estimated. This results in relative position measurements, or also
sometimes referred to as proprioceptive measurements. Relative position measure-
ments are measurements that are made by looking at the robot itself only. No
external information is used and these measurements can therefore only supply in-
formation relative to the point where the measurements were started.
The robot senses the environment by means of its sensors. These sensors give mo-
mentary situation information, called observations or measurements. This infor-
mation describes things about the environment of the robot at a certain moment.
Observations made from the environment provide information about the location
of the robot that is independent of any previous location estimates. They provide
absolute position measurements, also sometimes called exteroceptive measurements
to emphasize that the information of these measurements comes from looking at the
environment instead of at the robot itself.
8
2 – Localization
active landmarks; trilateration techniques only use distances. The angles and/or
distances are then used to calculate the position and orientation of the robot. The
GPS, or Global Positioning System, uses trilateration techniques to determine lati-
tude, longitude and elevation. It uses time of flight information from uniquely coded
radio signals sent from satellites. Twenty-four satellites orbit the earth in 12 hours
in six different orbital planes. Ground stations track the satellites and send them
information about their position. On their turn, the satellites broadcast information
back to the earth. The result gives a position accuracy between 100 and 150 meters.
To be able to use the mentioned methods, the robot needs to know the location of
the landmarks in advance. Besides this, no a-priori information is required. There
are some problems with these techniques though. The transmitting of the active
signals can be disturbed by atmospheric and geographic influences while going from
sender to receiver [81]. These disturbances can be refractions and reflections, and
will result in incorrect measurements. Another problem is that active landmarks
in practice often cannot send out their signals in all directions, and thus cannot be
seen from all places. Furthermore, active landmarks may be expensive to construct
and maintain.
If the landmarks do not actively transmit signals, the landmarks are called pas-
sive landmarks. The robot has to actively look for these landmarks to acquire
position measurements. Techniques using passive landmarks in determining the po-
sition of the robot rely on detection of those landmarks from sensor readings. The
detection of landmarks depends on the type of sensor used. For example, in de-
tecting landmarks in images from a vision system, image processing techniques are
used. When three or more landmarks are detected by the robot, it can use the
triangulation or trilateration techniques to compute its location. Passive landmarks
can be either artificial or natural and the choice of which kind of landmarks to use
can play a significant role in the performance of the localization system.
Artificial landmarks are landmarks designed to be recognized by robots. They
are placed at locations in the environment that are known in advance and that are
well visible to the robot’s sensors.
Natural landmarks are landmarks that are not specifically engineered to be used
as localization means for robots. Natural landmarks are already part of the environ-
ment of the robot. In indoor environments, examples of passive natural landmarks
are doors, windows, and ceiling lights, whereas in outdoor environments roads, trees,
and traffic signs are candidates.
9
2 – Localization
describe walls in hallways or offices. Sensor output, for example from sonars, is then
matched with these features. Model matching can be used to update a global map
in a dynamic environment, or to create a global map from different local maps [81].
The representation of maps can differ. It can either be geometric or topological.
Geometric maps contain the environment in a global coordinate system. Topologi-
cal maps contain the environment in the form of a network where nodes represent
places in the environment and arcs between the nodes represent actions relating one
location to another.
In this Phd thesis grid representations of the environment, called Occupancy
grid maps, are employed. Occupancy grid maps, which were introduced in the
1980s by Moravec and Elfes [65], are a popular probabilistic approach to represent
the environment. They are an approximative technique in which for each cell of a
discrete grid the posterior probability that the corresponding area in the environ-
ment is occupied by an obstacle is is calculated. The advantage of occupancy grid
maps lies in the fact that they do not rely on any predefined features. Additionally
they offer a constant-time access to grid cells and provide the ability to represent
unknown (unobserved) areas, which is important in many applications (e.g., explo-
ration tasks). Their disadvantages lie in potential discretization errors and the high
memory requirements.
10
2 – Localization
11
2 – Localization
Require: χt−1 = {pit−1 , wit−1 | i = 1, ..., n}, representing belief Bel(xt−1 ), Nmin ,
Nmax , control measurement ut−1 , observation z t
Ensure: χt
1: Initialization
2: χ(t) = 0
3: ηχ = 0
4: while ηχ ≤ i do
5: Resampling
6: Sample an index j from the discrete distribution given by the weights in
S t−1
7: Next State Prediction
8: Sample pti from p(xt |xt−1 , ut−1 ) conditioned on pt−1
j and ut−1
9: Update
10: wit = p(z t |pti ); // Compute importance weight
11: α = α + wit ; // Update normalization factor
S t = S t hpti , wit i; // Insert sample into sample set
S
12:
13: i = i + 1;
14: KLD criterion
15: ηχ = KLD(χt , Nmin , Nmax );
16: if ηχ > i then
17: break
18: [p̂ ] = DT_clustering(χt )
t
12
2 – Localization
13
2 – Localization
14
2 – Localization
..
hi (k) = . (2.1)
.
|R (k)| |R (k)|
x̂i i (k), ŷi i (k)
15
2 – Localization
Each row of (2.1) contains an hypothesis on the position of the i-th rover expressed
in Cartesian global coordinates.
The set of all the measurements received by the i-th rover up to time k is then
defined as Hik = {hi (1), . . . , hi (k)}.
The algorithm has been conceived as a finite state machine, with three states:
1) GL = global localization, 2) UN = undecided, and 3) PT = position tracking. It
represents an upgraded version of the multi-robot localization algorithm originally
proposed in [11], called 3SMCL, which was based on two states only, but applied
to a completely symmetric environment, where absolute heading measurements are
only occasionally available.
At the beginning of the execution of the algorithm, each rover is in the first
state, as no information is available about its initial position, and hence it has to be
globally localized with respect to the map. When the number of feasible hypotheses
about its actual position becomes small (as detailed in the next Subsection), the
rover enters the UN state. Finally, it switches to the PT state, when it is supposed
to be correctly localized with a high degree of confidence. If a sudden increase
in the localization error is detected, due for instance to kidnapping or failures in
proprioceptive sensors (e.g., wheel encoders rupture) or changes in the map, the
algorithm may switch again to the UN state.
The algorithm
We now describe the core of the 3SMCL algorithm, which runs onboard each rover.
The algorithm outlined in Figure 2 is basically organized as a Finite State Ma-
chine (FMS) with three states. The first state is the default initial state and it is
active when the rover is in GL state (lines 17-60). Then, when the accordance func-
tion defined in 2.5 is under a certain threshold, the algorithm enters the UN state.
This state indicates that there are at most two relevant position hypotheses, but
the algorithm is still not able to definitely choose which hypothesis has the highest
likelihood.
When one hypothesis markedly prevails, i.e., when the localization performances
are sufficiently accurate, the algorithms changes its state to PT.
Ideally, once reached the PT state, the rovers should never switch back to the UN
state. However, the algorithm continues to monitor the localization performances.
In case of localization performance degradation, the algorithm switches again to the
UN state.
The algorithm is based on particle filters [86]; observing the pseudo-code in Algo-
rithm 2 relative to the GL and UN states,
16
2 – Localization
it can be noticed the typical prediction phase at line 19 and the update phase at
lines 21-22. The prediction phase computes the vector pi (t) containing the predicted
pose (in terms of global coordinates (x, y, θ)) for each particle, while the purpose
of the update phase is twofold. It gives the vector wi (t) containing the importance
factors for each particle, and it verifies whether matrix hi (k) contains position es-
timates outside the map. If this is the case, such estimates are weighted using a
Bivariate Normal Distribution. Then, at line 23, the algorithm verifies if it has re-
ceived a vector of measurements hi (k) from other rovers of the set Ri (k) at time k. If
Ri (k) is empty, a classic Kullaback-Leibler Divergence (KLD) resampling occurs (see
[86]); Nmin and Nmax are respectively the lower and upper bound of the number of
particles Nkld employed in the resampling algorithm. If instead Ri (k) is not empty,
a further improved version of the KLD resampling is implemented (line 29). The
idea is to exploit the relative Cartesian position measurements (contained in vector
hi (k)) that the i-th rover receives from the other rovers of Ri (k) to propagate the
information about the few asymmetries of the environment. Instead of employing all
the available particles (Nmax − Nkld ) to approximate each belief contained in matrix
hi (k), in the solution here proposed we adopt the Kullback-Leibler Divergence to
compute the number of particles needed to approximate each position hypothesis
contained in matrix hi (k), and we propose an original resampling algorithm called
Mutual KLD outlined in Algorithm 3.
In line 1 the number nhyp of hypotheses received is calculated as the cardinality of
the elements contained in hi (k). Nkld is the number of particles used to approximate
the belief without taking into account the position hypotheses coming from the
other rovers. This number is computed in lines 2-3. If this number is greater than
Nmax − Nhyp , we reduce it to Nmax − Nhyp , where Nhyp is the maximum number of
particles that in this case can be used to approximate the probability distribution
of the nhyp position hypotheses.
Then, the number of particles Nmkldi
is computed for the i-th hypothesis according
to
1
i
Nmkld = χ2k−1,1−δ
2
which is the equation (13) in [38], where χ2k−1,1−δ is a chi-square distribution with
1 − k degrees of freedom. This number is the required number of particles to guar-
antee that with probability 1 − δ the Kullback-Leibler distance between the MLE
of the position hypothesis and the true distribution is less than .
Our claim is that we do not need to know the exact probability distribution of the
incoming hypotheses, because they are only used to add inaccurate information on
the position of the i-th rover. Therefore we can accept an approximation of the prob-
abiliy distribution of these hypotheses. By increasing or decreasing the value of
and δ better or worse approximations of the distribution of each position hypothesis
are set. This has indeed an impact on the number of particles used to approximate
17
2 – Localization
18
2 – Localization
Nkld = i Ncurr i
, i = 1, . . . , nhyp
P
9:
S = i Si , i = 1, . . . , nhyp
t t
P
10:
19
2 – Localization
the final belief on the position of the rovers, as it will be discussed later in Remark
1, with reference to the results of the simulation test 4.
Higher values of allow to potentially use less than Nmax − Nkld particles to approx-
imate the belief of the rovers, hence the value of in the case of our Mutual KLD is
higher than the value adopted in the standard KLD resampling.
The value of can indeed be seen as a tunable parameter that changes the impor-
tance of the position hypotheses contained in hi (k) on the final belief of the i-th
rover. We do not focus here on how to find general methodologies to calculate , but
we simply set this value to five times the value of in the standard KLD sampling.
In line 8, for each hypothesis we constrain the number of particles Nmkld i
to be at
most equal to NlimitHyp , which is the maximum number of particles that can be
reserved for each hypothesis. The new Nkld number of particles which approximates
the belief of the rover is computed in line 10 of the algorithm in Figure 3 as the sum
over i of all the Ncurr
i
.
The Algorithm in Figure 3 can be seen as an extension of the adaptive resam-
pling proposed in [38], since it adapts the sample set to represent the belief of a
rover using also information coming from the other rovers of the multirobot team.
After this resampling phase, a classic Density-Tree clustering [39] (lines 4, 25, 30
of Figure 2) is always performed, providing a set of Nh hypotheses Φi (t) = {φji (t)},
j = 1, . . . , Nh , on the position of the i-th rover, among which the best hypothesis
φbest (t) is selected. Each hypothesis φji (t) consists of the predicted pose pji (t), its
covariance matrix Σji (t), and the associated weight Wij (t), representing its confidence
level:
φji (t) = {pji (t), Σji (t), Wij (t)}. (2.2)
The best hypothesis at time t is defined as
φbest
i (t) = arg max
j
(Φi (t)) = {pbest
i (t), Σbest
i (t), Wibest (t)}. (2.3)
Wi
pi (t)
Nh j
p̄i (t) = (2.4)
X
.
j=1 Nh
Switching rules among the three states are based on the following accordance func-
tion:
q
|Ri (q)|
k
X X (x̂ji (q) − x̂best
i (t))2 + (ŷij (q) − ŷibest (t))2
µk = , (2.5)
q=k−n j=1
n|Ri (q)|
where n is the length of the sliding window used to compute the average in (2.5).
20
2 – Localization
If the rover is in the UN state and it is verifying whether it can switch to PT, n
is set equal to nu2p . If the rover is in the UN state and it is verifying whether it has
to switch back to GL, n is set equal to nu2g . The inner summation in (2.5) averages
the distances among the elements of hi (k) and the best position hypotheses of the
i-th rover. The outer summation in (2.5) performs a moving average of length n on
the results of the inner summation. Therefore µk measures the accordance between
the actual belief on the position of the i-th rover and the average of the beliefs that
the other rovers have on its position at time k.
When µk is lower than a certain threshold µu2p (empirically determined) the
algorithm switches to PT. This phase is aimed at tracking the position of the rover
over time, and it is implemented in a classic way (see [86]). µk is computed also
during the position tracking phase: if µk becomes greater than a given threshold
µp2u , the algorithm switches again to UN.
When the rover is in GL, it switches to UN if and only if the distance among
the hypothesis defined in (2.4) is smaller than a certain threshold µg2u (see Figure
2, lines 50-52 and Figure 2.4).
21
2 – Localization
rovers simulation.
The simulator has also been improved by adding a simple simulated vision sensor
and the support for communication among rovers.
Two environments were considered. The first environment simulates a large logistic
area (see Figure 2.3). The occupied black areas can be thought to represent con-
tainers or similar bulky items stored by transport societies before distribution. The
dimension of the whole environment is 80 × 65 m, the black areas are 20 × 10 m and
the corridors are 5 m wide. A small asymmetry is present in the upper left corner.
The second environment is the hallway considered in [39] and reported here in
Figure 2.5. We ran the 3SMCL algorithm in this environment in order to perform
some comparisons with the approach proposed in [39].
Simulation test 1
In this test we analyze the robustness of the 3SMCL algorithm with respect to
random variations in the initial position of the rovers. The localization error of the
i-th rover is defined as
q
eρi (t) = (xi (t) − x̂best
i (t))2 + (yi (t) − ŷibest (t))2 (2.6)
eθi (t) = θi (t) − θ̂ibest (t) (2.7)
22
2 – Localization
The pose informations of the best hypothesis are given by pbest i (t) and can be
extracted by φi (t), defined in (2.3).
best
We randomly initialize the pose of NR = 6 rovers in free areas of the map, let them
move according to a simple obstacle avoidance behavior, and monitor the localization
error eρi (t) for i = 1, . . . , NR up to t = 2500 s. We are interested in evaluating the
average localization error among repetitions of the experiments, hence we repeat
them several times, each time setting randomly the initial position of the rovers,
and we define ēri (t) as the average of eρi (t) for the i-th rover over ne realizations.
The results are shown in Figure 2.6 for ne = 100.
The localization error eρi (t), i = 1, . . . , NR decreases approximately linearly for
all the rovers, and the mean error among all the 6 rovers (dashed line in Figure 2.6)
reaches a final value below 0.4 m. The 3SMCL algorithm is thus not susceptible to
variations in the initial positions of the rovers. This fact has an important impact on
the application side, in particular when considering robotic applications in logistic
spaces, since the algorithm does not require any particular initial formation of the
rovers, avoiding any human intervention to initially place the rovers in a specific
area of interest.
We now give the definition of the first and the last switching time from one state
to another during the experiments. The first switching time is the moment when a
rover changes its state for the first time, while the last switching time is the moment
when a rover changes its state remaning in the final state. Figure 2.7 shows the
23
2 – Localization
comparison between the first switching time and the last switching time between
the UN and the PT states and between the GL and UN states. The group of six
bars on the left (group “1” on the x-axis) refers to the first switching time, while the
group of six bars on the right (group “2” on the x-axis) refers to the last switching
24
2 – Localization
time. The first three bars of each group are relative to the switching from UN to PT,
while the second three bars are relative to the switching from GL to UN. The blue
bars indicate the minimum among the switching times, the red bars the maximum
and the green bars the average. Observing the second green bars of each group,
which represent the average of respectively the first and the last switching time be-
tween the GL and the UN states, we notice a big difference, since the first switching
time is around 70 s while the last switching time is around 500 s. This behavior
is particularly positive, since rovers may switch to the UN state many times and in
different moments during the localization process, thus avoiding false positive that
may compromise the correct localization of the whole team. Observing then the
first green bar of each group, it can be noticed that the last switching time occurs
only two minutes after the first switching time. This means that the algorithm does
not bounce for a long time between the UN and the PT states.
Simulation test 2
This test has been designed to understand how the localization performance of the
3SMCL algorithm is affected by the number of rovers in the team, in terms of
Cartesian position error.
We define the average position error among the NR rovers of the team over the ne
realizations of the experiments as:
1X r X
ei (t)
NR ρ
r
EN (t) = (2.8)
R
r j=1 i=1 NR
where eρi (t) is the localization error defined in equation (2.6). The results of the
simulations for NR = 1,3,6,9 are reported in Figure 2.8 and in Table 2.1.
It can be clearly seen that one rover is not sufficient to resolve the ambiguity in
localization, and that also three rovers are not enough to assure reliable localization,
since rovers are able to localize themselves correctly only the 98% of the trials. As
soon as 6 rovers are employed, the localization error goes below 2.5 m after nearly
25
2 – Localization
1000 s, and all the trials are successful; the results with nine rovers are comparable
with those obtained with six rovers.
Path planning algorithms become more effective for the rovers relying only on their
position estimations, thus allowing rovers to accomplish in a more reliable way
the assigned task (e.g., handling hazardous events collaboratively). Therefore the
obtained results are particularly relevant in practical applications. Of course, the
exact number of rovers that ensure correct localization of all the members of the
team depends on the size of the area where the rovers move. Future investigations
will be devoted to study the performance of the proposed algorithm with respect to
variations of the ratio between the number of rovers and the area to be covered by
the robot team.
Figure 2.9 finally shows the average first and last switching times from the UN to the
PT states considering 3, 6 and 9 rovers. The switching times dramatically decrease
passing from three rovers to six and remain nearly the same when passing from six
to nine rovers. The performance of the 3SMCL algorithm in terms of switching
time improves increasing the number of rovers of the team, since there is a clear
increase in the speed with which the rovers reach the PT state.
Simulation test 3
This test is aimed at demonstrating that, once the rovers are all in the PT state,
the algorithm is robust even with respect to partial variations of the map. To show
this robustness, we have set up a case study where NR = 6 rovers are deployed in
26
2 – Localization
the same logistic area of the tests 1 and 2. After all the rovers have reached the PT
state, a fork lift is supposed to enter the logistic area in order to remove and add
pallets. The fork lift moves ideally at a constant speed of 1 m/s, and removes or
adds randomly a pallet in the map, employing 3 seconds to perform these operations.
Tests have been performed with a decreasing occupancy percentage, starting from
90% of occupancy in steady state condition, up to 50% occupancy with step of
10%. It is important to say that the informations about the map variations are not
communicated to the rovers, therefore the challenge here for the 3SMCL algorithm
is to maintain the PT condition and to keep the localization error low for all the
rovers. Figure 2.10 shows the localization error eρi (t) for i = 1, . . . ,6, considering
only one realization of the experiment. The first plot shows the localization error
reduction when the rovers, in each test, reach the PT condition. On average after
approximately 700 s all the rovers in each test are in PT, and the algorithm that
simulates the intervention of a fork lift begins to modify the map.
Observing the second plot, which is simply a zoom of the first plot, we see that
the error increases, but only from 0.4 m to 0.6 m. Therefore the PT state of the
proposed algorithm can be considered stable with respect to random variations in
the map.
Simulation test 4
We have applied the 3SMCL algorithm in localization experiments with the map
shown in [39] and reported in Figure 2.5, in order to compare our results with those
27
2 – Localization
Figure 2.10. Simulation test 3: case study with variations in the map.
obtained in [39].
We performed three experiments: two of them with 3 and 6 rovers respectively,
allowing exchange of mutual information (so running the 3SMCL algorithm) and
the third one with 6 rovers, with no information exchange. In this test case the
algorithm reduces to a classical MCL algorithm with KLD resampling.
The number of repetitions is 50, as in Test 2. Figure 2.11 shows the average
position error ENr
R
(t) for NR = 3, with mutual information exchange allowed, and for
NR = 6, with and without mutual information allowed. The position error obtained
with three rovers is comparable with that with six rovers and local communication
allowed and it is significantly reduced with respect to the experiment with six rovers
and communication not allowed. These results are comparable with those obtained
in [39], where at 600 s, in absence of mutual exchange of information a final error
than 4 m was achieved, while in our case it is less than 3 m. Considering instead
the cases with mutual exchange of information, the final error is decreased to less
than 1.2 m (while in [39] it was almost 2 m). It must be noted that in both cases
ultrasound sensors were used, but the results reported in [39] are relative to eight
rovers and averaged over 8 experiments only.
The proposed algorithm shows only a slight improvement in the localization time
with respect to that in [39]. From our point of view the main improvement given
by our work is relative to the possibility of checking if rovers are well-localized, i.e.
they have all reached the PT state. In particular in all these experiments all rovers
reach the PT state; the switching times are reported in Figure 2.12. We notice again
28
2 – Localization
that the switching times on average decrease significantly passing from 3 rovers to
6 rovers. Considering in particular the experiment with 6 rovers, the last switching
time is approximately 500 s on average and at that time the average position error
is below 1.5 m. A remarkable result is that even if the number of rovers increases,
they do not become overconfident about their belief. PT state is always reached
when the rovers are correctly localized.
Experimental tests
We also tested the proposed algorithm in the real world using real robotic plat-
forms. The environment is 20 × 12 m and possesses only little asymmetries. We
use a team of three Pioneer P3DX endowed with ultrasound sensors, SICK LMS200
laser rangefinders and low cost monocular vision sensors. Each rover of the team
29
2 – Localization
is able to identify the others when they are in a certain field of view, using bar-
codes as identificators and vision sensors to detect the barcodes. Figure 2.14 shows
the rovers team, while Figure 2.15 reports a grid representation of the site where
the experiments have been carried out, showing in particular the minor differences
(asymmetries) that distinguish the sides of the environment.
30
2 – Localization
The first test (A) has been performed using two rovers, to show that a well
localized rover can speed up the localization of the other that is still uncertain about
its position. A rover is positioned near the main asymmetry of the environment
(R1 in Figure 2.14), while the other is positioned in a place where there are no
discriminant features that can ensure fast localization (see the same Figure). The
initial robot heading the rovers are indicated by the arrows in Figure 2.14.
The result of the test is reported in Figure 2.16, where the red plot is relative to the
rover R1 and the blue plot to rover R2. The sudden decrease in the localization error
Figure 2.15. The map of the site where the localization experiments
have been carried out.
for rover R2 at 116 s is due to the fact that at a certain point during the experiment
the rovers detect each other and the belief of the rovers are mutually influenced.
The overall result is that the well localized rover (R1) speeds up the localization of
the other rover (R2). This simple example demonstrates that the mutual exchange
of position information among rovers using the proposed algorithm can significantly
speed up the localization process in a real experiments.
31
2 – Localization
The second test (B) is a classical kidnapping test. The two rovers are randomly
placed in the environment and, as shown in Figure 2.17, one rover (blue plot) is
kidnapped approximately at 500 s, and after 150 s is localized again by the other.
Differently from test (A), which only demonstrates that localization can be speed up
using the 3SMCL algorithm, here we notice that the belief of a not well localized
32
2 – Localization
ei (t)
NR ρ
j
(t) = j = 1, . . . , r. (2.9)
X
EN R
i=1 NR
In this experiment we observed that the first and last switching times are the same,
so the rovers do not switch back from the PT state to the UN state. On average
these switching times are around 450 s.
33
2 – Localization
34
2 – Localization
(citare graph-slam) We use displacement from wheel odometry, since it is usually ac-
curate enough, and correct heading using measure from the IMU’s gyroscope, since
heading estimation from wheel odometry is usually bad due to wheel slipping. The
IMU that was used is a medium-cost 3-axis commercial sensor (XSens MTI) which
is commonly used for mobile robotics applications. We first experimented with the
heading estimation given by the internal Extended Kalman filter running onboard
of the sensor, but we found that the heading estimation was very bad when used
onboard of the robot. This is due to the fact that the magnetometers are hugely
affected by magnetic disturbances in the environment. Being the IMU placed on
a metallic platform, the heading information is always biased. For this reason, we
chose to use gyroscopes only.
When the robot is not moving, angular velocity measurements are used for gy-
roscope bias error estimation with a moving average.
35
2 – Localization
36
2 – Localization
phase of the algorithm, which integrates marker detection into the particle filter. We
proceeded to integrate the result of markers detection into our localization algorithm
considering two levels of integration, a loose integration and a tight intregration.
In the loose case detected markers provide directly a robot pose hypothesis,
with a fixed covariance error; in the second case the detected markers are used as
additional inputs for the particle filter-based localization.
In tight coupling the position estimates come both from the laser range finder
and from marker detection algorithm described before. At each time instant t for
each particle we have a measurement z (i) (t) defined as:
where Nr is the number of rays of the laser range finder. If a laser measurement
occurs, the update phase is the usual one. If an output from the algorithm described
before is provided, particles are weighted using the pose estimated from the marker
detection algorithm, using a fixed covariance.
37
2 – Localization
based on the use of a Microsoft Kinect camera. In particular, markers are first
detected using the ARToolkit library, and their pose is then estimated by searching
for square plain areas in their correspondent positions in the 3D point cloud. The
node transforms the 3D pose of the recognized ArTags from the coordinate frame
of ArToolkit to the coordinate frame of the map used for localization. A number of
checks is done in order to discard false positives, which are very common in difficult
lightning conditions such as closed indoor environments. In particular all detection
over a range of 2.5 m are discarded, as well as spot detections and makers poses
which are not parallel to the walls and floor (presenting large roll and pitch angles).
In order for the node to work, two things are needed:
• The appearance and the pose of the markers in the map have to be written
into a configuration file
The localization algorithm that we use is a modified version of the amcl ROS
package. In our version of the algorithm global localization is also possible, as in the
approach that we developed and described in Section 2.3.2. We use the likelihood
field laser sensor model for particle update, since it is faster and better represent
sensor readings compared to the classical beam model, it is smooth with respect to
small changes in robot position and is better suited for small obstacles.
When the robot detects a marker, the correct pose of the robot in the map is
estimated relative to the marker and AMCL is re-initialized around this pose (with
a suitable covariance, in order to deal with visual detection uncertainties) in the
case of loose integration. In the case of tight integration the particles are weighted
according a gaussian sensor model around the estimated pose with a fixed covariance.
It should be noted that the choice of the AMCL parameters is critical for achieving
low localization error, in particular with a challenging scenario. Good localization is
crucial for the subsequent path planning. The most important parameter values that
we use in our modified ROS implementation of the AMCL algorithm are reported in
Table 2.2. The error in laser readings caused by the metal grids has been modeled
in a trivial way by raising the laser_z_hit value of the likelihood field sensor model.
The higher laser_z_rand value accounts for the presence of glass-covered racks. We
experimentally found that a maximum particle size of 10000 is enough for reliable
global localization, and a minimum of 500 is enough for modeling the robot pose
during position tracking.
38
2 – Localization
Table 2.2. Parameters used for our implementation of the AMCL ROS node.
Parameter Value
max_particles 10000
min_particles 500
laser_z_hit 0.5
laser_z_rand 0.5
update_min_d 0.1
update_min_a 0.25
resample_interval 1
2.4.4 Results
The approach has been first tested in a corridor-like environment at Telecom Italia
Lab, using a Coroware Corobot Classic robotic platform with a Microsoft Kinect
sensor and a XSens MTI IMU. During preliminary experiments two issues have
shown up:
• markers not being detected, due to their bad placement in the environment
In a successive implementation of the algorithm false detections have been ruled out
by implementing a number of checks on the placement of the markers: for instance,
markers are only detected within a range of 2.5m, since the accuracy of the Kinect
sensor degrades at larger distances. Moreover, marker detections which are not
aligned with the walls are discarded. Marker placement is crucial, as markers need
to be placed only at the end of long symmetric corridors and should face the camera
mounted on the robot, in order to be correctly detected. From the results we saw
that the robot is able to recover its pose estimate at the end of the longer corridors
most of the times.
Figure 2.20 shows an example of an experiment in this environment. In 2.20(a)
the robot is travelling along a corridor with few asymmetries (open doors). Moreover
many of the doors in the corridor are in a different state (open, closed) than what
is present on the map. The real robot position in the environment is shown as a
blue square in the simulator (right), while the position estimated by the localization
algorithm is represented by the red particles on the left. It can be seen the estimated
position is incorrect, due to inaccurate odometry and lack of map features. An ArTag
is attached on the wall in the corridor in front of the robot, and its pose (position
and orientation) is known inside the map of the environment (it has been measured
by hand). In 2.20(b), as the robot moves forward, it is able to detect and recognize
39
2 – Localization
(a)
(b)
(c)
40
2 – Localization
the marker using the onboard Kinect sensor. The pose of the marker is estimated
relative to the robot, and the pose of the robot relative to the map is retrieved.
In this example, we use the loose integration, and the particle filter is re-initialized
aroud the estimated robot pose with a suitable covariance, in order to take into
account detection errors. In 2.20(c) the robot continues to move along the corridor
and is able to recover a correct pose estimate.
Another test has been done inside the corridors of the data center. Although the
Corobot platform was able to correctly localize even without the need of markers, two
markers have been placed in key points of the environment for improving robustness
and implementing a fail-safe service.
In a third experiment, we tested localization and path planning performances in
another datacenter room, with a previously created map. This particular scenario
presents some challenges for localization, such as its symmetry and irregular surfaces.
Data-center areas are organized as regular grids, with rows of racks and corridors.
In particular, server racks are usually covered with metallic grids, which introduce
large errors in range measures, as laser rays are passing through the grids introducing
measurement errors. Moreover, the symmetry of the environment poses a challenge
for loop closings detection.
A path composed by a certain number of waypoints was created. The tasks asso-
ciated with each waypoint were thermal camera image acquisition and temperature
measurement. Figure 2.21 shows the results of a typical experiment. It can be noted
that the robot correctly localized itself and was able to follow the given path.
We measured the localization accuracy at different points inside the map. The
ground-truth for each reference point (position and orientation) has been measured
by hand. Table 2.3 show that the average position and rotation errors are higher
than ..., but are still adequate for the subsequent path planning, as shown in the
next experiment.
41
2 – Localization
2.5.1 Introduction
Localization and map building are closely linked problems, and are often referred to
as Simultaneous Localization and Mapping (SLAM) (see Chapter 3). While building
maps when robot poses are known is a tractable problem with limited computational
complexity, in the SLAM case the uncertainty in measures and in robot pose esti-
mates makes the problem much more complex in terms of data association and
computational requirements. Moreover, the vast majority of the effort has been
devoted to the case of mapping of static environments.
Many real applications require updated maps of the environment that vary over
time, starting from a given initial condition. This is for instance the case of robotic
applications in logistic spaces, where robots have to track the presence of goods in
a certain area. The goods are stored in appropriate places, but during the day they
can be removed and substituted by other items many times.
This work deals with the problem of keeping the map updated, in order to guarantee
the robots localization; specific goods tracking procedure are not investigated.
42
2 – Localization
The initial map can be a-priori known or built using a classical SLAM algorithm.
In the latter case, the environment is supposed to be static during the initial SLAM
process. Then the environment starts to change, and the map needs to be updated,
as modifications are sensed by the robot.
During the map update process, there are mainly two issues to be considered:
• the algorithm performing the map update has to be computationally light and
to use limited memory.
The first item is mandatory since in real applications map variations may occur
continuously for a long period of time (even an entire day), and the map updating
process should be carried out as long as possible without diverging.
The second item is very important when the map updating process has to be carried
out in parallel with other tasks, (e.g., team coordination and planning, surveillance).
A combined algorithm for map update and robot localization is proposed in [90],
while a method for updating the map dynamically during the process of localization
is developed in [63]. This method seems very promising but its experimental valida-
tion is still only partial. The problem of grid mapping through belief propagation
is investigated in [76], assuming that good estimates of the robot position are avail-
able. A spatial Markov random field model is applied to a minefield mapping. Two
things are required: the list of the type of objects that populate the environment
and the a-priori knowledge about where the objects are positioned.
Finally [18] introduces a dynamic map representation for mobile robots that adapts
continuously over time. It solves the stability-plasticity dilemma (the trade-off be-
tween adaptation to new patterns and preservation of old patterns) by representing
the environment over multiple timescales simultaneously.
In this Section we propose a methodology that is able to:
• merge the local map with the global one used for localization.
The variations of the environment are detected using a technique called weighted
recency averaging, while the local maps are merged employing the Hough transform.
43
2 – Localization
t be denoted by
p̂(t) = {x̂(t), ŷ(t), θ̂(t)}. (2.11)
By the expression correctly localized robot we indicate a robot that is in the position
tracking state, as defined in [11] and [12].
An occupancy grid map of the environment (used in the localization algorithm to
track the robot position over time) is available to the robot. Such a map could have
been manually created or previously built by a SLAM algorithm.
At discrete instants k the environment changes, and consequently the robot has to
modify its map, to take into account the variation. We call this phase a ∆-mapping
step.
We define the set of new maps collected up to time k as
M(K) = {Mk }, k = 0, . . . , K.
M0 is the initial map, obtained by the SLAM procedure. The goal of the algorithm
developed in the next section is to provide an estimate M̂k of the map at each time
step k.
∆-Awareness
The ∆-Awareness block detects persistent variations in the environment, using a
technique called weighted recency averaging, which is normally applied in problems
of tracking non-stationary processes.
44
2 – Localization
45
2 – Localization
Figure 2.23. Comparison between the trend of the wavg (t) over time when no
modifications in the map occur (a), and when (b) the robot passes near a variation.
Store Scan
The purpose of the Store Scan block is to select the laser scans suitable for building
the local updated sub maps. These scans are stored in a set called S(k).
In order to maximize the probability of storing scans when modifications in the map
have occurred, if rd (t) > 0 a scan is selected with a probability given by v < rd (t),
where v is a uniformly distributed random variable U (0,1).
46
2 – Localization
Figures 2.24 and 2.25, respectively, clarify what happens when there are no modifi-
cations in the map and when a robot perceives a variation in the environment.
In Figure 2.24 the values taken by wf ast (t) are higher than those taken by wslow (t);
Figure 2.24. Trend of wavg (t), wf ast (t), wslow (t), rd (t) in absence of modi-
fications in the environment.
Figure 2.25. Trend of wavg (t), wf ast (t), wslow (t), rd (t) in presence of mod-
ifications in the environment.
from (2.13) it follows that rd (t) is negative (in Figure 2.24 it has been set to zero)
and the ∆-mapping process does not begin. In Figure 2.25 the values assumed by
wf ast (t) are lower than those assumed by wslow (t) and hence rd (t) is greater than
47
2 – Localization
zero.
The ∆-mapping process begins when rd (t) becomes positive for the first time, and
ends when rd (t) falls again below zero.
Unfortunately we experienced that this method suffers from a couple of problems.
It can be observed that even if the environment does not change, the divergence ratio
may increase in specific situations, such as when the rotational velocity of the robot
is greater than zero, causing the beginning of the ∆-mapping process also when the
environment has not changed.
The second problem is that even if the ∆-mapping process has correctly begun due
to a variation in the environment, the quality of the scans acquired by the laser
rangefinder could be insufficient.
The accuracy of the scans is related to the precision of the robot pose estimate, in
particular to its heading estimation. A significant source of inconsistency is to be
found in heading variance which, if large, can cause divergence in just a few updates.
In our case, scans acquired with a wrong heading may lead to bad-aligned sub-maps.
Moreover, the problem of a bad quality of the scans can be related to the first high-
lighted matter, i.e. when the robot is rotating.
In fact it frequently happens that the robot encounters map variations while turning,
hence when its rotational velocity is greater than zero. In this situation, even if the
map has changed, some acquired scans may have a wrong heading, and therefore
they should be discarded.
The proposed solution for a proper scan discarding is discussed in the next subsec-
tion.
The local degeneracy of the particle filter algorithm is measured by the effective
sample size Neff of the particle filter defined as
Ns
Neff = (2.14)
1 + Var(wt∗ i )
where Ns is the number of particles and wt∗ i is the true weight. Neff cannot be
exactly calculated, but an estimate can be obtained by computing
1
N̂eff = PNs (2.15)
i=1 (wt )
i 2
A relation has been empirically found between the local degeneracy of the particle
filter and the moments in which the robot is turning. Three cases have been con-
sidered to investigate such a relation. In the first one we compute the values of Neff
while the robot is going straight and some environment changes occur. In the second
one the robot turns with a constant rotational velocity (without any environmental
modification), while in the third one it follows the blue (not straight) path shown
in Figure 2.26. The results of computing Neff in the three examples are shown in
48
2 – Localization
Figure 2.26. Path followed by the simulated robot to evaluate Nef f in the
third case. On the left there is the simulated robot and environment, on the
right our graphical user interface.
49
2 – Localization
Scan Alignment
The Scan Alignment block produces a local map performing a consistent registration
of the collection of scans contained in S(k). The approach maintains all the local
frames of data as well as the relative spatial relationships between local frames,
modeled as random variables and derived from matching pairwise scans or from
odometry. Then all the spatial relations are combined using the Rmap algorithm
(see [5]).
The output of this block is a consistently aligned map Sk .
Map Merge
The Map Merge block receives the aligned map Sk provided by the Scan Alignment
block and merges this map with M̂k−1 , which is the map the robot is currently
using for localization. The output of this merge process is M̂k . We adopted the
algorithm proposed in [22], whose theoretical foundations are briefly recalled in the
next subsection.
i=1
50
2 – Localization
i=1
Local maxima in the spectra cross correlation reveal how M 2 should be rotated in
order to align it with M 1. The proposed algorithm therefore extracts a set of n local
maxima (n being a specified parameter), and returns n transformations.
Given a candidate rotation φi , the corresponding translations ∆xi , ∆y i can be
in principle easily determined. Let M 3 be the map obtained rotating M 2 of φi , i.e.
M 3 = T (0, 0, φi )M 2. (2.17)
Translations needed to overlap M 3 to M 1 can be obtained computing the bidimen-
sional correlation in the following way. First we compute the X-spectrum of a binary
image M , having r rows and c columns, as follows:
M (i, j) 1 ≤ j ≤ c
( P
r
SXM (j) = i=1
(2.18)
0 otherwise
M (i, j) 1 ≤ j ≤ r
( P
c
SYM (j) = i=1
(2.19)
0 otherwise
Given SXM 1 and SXM 3 , ∆xi can be easily inferred by looking at the global maxi-
mum of the cross correlation between them, defined as
+∞
CCXM 1M 3 (τ ) = SXM 1 (l + τ )SXM 3 (l) (2.20)
X
l=−∞
Similarly to the case of correlation between Hough spectra, multiple local maxima
may emerge when computing the cross correlation between X-spectra. Each of the
maxima is associated with a candidate translation to align the two maps and can
be individually tracked.
51
2 – Localization
Figure 2.28. The map used to validate the Map Merge block, used also in
the simulation tests.
(a) (b)
Figure 2.29. The two parts of map in Figure 2.28 used to validate the
Map Merge block.
For each submap we have generated every possible x translation in the range
[-10;10] pixel with resolution 1 pixel, for every x translation every possible y trans-
lation in the range [-10;10] pixel with resolution 1 pixel, and for every y translation
every possible rotation in the range [0;360] degrees with resolution 1 degree. To
evaluate the performances of the merging procedure, given two maps M 1 and M 2
we consider the following simple acceptance index:
(
0 if agr = 0
w(M 1, M 2) = agr(M 1,M 2) (2.21)
agr(M 1,M 2)+dis(M 1,M 2) / 0
if agr =
52
2 – Localization
that M 1 is free and M 2 is occupied and vice-versa. Notice that only free or occupied
cells are considered, while unknown cells are ignored.
We have considered four possible merging algorithms. The first one (MA1) is the
one presented in Section 2.5.3, while the other two are the “robust” versions of the
first one.
The second one (MA2) evaluates other candidate solutions rotating the ones com-
puted by MA1 in the range [-0.5;0.5] degrees with a step of 0.25 degrees. The third
one (MA3) evaluates other candidate solutions rotating again the ones obtained by
MA1 in the range [-1;1] degrees with a step of 0.25 degrees. In the fourth one (MA4)
a further translation is added. The other candidates solutions are determined rotat-
ing the MA1 solution in the range [-0.25;0.25] degrees with a step of 0.25 degrees,
and adding to each rotation a translation of +1 and -1 pixel.
The results in terms of acceptance and computation time of the four merging algo-
rithms are indicated in Table 2.4.
For each method we averaged all the acceptance indexes obtained for each ro-
tation (and also translation for M4). The computation time for each method has
been computed as the average among all the single computation times for each ro-
tation/translation. The Map Merge validation has been performed on a 2.5 GHz
Intel© Core2 Duo platform with 2 GB of RAM memory.
After the evaluation of the accuracy of the four proposed methods and their compu-
tation time, we chose to use the MA1 method in the Map Merge block, because the
acceptance obtained is comparable with the other methods, and the computation
time is lower than the others.
53
2 – Localization
related to nr = 10 averaged runs, and the ∆-mapping updating process lasts for
approximately two hours each run.
We define the localization error of the robot as the distance between the ground-
truth Cartesian position (xgt
i (t), yi (t)) and its Cartesian position estimation given
gt
by (2.11) as
q
eρ (t) = (xgt (t) − x̂(t))2 + (y gt (t) − ŷ(t))2 . (2.22)
The localization error is shown in Figure 2.30. The localization error remains lower
than 0.5 m, except for a period of time where the error increases. This is due to the
loss of the localization by the rover in one of the runs. However the robot quickly
recovers correct localization.
Table 2.5 shows the acceptance index (2.21) mediate over the single run, along
with the number of variations occurred detected per run.
The number of detected variations in each run depends on the path followed by the
robot. In this work the robot wanders in the environment, and there is no active
mechanism to ensure the detection of all the variations occurred in the environment.
Future works will be devoted to enhance the motion strategy. The average quality
of the map is comparable with the quality of a map obtained by a Rao-Blackwellized
54
2 – Localization
SLAM process in static conditions, since in that case the value of the index obtained
is 0.98.
In another test, we have performed a ∆-mapping process lasting for approxi-
mately seven hours, for a total number of 420 variations, to check the long operativ-
ity performance. Figure 2.31 shows the status of the grid map respectively after few
variations and after many variations. The map shown in Figure 2.31 (c) contains
open squares because sometimes the robot may not be able to completely map a
variation. Moreover, the a-priori knowledge of the geometric aspect of the elements
inside the environment (e.g., the goods shape) is not used to complete the final map
obtained by the proposed procedure.
In Figure 2.32 the localization error of the simulated rover is shown. Figure 2.38(b)
shows the acceptance index (in the range 0-1 from worst to best) related to the
map quality over time. After seven hours of operation the rover localization error
remains acceptable (below 1 m) and the quality of the map is still comparable with
the quality of a map obtained by a Rao-Blackwellized SLAM process.
55
2 – Localization
Figure 2.31. The initial map (a), the map after few variations (b), and the map
at the endo of the ∆-mapping process.
have to modify their map to take into account the variation. This phase is called
∆-mapping step.
The set of new maps collected up to time k is defined as
M(k) = {Mk }, k = 0, . . . , K.
M0 is the initial map, obtained by the SLAM procedure. The goal of the devel-
oped algorithm is to provide for each robot an estimate M̂k of the map at each time
step k. In order to take advantage of the multi-robot scenario these updated maps
must be spread among the other robots, and this information has to be merged in
order to create a map that is a good estimate of the current state of the environment.
56
2 – Localization
Figure 2.33. The trend of the quality of the map over time.
The guidelines of the proposed approach are described hereafter, whereas details
about the specific processes of variations awareness, local ∆-mapping and map merg-
ing are given in subsection 2.6.2.
In the proposed ∆-mapping approach the concept of time-map is introduced to
merge in an appropriate way the changes detected locally by a robot and the updated
maps received from the other team members. In a grid map each cell represents the
belief on the occupation value of the corresponding area. Since the environment
changes over time, the reliability of the stored value for the cells decreases over
time. Therefore to each cell in the map a value in the range [0 − 1] is assigned,
related to the time passed since the cell has been visited for the last time. The set
of these values at each time step is called time-map and defined as Tt .
The outline of the ∆-mapping algorithm, which runs on board of each rover, is
described in Algorithm 4.
The algorithm takes as inputs the previous map M̂k−1 and the time-map Tt−1 .
p and l are the current robot pose and the current laser range reading respectively,
57
2 – Localization
P and L are two matrices in which the values of p and l are collected as
x̂1 , ŷ 1 , θ̂1
P =
..
(2.24)
.
n n n
x̂ , ŷ , θ̂
l1
.
..
L = (2.25)
n
l
where the n-th entry is the last element stored. These matrices are used to create a
local ∆-map containing the changes in the environment detected by the robot.
The time-map Tt is updated every time a laser scan is available to the robot; a
ray tracing procedure is applied for each angle of the scan, assigning a maximum
value equal to 1 to every cell crossed by a ray. At each time step all the values in Tt
are updated according to (2.26), where ∆t is the time elapsed from the last update
of Tt , and Ct is a time constant which defines the forgetting speed.
∆t
!
Tt (i, j) = Tt−1 (i, j) · 1 − (2.26)
Ct
58
2 – Localization
The time-map update depends only on ∆t, therefore a common timebase among
the team members is not required, avoiding the need of synchronization techniques
over the net, such as Network Time Protocol (NTP).
The algorithm is then divided into two parts. The first part (lines 2-4) is per-
formed only when the robot receives a map from another robot member of the team,
while the second part (lines 6-15) is performed only if a variation in the environment
has been detected.
If the robot receives a new map M̂ 0 and the relative time-map T 0 , it updates the
state of its map and its time-map by merging them with M̂k−1 and Tt respectively
(line 3). At this point the resulting map contains the modifications perceived by the
other robots (line 4).
If a modification is detected by the ∆-awareness block, recalled in Section 2.6.2,
the algorithm stores the current robot pose and the relative laser range reading (lines
7-8).
If the ∆-awareness block does not detect any modification and P and L are not
empty, a local ∆-mapping is performed, following the approach recalled in subsection
2.6.2 (line 10). The content of these vectors is used to create a local ∆-map ∆M̂ ,
then ∆M̂ is aligned and merged with the old map M̂k−1 , in order to obtain an
updated map M̂k .
Finally the resulting map M̂k and the current time-map Tt are dispatched to the
other team members.
59
2 – Localization
frames, modeled as random variables and derived from matching pairwise scans or
from rover poses stored in P .
The Map Merge block merges the output of the Scan Alignment block with
the map M̂k−1 . The goal of this block is to find a rigid transformation in order
to overlap ∆-map and M̂k−1 to create the current environment occupancy map M̂k .
We adopted the algorithm proposed in [22], which uses Discretized Hough transform
and bidimensional correlation. The Discretized Hough transform allows to find the
rotation that aligns ∆-map with M̂k−1 , then the bidimensional correlation is applied
to compute the translation to overlap the two maps.
Local ∆-mapping in this work consists in the application of the Scan Alignment
and Map Merge blocks.
In the updateTimeMap function in line 3 of Algorithm 4 the current maps M̂k−1
and Tt are updated according to M 0 and T 0 received from the other robots. For all
couples i, j every cell M̂k−1 (i, j) is updated if its value is older than the corresponding
cell M̂ 0 (i, j), so that the most reliable value is used. The information about the
reliability is given by the time-maps Tt and T 0 .
When a robot receives a new map M̂ 0 and a time-map T 0 from another teammate,
it merges it with the previous map M̂k−1 and the local time-map Tt in order to
produce M̂k−1 0
. Tt is also updated. For all couples i, j the value of the cell M̂k−1 0
(i, j)
is set equal to the cell M̂ (i, j) if T (i, j) > Tt (i, j), otherwise it is set equal to
0 0
M̂k−1 (i, j). The value of the cell Tt (i, j) is set equal to T 0 (i, j) if T 0 (i, j) > Tt (i, j),
otherwise it is not modified. Figure 2.34 shows the map merging in a typical case. It
can be noticed how changes received from another robot and local changes detected
by the local ∆-mapping are both merged in a consistent way. This happens because
cells belonging to areas that have been recently mapped have high corresponding
time-map values (close to 1), so recent changes in the map resulting from a local
∆-mapping process are not discarded.
60
2 – Localization
Figure 2.34. Figures show the map merging in a typical case: 2.34(a) shows the
pose of the robots, Robot 1 receives a map from Robot 2 and it uses it to update
its map; 2.34(b) is the current map, 2.34(e) is the current time-map, 2.34(c) is the
received map, 2.34(f) is the received time-map, 2.34(d) and 2.34(g) are the resulting
map and time-map after the merging process.
61
2 – Localization
Areas that need to be explored are then the ones for which the corresponding
value of the time-map is below a given threshold. For each robot, a set of points
is extracted to feed the path planning algorithms from a topological map, which is
constructed from the grid-map representing the areas to be explored.
Many approaches exist to obtain a topological representation from a grid-map,
such as Voronoi diagrams or topological operations. A morphological skeleton repre-
sentation of the map is extracted. The skeleton of an image is a good representation
of the geometrical and topological properties of its shape.
Then a set of points belonging to the skeleton is identified, with the constraint
that each point has to be at a minimum distance from every other point.
Each point becomes one goal point for a suitable path planning algorithm: in this
case the wavefront algorithm [56] is used. These goal points are then allocated to the
team members by a distributed market-based task allocation algorithm described in
the following subsection.
Figure 2.35 shows how the goal points are obtained. The time-map has inverse
colors (black cells have the highest reliability and white ones have the lowest reliabil-
ity). In Figure 2.35(a) red points belong to the skeleton of the areas with reliability
below a given threshold. In this case the team is composed by three robots, so three
goal points are obtained as indicated in Figure 2.35(b).
(a) (b)
Figure 2.35. Time-map and skeleton of areas to explore (a) and final goal
points for three robots (b)
62
2 – Localization
board, and are able to localize themselves in the given environment. It is assumed
63
2 – Localization
that, once the rovers are correctly localized, a virtual fork-lift adds or removes one
container every minute.
The rovers start moving with a simple obstacle avoidance policy. Then when the
∆-mapping process starts the rovers move according to the exploration strategy
described in Section 2.6.3. The quality of the map over time and the localization
error are measured. The error on the estimate of the robot pose is strictly related to
the quality of the map. Every ∆-mapping process induces some degradation of the
map, due to the localization error which cannot be fully compensated by the Map
Merge block.
Even after a consistent number of changes in the environment the rovers keep
a map that is consistent with the environment and therefore the localization error
remains low.
q
eρi (t) = (xgt 2 gt 2
i (t) − x̂i (t)) + (yi (t) − ŷi (t)) . (2.27)
We then define the average localization error for n robots over r runs as
r X n
1X eρi (t)
eρn,r (t) = (2.28)
r j=1 i=1 n
The localization error is reported in Figure 2.37(a). It can be noticed that the
mean localization error remains lower than 0.6 m after approximately 2.5 hours. The
quality of the map for the duration of the test is also inspected. Visual inspection
is often used, and numerical results by using the acceptance index described in [22]
are also provided. They can be used as a measure of similarity between the real
map and the estimated map.
Figure 2.37(b) shows the acceptance index mediate over the n = 3 robots and
over r = 10 runs. After 140 variations the value obtained is 0.97, which is comparable
with the one obtained with a typical grid-based SLAM algorithm (0.98).
64
2 – Localization
1
1
0.9
0.995
0.8
0.99
0.7
acceptance index
0.6 0.985
eρn,r(t) [m]
0.5
0.98
0.4
0.975
0.3
0.2 0.97
0.1
0.965
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 0 20 40 60 80 100 120 140
t [s] number of map variations
(a) (b)
65
2 – Localization
15
1
0.99
10
0.98
acceptance index
eρn,1(t) [m]
0.97
5
0.96
0.95
0 0.94
1 2 3 4 5 6 7 8 9 10 0 50 100 150 200 250 300 350
t [h] number of map variations
(a) (b)
received from robot 1, but this is still not sufficient in order to maintain a consistent
map of the environment.
35
Robot 1
Robot 2
30
Robot 3
25
20
eρi (t) [m]
15
10
0
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
t [s]
66
2 – Localization
refer to the end of each local ∆-mapping, while the peaks in CPU usage only refer
to the computation and assignment of the exploration points. It can be noticed that
after the beginning of the ∆-mapping process the memory usage steadily increases
by only 12 MB, with peaks corresponding to the last phase of each local ∆-mapping
process.
40
cpu usage [percentage]
30
20
10
0
0 2 4 6 8 10 12
180
memory occupation [Mb]
170
160
150
140
130
0 2 4 6 8 10 12
t [min]
Figure 2.40. CPU usage (upper plot) and memory usage (lower plot) in
a simulated experiment.
67
2 – Localization
(a) Initial map (b) Dispatched map after the (c) Dispatched map after the
first change second change
68
Chapter 3
3.1 Introduction
The problem of learning maps is one of the fundamental problems in mobile robotics.
Models of the environment are needed for a series of applications such as trans-
portation, cleaning, rescue, and various other service robotic tasks. Learning maps
passively, i.e. by perceiving sensor data only, requires the solution of two tasks,
which are mapping, and localization. Mapping is the problem of integrating the
information gathered with the robot’s sensors into a given representation. It can
intuitively be described by the question “What does the world look like?” Central
aspects in mapping are the representation of the environment and the interpretation
of sensor data. Localization and mapping cannot be solved independently of each
other. Whenever robots operate in the real world, their observations and motion are
affected by noise. Building accurate spatial models under those conditions is widely
known as the Simultaneous Localization And Mapping (SLAM) problem.
SLAM problems arise when the robot does not have access to a map of the envi-
ronment or its own poses. Instead, all it is given are measurements z1:t and controls
u1:t . The term "simultaneous localization and mapping" describes the resulting prob-
lem: In SLAM, the robot acquires a map of its environment while simultaneously
localizing itself relative to this map. SLAM is significantly more difficult than all
robotics problems discussed thus far: It is more difficult than localization in that
the map is unknown and has to be estimated along the way. It is more difficult than
mapping with known poses, since the poses are unknown and have to be estimated
along the way [86].
There are two main forms of the SLAM problem, which have both been thor-
oughly addressed by the scientific community. One is known as the online SLAM
69
3 – Simultaneous Localization and Mapping
problem: It involves incrementally estimating the posterior over the current pose
along with the map:
where xt is the robot pose at time t, m is the map, z1:t , u1:t are the measurememts
and the controls. This is typically the case of a robot that acquires proprioceptive
and exteroceptive measurements from the environment and, at each time step, in-
crementally includes such information in the posterior describing the robot pose and
a representation of the environment.
The second SLAM problem is called the full SLAM problem. In full SLAM, the
objective is to calculate a posterior over the entire path x1:t along with the map,
instead of just the current pose:
Full SLAM approaches tackle the problem of retrieving the whole trajectory poste-
rior, taking into account at the same time all the available measurements; this batch
solution may occur, for instance, when the data, acquired during robot operation,
have to be processed off-line to produce a meaningful representation of the scenario.
From the algorithmic point of view, literature on SLAM can be roughly divided
in three mainstream approaches: Gaussian filter-based, Particle filters-based and
graphical approaches. The first category includes Extended Kalman Filter SLAM
(EKF-SLAM) [82], Sparse Extended Information Filter [87], delayed-state EKF-
SLAM [35].
70
3 – Simultaneous Localization and Mapping
accurate proposal distribution taking into account not only the movement of the
robot but also the most recent observations. It has since become one of the most
widely used approaches for robotic mapping.
While EKF-SLAM and FastSLAM are the two most important solution methods,
newer alternatives, which offer much potential, have been proposed including the
use of the information-state form [34].
71
3 – Simultaneous Localization and Mapping
and Montemerlo [88] enabled the estimation of large maps by reducing the problem
and then using conjugate gradient optimization. Konolige [50] stressed the prob-
lem of large-scale mapping, by exploiting the structure of the graph. Frese et al.
proposed a multilevel relaxation approach for full SLAM [41].
Olson et al. [70] introduced an efficient method to pose graph optimization by
applying stochastic gradient descent. A tree-based implementation called TORO
was proposed by Grisetti et al. [45]. In [52] the authors introduced a flexible
framework for the optimization of graph-based nonlinear error functions. Some at-
tempts to exploit the mathematical structure of the optimization problem and the
properties of the underlying graph have been proposed in [33] and [32]. Recently,
Sünderhauf et al. [84] introduced the idea of switchable constraints for robust pose
graph optimization, and proposed a strategy for discarding outliers during the opti-
mization. All the aforementioned techniques are iterative, in the sense that, at each
iteration, they solve a local convex approximation of the original problem, and use
such local solution to update the configuration. This process is then repeated until
the optimization variable converges to a local minimum of the cost function. In
particular, when a linear approximation of the residual errors in the cost function is
considered, the problem becomes an unconstrained quadratic problem and the local
correction can be obtained as solution of a system of linear equations, known as
normal equation [32]. In such a case the source of complexity stems from the need
of repeatedly solving large scale linear systems. Moreover, all mentioned techniques
require the availability of an initial guess for nonlinear optimization, which needs be
sufficiently accurate for the technique to converge to a global solution of the prob-
lem. A partial answer to this two problems (computational complexity and need of
an accurate initial guess) came from the work [20]. In [20] the authors proposed a
linear approximation for the pose graph configuration, assuming that the measure-
ment covariance matrices have a block diagonal structure. Roughly speaking, each
inter-nodal measurement describes a relative pose between two nodes in the pose
graph, and the work [20] assumes that the relative position and relative orientation
measurements (that together give the relative pose measurement) are uncorrelated.
The approach requires no initial guess and is shown to be accurate in practice.
While a large amount of work has been done on graph optimization, less work
has been devoted to the implementation of robust front-ends for the creation of the
actual graphs for real-time applications involving mobile robots using traditional
distance sensors, such as laser range finders [51, 59]. In particular the graph opti-
mizer assumes that all constraints are correct. A number of consistency checks are
thus required to minimize the number of false detections. For example in an office-
like environment there can be lack of significant features along a corridor or there
can be ambiguity due to symmetries in the environment and repetitive structures.
In this thesis work we propose two contributions: (i) we propose a linear approxi-
mation to the problem of graph optimization in the 2D case, extending the approach
72
3 – Simultaneous Localization and Mapping
θi is its orientation. The available measurements usually describe the relative pose
between pairs of nodes, for instance ξi,j = pj pi .
For instance a measurement ξ¯i,j between node i and node j is in the form
Ri> (pj − pi ) ∆
" # " #
ξ¯i,j = ξi,j + i,j = + i,j
, (3.3)
θj − θi δi,j
where ξi,j is the true (unknown) relative pose between node i and node j, i,j ∈
R3 is the measurement noise, Ri ∈ R2×2 is a planar rotation matrix of an angle
θi and ∆ i,j and i,j are the (possibly correlated) Cartesian and orientation noise.
δ
73
3 – Simultaneous Localization and Mapping
Now we assume that the regularization terms have been correctly computed,
according to [20], and we call δ̄i,j the regularized measurements, i.e., we define
δ̄i,j = δ̌i,j − 2ki,j π. Then the measurement model becomes:
∆
" # " # " #
¯l
∆ Ri> (pj − pi )
i,j i,j
= + . (3.5)
δ̄i,j θj − θi δi,j
It is convenient to adopt graph formalism to model the problem: each pose can be
associated to a node of a directed graph G(V, E) (often referred to as pose graph),
where V = {v0 , . . . , vn } is the set of nodes and E (graph edges) is the set containing
the unordered node pairs (i, j) such that a relative pose measurement exists between
i and j. By convention, if an edge is directed from node i to node j, the corresponding
relative measurement is expressed in the reference frame of node i. We denote with
m the number of available measurements, i.e., |E| = m.
It is well known in literature that the maximum likelihood estimate of network
configuration X attains the minimum of the following cost function (see [20] and
the references therein):
" #> " #
X Ri> (pj −pi )− ∆ ¯l Ri> (pj −pi )− ∆ ¯l
f (X) = i,j Ωi,j i,j
θj −θi − δ̄i,j θj −θi − δ̄i,j
(i,j)∈E
74
3 – Simultaneous Localization and Mapping
Roughly speaking, this essentially requires that the relative position and relative
orientation measurements (that together give the relative pose measurement) are
uncorrelated. In order to present the subsequent derivation we need to rewrite
the cost function (3.6) in a more compact form. For this purpose we define the
unknown nodes’ position p = [p> 1 . . . pn ] and the unknown nodes’ orientation θ =
> >
Ω∆ Ω∆δ
" #
.
Ω= , (3.7)
Ωδ∆ Ωδ
¯ l )> δ̄ > ]> .
such that Ω is the information matrix of the vector of measurements [(∆
Then, according to [20], the cost (3.6) can be written as:
" #>" #" #
A> ¯l RΩ∆ R> RΩ∆δ A> ¯l
f (x) = 2 p − R∆ 2 p − R∆ (3.8)
A> θ− δ̄ Ωδ∆ R> Ωδ A> θ− δ̄
where:
• A is the reduced incidence matrix, obtained from the incidence matrix A (see
“Preliminary and Notation” Section), by deleting the row corresponding to
node v0 ;
75
3 – Simultaneous Localization and Mapping
The residual errors in the cost function (3.8) are described by the following
vector, whose entries represent the mismatch between the relative poses of a given
configuration x and the actual relative measurements.
" #
A> ¯l
.
r(x) = 2 p − R∆ (3.9)
A> θ− δ̄
Before presenting the proposed approach we anticipate the main intuition behind
the algorithm. The cost function is quite close to a quadratic cost function, since the
last part of the residual errors in (3.9) is linear, and the overall cost function (3.8)
becomes quadratic as soon as the rotation matrix R is known. Therefore, the basic
idea is (i) to obtain an estimate of nodes orientations exploiting the linear part of the
residual errors in (3.9), (ii) to use the estimated orientation to compute an estimate
of R, and (iii) to solve the overall problem in the optimization variable x. This basic
intuition is the same motivating [20], although here the derivation is made more
complex by the presence of the correlation between position measurements ∆ ¯ l and
orientation measurements δ̄.
We are now ready to present the proposed linear approximation for pose graph
optimization, whose properties will be analyzed in Theorem 1.
The effectiveness of the linear approximation computed using Algorithm (5) is
assessed by the following result.
Theorem 1 Given the inputs {∆ ¯ l , δ̄, Ω, A, A2 }, and assuming the the information
matrix Ω to be positive-definite, the following statements hold for the quantities
computed in Algorithm 5:
The first claim assures the uniqueness of the outcome of the proposed algorithm
(no indetermination in the solution of the linear systems). The second claim states
that, with arbitrary high probability α, we can bound the Mahalanobis distance
between the estimate θ̂ and the true (unknown) nodes’ orientation. Finally, the last
claim assures that the proposed approximation improves over the initial guess x̂,
applying a Gauss-Newton step.
76
3 – Simultaneous Localization and Mapping
Ωz z = bz (3.10)
with:
" # h i>
I2m 02m,n ¯ l )> δ̄ >
bz = Z > Ω (∆
Z= , and, (3.11)
0m,2m A> Ωz = Z > ΩZ
. ˆl > >
Call the solution of the linear system ẑ = [(∆ ) θ̂] .
3. Solve the following linear system in the unknown x = [p> θ> ]> , given ŷ, see
(3.12), and Ωy , see (3.13):
Ωx x = bx (3.15)
with:
" #
A>
2 02m×n bx = B > Ωy ŷ
B= , and, (3.16)
0n×2n In Ωx = B > Ωy B
77
The solution of the linear system (3.15) is the proposed linear approximation
of the pose graph configuration: x∗ = [(p∗ )> (θ∗ )> ]> .
3 – Simultaneous Localization and Mapping
The relative pose measurements of the datasets {FR079, CSAIL, M3500, M10000}
are available online [55], while the measurements of the dataset INTEL were obtained
through a scan matching procedure, from the raw sensor data, available at [55]. The
INTEL dataset is the same studied in the work [20]. The relations available at [55]
only describe the relative pose measurements, while we are interested to test the
behavior of the approaches for different measurement covariance matrices. In par-
ticular, for each dataset we consider three variants, each one corresponding to a
different choice of the covariance matrix. The first variant (e.g., FR079, I) uses
identity matrices as measurement covariances, i.e., the noise of the relative pose
measurement between node i and node j is ji ∼ N (03 , I3 ). The second variant (e.g.,
FR079, Ps ) uses a structured covariance matrix, as in eq. (3.6). The third variant
(e.g., FR079, Pf ) uses full covariance matrices obtained as follows.
Accuracy. We first show qualitative results for the proposed approach on each
dataset. The estimated trajectory for each dataset is reported in Figure 3.2. For
a quantitative evaluation of the accuracy of the approaches we report the optimal
value of the cost function (3.6) attained by each of the compared techniques.
we use the SLAM benchmark metrics proposed in [53]. Such metrics provide
a tool for comparing SLAM approaches that use different estimation techniques or
different sensor modalities in terms of accuracy. For each dataset we consider three
scenarios, in which different measurements covariance matrices are considered: in
the first case the covariance matrix is chosen to be the identity matrix (Pij = I 3 ); in
78
3 – Simultaneous Localization and Mapping
the second scenario we use a structured covariance matrix; in the third scenario the
covariance matrix is full. Our implementation of the algorithm is written in C/C++
and uses the CSparse library [31]; the tests are conducted on a PC equipped with
an Intel Core i7 3.4 GHZ and 8 GB of RAM.
Figure 3.2. Estimated trajectory for each of the considered datasets: fr079 (a),
csail (b), intel (c), M3500 (d), M10000 (e)
We now compare our approach, both in the case of structured covariance matrix
and full covariance matrix, with the other optimization techniques. Table 3.2 shows
the results in terms of the of the constraint satisfaction metrics while Table 3.3
reports the results in terms of computational time.
From the results, we can see that the Gauss-Newton approach is accurate enough
to be used as a reference value but slow. g 2 o is as accurate as Gauss-Newton on
79
3 – Simultaneous Localization and Mapping
most scenarios, with only slight differences, while being much faster. TORO is
both less accurate and slower compared to the other approaches. Our three-phase
approaches are shown to be faster than all the other optimization techniques, while
approaching the reference value provided by Gauss-Newton. Only in one scenario
(M3500P f ) the accuracy is lower than Gauss-Newton and g 2 o. While the technique
proposed in this work is relatively more accurate than the approach with structured
covariance matrices, this comes at the cost of a slightly higher computation time. As
only in the third scenario for each dataset we are using full covariance measurement
matrices, we do not see any difference between our old approach and the new one on
the first two scenarios in terms of accuracy. We should also remark that while our
approach is based on a linear approximation, as opposed to the other approaches,
the approximation is always more accurate than TORO in terms of the constraint
satisfaction metric, and in one case more accurate than g 2 o, which is the state of
the art. g 2 o is more accurate then our approach in term of cost-function evaluation
in the other cases, but slower than linear optimization approaches.
80
3 – Simultaneous Localization and Mapping
Table 3.3. Average computation time (in seconds) for the compared approaches.
Linear Approximation Linear Approximation GAUSS
TORO g2 o
(structured covariance) (unstructured covariance) NEWTON
I 5.80E-03 8.15E-03 1.99E-01 3.19E-01 1.05E-02
FR079 Ps 5.76E-03 7.87E-03 2.00E-01 3.39E-01 1.07E-02
Pf 5.81E-03 8.16E-03 2.00E-01 3.04E-01 1.06E-02
I 5.72E-03 7.55E-03 2.65E-01 2.89E-01 1.01E-02
CSAIL Ps 5.53E-03 7.46E-03 2.01E-01 2.90E-01 1.01E-02
Pf 5.59E-03 7.50E-03 2.66E-01 2.88E-01 1.03E-02
I 7.10E-03 9.49E-03 5.87E-01 4.15E-01 1.32E-02
INTEL Ps 7.01E-03 9.49E-03 4.90E-01 3.89E-01 1.31E-02
Pf 6.98E-03 9.47E-03 5.91E-01 4.01E-01 1.31E-02
I 3.26E-02 4.04E-02 5.81 1.57 7.07E-02
M3500 Ps 3.25E-02 4.03E-02 4.84 1.61 7.06E-02
Pf 3.26E-02 4.05E-02 5.82 1.61 7.14E-02
I 3.55E-01 4.86E-01 2.21E+02 1.73E+01 6.93E-01
M10000 Ps 3.57E-01 4.89E-01 2.21E+02 1.83E+01 6.96E-01
Pf 3.55E-01 4.86E-01 4.11E+02 1.77E+01 6.91E-01
81
3 – Simultaneous Localization and Mapping
Odometric edges can be built starting from the results of scan matching, from
wheel odometry only, from a combination of wheel odometry and scan matching (in
this case the translation estimation is given by wheel odometry, while the angular
rotation estimation is given by scan matching), or a combination of scan matching
and IMU (used for rotation estimation only as well) (Algorithm 5 lines 1-2). The
best way to retieve odometric edges depends on the sensors and the nature of the
environment (i.e., in a corridor-like environment the translation estimated by wheel
odometry could be more accurate than the one estimated by scan matching).
For scan matching we use the Canonical Scan Matcher (CSM) algorithm de-
scribed in [23]. The algorithm implements iterative closest point (ICP) procedure
using a custom point-to-line distance metric, in order to find the relative roto-
translation between two laser scans. ICP can benefit from an initial guess, if present.
The pseudo-code describing this module is shown in Algorithm 5. In our imple-
mentation the initial guess for scan matching can be omitted, or given by odometric
information. Odometric information can be provided by wheel odometry, if present,
or by an inertial measurement unit (IMU) if present.
A new graph node is created and added to the list every nodeStep steps (line 4)
and only if the robot has moved a minimum linear or angular distance since the last
node was added (line 7).
For each odometric constraint the covariance matrix is taken from the one esti-
mated by CSM if scan matching is used, otherwise it is set to default values.
We then check for possible loop closings between the current node and all pre-
vious nodes (line 11). Loops detection is described in the next Section. If at least
one loop is found (line 12), we save the previous graph (nodes and constraints) (line
13) and add the new loop constraints to the graph (line 14). We then optimize the
graph using the optimizer described in Section 3.4 (line 15). Finally, we check the
consistency of the optimized graph (see Section 3.3.2) and, in case the consistency
is not validated by the checks, we remove the loops found from the graph and we
restore the previous graph (lines 16-17).
82
3 – Simultaneous Localization and Mapping
by CSM. Another case is when the two poses have time-stamps which are very close
together (lines 11-12). For example, in the case of a robot travelling along a corridor,
the estimated translation may be unreliable. For orientation-only constraints we set
the rotation part of the information matrix to zero. In all other cases, the constraint
is treated as a full-pose constraint (i.e., the full covariance matrix is used).
Since using simple scan matching between two laser scans leads to several wrong
matches, some rejection procedures must be implemented in order to discard bad
matches.
83
3 – Simultaneous Localization and Mapping
Visibility check
A visibility check ensures that no large obstacle is present between the current pose
and the candidate one using ray tracing. This check establishes if a large fraction
of the laser rays, which lie along the edge connecting the two poses, are crossed by
some obstacle. In this case the loop is discarded (line 6). Experimental tests showed
that the visibility check is useful only in some particular scenarios.
After this checks scan matching is performed and a set of candidate loop closings
is obtained. We then implement other checks in order to further discard possible
wrong matches.
84
3 – Simultaneous Localization and Mapping
Data: n, Graph
Result: loop_list
1 foreach ni in Graph do
2 ξ˜i = ni .p n.p;
3 if kξ˜i .ρk > ∆L 2
then
4 continue;
5 end
6 if not visibilityCheck(n, ni ) then
7 continue;
8 end
9 ξi ← performICP(ni .z, n.z, ξ˜i );
10 ξ.type ← f ull_contraint;
11 if |ξi .θ| > θth or ξi .∆t < ∆tth then
12 ξ.type ← orientation_only;
13 end
14 if ξi .ICP _err ∈ ICP ; ICP ] then
/ [min max
15 continue;
16 end
17 if kξi .ρ − ξ˜i .ρk > τρ or ξi .θ − ξ˜i .θ > τθ then
18 continue;
19 end
20 loop_list.addContraint(n, ni , ξi );
21 end
Algorithm 6: Loop closing constraints creation.
does not hold true, where ξi and ξi∗ are the edges before and after the optimization.
85
3 – Simultaneous Localization and Mapping
86
3 – Simultaneous Localization and Mapping
Graph
Graph
Raw LAGO
Creator
data Optimizer
graph_slam
Localized
Node poses
scans
Map
Creator
grid_mapper
87
3 – Simultaneous Localization and Mapping
3.7.3 Data-center
We also show the results of our algorithm in two real large-scale indoor environments.
The first scenario is a data-center room of dimensions 25x40m, composed by free
corridors and rows of racks. In this experiment a Corobot rover was used. Wheel
88
3 – Simultaneous Localization and Mapping
(a) (b)
Figure 3.6. Mapping results for the data-center room. (a) Resulting map using
our approach; (b) Robot trajectory and loop closings. Red and blue arrows repre-
sent the trajectory; yellow lines represent orientation-only constraints; green lines
represent full-pose constraints.
odometry was used for translation estimation, while an IMU was used for orientation
estimation. The environment for this particular experiment was challenging for
three reasons. Most of the server racks were covered by metal grills, thus making
the scan-matching part prone to errors, as the laser rays were sometimes passing
through the grills introducing big measurement errors. Moreover the symmetry of
the environment posed a challenge for loop closing detection. Finally, the presence
of slippery grills on the floor introduced large odometric errors. The result is shown
in Figure 3.6.
89
3 – Simultaneous Localization and Mapping
Table 3.5. Benchmark metrics: translation error (ηc ) and angular error (ηa ).
Dataset ηc [m] ηa [rad]
INTEL 0.055 0.037
FR 79 0.065 0.032
MIT-CSAIL 0.070 0.053
90
Chapter 4
Human-robot interaction
• The astronaut may only move forward, pointing his face towards the direction
of motion
• The astronaut speed is supposed to be less than 1 m/s when the relative
distance between the astronauts crew and the robot is 1 m
91
4 – Human-robot interaction
• The astronaut is not supposed to wear any visual marker, although, if they
are already included in the mission equipment, they can support the tracking
service
• The astronaut can be standing or can lie on the floor, but a given percentage of
the astronaut body needs to be visible at any time (not occluded by obstacles)
92
4 – Human-robot interaction
ROI selection
The simplest technique to obtain initial object location hypotheses is the sliding win-
dow technique, where detector windows at various scales and locations are shifted
over the image. The computational costs are often too high to allow for real-time
processing. Significant speedups can be obtained by either coupling the sliding win-
dow approach with a classifier cascade of increasing complexity (rejection cascade)
or by restricting the search space based on known camera geometry and prior infor-
mation about the target object class. These include application-specific constraints
such as the flat-world assumption, ground-plane-based objects and common geom-
etry of pedestrians, e.g., object height or aspect ratio. In case of a moving camera
in a real-world environment, varying pitch can be handled by relaxing the scene
constraints or by estimating the 3D camera geometry online.
Classification
After a set of initial object hypotheses has been acquired, further verification (classi-
fication) involves pedestrian appearance models, using various spatial and temporal
cues. These models can be divided into generative or discriminative. In both ap-
proaches a given image (or a subregion thereof) is to be assigned to either the
pedestrian or non-pedestrian class, depending on the corresponding class posterior
probabilities.
93
4 – Human-robot interaction
As for the classifier, two different approaches are common. Using Support Vec-
tor Machines (SVN), in the linear variant or with a kernel function, or AdaBoost.
The latter is especially used to form rejection cascades as mentioned before. The
advantage of support vector machines is, that the training is very easy and after
training a linear support vector machine is easy to compute and thus faster.
Generative models try to create a full probabilistic model of all the variables and
are usually divided into shape models and combined shape and texture models.
On the other hand, discriminative models approximate the Bayesian maximum-
a-posteriori decision by learning the parameters of a discriminant function (decision
boundary) between the pedestrian and non-pedestrian classes from training exam-
ples.
Another distinction is between single-part algorithms and part-based models.
Single-part algorithms commonly use a sliding window technique. Sliding window
techniques scan the image at all relevant positions and scales to detect a person.
Consequently there are two major components: the feature component encodes the
visual appearance of the person, whereas the classifier determines for each sliding
window independently whether it contains the person or not. As typically many
positions and scales are scanned these techniques are inherently computationally
expensive. Fortunately, due to recent advances in GPUs, real-time people detection
is possible (see [19],[75]).
Part-based models are composed by two major components. The first uses low-
level features or classifiers to model individual parts or limbs of a person. The
second component models the topology of the human body to enable the accumu-
lation of part evidence. Part-based people models can outperform sliding-window
based methods (such as HOG) in the presence of partial occlusion and significant
articulations [80]. It should be noted however, that part-based models tend to re-
quire a higher resolution of the person in the image than most sliding-window based
approaches.
Tracking
94
4 – Human-robot interaction
95
4 – Human-robot interaction
Figure 4.1. An overview of the feature extraction and object detection chain.
The detector window is tiled with a grid of overlapping blocks in which His-
togram of Oriented Gradient feature vectors are extracted. The combined vec-
tors are fed to a linear SVM for object/non-object classification. The detection
window is scanned across the image at all positions and scales, and conventional
non-maximum suppression is run on the output pyramid to detect object instances.
4.2.3 Detector
The detection part is based on HOG features and a linear-SVM classifier, which is
trained online over a sliding window of samples taken from previous images on the
basis of previous detections. In oder to reduce the probability of introducing a bias
in the classifier, a subset of older samples is always preserved and introduced in the
96
4 – Human-robot interaction
training.
The classifier is re-trained only when the variance of the weights of the parti-
cles, given by the effective sample size Nef f , increases over a given threshold. We
determined experimentally that we can take Nef f as a coarse measure of how the
erformances of the detector are degrading. This is due to the fact that the appear-
ance of the tracked object has changed significantly, and the classifier needs to be
trained again on newer samples.
In order to improve the performances of the detector, at every time step only
a small area around the last detection, a region of interest is fed into the detector.
This can be done only under the assumption that the tracked object does not move
very fast, but we found that the assumption holds for most kind of objects to be
tracked. This simple expedient can provide frame rates up to 10-25 FPS depending
on the size of the tracked object in the image.
In order to improve the tracking of objects that are moving away from the camera
or the detection of objects in small resolution videos, we implemented a simple
rule: if the last detected window is close to the minimum detectable height of the
classifier, than the region of interest is magnified (digital zoom). There is one
threshold parameter (default is 0) in the OpenCV implementation of the HOG
algorithm that we are using, that allows to vary the accuracy of detection. For
higher positive values of the parameter, less false positives, but more false negatives
are produced. For negative values more false positives but also less false negatives
are produced. Therefore, if no prior detection has occurred the threshold is set to
an higher value(0.2 on our case) to avoid false detections and otherwise lower (-0.2
on our case) to avoid false negatives.
97
4 – Human-robot interaction
w=
X
αi vi
For bootstrapping our detector in the case of people detection we train our classifier
on the INRIA Person Dataset [3]; in the case of object detection we use a suitable
object class from the Pascal dataset [4].
The training is done over a sliding window of samples taken from previous
smaples on the basis of previous detections. Every k1 frames a rectangular area
is taken from the current frame and put into a subset S1 of samples. The position
of the area corresponds to the current object position hypothesis with the highest
weight from the particle filter (see Section 4.2.5 for details). The dimensions of the
area are the same as the latest detection window, as we can assume that the tracked
object will not have big changes is dimensions between subsequent frames. Every
k2 > k1 frames the rectangular area is put into another subset S2 of samples. This
subset is used in order to include also old samples in the training, thus preventing
the problem of overfitting in most cases.
When the dimensions of the subset S1 is above a given threshold the classifier
is re–trained on the current dataset, and the resulting weight vector w is fed to the
classifier.
The state of the particle filter is also used to decide when a new training of the
classifier is needed. When the effective sample size Nef f falls below a given threshold
the classifier is re–trained on the current dataset, and the resulting weight vector is
used instead of the older one.
4.2.5 Tracking
For the tracking of the detected object we use a particle filter implementation. Par-
ticle filters are sequential Monte Carlo methods based on point mass representations
of probability densities. The first advantage of this method over classic Kalman fil-
ters is the ability to cope with non-linearity and non-gaussianity, which are critical
in the case of a moving object. Another important aspect is the possibility to model
multi-modal distributions.
98
4 – Human-robot interaction
In our approach the prediction phase is based on a simple linear motion motion
model. The estimate of the new state of each particle is a linear extrapolation of
the previous state plus Gaussian noise. We chose a simple motion model because
for people and object tracking the advantage of using a stochastic model is not
prominent compared to simpler motion models [73].
The update phase is based on object detections. Particles are weighted based on
the distance from detection hypotheses provided by the adaptive HOG detector.
The resampling phase is based on Kullback-Leibler divergence (KLD). The num-
ber of particles representing the belief on the object pose at each step is adaptive,
so that only the number of particles sufficient to represent the belief distribution is
used.
We implement the adaptive particle filter approach proposed in [38] for robot lo-
calization. Since the approach is generic it can also be adopted for the particular case
of object tracking. In this approach the key-point is to bound the error introduced
by the sample-based representation of the particle filter. The underlying assumption
is that the true posterior is given by a discrete, piecewise constant distribution such
as a discrete density tree or a multi-dimensional histogram. Under this assumption
we can determine the number of samples so that the distance between the Maximum
Likelihood Estimate based on the samples and the true posterior does not exceed a
pre-specified threshold . The distance between the Maximum Likelihood Estimate
and the true distribution is measured by the Kullback-Leibler distance. As we saw
in Section 2.3.2, the number of particles at each step i can be set to
1 2
ni = χ
2 k−1,1−δ
where χ2k−1,1−δ is a chi-square distribution with 1 − k degrees of freedom. This value
is the required number of particles to guarantee that with probability 1 − δ the
Kullback-Leibler distance between the Maximum Likelihood Estimate of the posi-
tion hypothesis and the true distribution is less than .
99
4 – Human-robot interaction
Experiment 1
In Figure 4.2.6 we show a comparison of two different algorithms on some image
sequences taken from the BoBoT dataset (SeqA, SeqB, SeqI ). The first algorithm
is a non-adaptive approach based on HOG features and particle filters. The sec-
ond approach is the proposed approach with adaptive HOG detector and adaptive
particle filters.
The ground-truth data which is available in this dataset is in the form of rect-
angular shapes surrounding the object to be tracked. Since in our approach the
detection window is of fixed size, we do not use this ground-truth directly. We cal-
culate instead for each frame the center of the detected object and we measure the
error between this center and the output of our particle filter St .
Sequence SeqA presents significative changes in appearence and fast movements
of the tracked object; sequence SeqB presents strong changes in background ap-
pearence; sequence SeqI is an example of people tracking with many occlusions
from crossing pedestrians.
We can see how the adaptive approach significantly outperforms the non-adaptive
approach in every image sequence. In particular the adaptive method is able to
recover more quickly from detection errors.
In Table 4.1 we present a comparison between 6 different algorithms. The first
100
4 – Human-robot interaction
250
non−adaptive
adaptive
200
100
50
0
0 50 100 150 200 250 300 350 400
frame number
(a) SeqA
300
non−adaptive non−adaptive
300 adaptive
adaptive
250
250
150
150
100
100
50 50
0 0
0 50 100 150 200 250 300 0 100 200 300 400 500 600 700 800
Frame number Frame number
one is a simple tracking algorithm based on color histograms, the second one is
based on multi-component tracking, the next three are different versions of the
approach proposed in [49] and the last column contains the results form our adaptive
algorithm. For every approach and for every sequence the mean score over the
sequence is reported (calculated as suggested in [1]). Since our detection window
is of fixed size, the results of our algorithm are generally better than the reported
101
4 – Human-robot interaction
score. We show that the results are comparable with the results presented in [49].
Experiment 2
In Figure 4.4(a) we show the detection time for each frame of sequence SeqI from
the BoBoT dataset. The dimensions of the video sequence are 320 x 240 and the
HOG detection window is 64 x 128. In Figure 4.6(b) we show the detection times for
sequence SeqB. The dimensions of the video sequence are 320 x 240 and the HOG
detection window is 64 x 64.
We show that by using a variable region of interest around the tracked object
an high frame rate can be achieved. The peaks in detection time correspond to the
time instants when the region of interests grows bigger or the full image is scanned.
145 76
140
74
135
72
130
Detection time (ms)
70
125
120
68
115
66
110
64
105
100 62
0 100 200 300 400 500 600 700 800 900 0 50 100 150 200 250 300
Frame number Frame number
Experiment 3
In figure 4.5 we show how the number of particles changes over time for a single
run. The test video sequence is SeqI and the maximum number of particles was
set to 5000. We also set a lower bound of 1000 to the number of particles, in order
to avoid the convergence of the particle filter to a wrong position hypothesis in the
case of bad detections. We can see how the number of particles drops and in some
cases could be lower than the lower bound, without any significant downgrade in
tracking performances.
Experiment 4
Successful tests of people tracking and following have been carried out by using a
Microsoft Kinect camera mounted on a mobile robotic platform. The user is detected
using the adaptive technique shown before. We also detect the distance of the user
102
4 – Human-robot interaction
5000
4500
4000
3500
Particles
3000
2500
2000
1500
1000
0 50 100 150 200 250 300 350 400 450 500
Frame number
(a) (b)
Figure 4.6. Example of robot following a user. (a) The algorithm in action.
(b) Estimated trajectory of the user (blue dots) and trajectory followed by
the robot (red line).
relative to the robot directly from the Kinect camera. An Extended Kalman Filter is
implemented for smoothing the estimated position of the user on the ground plane.
The state is composed by position and velocity on the (x, y) axes.
In this way, the robot is able to track the position of the user and follow it.
Figure 4.6 reports the results of an experiment in which the robot is able to track
and follow a user in real-time along a series of corridors.
103
Chapter 5
Conclusions
In this thesis we first presented solutions to different aspects of mobile robot naviga-
tion, in particular related to localization and Simultaneous Localization and Map-
ping, with a focus on real-world applications in industrial environments. Then we
presented some methods related to Human-Machine Interaction; in particular we
present an adaptive approach to people and object tracking for interaction between
an autonomous robot and human operators, and finally we presented a novel method
for human weight estimation based solely on vision sensors.
In the field of robot localization we proposed a multi-robot localization approach
which is particularly suitable for large logistic environments, which present problems
due to their large scale and in particular high symmetry. We showed how it is
possible to exploit communication between the members of a team of robots in
order to spread the knowledge about small asymmetries in the environment, as well
as to provide some sort of recovery in the case of localization failures. Moreover, we
introduce a novel way for the localization algorithm to aacquire insight about the
localization state of each team member. Using our approach, the team is able to
know with a certain confidence when all team members are correctly localized, so
that they can start to execute high-level tasks.
We then introduced a series of practical improvements to the localization algorithm
in the case of a single robot in the presence of high symmetry and noisy sensors.
We combined a simple sensor fusion approach which is able to exploit gyroscopes
to correct wheel odmetry and a marker detection algorithm which is able to correct
the pose estimate of the robot using planar markers embedded in the environment.
It should be noted that the planar markers can be in principle replaced by arbitrary
planar objects already present in the environment.
In the field of Simultaneous Localization and Mapping we proposed a full SLAM
approach based on pose graph optimization. We developed a graph optimization
algorithm which is demonstrated to be faster than most state-of-the-art approaches,
by exploiting a linear approximation of the problem that holds in the case of planar
104
5 – Conclusions
environments. We then proposed a front-end for our approach which is based on laser
scan matching only. We showed how the full approach is able to create consistent
maps while running in real-time.
In the second part of the thesis we presented our contribution to the problem
of people detection and tracking from a mobile camera. We presented an adaptive
approach which is able to learn appearence changes over time and uses adaptive
particle filters for tracking. We also showed how this approach can be used for
tracking generic objects as well.
Finally, we presented some preliminary results on 3D body reconstruction for
weight estimation using RGB-D cameras only. We presented some results of body
reconstruction using two cameras and using a single camera, with the user moving
in front of it.
5.1 Publications
5.1.1 Journal papers
• Bona B., Carlone L., Indri M., Rosa S., Supervision and monitoring of logistic
spaces by a cooperative robotic team: methodologies, problems, and solutions,
Intelligent Service Robotics, 2014, DOI: 10.1007/s11370-014-0151-0
• Abrate F., Bona B., Indri M., Rosa S., Tibaldi F., Multirobot Localization in
Highly Symmetric Environments, Journal of Intelligent and Robotic Systems,
2013, DOI: 10.1007/s10846-012-9790-6
• Abrate F., Bona B., Indri M., Rosa S., Tibaldi F., Multi-robot map updating
in dynamic environments, in Springer Tracts in Advanced Robotics, Volume
83, 2013, DOI: 10.1007/978-3-642-32723-0
• Russo L.O., Airo Farulla G., Indaco M., Rosa S., Rolfo D., Bona B., Blur-
ring prediction in Monocular SLAM, 8th IEEE International Design & Test
Symposium 2013 (IDT), 2013
• Carlone L., Yin J., Rosa S., Yuan Z., Graph optimization with unstructured
covariance: fast, accurate, linear approximation. In: Simulation, Modeling,
and Programming for Autonomous Robots (SIMPAR 2012), 2012.
105
5 – Conclusions
• Rosa S., Paleari M., Ariano P., Bona B., Object Tracking with Adaptive HOG
Detector and Adaptive Rao-Blackwellised Particle Filter. In: SPIE 8301, Intel-
ligent Robots and Computer Vision XXIX: Algorithms and Techniques, 2012.
• Margaria V., Rosa S., Ariano P., HExEC: hand exoskeleton electromyographic
control, Workshop on Human-Friendly Robotics, 2011
5.1.3 Preprints
• Rosa S., Paleari M., Velardo C. , Dugelay J.L., Bona B., and Ariano P., To-
wards Mass Estimation Using 3D Vision Sensors in Microgravity Environments
106
Bibliography
107
Bibliography
108
Bibliography
[29] N. Dalal and B. Triggs. Histograms of oriented gradients for human detec-
tion. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE
Computer Society Conference on, 2005.
[30] T. Darrell, G. Gordon, M. Harville, and J. Woodfill. Integrated person tracking
using stereo, color, and pattern detection. In Computer Vision and Pattern
Recognition, 1998. Proceedings. 1998 IEEE Computer Society Conference on,
pages 601 –608, 1998.
[31] Timothy A. Davis. Direct Methods for Sparse Linear Systems (Fundamentals of
Algorithms 2). Society for Industrial and Applied Mathematics, Philadelphia,
PA, USA, 2006.
[32] F. Dellaert, J. Carlson, V. Ila, K. Ni, and C. Thorpe. Subgraph-preconditioned
conjugate gradients for large scale SLAM. In Proc. of the IEEE-RSJ Int. Conf.
on Intelligent Robots and Systems, 2010.
[33] F. Dellaert and M. Kaess. Square root SAM: Simultaneous localization
and mapping via square root information smoothing. Int. J. Robot. Res.,
25(12):1181–1203, 2006.
[34] Hugh Durrant-Whyte and Tim Bailey. Simultaneous localisation and mapping
(slam): Part i the essential algorithms. IEEE ROBOTICS AND AUTOMA-
TION MAGAZINE, 2:2006, 2006.
[35] R.M. Eustice, H. Singh, and J.J. Leonard. Exactly sparse delayed-state filters
for view-based SLAM. Int. J. Robot. Res., 22(6):1100–1114, 2006.
[36] P.F. Felzenszwalb, R.B. Girshick, and D. McAllester. Cascade object detection
with deformable part models. In Computer Vision and Pattern Recognition
(CVPR), 2010 IEEE Conference on, pages 2241 –2248, 2010.
[37] Terrence W. Fong and Illah Nourbakhsh. Interaction challenges in human-robot
space exploration. Interactions, 12(1):42–45, March 2005.
[38] D. Fox. Kld-sampling: Adaptive particle filters. In In Advances in Neural
Information Processing Systems 14, pages 713–720. MIT Press, 2001.
[39] D. Fox, W. Burgard, H. Kruppa, and S. Thrun. A probabilistic approach to
collaborative multi-robot localization. Autonomous Robots, 8(3):325–344, 2000.
[40] D. Fox, W. Burgard, and S. Thrun. Markov localization for mobile robots in
dynamic environments. Journal of Artificial Intelligence Research, 11:391–427,
1999.
[41] U. Frese, P. Larsson, and T. Duckett. A multilevel relaxation algorithm for
simultaneous localization and mapping. IEEE Trans. on Robotics, 21(2):196–
207, 2005.
[42] B. Gerkey, R.T. Vaughan, and A. Howard. The player/stage project: Tools for
multi-robot and distributed sensor systems. In 11th Int. Conf. on Advanced
Robotics (ICAR 2003), pages 317–323, 2003.
[43] D. Göring and H.-D. Burkhard. Multi robot object tracking and self localization
using visual percept relations. In IEEE/RSJ Int. Conf. on Intelligent Robots
109
Bibliography
110
Bibliography
[58] F. Lu and E. Milios. Globally consistent range scan alignment for environment
mapping. Autonomous Robots, 4:333–349, 1997.
[59] S. Hellbach H.-J. Böhme M. Himstedt, S. Keil. A robust graph-based frame-
work for building precise maps from laser range scans. In Proc. of the Work-
shop on Robust and Multimodal Inference in Factor Graphs. IEEE International
Conference on Robotics and Automation (ICRA), 2013.
[60] A. Martinelli. Improving the precision on multi robot localization by using a
series of filters hierarchically distributed. In IEEE/RSJ Int. Conf. on Intelligent
Robots and Systems, pages 1053–1058, 2007.
[61] R. Martinez-Cantin, N. de Freitas, and J. Castellanos. Analysis of particle
methods for simultaneous robot localization and mapping and a new algorithm:
Marginal-SLAM. In Proc. of the IEEE lnternational Conf. on Robotics and
Automation, 2007.
[62] M. Di Marco, A. Garulli, A. Giannitrapani, and A. Vicino. Simultaneous local-
ization and map building for a team of cooperating robots: A set membership
approach. IEEE Trans. on Robotics and Automation, 19(2):238–249, 2003.
[63] Adam Milstein. Dynamic maps in monte carlo localization. In Canadian Con-
ference on AI, pages 1–12, 2005.
[64] MobileRobots Inc. Mobilesim-the mobile robots simulator, 2011.
[65] Hans Moravec and A. E. Elfes. High resolution maps from wide angle sonar.
In Proceedings of the 1985 IEEE International Conference on Robotics and
Automation, pages 116 – 121, March 1985.
[66] A.I. Mourikis and S.I. Roumeliotis. Performance analysis of multirobot coop-
erative localization. IEEE Trans. on Robotics, 22(4):666–681, 2006.
[67] Rafael Muñoz-Salinas, Eugenio Aguirre, Miguel GarcÃa-Silvente, and Antonio
Gonzalez. People detection and tracking through stereo vision for human-robot
interaction. In MICAI 2005: Advances in Artificial Intelligence, volume 3789
of Lecture Notes in Computer Science, pages 337–346. Springer Berlin / Hei-
delberg, 2005.
[68] K. Murphy and S. Russell. Rao-blackwellized particle filtering for dynamic
bayesian networks. Sequential Monte Carlo Methods in Practice, Springer,
2001.
[69] José Méndez-Polanco, Angélica Muñoz-Meléndez, and Eduardo
Morales. People detection by a mobile robot using stereo vision in dynamic
indoor environments. In Arturo Aguirre, Raúl Borja, and Carlos Garciá, ed-
itors, MICAI 2009: Advances in Artificial Intelligence, volume 5845 of Lecture
Notes in Computer Science, pages 349–359. Springer Berlin / Heidelberg, 2009.
[70] E. Olson, J.J. Leonard, and S. Teller. Fast iterative optimization of pose graphs
with poor initial estimates. In Proc. of the IEEE Int. Conf. on Robotics and
Automation, pages 2262–2269, 2006.
111
Bibliography
112
Bibliography
113