Motion Planning and Control For Mobile Robot Navigation Using

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Autonomous Robots (2022) 46:569–597

https://doi.org/10.1007/s10514-022-10039-8

Motion planning and control for mobile robot navigation using


machine learning: a survey
Xuesu Xiao1 · Bo Liu1 · Garrett Warnell3 · Peter Stone1,2

Received: 24 March 2021 / Accepted: 25 February 2022 / Published online: 20 March 2022
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022

Abstract
Moving in complex environments is an essential capability of intelligent mobile robots. Decades of research and engineering
have been dedicated to developing sophisticated navigation systems to move mobile robots from one point to another. Despite
their overall success, a recently emerging research thrust is devoted to developing machine learning techniques to address the
same problem, based in large part on the success of deep learning. However, to date, there has not been much direct comparison
between the classical and emerging paradigms to this problem. In this article, we survey recent works that apply machine
learning for motion planning and control in mobile robot navigation, within the context of classical navigation systems. The
surveyed works are classified into different categories, which delineate the relationship of the learning approaches to classical
methods. Based on this classification, we identify common challenges and promising future directions.

Keywords Mobile robot navigation · Machine learning · Motion planning · Motion control

1 Introduction While many sophisticated autonomous navigation systems


have been proposed, they usually follow a classical hierar-
Autonomous mobile robot navigation—i.e., the ability of chical planning paradigm: global path planning combined
an artificial agent to move itself towards a specified way- with local motion control. While this paradigm has enabled
point smoothly and without collision—has attracted a large autonomous navigation on a variety of different mobile robot
amount of attention and research, leading to hundreds platforms—including unmanned ground, aerial, surface, and
of approaches over several decades. Autonomous naviga- underwater vehicles—its typical performance still lags that
tion lies at the intersection of many different fundamental which can be achieved through human teleoperation. For
research areas in robotics. Moving the robot from one example, mobile robots still cannot achieve robust naviga-
position to another requires techniques in perception, state tion in highly constrained or highly diverse environments.
estimation (e.g., localization, mapping, and world represen- With the recent explosion in machine learning research,
tation), path planning, and motion planning and control. data-driven techniques—in stark contrast to the classical
hierarchical planning framework—are also being applied
B Xuesu Xiao to the autonomous navigation problem. Early work of this
[email protected] variety has produced systems that use end-to-end learning
Bo Liu algorithms in order to find navigation systems that map
[email protected] directly from perceptual inputs to motion commands, thus
Garrett Warnell bypassing the classical hierarchical paradigm. These systems
[email protected] allow a robot to navigate without any other traditional sym-
Peter Stone bolic, rule-based human knowledge or engineering design.
[email protected] However, these methods have thus far proven to be extremely
1
data-hungry and are typically unable to provide any safety
Department of Computer Science, The University of Texas at
Austin, Austin, TX 78712, USA guarantees. Moreover, they lack explainability when com-
2
pared to the classical approaches. For example, after being
Sony AI, Tokyo, Japan
trained on hundreds of thousands of instances of training
3 Computational and Information Sciences Directorate, Army data, during deployment, the cause of a collision may still
Research Laboratory, Adelphi, MD 20783, USA

123
570 Autonomous Robots (2022) 46:569–597

not be easily identifiable. This lack of explainability makes – Section 3 contains the main content, which is fur-
further debugging and improvement from this collision dif- ther divided into three subsections discussing learning
ficult. As a result, many learning-based approaches have methods that (i) completely replace the entire classical
been studied mostly in simulation and have only occasionally navigation pipeline, (ii) replace only a navigation sub-
been applied to simple real-world environments as “proof- system, and (iii) adopt learning to improve upon existing
of-concept” systems for learned navigation. components of a navigation stack;
These two schools of thought—classical hierarchical – Section 4 re-categorizes the same papers in Sect. 3, com-
planning and machine learning—each seek to tackle the same paring the performance between the discussed methods
problem of autonomous navigation, but as of yet, have not and the classical approaches with respect to five perfor-
been compared systematically. Such a comparison would mance categories;
be useful given that each class of approach typically domi- – Section 5 discusses the application domains and the input
nates in its own, largely separate community. The robotics modalities of these learning methods, based on the same
community has gravitated toward classical hierarchical plan- pool of papers;
ners due to their ability to reliably deploy such systems – Section 6 provides analyses, challenges, and future
in the real world, but many are perhaps unaware of recent research directions that the authors think will be of value.
advances in machine learning for navigation. Similarly,
while the machine learning community is indeed developing Our survey differs from previous surveys on machine
new, potentially powerful data-driven approaches, they are learning for robot motion (Tai and Liu 2016; Tai et al. 2016).
often only compared against previous learning approaches, In particular, Tai and Liu (2016) surveyed deep learning
thus neglecting the most pressing challenges for real-world methods for mobile robots from perception to control sys-
navigation. A lack of awareness of, and thus a failure to tems. Another relevant survey (Tai et al. 2016) provided
address, these challenges makes it less likely for roboticists a comprehensive overview on what kinds of deep learning
to consider their methods for deployment in real-world envi- methods have been used for robot control, from reinforce-
ronments. ment learning to imitation learning, and how they have varied
In this article, we provide exactly the comparison described along the learning dimension. In contrast to these two sur-
above based on existing literature on machine learning for veys from the learning perspective, our work focuses on
motion planning and control1 in mobile robot navigation. We categorizing existing learning works based on their relation-
qualitatively compare and contrast recent machine learning ships with respect to the classical navigation pipeline. We
navigation approaches with the classical approaches from focus on works that we believe to have significant relevance
the navigation literature. Specifically, we adopt a robotics to real robot navigation. In other words, we only include
perspective and review works that use machine learning machine learning approaches for motion planning and con-
for motion planning and control in mobile robot naviga- trol that actually move mobile robots in their (simulated or
tion. By examining machine learning work in the context of physical) environments, excluding papers that, for example,
the classical navigation problem, we reveal the relationship use machine learning to navigate (non-robotic) agents in
between those learning methods and existing classical meth- gridworld-like environments (Dennis et al. 2020), address
ods along several different dimensions, including functional semantic mapping but without using the learned seman-
scope (e.g., whether learning is end-to-end, or focused on a tics to move mobile robots (Jiang et al. 2021), or focus on
specific individual sub task) and navigation performance. robotic manipulation tasks (Kroemer et al. 2021). In short,
The goals of this survey are to highlight recent advances in we aim to provide the robotics community with a summary
applying learning methods to robot navigation problems, to of which classical navigation problems are worth examin-
organize them by their role in the overall navigation pipeline, ing from a machine learning perspective, which have already
to discuss their strengths and weaknesses, and to identify been investigated, and which have yet to be explored. For
interesting future directions for research. The article is orga- the learning community, we aim to highlight the mobile
nized as follows: robot navigation problems that remain unsolved by the clas-
sical approaches despite decades of research and engineering
– Section 2 provides a brief background of classical mobile effort.
robot navigation;

1 In mobile robot navigation, “motion planning” mostly focuses on rel- 2 Classical mobile robot navigation
atively long-term sequences of robot positions, orientations, and their
high-order derivatives, while motion control generally refers to rela-
In general, the classical mobile robot navigation problem
tively low-level motor commands, e.g., linear and angular velocities.
However, the line between them is blurry, and we do not adhere to any is to generate a sequence of motion commands A∗ =
strict distinction in terminology in this survey. {a0 , a1 , a2 , . . .} to move a robot from its current start location

123
Autonomous Robots (2022) 46:569–597 571

s to a desired goal location g in a given environment E: inputs directly fed to motor controllers to generate raw motor
commands, which are both safe and goal-oriented. Note that
A∗ = argmin J (E, s, g), (1) the perception may come not only from ego-centric onboard
A∈A sensors, but also from other exterior systems such as GPS,
acoustic transponders, or motion capture systems. This sec-
where A is the space of all possible sequences of motion tion introduces both the global and local reasoning levels of
commands and J (·) is a cost function that the navigation the classical mobile robot navigation pipeline.
system aims to minimize. The actions can be linear and angu-
lar velocities for differential-drive robots, steering angle for 2.1 Global planning
Ackermann-steer vehicles, propeller thrust for quadrotors,
etc. The environment E, either in 2D or 3D for ground or Global planning aims to find a coarse path leading from the
aerial navigation, is instantiated by leveraging the robot’s current position to the final goal. The workspace of mobile
sensory perceptions (e.g., LiDAR, camera, wheel encoder, robots is defined as the geometric space where the robot
inertial sensor) and/or pre-built maps of the environment. moves, and usually resides in special Euclidean group, S E(2)
Depending on the navigation scenario, J can take a variety or S E(3) (LaValle 2006). The global goal is specified as the
of forms, which can include the robot’s motion constraints, final pose, i.e., a combination of position and orientation.
obstacle avoidance, shortest path or time, social compliance, The global representation is reconstructed using both cur-
offroad stability, and/or other domain-dependent measures. rent and previous perception streams. Prior knowledge can
Since many navigation problems cover a large physical space also be utilized, such as an existing map. Common perception
and it is computationally infeasible to generate fine-grained modalities include sonar, ultrasonic sensors, LiDAR, and,
motion sequences over long horizons, most classical navi- more popular recently, vision. The raw perception streams
gation systems decompose J and tackle it in a hierarchical are processed using classical signal processing techniques,
manner. We use the following analogy to human navigation such as filtering, noise suppression, and feature extraction,
to illustrate such hierarchy. and then used to reconstruct a global representation of the
Consider the problem faced by a person who would like environment. Techniques such as Structure from Motion
to navigate from her bedroom to the neighborhood park. She (SfM) (Ullman 1979), Simultaneous Localization And Map-
needs to first come up with a coarse plan connecting her cur- ping (SLAM) (Durrant-Whyte and Bailey 2006), and Visual
rent location (i.e., the bedroom) to the park. That may include Odometry (VO) (Nistér et al. 2004) are used to generate the
going to the living room through the bedroom door, follow- world representation from sensor data.
ing the hallway leading to the dining room, exiting the house Roboticists usually construct global representations with
and turning right, walking straight until the second intersec- relatively sparse structures, such as graphs. Probabilistic
tion, and then turning right into the park. This sequence of Road Maps (PRM) (Kavraki et al. 1996) and Rapidly-
high-level steps is based upon a good representation of the exploring Random Trees (RRT) (LaValle 1998) are mature
world, such as knowing the layout of the house and the map of sampling-based techniques for tackling large-scale and fine-
the neighborhood. When the person starts walking, she may resolution (usually continuous) workspaces in a timely man-
need to go around some toys left in the hallway, greet her ner. If the workspace is relatively small or low-resolution, a
partner in the dining room, avoid a new construction site on coarse occupancy grid (Elfes 1989) or costmap (Jaillet et al.
the sidewalk, and/or yield to a car at a red light. These behav- 2010; Lu et al. 2014) may suffice. Occupancy grids treat the
iors are not planned beforehand, but rather are reactions to workspace as a grid of occupied or unoccupied cells, while
immediate situations. costmaps construct continuous cost values for these cells.
The classical robotics approach to navigation follows a Note that the global representation may omit details such
similar decomposition of the overall navigation task. In par- as obstacles or fine structures, since these can be handled
ticular, the bottom of Fig. 2 gives an overview of the classical by local planning and may also be subject to change. Exist-
mobile navigation pipeline. Based on a specific goal, the per- ing knowledge or belief about the world, especially regions
ception is processed to form a global representation of the beyond the current perceptual range, can be used in the rep-
world, which could either be based on recent perceptions or resentation as well, and they are updated or corrected when
a given map. Passing the goal and perception into a parame- the robot reaches these unknown regions.
terized global planner, a global path is produced, usually in Based on a coarse global representation and an evaluation
the form of a sequence of local goals. This process corre- criterion, such as minimum distance, global planners find a
sponds to the human’s coarse plan. Taking the closest local reasonable path connecting the current configuration to the
goal, the robot then uses the output of perception to create global goal. To be computationally efficient, global planners
a local representation. A parameterized local planner then usually assume the robot is a point mass and only con-
is responsible for generating motion commands, defined as sider its position, not its orientation. Search-based algorithms

123
572 Autonomous Robots (2022) 46:569–597

are appropriate for this task. For example, a shortest-path


algorithm could be used on a PRM representation, occu-
pancy grid, or costmap. Example algorithms include depth-
or breadth-first search, Dijkstra’s algorithm (Dijkstra 1959),
and A* search with heuristics (Hart et al. 1968). Note that if
the global representation changes due to new observations,
the global planner needs to replan quickly. Algorithms such
as D* Lite (Koenig and Likhachev 2002) can be suitable for
such dynamic replanning.
As the output of global reasoning, the global planner gen-
erates a coarse global path, usually in terms of a sequence of
waypoints, and passes it along to the local planning phase.
The local planner is then in charge of generating motions to
execute this path.
Fig. 1 Classical navigation system: given a global goal (red arrow indi-
2.2 Local planning cating the final pose) and a global representation (grey area, based on
a pre-built map or the robot’s memory), the global planner produces a
Generating low-level motion commands requires a more global path (blue line). Using the perceptual input (red dots indicating
sophisticated model of the mobile robot (either kinematic LiDAR returns), a finer and smaller local representation (cyan area) is
built. Based on all this information, current motion commands (green
or dynamic) and a more accurate local representation of line) are computed to move the robot (OSRF 2018) (Color figure online)
the world, as compared to the global level. When deployed
online, these models often require significant computation.
Therefore local planners typically only reason about the
robot’s immediate vicinity. 1997). The output of the local planner is either discrete high-
Local planners take a local goal as input, which is typi- level commands, such as turn 45◦ left, or continuous
cally provided by the global planner in the form of either a linear and angular velocities. These motion commands are
single position relatively close to the robot in the workspace, finally fed into low-level motor controllers.
or a short segment of the global path starting from the An illustration of an example classical navigation system,
robot. It is the local planner’s responsibility to generate Robot Operating System (ROS) move_base (OSRF 2018),
kinodynamically-feasible motion commands either to move is shown in Fig. 1.
the robot to the single local goal within an acceptable radius
or to follow the segment of the global path. The local plan-
ner then moves on to the next local goal or the global path 3 Scope of learning for navigation
segment gets updated.
The local environment representation is usually con- Despite the success of using conventional approaches to
structed by the current perceptual stream only, unless compu- tackle motion planning and control for mobile robot navi-
tation and memory resources allow for incorporating some gation, these approaches still require an extensive amount of
history information. This representation needs to be more engineering effort before they can reliably be deployed in the
precise than the global one since it directly influences the real world. As a way to potentially reduce this human engi-
robot’s actual motion and can therefore impact safety. Com- neering effort, many navigation methods based on machine
mon representations are occupancy grids (Elfes 1989) and learning have been recently proposed in the literature. Sec-
costmaps (Jaillet et al. 2010; Lu et al. 2014), but with a finer tion 3 surveys relevant papers in which the scope of work
resolution and smaller footprint when compared to the rep- may be categorized as using machine learning to solve prob-
resentation used by the global planner. lems that arise in the context of the classical mobile robot
Local planners typically require both the surrounding navigation pipeline described in Sect. 2.
world representation and a model of the robot, e.g., holo- Due to the popularity of end-to-end learning, a large body
nomic, differential drive, or Ackerman-steering vehicle mod- of work has been proposed to approach the navigation prob-
els. Aiming at optimizing a certain objective, such as distance lem in an end-to-end fashion. That is, given raw perceptual
to local goal and clearance from obstacles, local planners information, these approaches seek to learn how to directly
compute motion commands to move the robot toward the produce motion commands to move the robot either towards
local goal. At the same time, they need to respect the sur- a pre-specified goal, or just constantly forward without any
roundings, e.g., following the road or, most importantly, explicit intermediate processing steps. In contrast, other work
avoiding obstacles (Quinlan and Khatib 1993; Fox et al. in the literature has proposed to “unwrap” the navigation

123
Autonomous Robots (2022) 46:569–597 573

pipeline and use machine learning approaches to augment replacing the classical sense-plan-act architecture with a sin-
or replace particular navigation subsystems or components. gle learned policy. He used Q-learning (Watkins and Dayan
Therefore, we divide Sect. 3 into three major subsections: 1992) to find a policy that mapped visual, ultrasonic, and laser
state information directly to discrete motion actions for the
1. Learning to replace the entire navigation stack (Sect. 3.1), task of servoing to a designated target object. The generated
2. Learning navigation subsystems (Sect. 3.2), and motion trajectories were relatively simple, and the system
3. Learning navigation components within the navigation sought only to move the platform toward the target rather
stack (Sect. 3.3). than avoid obstacles.
Since Thrun’s seminal work above, the machine learning
A more detailed breakdown of the scope of learning targeted community has proposed several other end-to-end approaches
by each of the surveyed papers is shown in Fig. 2. The upper for fixed-goal navigation. Below, we organize them accord-
portion of Fig. 2 contains works that seek to completely ing to the sensory inputs presented to the learner: Geometric
replace the classical navigation stack, with differences such navigation techniques use sensory inputs that directly indi-
as fixed (global) goal or moving (local) goal, and discrete or cate where obstacles and free space lie in the environment
continuous motion commands. The two middle portions of (e.g., LiDAR sensors); non-geometric navigation techniques
the table pertain to works that seek to use learning for partic- are designed for sensors that do not directly provide such
ular navigation subsystems—including learning for global or information (e.g., RGB cameras); and hybrid navigation
local planning—and learning methods for individual naviga- techniques utilize a combination of the two.
tion components. The shade of each cell corresponds to the Geometric navigation Given the ubiquity of geometric
number of surveyed papers in each category (darker means sensors on existing robotic platforms, several end-to-end
more). The lower portion shows the corresponding compo- machine learning techniques for fixed-goal, geometric nav-
nents of the classical navigation pipeline. The structure of igation have been proposed. Many such methods propose
Sect. 3 is illustrated in Fig. 3. replacing the entire navigation system with a deep neural
network.
3.1 Learning the entire navigation stack In the single-agent setting, Pfeiffer et al. (2017) presented
a representative technique of those in this category, i.e.,
Due to the recent popularity of end-to-end machine learn- they enabled fixed-goal navigation with collision avoidance
ing techniques, work in this category comprises the majority using an end-to-end neural network that maps raw LiDAR
of attempts to use machine learning to enable mobile robot returns and a fixed goal location to low-level velocity com-
navigation. Tackling the entire navigation problem in an end- mands. They trained the network using Imitation Learning
to-end fashion is relatively straightforward: there is no need (IL) (Russell and Norvig 2016) with the objective to mimic
to formulate specific subcomponents of the classical system the classical ROS move_base navigation stack that uses a
or relationships between these subcomponents. Instead, most global (Dijkstra’s) and local ((Dynamic Win dow Approach
works here treat the entire system as a black box with raw (DWA) (Fox et al. 1997)) planner. Tai et al. (2017) showed
or processed perceptual signals as input and motion com- that such systems could achieve these same results even with
mands as output (either high-level discrete commands or very low-dimensional LiDAR data (10 returns) when trained
low-level continuous commands). Moreover, whereas clas- using Reinforcement Learning (RL) (Russell and Norvig
sical navigation systems always take as input a pre-specified, 2016), where the robot learns via trial-and-error. To facilitate
fixed goal (destination), we have found—surprisingly—that better adaptation to new situations (e.g., changing navigation
a very large number of learning-based navigation methods in goals and environments) without the classical localization,
this category do not use any such information. Rather, these mapping, or planning subsystems, Zhang et al. (2017a) incor-
methods merely react to the current environment around the porated Deep Reinforcement Learning (DRL) and successor
robot, e.g., navigating the robot forward as if there were a features. With a limited, discrete action space (stay, left, right,
moving goal in front of it. Therefore, we divide Sect. 3.1 and forward), their results showed that Deep Convolutional
into fixed-goal (Sect. 3.1.1, the first two boxes of Fig. 2) and Neural Network (CNN) could indeed handle new situations
moving-goal navigation (Sect. 3.1.2, the third and forth boxes in both simulated and physical experiments. Zeng et al.
of Fig. 2). (2019) instead considered a setting with dynamic obstacles,
and found success using the Asynchronous Advantage Actor-
3.1.1 Fixed-goal navigation Critic (A3C) algorithm (Mnih et al. 2016) augmented with
additional strategies such as reward shaping, the inclusion
One of the earliest attempts at using end-to-end machine of Gated Recurrent Units (GRU) enabled memory (Chung et
learning for fixed-goal navigation is Thrun’s 1995 work al. 2014), and curriculum learning. Zhelo et al. (2018) also
(Thrun 1995)—an early proof-of-concept for completely used RL to train their system and show that adding an intrin-

123
574 Autonomous Robots (2022) 46:569–597

Scope of Learning
Pfeiffer et al. (2017, 2018); Tai et al. (2017); Codevilla et al. (2018); Long et al. (2018); Zeng et al. (2019); Chen et al. (2017a, b); Tai et al. (2018); Thrun (1995); Khan et al. (2017); Li et al. (2018);
Zhu et al. (2019); Jin et al. (2019); Everett et al. (2018); Chen et al. (2019); Ding et al. (2018); Godoy et al. (2018); Liang et al. (2020); Sepulvedaet al. (2018)
Zhang et al. (2017a); Zhou et al. (2019); Gupta et al. (2017a, b); Zhu et al. (2017); Wang et al. (2018); Zhang et al. (2017b); Zhelo et al. (2018)
Kahn et al. (2018a, b); Loquercio et al. (2018); Bojarski et al. (2016); Siva et al. (2019); Zhang et al. (2016); Sadeghi and Levine (2016); LeCunn et al. (2006);
Sergeant et al. (2015); Ross et al. (2013); Pomerleau (1989); Pan et al. (2020)
3.1 Bruce et al. (2017); Giusti et al. (2015); Tai et al. (2016a)

Yao et al. (2019)


Gao et al. (2017); Kahn et al. (2020); Faust et al. (2018); Lin et al. (2019); Becker-Ehmck et al. (2020); Chiang et al.
3.2 (2019); Xie et al. (2018); Zhao and Roh (2019); Pokle et al. (2019); Liu et al.(2020); Xiao et al. (2021a, b)

Okal and Arras (2016); Stein et al. (2018); Martins et al. (2019); Kretzschmar et al. Wigness et al. (2018); Teso-Fz-Betono et al.
Bhardwaj et
(2016); Pfeiffer et al. (2016); Perez-Higueras et al. (2018a, b); Shiarlis et al. (2017); Kim and Pineau (2016); (2019); Xiao et al. (2020);
Henry et al.(2010); Luber et al. (2012); Johnson and Kuipers (2018)
al. (2019) Richter and Roy (2017) Soot et al (2020)
3.3

Params.
Local Local Discrete Continuous
Percept.
Repr. Planner Commands Motions
Percept.
Global Local
Repr. Goal
Global
Goal
Planner
Params.

Global Planning Subsystem

Local Planning Subsystem

Navigation Pipeline

Fig. 2 Classical navigation pipeline

3. Scope of Learning

3.1 Entire System 3.2 Subsystem 3.3 Component

3.1.1 3.1.2 3.2.1 3.2.2 3.3.1 3.3.2


Fixed Goal Moving Goal Global Planning Local Planning World Representation Planner Parameters

Pfeiffer et al. (2017,


2018); Tai et al. (2017);
Codevilla et al. (2018);
Long et al. (2018); Zeng
et al. (2019); Chen et al.
Okal and Arras (2016);
(2017a, b); Tai et al.
Kahn et al. (2018a, b); Stein et al. (2018);
(2018); Thrun (1995);
Loquercio et al. (2018); Gao et al. (2017); Kahn et Martins et al. (2019);
Khan et al. (2017); Li et
Bojarski et al. (2016); al. (2020); Faust et al. Kretzschmar et al. (2016);
al. (2018); Zhu et al.
Siva et al. (2019); Zhang (2018); Lin et al. (2019); Pfeiffer et al. (2016);
(2019); Jin et al. (2019);
et al. (2016); Sadeghiand Becker-Ehmck et al. Perez-Higueras et al. Bhardwaj et al. (2019);
Everett et al. (2018); Chen
Levine (2016); LeCunn et (2020); Chiang et al. (2018a, b); Shiarlis et al. Teso-Fz-Betono et al.
et al. (2019); Ding et al. Yao et al. (2019)
al. (2006); Sergeant et al. (2019); Xie et al. (2018); (2017); Henry et (2019); Xiao et al. (2020);
(2018); Godoy et al.
(2015); Ross et al. (2013); Zhao and Roh (2019); al.(2010); Luber et al. Soot et al (2020)
(2018); Liang et al.
Pomerleau (1989); Pan et Pokle et al. (2019); Liu et (2012); Johnson and
(2020); Sepulvedaet al.
al. (2020); Bruce et al. al.(2020); Xiao et al. Kuipers (2018); Wigness
(2018); Zhang et al.
(2017); Giusti et al. (2021a, b) et al. (2018); Kim and
(2017a); Zhou et al.
(2015); Tai et al. (2016a) Pineau (2016); ); Richter
(2019); Gupta et al.
and Roy (2017)
(2017a, b); Zhu et al.
(2017); Wang et al.
(2018); Zhang et al.
(2017b); Zhelo et al.
(2018)

Fig. 3 Structure of Sect. 3

123
Autonomous Robots (2022) 46:569–597 575

sic reward can lead to faster learning. Zhang et al. (2017b) Codevilla et al. (2018) considered the problem of road nav-
took these augmentations even further, and proposed neural igation for a single autonomous vehicle using RGB images
SLAM to better enable exploration behaviors (as opposed to and a high-level goal direction (rather than an explicit goal
goal-seeking beahviors) by using a neural network that con- location), and enabled such navigation using an end-to-end
tained explicit, differentiable memory structures for learning deep IL approach. Their system was evaluated experimen-
map representations. The network was trained using A3C tally both in simulation and in the real world. Zhu et al.
with an exploration-encouraging reward function. A similar (2017) considered a slightly different problem in which they
idea was explored by Khan et al. (2018), in which a Mem- sought to enable collision-free navigation from RGB images
ory Augmented Control Network (MACN) was proposed to but where the fixed goal is specified as the image the robot
enable navigation performance on par with more classical is expected to see when it reaches the desired pose. They
A* algorithms in partially-observable environments where showed that an end-to-end deep RL approach could success-
exploration is necessary. Another end-to-end approach in fully train a network that mapped camera images directly to
this category is the work by Wang et al. (2018), in which discrete local waypoints for a lower-level controller to fol-
a modular DRL approach was adopted, i.e. one Deep Q- low. A similar indoor visual navigation problem has been
Network (DQN) for global and another two-stream (spatial addressed by Sepulveda et al. (2018), in which they used
and temporal) DQN for local planning. Interestingly, despite imitation learning with RRT* (Karaman and Frazzoli 2011)
all the success noted above, Pfeiffer et al. (2018) performed as expert and utilized indoor semantic structures to produce
an extensive case study of such end-to-end architectures, and motor controls. Gupta et al. (2017a) and Gupta et al. (2017b)
ultimately concluded that such systems should not be used also sought to enable this style of navigation, but more explic-
to completely replace a classical navigation system, particu- itly looked at incorporating more classical systems into the
larly noting that a classical, map-based, global path planner neural network—they jointly trained a neural-network-based
should be used if possible, in order to reliably navigate in mapper and planner using the Value Iteration Network (VIN)
unseen environments. (Tamar et al. 2016) technique. This approach resulted in the
End-to-end learning techniques for fixed-goal, geometric network generating a latent map representation from which
navigation have also been proposed for navigation systems the learned planner was able to generate discrete actions (stay,
designed to operate in multi-agent settings. For example, both left, right, and forward) that enabled successful navigation to
Chen et al. (2017b) and Long et al. (2018) have enabled col- a fixed goal.
lision avoidance in multi-robot scenarios using DRL to train Hybrid navigation Researchers studying end-to-end machine
networks that map LiDAR information and a goal location learning for fixed-goal navigation have also proposed sys-
directly to low-level navigation commands. Tai et al. (2018) tems that combine both geometric and non-geometric sensor
and Jin et al. (2020) have shown that systems similar to the input. For example, for the problem of single-agent naviga-
ones above can also be used to enable navigation in the pres- tion with collision avoidance, Zhou et al. (2019) modified the
ence of human pedestrians using RL and geometric sensor traditional paradigm of hierarchical global and local plan-
inputs. Ding et al. (2018) showed that a similar capability ning by proposing a new switching mechanism between
can also be achieved using RL to train a system to choose them. In a normal operating mode, a global planner trained
between target pursuit and collision avoidance by incorpo- using Behavior Cloning (BC) used a visual image and goal
rating a Hidden Markov Model (HMM) (Stratonovich 1965) as input and output one of three discrete (left, straight, and
into a hierarchical model. Considering the specific case in right) actions. However, when an obstacle was detected using
which some humans in the scene can be assumed to be com- LiDAR, the system switched to use a local planner (which
panions, Li et al. (2018) showed that end-to-end learned can output more fine-grained actions) that was trained using
approaches could also enable Socially Concomitant Navi- RL. In a continuation of their previous CADRL work Chen et
gation (SCN), i.e., navigation in which the robot not only al. (2017a, b) also proposed a hybrid approach called Socially
needs to avoid collisions as in previous work, but also needs Aware CADRL (SA- CADRL), in which DRL over both
to maintain a sense of affinity with respect to the motion of visual and LiDAR inputs is used to train a navigation system
its companion. Recently, Liang et al. (2020) used multiple that learns social norms like left- and right-handed passing.
geometric sensors (2D LiDAR and depth camera) and RL Further along that same line of research, Everett et al. (2018)
to train an end-to-end collision avoidance policy in dense used both a 2D LiDAR and three RGB-D cameras in their
crowds. proposed GA3C-CADRL technique, which relaxes assump-
Non-geometric navigation In contrast to the approaches tions about the other agents’ dynamics and allows reasoning
above that take geometric information as input, researchers about an arbitrary number of nearby agents.
have also proposed several learned, end-to-end, fixed-goal Exteroception In contrast to the previous sections which con-
navigation systems that exclusively utilize non-geometric sidered sensor input collected using sensors on the robot
sensor information such as RGB imagery. For example, itself, there also exists some end-to-end learned, fixed-goal

123
576 Autonomous Robots (2022) 46:569–597

navigation work that instead uses exteroception, or percep- during deployment, it used both image and laser inputs to
tion data gathered using sensors external to the robot. For produce an output direction for the vehicle to travel. The
example, in the robot soccer domain, using low-level state vehicle did not navigate to any specified goal but instead
information estimated from overhead cameras, Zhu et al. learned to keep moving forward down the middle of the road.
(2019) learned a policy trained using RL that maps the state Almost two decades later, DARPA’s Learning Applied to
information directly to motion commands to enable several Ground Robots (LAGR) program further stimulated related
goal-oriented motion primitives, including go-to-ball, research. LeCunn et al. (2006) presented a similar work, in
in which a single small-size robot navigates to the ball with- which the navigation system was trained in an end-to-end
out any obstacle or other players on the field. This work fashion to map raw input binocular images to vehicle steer-
by Zhu et al. (2019) is similar to Thrun’s work mentioned ing angles for high-speed off-road obstacle avoidance using
previously (Thrun 1995), where the robot also learned to data collected from a human driver and a six-layer CNN.
pursue a target, but used Deep Deterministic Policy Gradi- More recently, with the help of much deeper CNNs, far more
ent (DDPG) (Lillicrap et al. 2015) and an overhead camera data from human drivers, and more powerful computational
instead of Q-learning and onboard perception, respectively. hardware, Bojarski et al. (2016) built the DAVE-2 system for
Godoy et al. (2018) trained a planner with RL and global autonomous driving. DAVE-2 demonstrated that CNNs can
tracking in simulation and modified the learned velocities learn lane- and road-following tasks without manual decom-
with Optimal Reciprocal Collision Avoidance (ORCA) (Van position into separate perception, reasoning, and planning
Den Berg et al. 2011) for multi-robot navigation. Chen et steps.
al. (2019) also proposed a method for social navigation that There are also several works that have looked at enabling
used exteroceptive ground-truth simulation data and DRL various types of indoor robot navigation. For example,
to model human–robot and human–human interactions for Sergeant et al. (2015) proposed a method that allowed a
motion planning. ground robot to navigate in a corridor without colliding with
The works described in Sect. 3.1.1 all sought to replace the the walls using human demonstration data and multimodal
entire navigation pipeline with a single, end-to-end learned deep autoencoders. Similarly, Tai et al. (2016) also proposed
function. The approaches use as input both a fixed goal loca- to use human demonstrations to learn obstacle avoidance,
tion and sensor data (geometric, non-geometric, or both), and where they formulated the problem as one of mapping images
produce as output discrete or continuous motion commands directly to discrete motion commands. Kahn et al. (2018b)
to drive the robot to the goal location. Results from this body also sought to enable collision-free hallway navigation, and
of work show that such learned navigation systems can enable proposed a method that learns to produce steering angle
obstacle avoidance behavior, work in the presence of other while maintaining 2m/s speed from training experience that
robots, consider human social norms, and explore unknown included sensed collisions. Kahn et al. (2018a) further col-
environments. lected event cues during off-policy training and used these
event cues to enable different navigation tasks during deploy-
3.1.2 Moving-goal navigation ment, such as avoid collisions, follow headings, and reach
doors.
While moving to a specified destination is the ultimate In the space of outdoor navigation, Siva et al. (2019)
objective of classical navigation systems, a large body leveraged human teleoperation and combined representa-
of learning-based navigation approaches are designed to tion learning and imitation learning together to generate
address a more simplified navigation problem, i.e., that of appropriate motion commands from vision input when the
navigating the robot towards a moving goal that is always robot was traversing different terrain types, such as concrete,
in front of the robot. Solutions to this type of problem can grass, mud, pebbles, and rocks. Pan et al. (2020) looked at
lead to behaviors such as lane keeping, obstacle avoidance, enabling agile off-road autonomous driving with only low-
and terrain adaptation, but they cannot ensure that the robot cost onboard sensors and continuous steering and throttle
arrives at a specified goal. We conjecture that the reason that commands. Instead of human demonstration, they devel-
this problem has gained so much traction in the learning oped a sophisticated Model Predictive Controller (MPC) with
community is that the limited scope of such problems helps expensive IMU and GPS and used it as an expert to generate
constrain the learning task, which, in turn, leads to greater training data for batch and online IL with Dataset Aggrega-
success with existing machine learning methods. tion (DAgger) (Ross et al. 2011).
Early work in this category focused on creating autonomous Similar navigation without a specific goal has also been
ground vehicles, and one of the earliest such approaches was applied to Unmanned Aerial Vehicles (UAVs), for example,
ALVINN, an Autonomous Land Vehicle In a Neural Network using IL to learn from an expert: As an early application of
(ALVINN) (Pomerleau 1989), proposed in 1989. ALVINN DAgger, Ross et al. (2013) used a human pilot’s demonstra-
was trained in a supervised manner from simulated data, and tion to train a UAV to fly in real natural forest environments

123
Autonomous Robots (2022) 46:569–597 577

with continuous UAV heading commands based on monocu- 3.2.1 Learning global planning
lar vision. Aiming to navigate forest trails, Giusti et al. (2015)
used data collected from three GoPros mounted on a human The only approach that we are aware of in the literature that
hiker’s head facing left, straight, and right to learn a neural seeks to use machine learning to replace the global planning
network to classify a quadrotor’s discrete turning right, keep- subsystem is that of Yao et al. (2019). They considered the
ing straight, and turning left actions, respectively. Similarly, problem of building a navigation system that could observe
Loquercio et al. (2018) used labeled driving data from cars social rules when navigating in densely-populated environ-
and bicycles to train DroNet, a CNN that can safely fly a ment, and used a deep neural network as a global planner,
drone through the streets of a city. To avoid labeled training called Group-Navi GAN, to track social groups and generate
data, Sadeghi and Levine (2017) looked at enabling collision- motion plans that allowed the robot to join the flow of a social
free UAV navigation in indoor environments with CAD2 RL, group by providing a local goal to the local planner. Other
in which a collision-avoidance RL policy was represented by components of the existing navigation pipeline, such as state
a deep CNN that directly processes raw monocular images estimation and collision avoidance, continued functioning as
and outputs velocity commands. Similar to the idea of using usual.
a sophisticated MPC as an expert Pan et al. (2020), Zhang et
al. (2016) also used MPC with full access to simulated state 3.2.2 Learning local planning
information for RL training. During their simulated deploy-
ment, only observation (e.g., onboard LiDAR) was available, Compared to the global planner, the literature has focused
in contrast to full state information. much more on replacing the local planner subsystem with
One special case is by Bruce et al. (2017), in which the machine learning modules. These works use the output from
training environment was built with a 360◦ camera and the classical global planners to compute a local goal, combine
robot was trained to only navigate to one fixed goal speci- that goal with current perceptual information, and then use
fied during training. Similar to the moving-goal approaches learned modules to produce continuous motion control. In the
discussed above, this work does not take in a goal as input. following, we discuss learning approaches in ground, aerial,
This end-to-end approach is robust to slight changes during and marine navigation domains.
deployment, such as the same hallway on a different day, For ground navigation, Intention-Net (Gao et al. 2017) was
with different lighting conditions, or furniture layout. proposed to replace the local planning subsystem, where con-
While the works described in Sect. 3.1.2 treated the entire tinuous speed and steering angle commands were computed
navigation pipeline as an end-to-end black box, they are not using “intention” (local goal) from a global planner (A*) in
applicable to the problem of navigation to a specific goal. the form of an image describing the recently traversed path,
Rather, they look to enable navigation behaviors such as along with other perceptual information. The system was
lane keeping and forward movement with collision avoid- trained end-to-end via IL. A similar approach was taken by
ance. Learning such behaviors is very straightforward and Faust et al. (2018) in their PRM-RL approach: they used PRM
simple for proofs-of-concept of different learning methods, as a global planner, which planned over long-range naviga-
but its applicability to real navigation problems is limited tion tasks and provided local goals to the local planner, which
compared to fixed-goal navigation. was an RL agent that learned short-range point-to-point nav-
igation policies. The RL policy combined the local goal with
LiDAR perception and generated continuous motion com-
3.2 Learning navigation subsystems mands. In their further PRM-AutoRL work (Chiang et al.
2019), they used AutoRL, an evolutionary automation layer
Despite the popularity of end-to-end learning to replace the around RL, to search for a DRL reward and neural network
entire navigation system, there also exist learning for navi- architecture with large-scale hyper-parameter optimization.
gation approaches that instead target subsystems of the more Xie et al. (2018) instead investigated the idea of using a clas-
classical architecture. More limited in scope than fully end- sical PID controller to “kick-start” RL training to speed up the
to-end navigation, the benefit of this paradigm is that the training process of the local controller, which took local goal
advantages of the other navigation subsystems can still be as input from a classical global planner. Pokle et al. (2019)
maintained, while the learning research can target more spe- targeted at local planning to address social navigation: with
cific subproblems. Most learning approaches in this category the help of a classical global planner, the learned local planner
have focused on the local planning subsystem (Sect. 3.2.2), adjusts the behavior of the robot through attention mecha-
and one work has looked at global planning (Sect. 3.2.1). nisms such that it moves towards the goal, avoids obstacles,
Each interfaces with the other subsystem, which is imple- and respects the space of nearby pedestrians. Recently, Kahn
mented according to a classical approach (the fifth and sixth et al. (2021) introduced BADGR to plan actions based on
box of Fig. 2). non-geometric features learned from physical interaction,

123
578 Autonomous Robots (2022) 46:569–597

such as collision with obstacles, traversal over tall grass, and these works, including improving world representation (Sect.
bumpiness through uneven terrains. Although this method 3.3.1) and fine-tuning planner parameters (Sect. 3.3.2).
did not require simulations or human supervision, the robot
experienced catastrophic failure during training, e.g., flipping
over, which required manual intervention and could have 3.3.1 World representation
damaged the robot. Xiao et al. (2021c) proposed Learning
from Hallucination (LfH) [along with a Sober Deployment One popular component of a classical navigation system that
(Xiao et al. 2021b) and a learned hallucination extension the machine learning community has targeted is that of world
(Wang et al. 2021b)], a paradigm that safely learns local representation, which, in the classical navigation stack, acts
planners in free space and can perform agile maneuvers in as a bridge between the perceptual streams and the planning
highly-constrained obstacle-occupied environments. Liu et systems.
al. (2021) introduced Lifelong Navigation, in which a sep- For social navigation problems, Kim and Pineau (2016)
arate neural planner is learned from self-supervised data used Inverse Reinforcement Learning (IRL) to infer cost
generated by the classical DWA planner. During deployment, functions: features were first extracted from RGB-D sen-
the learned planner only complements the DWA planner, sor data, and a local cost function over these features was
when the latter suffers from suboptimal navigation behav- learned from a set of demonstration trajectories by an expert
iors. The neural planner trained using Gra- dient Episodic using IRL. The system operated within the classical naviga-
Memory (GEM) (Lopez-Paz and Ranzato 2017) can avoid tion pipeline, with a global path planned using a shortest-path
catastrophic forgetting after seeing new environments. More algorithm, and local path using the learned cost function to
recently, Xiao et al. (2021a) utilized inertial sensing to embed respect social constraints. Johnson et al. (2018) collected data
terrain signatures in a continuous manner and used learning using a classical planner (Park 2016) and observed human
to capture elusive wheel-terrain interactions to enable accu- navigation patterns on a campus. Social norms were then
rate, high-speed, off-road navigation. learned as probability distribution of robot states conditioned
For aerial navigation, Lin et al. (2019) used IL to learn on observation and action, before being used as an additional
a local planning policy for a UAV to fly through a narrow cost to the classical obstacle cost. Other researchers tackled
gap, and then used RL to improve upon the achieved results. cost function representation at a more global level. Luber
Becker-Ehmck et al. (2020) developed the first Model-Based et al. (2012) utilized publicly available surveillance data to
Reinforcement Learning (MBRL) approach deployed on a extract human motion prototypes and then learn a cost map
real drone where both the model and controller were learned for a Theta* planner (Daniel et al. 2010), an any-angle A*
by deep learning methods. This approach can achieve point- variant to produce more natural and shorter any-angle paths
to-point flight with the assistance of an external Motion than A*. Henry et al. (2010) used IRL for cost function rep-
Capture (MoCap) system and without the existence of obsta- resentation and enabled behaviors such as moving with the
cles. flow, avoiding high-density areas, preferring walking on the
Moving to marine navigation, Zhao and Roh (2019) used right/left side, and reaching the goal quickly. Okal and Arras
DRL for multiship collision avoidance: a DRL policy directly (2016) developed a graph structure and used Bayesian IRL
mapped the fully observable states of surrounding ships to (Ramachandran and Amir 2007) to learn the cost for this
three discrete rudder angle steering commands. A unified representation. With the learned global representation, tra-
reward function was designed for simultaneous path follow- ditional global planner (A*) planned a global path over this
ing (from global planner) and collision avoidance. graph, and the POSQ steering function (Palmieri and Arras
The works described in Sect. 3.2.2 addressed only the local 2014) for differential-drive mobile robots served as a local
planning part of the traditional navigation stack. These works planner. A similar approach, ClusterNav, was taken by Mar-
replaced the classical local planner with learning approaches, tins et al. (2019), which learns a socially acceptable global
which take as input a local goal from the global planner along representation and then also uses A* for global path plan-
with the current perception. ning. Using RRT as global planner, Shiarlis et al. (2017) and
Pérez-Higueras et al. (2018b) also used IRL to learn its cost
function for social navigation. Pérez-Higueras et al. (2018a)
3.3 Learning individual components learned a path predictor using fully convolutional neural net-
works from demonstrations and used the predicted path as
Maintaining the traditional structure of classical navigation a rough costmap to bias the RRT planner. Kretzschmar et
approaches, a relatively small body of literature has used al. (2016) used maximum entropy probability distribution
machine learning techniques to learn individual components to model social agents’ trajectory with geometric input and
(the seventh box of Fig. 2) rather than the entire naviga- then plan based on that information. Similar to this maxi-
tion system or entire subsystems. Section 3.3 focuses on mum entropy probability distribution model (Kretzschmar

123
Autonomous Robots (2022) 46:569–597 579

et al. 2016), the work by Pfeiffer et al. (2016) used RGB


input.
Learned world representations have also been used for
other styles of navigation beyond social compliance. In par-
ticular, Wigness et al. (2018) used human demonstration and
maximum entropy IRL (Ziebart et al. 2008) to learn a local
costmap so that the robot can mimic a human demonstrator’s
navigation style, e.g., maintaining close proximity to grass
but only traversing road terrain, “covert” traversal to keep
close to building edges and out of more visible, open areas,
etc. Richter and Roy (2017) used antoencoder to classify nov-
elty of the current visual perception compared to the body of
Fig. 4 Learning end-to-end motion policy versus learning parameter
existing training data: if the visual perception is not novel, a policy (reproduced with authors’ permission Xiao et al. 2021d)
learned model predicts collision. Otherwise, a prior estimate
is used. The predicted collision serves as part of the cost
function for planning. Finally, for navigating to a global goal interaction modalities, including teleoperated demonstration
in structured but unknown environments, Stein et al. (2018) (APPLD Xiao et al. 2020), corrective interventions (APPLI
defined subgoals to be on the boundary between known and Wang et al. 2021a), evaluative feedback (APPLE Wang et
unknown space, and used neural networks to learn the cost- al. 2021), and reinforcement learning (APPLR Xu et al.
to-go starting from a particular subgoal of the global goal, 2021). APPL introduces the parameter learning paradigm,
such as the probability of getting stuck in a dead-end after where the learned policy does not directly issue end-to-end
going to that subgoal. motion commands to move the robot (as shown in Fig. 4
left), but interfaces with a classical motion planner through
3.3.2 Planner parameters its hyper-parameters (Fig. 4 right). The learned parameter
policy adjusts the planner parameters on the fly to adapt to dif-
Another way in which machine learning approaches have ferent navigation scenarios during runtime and outperforms
targeted individual components of the classical navigation planners with fixed parameters fine-tuned by human experts.
pipeline is by tuning the parameters of the existing systems. A similar work to learning adaptive planner parameters
For example, classical planners have a set of tunable param- is learning adaptive motion primitives (Sood et al. 2020).
eters which are designed for the planner to face different Although not implemented on any physical mobile robots,
situations or planning preferences, e.g., inflation radius, sam- it is shown that on a 3 degree-of-freedom motion plan-
pling rate, and trajectory optimization weights. A relatively ning problem for navigation using the Reeds-Shepp path,
small amount of machine learning research has looked at the learned adaptive motion primitives in conjunction with
using learning techniques to achieve parameter fine-tuning, search algorithms lead to over 2x speedup in planning time.
instead of using extensive human expertise and labor. The works described in Sects. 3.2 and 3.3 maintained
At the global planner level, Bhardwaj et al. (2020) pro- the classical navigation pipeline and many existing compo-
posed a differentiable extension to the already differentiable nents. The learning approaches proposed only aimed at the
Gaussian Process Motion Planning 2 (GPMP2) algorithm, so global planner, local planner, or individual components, such
that a parameter (obstacle covariance) could be learned from as improving world representation and fine-tuning planner
expert demonstrations. Through backpropagation, GPMP2 parameters. The most important benefit of these methods is
with the learned parameter can find a similar global path to that they maintain the advantages of the classical approaches,
the expert’s. e.g., safety, explainability, and reliability, and improve upon
Similar ideas have been applied at the local planning their weaknesses simultaneously.
level. Teso-Fz-Betoño et al. (2019) proposed Predictive DWA
by adding a prediction window to traditional DWA (Fox 3.4 Analysis
et al. 1997), and used an Artificial Neuro-Fuzzy Inference
System (ANFIS) to optimize each of the fixed parame- In this section, we reviewed learning approaches that replace
ters’ values, i.e. DWA’s optimization weights, to increase (1) the entire navigation stack, (2) navigation subsystems,
performance. Through hand-engineered features and back- and (3) individual components within the navigation stack.
propagation, the three weights could be modified individually The scope of learning of each work is illustrated at the top of
by three ANFIS’s. Fig. 2, with their corresponding navigation components at the
More recently, Xiao et al. (2021d) proposed Adaptive bottom. For end-to-end approaches, we reviewed works that
Planner Parameter Learning (APPL) from different human are goal-oriented (first two boxes of Fig. 2) and works that

123
580 Autonomous Robots (2022) 46:569–597

do not consider the navigation goal (third and forth box). In even 1000 m away. Not being able to see similar goal con-
contrast to end-to-end approaches, we discussed subsystem- figurations during training makes it hard to generalize to
level learning, i.e. global (fifth box) and local (sixth box) arbitrarily specified goals. As pointed out by Pfeiffer et al.
planning, and also learning individual components (seventh (2018), an end-to-end learning architecture cannot com-
box). pletely replace a map-based path planner in large complex
Considering all of these works together, we make the fol- environments.
lowing observations: 3. All of the non-end-to-end approaches surveyed can nav-
igate to user-defined goals. In response to the shortcom-
1. The majority of machine learning for navigation work ings of end-to-end learned systems, 31 out of the 74
focuses on end-to-end approaches, despite their lack of surveyed papers used non-end-to-end approaches, and
proven reliable applicability in real-world scenarios. In all of them can navigate to user-defined goals. For these
particular, 43 out of the 74 surveyed papers focus on approaches, designers can retain the desirable properties
end-to-end techniques as illustrated in detail in Figs. 2 of existing navigation components while also studying
and 3. The functional blocks within the classical navi- the benefits of learning other components. These benefits
gation pipeline are obscured by the end-to-end learning include alleviating the difficulty of manually setting cost
approaches, which directly map from sensory input to functions or planner parameters and also increasing the
motion. An important benefit of the end-to-end paradigm overall performance of the system.
is that it is very straightforward and requires relatively 4. Approaches that seek to learn subsystems have focused
little robotics engineering effort. Roboticists no longer predominately on local planning rather than global plan-
need to hand-craft the navigation components and hand- ning. Of the 13 subsystems approaches, 12 focus on local
tune their parameters to adapt to different platforms and planning, while only one considers global planning. We
use cases. Another appeal of end-to-end methods is that posit that this lack of focus on global planning may be
they have the potential to avoid the prospect of cascading because it is easier to generalize to a close local goal or
errors such as errors in the global representation being local segment from a global path than an unseen, faraway
propagated to the local planning level. With end-to-end global goal.
learning, every component within the entire navigation
stack is connected and learned jointly. The cascading
errors will be eliminated and only one end-to-end error The analysis of the findings in this section leads us to
will be minimized by data-driven approaches. However, propose a hybrid classical and learning navigation paradigm
end-to-end navigation approaches have many problems as a very promising direction for future research (Item 1 in
as well, in addition to the intrinsic problems of learning Sect. 6.2).
methods in general, such as high demand on training data,
overfitting, and lack of explainability.
2. One third of the end-to-end learning approaches lack
the ability to navigate to user-defined goals. These 4 Comparison to classical approaches
approaches instead operate in the context of the robot
perpetually moving forward, and seek to enable behav- Whereas Sect. 3 above focused on reviewing where and how
iors such as lane keeping and obstacle avoidance. The existing learning approaches fit into the context of the classi-
proposed approaches are good at enabling these reactive cal navigation pipeline, this section focuses on comparing the
behaviors, but they lack the interface to a pre-specified results of learning methods to those of classical approaches.
fixed global goal. The capability to navigate to a specific Classical navigation systems seek to use current percep-
goal is very common in classical navigation approaches. tual input to produce collision-free motion commands that
The lack of such capability in many learning-based sys- will navigate the robot to a pre-specified goal. While there
tems is possibly due to the drawback of learning methods: exist many learning-based navigation approaches that seek to
It is very difficult for a learned model to generalize to arbi- solve the navigation problem in this classical sense, a portion
trarily defined goals, which is not seen in its training data. of the literature also attempts to go beyond that problem and
As the two main inputs to navigation systems, perception provide additional capabilities. Therefore, we divide the dis-
and goal, the space of the former is usually easy to be cussion of this section based on these two categories: learning
captured by training data, while the later is difficult. For for navigation in the classical sense (Sect. 4.1) and learning
example, a robot may have learned how to avoid obsta- beyond classical navigation (Sect. 4.2). Note that the same
cles based on the current perception of the surroundings reviewed papers in Sect. 3 are analyzed in this section again.
and navigate to a goal 1 m away from it, but it may find The comparison categories and the corresponding literature
it hard to navigate to a goal 10 m away, 100 m away, or are shown in Fig. 5.

123
Autonomous Robots (2022) 46:569–597 581

4. Learning Navigation

4.1 Classical Navigation 4.2 Beyond Classical

4.1.1 4.1.2
4.1.3 4.2.1 4.2.2
Duplication with Standard Duplication with Alternate
Improvement Terrain-Based Social Navigation
Interfaces and Constraints Interfaces and Constraints

Chen et al. (2017a); Pfeiffer et


al. (2016); Tai et al. (2018);
Tai et al. (2017); Codevilla et al.
Kretzschmar et al. (2016); Kim
(2018); Gupta et al. (2017a, b);
and Pineau (2016); Okal and
Pfeiffer et al. (2017, 2018); Zhu et al. (2017); Bruce et al.
Arras (2016); Martins et al.
Zeng et al. (2019); Thrun(1995); (2017); Gao et al. (2017); Faust
(2019); Li et al. (2018); Yao et
Khan et al. (2017); Zhang et al. et al. (2018); Becker-Ehmck et Teso-Fz-Betono et al. (2019);
al. (2019); Jin et al. (2019);
(2017a); Zhou et al. (2019); Lin al. (2020); Loquercio et al. Bhardwaj et al. (2019); Xiao et
Kahn et al. (2020); Siva et al. Everett et al. (2018); Long et al.
et al. (2019); Tai et al. (2016a); (2018); Bojarski et al. (2016); al. (2020); Xiao et al. (2021a);
(2019); Wigness et al. (2018); (2018); Chen et al. (2017b,
Wang et al. (2018); Zhang et al. Zhang et al. (2016); LeCunn et Soot et al (2020); Chiang et al.
Xiao et al. (2021b) 2019); Ding et al. (2018);
(2017b); Zhu et al. (2019); Xie al. (2006); Ross et al. (2013); (2019); Stein et al. (2018); Liu
Godoy et al. (2018); Perez-
et al. (2018); Sergeant et al. Giusti et al. (2015); Pomerleau et al.(2020)
Higueras et al. (2018a, b);
(2015); Zhao and Roh (2019); (1989); Pan et al. (2020); Kahn
Shiarlis et al. (2017); Pokle et
Zhelo et al. (2018) et al. (2018a, b); Sadeghi and
al. (2019); Liang et al. (2020);
Levine (2016); Richter and Roy
Henry et al. (2010); Luber et al.
(2017); Sepulveda et al. (2018)
(2012); Johnson and Kuipers
(2018)

Fig. 5 Comparison to classical approaches with section headings

4.1 Learning for navigation in the classical sense ature has been that of duplicating the input-output behavior
of classical navigation approaches with standard interfaces
The autonomous navigation problem that has been classically and constraints. Because achieving this goal would result in a
considered is that of perceiving the immediate environment system with behavior indistinguishable from that of an exist-
and producing collision-free motion commands to move the ing classical approach, the literature in this area is certainly
robot toward a pre-specified goal. Although many learning- of more interest to the machine learning community than to
based approaches for navigation do not enable goal-oriented the robotics community. However, duplicating the results of
navigation (see Sect. 3), the key aspect of the classical nav- decades of engineering effort by simply using learning-based
igation problem we focus on here is that it seeks to produce approaches—especially in an end-to-end fashion—can be a
collision-free motion based on geometric perception, such strong testimony to the power of machine learning. Thus, the
as LiDAR range readings or depth images, and a relatively work belonging to this category has mostly focused on the
accurate motion model. In this section, we discuss learn- typical concerns of the machine learning community rather
ing methods that seek to enable this same behavior. Broadly than that of the robotics community. In fact, much of the
speaking, we have found that the machine learning litera- literature does not directly compare the resulting navigation
ture in this area can be divided into three categories: (1) performance to that achievable with classical approaches.
machine learning approaches that merely aim to duplicate Instead, approaches in this category are typically compared
the input-output behavior of classical approaches with stan- to another baseline learning approach. Therefore, we divide
dard interfaces and constraints (Sect. 4.1.1); (2) machine work in this section into two categories: (1) initial, first-step
learning approaches that aim to provide the same capability machine learning approaches for duplicating mobile naviga-
as classical approaches but can leverage alternate inter- tion, and (2) later improvements upon those initial successes,
faces and constraints (e.g., sensor data, assumptions, and/or where the improvements are with respect to concerns in the
computational constraints) compared to standard approaches learning community (e.g., amount of training data).
(Sect. 4.1.2); and (3) machine learning approaches that Initial successes As first steps towards duplicating classi-
explicitly aim to provide navigation behaviors that can out- cal navigation systems, several machine learning approaches
perform classical approaches (Sect. 4.1.3). have been proposed for various versions of the naviga-
tion problem. For example, both Thrun (1995) and Zhu
et al. (2019) proposed RL techniques (using Q-learning
4.1.1 Duplication with standard interfaces and constraints and DDPG, respectively) that aim to induce simple tar-
get pursuit without taking obstacle avoidance into account.
An initial goal for machine learning methods for motion Unfortunately, neither of these approaches compared with
planning and control in mobile robot navigation in the liter-

123
582 Autonomous Robots (2022) 46:569–597

classical approaches. Sergeant et al. (2015) instead consid- were not compared with any classical approaches. Partially
ered the problem of only obstacle avoidance without seeking respecting the global versus local paradigm in the classical
to achieve a specific navigation goal in mind. While they navigation pipeline, modular RL developed by Wang et al.
compared the performance of various versions of their pro- (2018), in comparison to vanilla end-to-end RL, can achieve
posed method, they also did not compare their system with better results than classical Potential Field Method (PFM).
any classical navigation system. Seeking to enable moving- However, PFM is far inferior to most state-of-the-art clas-
goal navigation with obstacle avoidance, Tai et al. (2016) sical navigation approaches, such as those with SLAM, as
proposed a technique that they claimed was the first end-to- acknowledged by the authors.
end, model-free navigation approach that used depth images To address memory in navigation, Neural SLAM by Zhang
instead of LiDAR. Again, however, they only compared the et al. (2017b) outperformed vanilla A3C stacked with two
navigation results to a human demonstration using supervised Long Short Term Memory networks (LSTMs), but was not
classification error, and did not compare navigation perfor- compared with classical SLAM approaches. The Gazebo
mance to any classical approaches. Pfeiffer et al. (2017) and simulation test looked more or less like a toy example for
Zhou et al. (2019) proposed similar systems in that they are proof-of-concept. MACN by Khan et al. (2018) used A* as
end-to-end learning approaches, but they sought to enable an upper bound and compared the ratio of exploration path
collision-free fixed-goal navigation using LiDAR input. They length of different learning methods against that of A*. This
did compare their approaches to classical navigation sys- ratio was never smaller than one, meaning the performance
tems: Pfeiffer et al. (2017) showed that their approach could was never better than the classical approach.
achieve similar performance, while the approach by Zhou et Another learning-specific way in which the community
al. (2019) was always outperformed by classical navigation has sought to improve these approaches is by seeking bet-
using SLAM. For multiagent collision avoidance, Zhao and ter and faster convergence, especially for sample-inefficient
Roh (2019) proposed a decentralized DRL-based approach. RL approaches. One option that has been explored is to
While they showed experimental success in a simulation use IL to initialize the RL policy. Following Pfeiffer et al.
environment, no comparison to any classical technique was (2017), the case study presented by Pfeiffer et al. (2018)
made. compared the overall system performance for various com-
Later improvements Building upon the initial successes binations of expert demonstrations (IL) with RL, different
of applying relatively straightforward machine learning reward functions, and different RL algorithms. They showed
approaches to navigation discussed above, researchers have that leveraging prior expert demonstrations for pre-training
since looked into improving these various approaches in ways can reduce the training time to reach at least the same level of
of particular interest to the machine learning community. performance compared to plain RL by a factor of five. More-
For example, building upon RL, like the approaches by over, while they did not compare to classical approaches,
Thrun (1995) and Zhu et al. (2019) above, Zhang et al. they did admit that they do not recommend replacing classi-
(2017a) used successor features and transfer learning in an cal global planning approaches with machine-learning based
attempt to determine whether or not CNN-based systems systems. Xie et al. (2018) used a PID controller, instead of IL,
could indeed duplicate the performance of classical planning as a “training wheel” to “kick-start” the training of RL. They
systems. They showed that, with increased training epochs showed that not only does this technique train faster, it is also
(8 h of real experience), the task reward achieved by their less sensitive to the structure of the DRL network and con-
approach outperformed DQN and rivaled that of classical sistently outperforms a standard DDPG network. They also
A*. However, RL reward is possibly not an appropriate met- showed that their learned policy is comparable with ROS
ric to evaluate navigation performance. At least in the video move_base (OSRF 2018), a state-of-the-art classical nav-
demonstration, the performance of RL with successor fea- igation stack.
tures still looked inferior to classical approaches. Zeng et al. Based on the initial success achieved by IL, such as the
(2019) added reward shaping, memory GRU, and curriculum work by Pfeiffer et al. (2017), Zhou et al. (2019), and Lin
learning to A3C, but only compared to other rudimentary et al. (2019) improved upon IL using RL with full access
A3C versions, not classical approaches. The improvement to state information from a MoCap system. Similar perfor-
in terms of success rate compared to A3C was purely along mance to MPC approaches could be achieved by combining
the learning dimension: where it stood with respect to classi- IL with RL, and possibly with lower thrust, showing potential
cal approaches is unclear. The results of the curiosity-driven to outperform classical approaches.
DRL approach by Zhelo et al. (2018) showed that their DRL
with intrinsic motivation from curiosity learned navigation 4.1.2 Duplication with alternate interfaces and constraints
policies more effectively and had better generalization capa-
bilities in previously unseen environments, when compared Our second categorization of work that uses machine learning
with existing DRL algorithms, such as vanilla A3C. They in order to solve the classical navigation problem focuses on

123
Autonomous Robots (2022) 46:569–597 583

techniques that use machine learning in order to leverage— (Gupta et al. 2017b), they incorporated noisy but very sim-
or, in some cases, better leverage—alternate interfaces and ple motion controls, i.e., moving forward failed with 20%
constraints. For example, while vision-based navigation may probability. The agent stays in place when the action fails.
be possible using classical methods and expensive or brit- Sepulveda et al. (2018) used IL to learn indoor visual naviga-
tle computer vision modules, machine learning methods for tion with RRT* (Karaman and Frazzoli 2011) as expert and
vision-based navigation can sidestep these issues by not rely- utilized indoor semantic structures. Zhu et al. (2017) used RL
ing on such specific modules. More broadly speaking, we for visual indoor navigation, where—in addition to bypassing
have identified three major ways in which learning methods classical visual input processing—they even allowed for the
have allowed navigation systems to leverage alternate inter- goal to be specified as an image which the robot is expected
faces and constraints: (1) using visual input, (2) allowing for to see when it reaches the goal. One-shot RL with interactive
complex dynamics, and (3) allowing execution on low-cost replay for visual indoor navigation was proposed by Bruce et
hardware. al. (2017), in which the training environment was built using
Visual input The ability to effectively use visual input is a single traversal with a 360◦ camera and only the same goal
one of the most important reasons that researchers have used for training can be reached during deployment. Again,
sought to use machine learning for motion planning and these works did not compare to classical approaches.
control in mobile navigation. Although there exist modular Moving from computer vision towards the robotics com-
approaches for integrating visual information with classical munity, Intention-Net (Gao et al. 2017) kept the classical
methods (e.g., optical flow, visual odometry, visual SLAM, global and local navigation hierarchical architecture, and
etc.), these methods usually depend on human-engineered served to replace a local planner. It processed visual inputs
features, incur a high computational overhead, and/or are using a neural network and computed navigation actions
brittle in the face of environmental variations. Machine learn- without explicit reasoning. An interesting point is that they
ing methods, on the other hand, provide an alternate pathway also represented the local goal as a path image generated from
by which system designers can incorporate visual informa- the global planner. Alternatively, Kahn et al. (2018b) pro-
tion. posed a technique based on generalized computation graphs
Among the first attempts to use machine learning for through which a robot could learn to drive in a hallway with
vision-based navigation, ALVINN (Pomerleau 1989), and vision from scratch using a moving-goal approach. They
DARPA’s LAGR program (LeCunn et al. 2006) proposed compared their approach with many learning methods, but
end-to-end machine learning approaches to map images not to any classical approaches. Kahn et al. (2018a) further
directly to motion commands. More recently, NVIDIA’s collected event cues during off-policy exploration and then
DAVE-2 system (Bojarski et al. 2016) represents the culmi- allowed different navigation tasks without re-training during
nation of these approaches, leveraging deep learning, large deployment, with onboard RGB camera. Again, they did not
amounts of data, and access to a vast amount of computing compare to any classical approaches. Richter and Roy (2017)
power for training. These end-to-end approaches circumvent used rich contextual information from visual cameras to pre-
typical, complicated visual processing pipelines in classi- dict collision and an autoencoder to decide if the prediction
cal navigation, and have been empirically shown to enable is trustworthy. In case of novel visual perception, they revert
moving-goal navigation capabilities like lane keeping and back to a safe prior estimate.
obstacle avoidance. Codevilla et al. (2018) proposed a simi- Finally, in the aerial domain, mapping directly from vision
lar approach, but the system allows for fixed-goal navigation to action without any classical visual processing has also
using conditional IL. Unfortunately, it is difficult to assess been investigated. Ross et al. (2013) used DAgger to map
just how much better these approaches are able to leverage vision directly to UAV steering commands in order to avoid
visual information for navigation over their classical coun- trees, without any other visual processing. Similar end-to-end
terparts because no such experimental comparison has yet visual navigation was performed by Giusti et al. (2015) using
been performed. supervised data from a hiker as a classification problem to fly
The diverse and informative visual features in indoor envi- on forest trails. Loquercio et al. (2018) obtained their flying
ronments has led to a large body of literature that studies training set from driving data and trained DroNet. Sadeghi
using machine learning to incorporate visual information and Levine (2017) removed the necessity of supervised action
for the particular problem of indoor navigation. This line labels for vision and used RL from simulated visual input.
of research has mostly existed within the computer vision They then directly applied the policy to the real world. Again,
community, and it has focused more on visual perception as however, none of these approaches were evaluated against
opposed to motion planning and control. For example, Gupta their more classical counterparts.
et al. (2017a) proposed to bypass visual SLAM for indoor Dynamics modeling Just as machine learning has been used
navigation by incorporating a latent representation within a as a promising new pathway by which to exploit visual infor-
VIN and trained the system end-to-end. In their later work mation for navigation, researchers have also studied using

123
584 Autonomous Robots (2022) 46:569–597

machine learning methods for navigation as an alternative in drastically-improved performance. Therefore, one of the
way to handle complex dynamics. Classical dynamics model- main thrusts of machine learning techniques that seek to
ing approaches can be very difficult to implement for reasons provide improved navigation performance is toward tech-
such as sensing and actuation uncertainty, environmental niques that can automatically find good parameterizations
stochasticity, and partial observability. Learning methods, on of classical systems. IL is a typical learning paradigm used
the other hand, can bypass these difficulties by directly lever- here, as demonstrations—even from non-experts—have been
aging data from the platform. shown to provide a good learning signal for parameter tuning.
One such approach is that proposed by Zhang et al. (2016), Bhardwaj et al. (2020) used demonstrations and a differen-
who addressed the problem of partial observability of UAV tiable extension to a differentiable global planner (GPMP2)
obstacle avoidance by training a system with access to full to find the appropriate obstacle covariance parameter for their
state information and deployment with only partial obser- planner, while Teso-Fz-Betoño et al. (2019) used ANFIS
vations. Faust et al. (2018) used a RL agent to bypass the and backpropagation to find the right set of optimization
modeling of local MPC with non-trivial dynamics, i.e., dif- weights for DWA based on demonstration. While we expect
ferential drive ground robot and aerial cargo delivery with that the parameters found through the learning procedure
load displacement constraints. Becker-Ehmck et al. (2020) would achieve better navigation performance than random
eliminated the entire modeling of flight dynamics, and used or default parameters, no experiments were actually per-
RL to “learn to fly” without any obstacles. The only compar- formed on a real robot. APPL by Xiao et al. (2021d) took
ison to classical approaches was done by Faust et al. (2018), this paradigm a step further and demonstrated the possibility
but it was to a trivial straight-line local planner, instead of a of dynamically changing parameters during deployment to
better alternative. adapt to the current environment. APPL parameter policies
Low-cost hardware As a final way in which the autonomous were learned from different human interaction modalities
navigation research community has used machine learning and achieved improved navigation performance compared
approaches to leverage alternate interfaces and constraints, to classical approaches under one single set of parameters,
some approaches have sought to use learned systems as a either default or manually tuned, on two robot platforms
way to relax the burdensome hardware constraints some- running different local planners. Instead of learning adap-
times imposed by more classical systems. For example, Tai tive planner parameters to improve navigation performance,
et al. (2017) trained a map-less navigator using only 10 laser learning adaptive motion primitives Sood et al. (2020) in
beams from the LiDAR input for the purpose of enabling nav- conjunction with classical planners has effectively reduced
igation using very low-cost sensing hardware (e.g., sonar or computation time.
ultrasonic arrays instead of high-resolution LiDAR sensors). Another machine learning approach that explicitly seeks
Additionally, Pan et al. (2020) used supervised training data to improve navigation performance over classical approaches
obtained by a sophisticated MPC with high-precision sen- is PRM-AutoRL by Chiang et al. (2019). AutoRL automat-
sors (IMU and GPS), and then trained an off-road vehicle ically searches for an appropriate RL reward and neural
with low-cost sensors (camera and wheel speed sensor) to architecture. While the authors’ previous PRM-RL work
perform agile maneuvers on a dirt track. They showed that (Faust et al. 2018) suffered from inferior results compared to a
they could achieve similar performance as the MPC-expert classical system (PRM-DWA), it was claimed that after fine-
even though the platform was less well-equipped. tuning PRM-RL, PRM-AutoRL was able to outperform the
classical approach. However, no similar effort was made to
4.1.3 Improvement fine tune the classical approach. The LfH framework by Xiao
et al. (2021c) also aimed at improving the agility of classi-
In this section, we discuss machine learning approaches cal navigation, especially in highly-constrained spaces where
for autonomous navigation that have explicitly sought to sampling-based techniques are brittle. Superior navigational
improve over classical approaches. performance is achieved compared to default DWA planner,
One set of approaches adopts the approach of automati- and even to fined-tuned DWA by APPL (Xiao et al. 2021c)
cally tuning the parameters of classical navigation systems. and to BC (Pfeiffer et al. 2017), both from human demonstra-
Traditionally, these classical systems are tuned by hand in tion. The Lifelong Navigation by Liu et al. (2021) claimed
order to achieve good performance in a particular environ- it is unnecessary to learn the entire local planner, but only
ment, and the tuning process must be done by a human expert a complement planner that takes control when the classical
who is intimately familiar with the inner workings of the approach fails. They also addressed the catastrophic forget-
particular planning components in use. Unfortunately, this ting problem for the neural network based planner. Default
process can prove to be tedious and time consuming—the DWA used in conjunction with the learned controller per-
tuning strategy often reduces to one of trial and error. How- forms better than both classical and learning baselines.
ever, finding the correct parameterization can often result

123
Autonomous Robots (2022) 46:569–597 585

Lastly, for the particular task of navigating in structured unknown world states and therefore enable accurate, high-
but unknown environments, Stein et al. (2018) proposed a speed, off-road navigation.
machine learning approach for determining path costs on the
basis of structural features (e.g., hallways usually lead to 4.2.2 Social navigation
goals, rooms are likely to result in a dead end, etc.). While
defining costs associated with a particular subgoal (boundary Going beyond classical single-robot navigation in a relatively
point between known and unknown) is very difficult to do static world, several learning methods have sought to enable
manually, the authors showed that it was very tractable to autonomous social navigation, i.e. navigation in the pres-
do so using past experience. The resulting Learned Subgloal ence of other agents, either robots or humans. Using classical
Planner (Stein et al. 2018) was shown to be able to outperform methods in a dynamic world with other agents is difficult
a classical optimistic planner. because (1) cost function definitions are usually elusive, e.g.,
due to different social norms, and (2) such navigation may
4.2 Learning beyond classical navigation also include compounding sequential interactions between
the agents which are hard to anticipate in advance, i.e. the
While most classical approaches are already good at goal- robot’s navigation decision at one time instant will often
oriented, geometric-based, collision-free navigation, more influence the navigation behavior of another agent at the next
intelligent mobile robots require more complex navigational instant, which may affect robot’s next navigation decision,
behaviors. While there have been several attempts made to and so on. As a means by which to address these difficul-
formulate solutions to these problems using the framework of ties, the research community has proposed learning-based
the classical navigation system over the years, the additional solutions.
complexity of these problems has recently motivated the Learning cost functions The difficulties encountered in spec-
research community to consider learning-based solutions as ifying appropriate static cost functions for social navigation
well. The particular complex behaviors that the learning com- scenarios has motivated the research community to look at
munity has focused on are (1) non-geometric terrain-based using learning-based methods to find these cost functions
navigation (Sect. 4.2.1), and (2) interactive social navigation instead. For example, in the presence of humans, traditional
with the presence of other agents, either robots or humans strategies for defining a costmap (e.g., predefined costs to
(Sect. 4.2.2). induce obstacle avoidance) becomes clumsy because humans
are not typical obstacles—robots should be allowed to come
4.2.1 Terrain-based navigation closer to humans in crowded spaces or when the robot and a
human are both moving in the same direction.
One capability beyond classical geometric-based navigation While it may be difficult to manually define a cost function
is terrain-based navigation, i.e., the ability to model navi- for social scenarios, experiments have shown that it is rela-
gation behavior on the basis of the physical properties of tively easy for humans to demonstrate the desired navigation
the ground with which the robot interacts. Unlike geomet- behavior. Therefore, as a first approach, a few research teams
ric information, terrain-related navigation costs are hard to have proposed approaches that have adopted IL as a strategy
hand-craft in advance, but recent research in machine learn- for learning social navigation behaviors. Tai et al. (2018)
ing for navigation has shown that it may be easier to learn used Generative Adversarial Imitation Learning (GAIL) to
such behaviors using human demonstration and experience sidestep the problem of learning an explicit cost function,
interacting with the environment. and instead directly learn a navigation policy that imitates a
For example, Siva et al. (2019) used a human demonstra- human performing social navigation. Compared to a base-
tion of navigation on different types of outdoor unstructured line policy learned using a simpler IL approach, their system
terrain, and then combined representation learning and IL to exhibited safer and more efficient behavior among pedestri-
generate appropriate motion on specific terrain. Using IRL, ans. Yao et al. (2019) also proposed a similar approach that
Wigness et al. (2018) learned cost functions to enable outdoor bypassed cost representation through adversarial imitation
human-like navigational behaviors, such as navigating on a learning by using Group-Navi GAN to learn how to assign
road but staying close to grass, or keeping out of more visible local waypoints for a local planner.
and open areas. Another approach for terrain-based naviga- Researchers have also investigated methods that seek to
tion leveraged real interactions with the physical world to learn more explicit cost representations using IRL. The most
discover non-geometric features for navigation (Kahn et al. straightforward application of IRL to the problem of social
2021): traversing over tall grass has very low cost, while the navigation was the approach proposed by Kim and Pineau
cost of going through uneven terrain is relatively high. The (2016) that learns a local cost function that respects social
learned inverse kindodynamics model conditioned on iner- variables over features extracted from a RGB-D sensor. Addi-
tia embeddings by Xiao et al. (2021a) was able to capture tionally, Henry et al. (2010) used IRL to learn global cost

123
586 Autonomous Robots (2022) 46:569–597

function representation. Okal and Arras (2016) developed also allowed an RL approach to capture sequential inter-
Trajectory Bayesian IRL to learn cost functions in social action. The SCN problem considered by Li et al. (2018)
contexts as well. The work by Pérez-Higueras et al. (2018b) additionally required the robot to maintain close to a com-
and Shiarlis et al. (2017) also taught the robot a cost function panion while travelling together towards a certain goal in a
with IRL and then used RRT to plan. social context, which increased the complexity of the sequen-
Although not directly using IRL, Luber et al. (2012) tial interactions between agents. Role-playing learning (Li et
learned a costmap from human motion prototypes extracted al. 2018) using RL also aimed at resolving this additional
from publicly available surveillance data. Johnson et al. difficulty.
(2018) formed social norm cost in addition to obstacle Utilizing RL-based approaches to capture compounding
cost from exploration data collected by a classical planner. sequential interactions, researchers have shown that learn-
The predicted path learned from demonstrations by Pérez- ing methods can achieve better performance than multi-robot
Higueras et al. (2018a) can also serve as a costmap and navigation approaches that do not (or cannot) consider such
partially bias the configuration space sampling of the RRT interactions, e.g., ORCA (Van Den Berg et al. 2011), which
planner. Machine learning approaches for social navigation is a state-of-the-art multi-robot collision avoidance strategy
that use learned costs more implicitly include the maximum that doesn’t pay explicit attention to social awareness. For
entropy probability distribution approach from Pfeiffer et al. example, in simulation, Godoy et al. (2018) used RL to
(2016), which was used to model agents’ trajectories for plan- generate preferred velocity and then passed to ORCA to pro-
ning; and the approach from Kretzschmar et al. (2016), which duce collision-free velocity. Also in simulation, Chen et al.
sought to infer the parameters of the navigation model that (2017b) (CADRL) exhibited a 26% improvement in time to
best matched observed behavior. Lastly, ClusterNav (Mar- reach the goal can be achieved compared to ORCA. Long
tins et al. 2019) clustered human demonstrations to generate a et al. (2018) also used DRL with hand-crafted reward and
pose graph, which was used to define favorable social behav- curriculum learning for decentralized multi-robot collision
iors for navigation. All the vertices on this graph were deemed avoidance, and—compared to non-holonomic ORCA—the
with acceptable cost in the social context. Navigation behav- approach achieved significant improvement in terms of suc-
iors were generated using A* to search on this graph to find cess rate, average extra time, and travel speed, with up to
a path with minimum distance. 100 robots. Liang et al. (2020) further showed their colli-
Learning controllers Another challenge that arises when sion avoidance policy using multi-sensor fusion and DRL can
trying to use classical navigation approaches for social navi- achieve better performance than ROS move_base (OSRF
gation is the inherent difficulty in modeling the compounding 2018) and the work by Long et al. (2018) in social nav-
effects of sequential interactions between agents. Most clas- igation. Before planning motion, Chen et al. (2019) used
sical approaches are designed only to operate within a static RL to model both human–robot and human–human interac-
world, and, even when they are contrived to handle dynamic tions and their relative importance, and they showed that the
situations, it becomes extremely difficult to encode additional resulting system outperformed ORCA, CADRL, and GA3C-
rules about how current actions will affect other agents, and CADRL. Finally, a hierarchical RL framework with a HMM
how the decisions made by those agents might impact future model to arbitrate between target pursuit and a learned colli-
decisions. As an alternative approach, RL algorithms have sion avoidance policy (Ding et al. 2018) also achieved better
been shown to provide a suitable machine learning frame- results than ORCA and the work by Long et al. (2018), indi-
work that can capture the essence of such sequential social cating that unwrapping the end-to-end learning black box
interactions. based on human heuristics may improve performance. Dif-
For example, Chen et al. (2017b) proposed CADRL, ferent from an end-to-end perspective, the local trajectory
a deep-learning approach for learning controllers capable planner and velocity controller learned in Pokle et al. (2019)
of navigating multiple robots to their goals in a collision- uses attention to balance goal pursuit, obstacle avoidance,
free manner. Later, they proposed a variant of the CADRL and social context. It can achieve more consistent perfor-
framework (Chen et al. 2017a) which uses RL with a hand- mance compared to ROS move_base (OSRF 2018).
crafted reward function to find navigation systems that
incorporate social norms such as left- and right-handed pass- 4.3 Analysis
ing. Everett et al. (2018) augmented this framework with
an LSTM (Hochreiter and Schmidhuber 1997) and GPU Having organized the machine learning for autonomous nav-
(GA3C-CADRL), both of which allow an RL algorithm to igation literature according to similarities and differences
successfully learn navigation policies that can handle the with classical autonomous navigation approaches, we now
complex sequential interaction between the robot and other provide a summary analysis. Most importantly, we find that,
pedestrians. Similarly, the ego-safety and social-safety com- while a large concentration of learning-based work is able to
ponents of the reward function designed by Jin et al. (2020) solve the classical navigation problem, very few approaches

123
Autonomous Robots (2022) 46:569–597 587

actually improve upon classical techniques: currently, for learning methods. Rather than describing each paper as in the
navigation as a robotics task, learning methods have been previous sections, we will only define the categories below,
used mainly to replicate what is already achievable by clas- and visually indicate how the papers fall into the categories
sical approaches; and for learning, navigation is mostly a via Figs. 6 and 7. Note that Sects. 3 and 4 discussed some
good testbed to showcase and to improve learning algo- of the literature according to these taxonomies, but only as a
rithms. Additionally, a number of learning-based techniques way to group a subset of reviewed papers in their primary cat-
have been proposed that have enabled relatively new naviga- egories. Here, however, we categorize each of the reviewed
tion capabilities such as terrain-based and social navigation, papers using these two taxonomies. Note that Figs. 6 and 7
which have proven difficult to achieve with classical tech- only include the 74 papers selected within the specific scope
niques. described in Sect. 1 (i.e., machine learning approaches for
Specifically, as illustrated in Fig. 5: motion planning and control that actually move mobile robots
in their environments), and are not exhaustive in terms of
1. While the majority of papers surveyed sought to solve the using machine learning in their particular categories in gen-
classical navigation problem, very few actually demon- eral.
strate improved performance over classical approaches.
46 of the 74 papers surveyed dealt with the classical 5.1 Navigational tasks
problem, and only eight of those showed improved perfor-
mance over classical solutions. The other 38 only achieved We have identified six different navigational tasks considered
navigation in relatively simple environments (e.g., sparse by the literature on machine learning for motion planning
obstacles, uniform topology, same training and deploy- and control in mobile robot navigation. They are (1) Way-
ment environment) and mostly did not compare with point Navigation, (2) Obstacle Avoidance, (3) Waypoint
classical approaches. One explanation is that the research Navigation + Obstacle Avoidance, (4) Stylistic Navigation,
community initially focused on answering the question of (5) Multi-Robot Navigation, and (6) Exploration. All the
whether or not navigation was even possible with learning reviewed papers and their task categories are summarized
approaches, and focused on the unique benefits of such in Fig. 6.
approaches in that extensive human engineering (e.g., fil-
tering, designing, modeling, tuning, etc.) is not required or 5.1.1 Waypoint navigation
that alternate interfaces and constraints can be leveraged.
Very few proposed learning approaches have been com- The task of waypoint navigation is a fundamental building
pared to classical solutions, which leads us to advocate for block for mobile navigation. The task definition is to move the
such performance comparison to be done in future work, robot to a specified goal, without considering any other con-
along with comparisons along the other dimensions, such straints, e.g., obstacle avoidance. Works in this category are
as training versus engineering effort and neural architec- few, and are mostly useful as a “proof-of-concept” for learn-
ture search versus parameter tuning (items 2 in Sect. 6.2). ing methods since real-world requirements for navigation
2. Of the papers surveyed that sought to enable navigation typically require taking into account additional constraints.
behavior beyond that of the classical formulation, the
majority focus on social navigation and only a few on 5.1.2 Obstacle avoidance
terrain-based navigation. In particular, 28 out of the 74
papers surveyed sought to go beyond classical navigation, Another important building block of navigation is the task
and 24 of those focused on social navigation. The other of obstacle avoidance, which is critical to the issue of safety
four sought to enable terrain-based navigation. Social nav- in mobile robotics. The works we’ve placed into this cate-
igation may be the predominant focus due to the relative gory consider only obstacle avoidance, and do not navigate
ease with which the problem can be studied, i.e., humans to a pre-specified goal. Note that all Moving-Goal Naviga-
and pedestrian environments are much more easily acces- tion approaches in Sect. 3.1.2 fall into this category, and here
sible to researchers than environments with challenging we apply this categorization across all the papers we have
terrain. surveyed. The specific task of hallway navigation is a repre-
sentative example of obstacle avoidance.

5 Other taxonomies 5.1.3 Waypoint navigation + obstacle avoidance

In this section, we review the literature covered above in the Waypoint navigation with obstacle avoidance is the task in
context of two additional taxonomies: (1) the specific navi- our taxonomy that most closely matches that of classical nav-
gation task that is learned, and (2) the input modalities to the igation: the task requires the robot to navigate to a specified

123
588 Autonomous Robots (2022) 46:569–597

Waypoint Navigation +
Waypoint Navigation Obstacle Avoidance Stylistic Navigation Multi-Robot Navigation Exploration
Obstacle Avoidance

Pfeiffer et al. (2017,


Chen et al. (2017a);
2018); Tai et al. (2017);
Pfeiffer et al. (2016); Tai
Zeng et al. (2019); Zhang
et al. (2018); Kretzschmar
et al. (2017a); Zhou et al.
et al. (2016); Siva et al.
(2019); Faust et al. (2018);
(2019); Wigness et al.
Kahn et al. (2018b); Lin et al. (2019); Teso-Fz-
(2018); Kim and Pineau
Loquercio et al. (2018); Betono et al. (2019); Xiao
(2016); Okal andArras
Zhang et al. (2016); et al. (2020); Bhardwaj et
(2016); Martins et al.
Bojarski et al. (2016); al. (2019); Chiang et al.
Thrun (1995); Bruce et al. (2019); Li et al. (2018); Long et al. (2018); Chen Khan et al. (2017); Stein
Sadeghi and (2019); Xie et al. (2018);
(2017); Becker-Ehmck et Yao et al. (2019); Jin et al. et al. (2017b, 2019); Ding et al. (2018); Zhang et al.
Levine(2016); LeCunn et Wang et al. (2018); Zhao
al. (2020); Zhu et al. (2019); Everett et al. et al. (2018); Godoy et al. (2017b); Zhelo et al.
al. (2006); Sergeant et al. and Roh (2019); Codevilla
(2019) (2018); Perez-Higueras et (2018) (2018)
(2015); Ross et al. (2013); et al. (2018); Kahn et al.
al. (2018b,a); Shiarlis et
Giusti et al. (2015); Tai et (2020); Gao et al. (2017);
al. (2017); Pokle et al.
al. (2016a); Pomerleau Gupta et al. (2017a, b);
(2019); Liang et al.
(1989); Pan et al. (2020) Zhu et al. (2017);
(2020); Kahn et al.
Richter and Roy (2017);
(2018a); Henry et al.
Sepulveda et al. (2018);
(2010); Luber et al.
Xiao et al. (2021a, b); Liu
(2012); Johnson and
et al. (2020); Soot et al
Kuipers (2018)
(2020)

Fig. 6 Navigational tasks

goal while avoiding collisions with obstacles along the way. heuristic behaviors, which quickly become insufficient as
The majority of learning-based motion planning and control more and more agents need to be considered. Machine learn-
approaches for mobile robot navigation belong here. Note the ing approaches to this task, on the other hand, are able to
majority of works in Fixed-Goal Navigation in Sect. 3.1.1 utilize experiential data to find successful policies.
belong to the Waypoint Navigation + Obstacle Avoidance
category in this taxonomy. Again, here we apply this catego- 5.1.6 Exploration
rization across all the papers.
The last task we consider for autonomously navigating
5.1.4 Stylistic navigation mobile robots is that of pure exploration, i.e., when the
navigating agent’s goal is to maximize coverage of an envi-
Stylistic navigation is a more complex and varied task ronment for the purposes of, e.g., mapping or surveillance.
compared to those above; it refers to generating any num- For this task, machine learning techniques have been used to
ber of navigation behaviors that systematically differ from perform mapping (by, e.g., memorizing an explored space),
minimum-cost, collision-free navigation, and that are often or to help form predictions over unknown space based on
defined with respect to features in the environment beyond previous training experience.
geometric obstacles. Examples of stylistic navigation include
staying to one side of the hallway as in social navigation, 5.2 Input modalities
and moving slower or faster depending on the type of terrain
being traversed. Stylistic navigation tasks are not typically Finally, we also organize the literature reviewed in this survey
easily addressed by classical approaches, and because of the into categories that consider the system’s sensor input modal-
elusive nature of explicitly defining such tasks using classi- ity. In this regard, each of the reviewed learning approaches
cal static cost representations, they have mostly been enabled uses one of the following four classes of input: (1) Geomet-
using IL and RL methods. ric Sensors, (2) RGB Cameras, (3) RGB + Geometry, and (4)
Exteroception. The papers grouped by their input modalities
5.1.5 Multi-robot navigation can be found in Fig. 7.

The multi-robot navigation task requires an agent to explic- 5.2.1 Geometric sensors
itly take into account the presence of other navigating agents
when deciding its own course of action. Classical approaches Classically, the autonomous navigation problem has been
to multi-robot navigation typically require hand-crafted, posed in a purely geometric sense, i.e., the task is to move

123
Autonomous Robots (2022) 46:569–597 589

Geometric Sensor RGB Camera RGB + Geometry Exteroception

Pfeiffer et al. (2017, 2018); Tai et al.


(2017); Long et al. (2018); Zeng et al.
(2019); Kretzschmar et al. (2016); Khan Codevilla et al. (2018); Chen et al.
et al. (2017); Zhang et al. (2017a); Faust (2017a); Pfeiffer et al. (2016); Tai et al.
et al. (2018); Zhang et al. (2016); (2018); Gupta et al. (2017a, b); Zhu et al. Chen et al. (2017b); Lin et al. (2019);
Sergeant et al. (2015); Tai et al. (2016a); (2017); Bruce et al. (2017); Gao et al. Becker-Ehmck et al. (2020); Teso-Fz-
Thrun (1995); Zhou et al. (2019);
Xiao et al. (2020); Stein et al. (2018); (2017); Kahn et al. (2020, 2018); Betono et al. (2019); Bhardwaj et al.
Wigness et al. (2018); Kim and Pineau
Okal and Arras (2016); Martins et al. Loquercio et al. (2018); Bojarski et al. (2019); Zhu et al. (2019); Zhao and Roh
(2016); Pomerleau (1989); Yao et al.
(2019); Li et al. (2018); Chiang et al. (2016); Siva et al. (2019); Sadeghi and (2019); Chen et al. (2019); Godoy et al.
(2019); Everett et al. (2018); Pokle et al.
(2019); Xie et al. (2018); Wang et al. Levine (2016); LeCunn et al. (2006); (2018); Shiarlis et al. (2017); Henry et al.
(2019)
(2018); Zhang et al. (2017b); Jin et al. Ross et al. (2013); Giusti et al. (2015); (2010); Luber et al. (2012); Soot et al
(2019); Zhelo et al. (2018); Ding et al. Pan et al. (2020); Richter and Roy (2020) (Faust et al. (2018))
(2018); Perez-Higueras et al. (2018b,a); (2017); Kahn et al. (2018a); Sepulveda et
Liang et al. (2020); Johnson and Kuipers al. (2018)
(2018); Xiao et al. (2021a, b); Liu et al.
(2020)

Fig. 7 Input modalities

through a geometric workspace to a goal located at a set of 5.2.4 Exteroception


specified coordinates while avoiding obstacles of different
sizes and shapes along the way. In this sense, the only geo- Work in this final category, i.e., methods that utilize input
metric information needed is the location of generic obstacles from external sensors, are typically only used in con-
within the environment. Therefore, sensors that can directly trolled environments such as a MoCap studio or a simulator.
provide this information are widely used to provide percep- These methods assume perfect sensing and perception,
tion for many navigation systems, e.g., LiDAR sensors, depth and focus instead on what happens next, such as learn-
sensors, sonars, etc. This input modality categorization con- ing a model or policy, finding the right parameterization,
tains the largest number of learning-based approaches in our etc.
survey. The taxonomies proposed in this section are intended to
give readers an overview of what are the navigational tasks
5.2.2 RGB cameras researchers have used machine learning for, and the sensor
modalities proposed learning approaches utilize as input. For
RGB images lack depth information, and therefore most clas- the navigational task, although goal-oriented navigation with
sical navigation approaches required at least stereo vision obstacle avoidance still comprises the largest focus, learning
for depth triangulation so as to derive geometric informa- methods have drawn attention to interactive navigation in
tion. Learning methods have relaxed this requirement; most terms of navigation style and multi-robot navigation. Learn-
learning methods require only monocular vision to compute ing methods have also given rise to many navigation systems
motion for navigation. Although precise geometric informa- that can utilize visual—as opposed to purely geometric—
tion is not possible with only a single camera, reasonable input sensors, which represents a significant departure from
motion, e.g., lane keeping, turning left when facing a tree, the classical literature.
etc., can be effectively generated. RGB cameras are low-
cost, are human-friendly, and can provide rich semantic
information, which has drawn the attention of the learning 6 Analysis and future directions
community.
We now present an analysis of the literature surveyed
5.2.3 RGB + geometry above. We will discuss our findings based on not only each
individual organization proposed in the previous sections,
The third categorization in this section includes works that but also from a perspective that unifies the proposed tax-
combine the advantages of both RGB and geometric percep- onomies. Then we provide suggestions for future research
tion. Geometric sensors are typically used to enable obstacle directions.
avoidance, while visual cameras provide semantic informa-
tion.

123
590 Autonomous Robots (2022) 46:569–597

6.1 Analysis narrow (an additional 11 works are classified as component-


level, and each of them focuses specifically on learning cost
We first recap the statistics in Sects. 3 and 4, and then provide function representations). In fact, only two approaches that
a cross-dimensional analysis between these two sections. consider social navigation have been proposed at the interme-
diate subsystem level: (Yao et al. 2019) for global planning
6.1.1 Recap and (Pokle et al. 2019) for local planning. The upper portion
of the figure also shows that relatively few learning-based
Figure 3 provides an overview of the scope of machine learn- approaches have been proposed to enable terrain-based nav-
ing approaches for navigation with respect to the classical igation, though they are distributed across our end-to-end
navigation pipeline. We note that 43 out of the 74 sur- (Siva et al. 2019), subsystem (Kahn et al. 2021; Xiao et al.
veyed papers are end-to-end approaches, and 15 of those lack 2021a), and component (Wigness et al. 2018) categories.
the ability to navigate to user-defined goals. This inability Analyzing the lower portion of Fig. 8 also yields some
to navigate to defined goals is not common in the classi- interesting insights. In particular, one can observe a correla-
cal navigation literature. The remaining 31 papers describe tion between the two axes, i.e., approaches that look to apply
approaches that are not end-to-end, and each of these can nav- machine learning to a more limited scope of the classical
igate to user-defined goals: 13 applied machine learning to navigation problem seem to, roughly speaking, yield better
a particular navigation subsystem—one for global planning results. On this point, several more-specific observations can
and 12 for local planning—and the other 18 applied machine be made. For one, it is interesting to note that all end-to-
learning to individual components of a classical navigation end approaches fall in either the Duplication with Standard
system—14 for representation and four for parameters. or Alternative Interfaces and Constraints categories, with
Analyzing the literature from a different perspective, Fig. no single end-to-end approach having actually demonstrated
5 provides a comparison between learning-based and clas- improvement over classical navigation systems. Subsystem-
sical approaches to the autonomous navigation problem level approaches, on the other hand, populate each of the
with respect to the particular problems and goals that are categories defined relative to classical navigation systems,
considered. First, we find that 46 out of the 74 surveyed with three works in particular (Chiang et al. 2019; Liu et al.
learning-based approaches consider the classical navigation 2021; Xiao et al. 2021c) managing to demonstrate improved
problem, but a majority of those (38/46) resulted in nav- performance. In line with the observed correlation stated
igation systems that were only tested in relatively simple earlier, learning-based approaches that focus on individual
environments and that did not outperform—or, often, were components of a navigation system predominately appear in
not even compared with—classical approaches. That said, our Improvement category (with the novelty detection for
a small number of these approaches (8/46) were compared collision prediction work by Richter and Roy (2017) as the
to classical approaches and demonstrated some improve- only exception). We posit that this correlation exists because,
ment over them. Second, the remaining 28 out of the 74 while learning approaches with a wider scope provide an ini-
surveyed learning-based papers have proposed techniques tial qualitative “proof-of-concept” that they can be used for
for which the goal is to provide capabilities that go beyond navigation, it may prove too difficult to additionally be able
that envisioned by classical autonomous navigation. Of these to tune such a large system to achieve better overall per-
28 papers, 24 sought to enable some form of social naviga- formance. In contrast, learning methods that target a more
tion, and the remaining four focused on building systems that specific subsystem or component of the navigation pipeline
could perform terrain-based navigation. can usually improve the performance of that system because
the problem itself is smaller, with the rest of the system rely-
6.1.2 Cross-dimensional analysis ing on reliable components from classical approaches. The
concentration of learning works in the lower left corner of
Using the categorizations discussed in Sects. 3 and 4 (i.e., the table suggests that, for classical navigation, the litera-
scope of learning and comparison to classical approaches, ture has primarily approached the problem in an end-to-end
respectively), we now analyze the relationship between them manner and accomplished what has already been achieved
via the cross-dimensional view we provide in Fig. 8. by classical methods.
First, consider the upper portion of Fig. 8, which focuses
on the machine learning literature that has sought to extend 6.2 Recommendations and future directions
the capabilities of navigation systems beyond what was clas-
sically envisioned. In particular, one can see that, while many Based on the analysis provided in the previous Sect. 6.1,
learning-based approaches have been proposed for social we now provide high-level recommendations and identify
navigation, the scope of these approaches has either been promising future research directions for using machine learn-
very broad (we classify 11 works as end-to-end) or very ing to improve mobile robot navigation.

123
Autonomous Robots (2022) 46:569–597 591

Kim and Pineau (2016); Okal and Arras (2016);


Chen et al. (2017a); Tai et al. (2018); Li et al.
Martins et al. (2019); Kretzschmar et al. (2016);
(2018); Jin et al. (2019); Everett et al. (2018);
Social Pfeiffer et al. (2016); Perez-Higueras et al.
Long et al. (2018); Chen et al. (2017b, 2019); Yao et al. (2019); Pokle et al. (2019)
Navigation (2018b,a); Shiarlis et al. (2017); Henry et al.
Ding et al. (2018); Godoy et al. (2018); Liang et
(2010); Luber et al. (2012); Johnson and Kuipers
al. (2020)
((2018))
COMPARISON

Terrain-
Beyond Siva et al. (2019) Kahn et al. (2020); Xiao et al. (2021b) Wigness et al. (2018)
Based
Classical

Classical Xiao et al. (2020); Teso-Fz-Betono et al. (2019);


Improve- Chiang et al. (2019), Xiao et al. (2021a); Liu et
Bhardwaj et al. (2019); Stein et al. (2018); Soot
Navigation ment al. (2020)
et al (2020)

Tai et al. (2017); Codevilla et al. (2018); Gupta


Duplication et al. (2017a, b); Zhu et al. (2017); Bruce et al.
with (2017); Loquercio et al. (2018); Bojarski et al.
(2016); Zhang et al. (2016); LeCunn et al. Gao et al. (2017); Faust et al. (2018); Becker-
Alternate (2006); Ross et al. (2013); Giusti et al. (2015); Ehmck et al. (2020)
Richter and Roy (2017)
Interfaces & Pomerleau (1989); Pan et al. (2020); Kahn et al.
Constraints (2018b); Sadeghi and Levine (2016); Kahn et al.
(2018a); Sepulveda et al. (2018)
Duplication
Pfeiffer et al. (2017, 2018); Zeng et al. (2019);
with Thrun(1995); Khan et al. (2017); Zhang et al. Lin et al. (2019); Tai et al. (2016a); Xie et al.
Standard (2017a); Zhou et al. (2019); Wang et al. (2018); (2018); Sergeant et al. (2015); Zhao and Roh
Interfaces & Zhang et al. (2017b); Zhu et al. (2019); Zhelo et (2019)
Constraints al. (2018)

End-to-End Subsystem Component

SCOPE

Fig. 8 Performance versus scope

1. Recommendation: The current best practice is to use safety cannot be over-emphasized. That machine learning
machine learning at the subsystem or component level. approaches lack safety guarantees is likely one of the most
We make this recommendation based primarily on the important reasons why they are rarely found in mission-
correlation we found using the cross-dimensional analy- critical systems. Currently, only 12 out of the 74 surveyed
sis from Sect. 6.1.2 (Fig. 8)—the more machine learning papers describe navigation systems that can provide some
approaches limit their focus to particular subsystem or form of safety assurance, and each of these does so using
component of a classical navigation system, the bet- a component from a classical system.
ter the overall system seems to perform. Because these Another aspect of classical navigation systems that has
best approaches still rely heavily upon several compo- been largely overlooked by machine learning approaches
nents from classical navigation systems, this observation is that of explainability. Classical methods typically allow
also serves to remind us of the advantages of those sys- human roboticists to log and trace errors back along the
tems. For example, hybrid systems can employ learning navigation pipeline in an understandable and explain-
modules to adapt to new environments, and classical able fashion, which allows them to find and resolve
components to ensure that the system obeys hard safety specific issues so as to improve future system perfor-
constraints. Below, we highlight two critical aspects of mance. However, current learning-based approaches—
classical systems—safety and explainability—that cur- especially those that rely on deep learning—do not exhibit
rent learning-based approaches do not address, thus such explainability. In fact, none of the 43 end-to-end
motivating their continued inclusion via a hybrid system. approaches surveyed even attempted to maintain any
One of the most important system properties that classical notion of explainability, while the other 31 approaches
navigation system components can provide is a guaran- surveyed only partially maintained explainability by
tee of safety, which is typically not present for methods leveraging classical navigation subsystems or compo-
that utilize machine learning only. That is, unlike classical nents.
MPC approaches that utilize bounded uncertainty models, One concrete example of the type of hybrid system
learning methods typically cannot provide any explicit we advocate for as a best practice is one that uses
assurance of safety. For mobile robots moving in the real machine learning to tune the parameters of a classical sys-
world—where colliding with obstacles, other robots, and tem. Although classical approaches have the potential to
even humans would be catastrophic—the importance of conquer a variety of workspaces enabled by their sophis-

123
592 Autonomous Robots (2022) 46:569–597

ticated design, when facing a new environment, a great mance over another learning algorithm, especially only
deal of parameter tuning is required to allow it to adapt. with respect to learning metrics, is, both practically and
This tuning process is not always intuitive, and therefore scientifically, currently of limited value to the commu-
requires expert knowledge of the classical navigation sys- nity concerned with autonomous navigation. Only when
tem along with trial and error. Without parameter tuning, a learning methods start to outperform the state-of-the-art
classical navigation system with default parameters gen- classical approaches does it make sense to focus on these
erally exhibits only mediocre behavior across a range of types of improvements. Instead, when applying learn-
environments. Only six (Bhardwaj et al. 2020; Liu et al. ing methods to navigation problems, we recommend that
2021; Sood et al. 2020; Teso-Fz-Betoño et al. 2019; Xiao researchers focus primarily on metrics that measure real-
et al. 2021a, d) out of the 74 surveyed papers investigated world navigation performance and report where their
how to improve the adaptivity of classical approaches proposed approaches stand with respect to state-of-the-
using machine learning; there is still plenty of work that art techniques from the classical literature.
can be done in this space. With respect to comparing to classical systems using
Finally, while we claim that hybrid systems are the cur- additional metrics other than real-world navigation per-
rent best practice, there exists a large community of formance, we found that the literature on learning-based
researchers interested in navigation systems constructed navigation insufficiently acknowledges important short-
using end-to-end machine learning. With respect to these comings of these approaches. While the extensive engi-
approaches, the literature has shown repeatedly that they neering effort required by classical approaches to naviga-
can be successful at duplicating what has already been tion is often used to motivate the application of machine
achieved by classical approaches, but it has not con- learning approaches as an alternative (e.g., with classical
vincingly shown that end-to-end learning approaches can approaches even attempts to achieve slight performance
advance the state-of-the-art performance for the naviga- increases may require substantial knowledge, laborious
tion task itself. While some duplication work is indeed effort, or expensive hardware and software develop-
necessary in order to provide proofs-of-concept for such ment), similar costs exist for learning-based approaches.
systems, we suggest that this community should now However, we found that the literature on learning-based
move on from that goal, and instead focus explicitly navigation did little to explicitly acknowledge or charac-
on trying to improve navigation performance over that terize the very real costs of hyperparameter search and
afforded by existing systems. Further, with respect to the training overhead. Manual hyperparameter search—e.g.,
aspects of safety and explainability highlighted above as handcrafting reward functions, finding optimal learning
advantages of classical systems, it remains an interest- rates, searching for appropriate neural architectures—was
ing open problem as to whether learning approaches that required for 73 out of 74 of the surveyed approaches to
exhibit these same features can be designed. be successful, and the sole remaining work (Chiang et
2. Recommendation: More complete comparisons of learning- al. 2019) took 12 days to automatically find such param-
based and classical approaches are needed. Across the eters. The costs associated with such searching, which
literature we surveyed, we found that the evaluation are similar to those incurred by manual rule definition
methodologies applied to proposed learning-based tech- and parameter tuning procedures found when trying to
niques to be both inconsistent and incomplete. First, we deploy classical methods, should be clearly identified in
found it surprisingly common for the experimental eval- future work and compared with classical tuning when
uations performed for learning-based approaches to omit possible. Additionally, the data collection and training
classical approaches as a baseline. Second, we found costs associated with learning methods are rarely made
that many papers failed to compare machine learning apparent: training data for navigation is difficult to col-
approaches to classical approaches with respect to sev- lect, and training is typically done offline using high-end
eral relevant metrics, leading to confusion regarding the computational hardware over days or even weeks. For all
relative advantages and disadvantages of each. their shortcomings, classical approaches do not incur such
With respect to using classical approaches as a baseline, costs. Therefore, future work in this area should objec-
we found that much of the literature that proposed new tively consider this trade-off between engineering costs
learning-based approaches primarily chose to limit exper- associated with classical approaches and the costly train-
imental comparison to other learning-based approaches ing overhead associated with learning-based approaches.
(Zeng et al. 2019; Zhelo et al. 2018; Zhou et al. 2019) 3. Future direction: Further development of machine learn-
and some even only used metrics specific to machine ing methods that target the reactive local planning level of
learning concerns, e.g., reward (Zhang et al. 2017a) and autonomous navigation. Based on the work we surveyed,
sample efficiency (Pfeiffer et al. 2018). However, the navigation components that implement predominately
fact that one learning algorithm achieves superior perfor- reactive behaviors seem to have thus far benefited the

123
Autonomous Robots (2022) 46:569–597 593

most from the application of machine learning (Chiang involvement. However, most current learning-based sys-
et al. 2019; Faust et al. 2018; Liu et al. 2021; Richter and tems still separate the training and deployment phases,
Roy 2017; Xiao et al. 2021a, c). Intuitively, this success of even for online RL approaches (due to, e.g., onboard com-
learning for reactive behaviors is because machine learn- putational constraints). This separation means that such
ing approaches are typically at their best when applied systems are not continually learning from deployment
to limited, well-defined tasks. Local planning, in which experience. Only one (Liu et al. 2021) of the 74 papers
we seek to analyze local sensor data to produce immedi- explicitly sought to add this capability and proposed a
ate outputs over limited spatial windows, is exactly such learning-based system that could improve by continu-
a task. Global planning, on the other hand, is typically ally leveraging actual deployment experience outside of a
a much more deliberative process. Take, as an analogy, fixed training phase and entirely onboard a robot. This
human navigation: humans typically perform high-level ability to automatically improve during actual deploy-
deliberation to come up with long-range plans, such as ment using previous successes and failures is one that
how to get from a house or apartment to a local park, but is not exhibited by classical static approaches to navi-
navigation becomes much more reactive at the local level gation. There is a great opportunity now to design such
when needing to respond to immediate situations, such “phased” learning approaches blended in actual deploy-
as avoiding a running dog or moving through extremely ment for continually improving navigation performance.
complex or tight spaces. These reactive behaviors are
difficult to model using rule-based symbolic reasoning,
but are ripe for learning from experience. These reasons
7 Conclusions
explain the initial thrust and successes towards learning
local planners, especially to address challenging reactive
This article has reviewed, in the context of the classical
situations. Only four (Faust et al. 2018; Richter and Roy
mobile robot navigation pipeline, the literature on machine
2017; Xiao et al. 2021a, c) out of the 74 surveyed papers
learning approaches that have been developed for motion
have taken the initial step to investigate local navigation
planning and control in mobile robot navigation. The sur-
with challenging constraints; we believe there is much
veyed papers have been organized in four different ways
more work that could be done in this direction.
so as to highlight their relationships to the classical navi-
4. Future direction: Development of machine learning meth-
gation literature: the scope of the learning methods within
ods that can enable additional navigation behaviors that
the structured navigation systems (Sect. 3), comparison of
are orthogonal to those provided by classical approaches.
the learning methods to what is already achievable using
While classical approaches are well-suited to the problem
classical approaches (Sect. 4), navigational tasks (Sect. 5.1),
of metric (i.e., minimum-energy, collision-free) naviga-
and input modalities (Sect. 5.2). We have discussed each
tion in a static environment, our survey has shown that
surveyed approach from these different perspectives, and
learning-based methods have started to enable qualita-
we have presented high-level analyses of the literature as a
tively different types of navigation. In particular, we found
whole with respect to each perspective. Additionally, we have
that the community has focused thus far primarily on
provided recommendations for the community and identi-
social navigation, and a bit on terrain-aware navigation.
fied promising future research directions in the space of
With these successes, we encourage researchers to inves-
applying machine learning to problems in autonomous nav-
tigate other types of navigation behaviors that might now
igation (Sect. 6). Overall, while there has been a lot of
be possible with machine learning. For example, APPL
separate research on classical mobile robot navigation, and,
(Xiao et al. 2021d) has demonstrated the efficacy of
more recently, learning-based approaches to navigation, we
dynamic parameter adjustment on the fly, which consti-
find that there remain exciting opportunities for advancing
tutes a brand new capability for a classical motion planner.
the state-of-the-art by combining these two paradigms. We
Additionally, researchers may also consider things such
expect that by doing so, we will eventually see robots that
as stylistic navigation, navigation behaviors that change
are able to navigate quickly, smoothly, safely, and reliably
in the presence of particular objects, etc.
through much more constrained, diverse, and challenging
5. Future direction: Further development of machine learn-
environments than is currently possible.
ing components that continually improve based on real
deployment experience. While most traditional navigation Acknowledgements This work has taken place in the Learning Agents
systems require manual intervention by human engi- Research Group (LARG) at the Artificial Intelligence Laboratory,
neers in order to overcome failures or unsatisfactory The University of Texas at Austin. LARG research is supported in
part by Grants from the National Science Foundation (CPS-1739964,
performance during deployment, learning-based systems,
IIS-1724157, NRI-1925082), the Office of Naval Research (N00014-
in theory, should be able to automatically process data 18-2243), Future of Life Institute (RFP2-000), Army Research Office
generated by such events and improve without human (W911NF-19-2-0333), DARPA, Lockheed Martin, General Motors,

123
594 Autonomous Robots (2022) 46:569–597

and Bosch. The views and conclusions contained in this document are Elfes, A. (1989). Using occupancy grids for mobile robot perception
those of the authors alone. Peter Stone serves as the Executive Director and navigation. Computer, 22(6), 46–57.
of Sony AI America and receives financial compensation for this work. Everett, M., Chen, Y. F., & How, J. P. (2018). Motion planning among
The terms of this arrangement have been reviewed and approved by the dynamic, decision-making agents with deep reinforcement learn-
University of Texas at Austin in accordance with its policy on objec- ing. In 2018 IEEE/RSJ international conference on intelligent
tivity in research. We would also like to thank Yifeng Zhu for helpful robots and systems (IROS) (pp. 3052–3059). IEEE.
discussions and suggestions, and Siddharth Rajesh Desai for helping Faust, A., Oslund, K., Ramirez, O., Francis, A., Tapia, L., Fiser, M.,
editing and refining the language for this survey. & Davidson, J. (2018). Prm-rl: Long-range robotic navigation
tasks by combining reinforcement learning and sampling-based
planning. In 2018 IEEE international conference on robotics and
automation (ICRA) (pp. 5113–5120). IEEE.
References Fox, D., Burgard, W., & Thrun, S. (1997). The dynamic window
approach to collision avoidance. IEEE Robotics & Automation
Becker-Ehmck, P., Karl, M., Peters, J., & van der Smagt, P. (2020). Magazine, 4(1), 23–33.
Learning to fly via deep model-based reinforcement learning. Gao, W., Hsu, D., Lee, W. S., Shen, S., & Subramanian, K. (2017).
arXiv preprint arXiv:2003.08876 Intention-net: Integrating planning and deep learning for goal-
Bhardwaj, M., Boots, B., & Mukadam, M. (2020). Differentiable directed autonomous navigation. In Conference on robot learning
Gaussian process motion planning. In 2020 IEEE international (pp. 185–194). PMLR.
conference on robotics and automation (ICRA) (pp. 10598– Giusti, A., Guzzi, J., Cireşan, D. C., He, F. L., Rodríguez, J. P., Fontana,
10604). IEEE. F., et al. (2015). A machine learning approach to visual perception
Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., of forest trails for mobile robots. IEEE Robotics and Automation
Goyal, P., Jackel, L. D., Monfort, M., Muller, U., Zhang, J., et al. Letters, 1(2), 661–667.
(2016). End to end learning for self-driving cars. arXiv preprint Godoy, J., Chen, T., Guy, S. J., Karamouzas, I., & Gini, M. (2018).
arXiv:1604.07316 ALAN: Adaptive learning for multi-agent navigation. Autonomous
Bruce, J., Sünderhauf, N., Mirowski, P., Hadsell, R., & Milford, M. Robots, 42(8), 1543–1562.
(2017). One-shot reinforcement learning for robot navigation with Gupta, S., Davidson, J., Levine, S., Sukthankar, R.,& Malik, J. (2017)
interactive replay. arXiv preprint arXiv:1711.10137 Cognitive mapping and planning for visual navigation. In Pro-
Chen, C., Liu, Y., Kreiss, S., & Alahi, A. (2019). Crowd–robot ceedings of the IEEE conference on computer vision and pattern
interaction: Crowd-aware robot navigation with attention-based recognition (pp. 2616–2625).
deep reinforcement learning. In 2019 international conference on Gupta, S., Fouhey, D., Levine, S., & Malik, J. (2017). Unifying map
robotics and automation (ICRA) (pp. 6015–6022). IEEE. and landmark based representations for visual navigation. arXiv
Chen, Y. F., Everett, M., Liu, M., & How, J. P. (2017). Socially preprint arXiv:1712.08125
aware motion planning with deep reinforcement learning. In 2017 Hart, P., Nilsson, N., & Raphael, B. (1968). A formal basis for the
IEEE/RSJ international conference on intelligent robots and sys- heuristic determination of minimum cost paths. IEEE Transactions
tems (IROS) (pp. 1343–1350). IEEE. on Systems Science and Cybernetics, 4(2), 100–107. https://doi.
Chen, Y. F., Liu, M., Everett, M., & How, J. P. (2017). Decentral- org/10.1109/tssc.1968.300136.
ized non-communicating multiagent collision avoidance with deep Henry, P., Vollmer, C., Ferris, B., & Fox, D. (2010). Learning to navi-
reinforcement learning. In 2017 IEEE international conference on gate through crowded environments. In 2010 IEEE international
robotics and automation (ICRA) (pp. 285–292). IEEE conference on robotics and automation (pp. 981–986). IEEE.
Chiang, H. T. L., Faust, A., Fiser, M., & Francis, A. (2019). Learning Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory.
navigation behaviors end-to-end with autorl. IEEE Robotics and Neural Computation, 9(8), 1735–1780.
Automation Letters, 4(2), 2007–2014. Jaillet, L., Cortés, J., & Siméon, T. (2010). Sampling-based path
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical eval- planning on configuration-space costmaps. IEEE Transactions on
uation of gated recurrent neural networks on sequence modeling. Robotics, 26(4), 635–646.
arXiv preprint arXiv:1412.3555 Jiang, P., Osteen, P., Wigness, M., & Saripalli, S. (2021). Rellis-3d
Codevilla, F., Miiller, M., López, A., Koltun, V., & Dosovitskiy, A. dataset: Data, benchmarks and analysis. In 2021 IEEE inter-
(2018). End-to-end driving via conditional imitation learning. In national conference on robotics and automation (ICRA) (pp.
2018 IEEE international conference on robotics and automation 1110–1116). IEEE.
(ICRA) (pp. 1–9). IEEE. Jin, J., Nguyen, N. M., Sakib, N., Graves, D., Yao, H., & Jagersand, M.
Daniel, K., Nash, A., Koenig, S., & Felner, A. (2010). Theta*: Any-angle (2020). Mapless navigation among dynamics with social-safety-
path planning on grids. Journal of Artificial Intelligence Research, awareness: A reinforcement learning approach from 2d laser scans.
39, 533–579. In 2020 IEEE international conference on robotics and automation
Dennis, M., Jaques, N., Vinitsky, E., Bayen, A., Russell, S., Critch, A., (ICRA) (pp. 6979–6985). IEEE.
& Levine, S. (2020). Emergent complexity and zero-shot trans- Johnson, C., & Kuipers, B. (2018). Socially-aware navigation using
fer via unsupervised environment design. In Advances in neural topological maps and social norm learning. In Proceedings of the
information processing systems (Vol. 33, pp. 13049–13061). Cur- 2018 AAAI/ACM conference on AI, ethics, and society (pp. 151–
ran Associates, Inc. 157).
Dijkstra, E. W. (1959). A note on two problems in connexion with Kahn, G., Abbeel, P., & Levine, S. (2021). Badgr: An autonomous self-
graphs. Numerische Mathematik, 1(1), 269–271. supervised learning-based navigation system. IEEE Robotics and
Ding, W., Li, S., Qian, H., & Chen, Y. (2018). Hierarchical rein- Automation Letters, 6(2), 1312–1319.
forcement learning framework towards multi-agent navigation. In Kahn, G., Villaflor, A., Abbeel, P., & Levine, S. (2018) Compos-
2018 IEEE international conference on robotics and biomimetics able action-conditioned predictors: Flexible off-policy learning for
(ROBIO) (pp. 237–242). IEEE. robot navigation. In Conference on robot learning (pp. 806–816).
Durrant-Whyte, H., & Bailey, T. (2006). Simultaneous localization and PMLR.
mapping: Part I. IEEE Robotics & Automation Magazine, 13(2), Kahn, G., Villaflor, A., Ding, B., Abbeel, P., & Levine, S. (2018).
99–110. Self-supervised deep reinforcement learning with generalized

123
Autonomous Robots (2022) 46:569–597 595

computation graphs for robot navigation. In 2018 IEEE interna- Martins, G. S., Rocha, R. P., Pais, F. J., & Menezes, P. (2019). Clus-
tional conference on robotics and automation (ICRA) (pp. 1–8). ternav: Learning-based robust navigation operating in cluttered
IEEE. environments. In 2019 international conference on robotics and
Karaman, S., & Frazzoli, E. (2011). Sampling-based algorithms for automation (ICRA) (pp. 9624–9630). IEEE.
optimal motion planning. The International Journal of Robotics Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley,
Research, 30(7), 846–894. T., Silver, D., & Kavukcuoglu, K. (2016). Asynchronous methods
Kavraki, L. E., Svestka, P., Latombe, J. C., & Overmars, M. H. (1996). for deep reinforcement learning. In International conference on
Probabilistic roadmaps for path planning in high-dimensional con- machine learning (pp. 1928–1937).
figuration spaces. IEEE Transactions on Robotics and Automation, Nistér, D., Naroditsky, O., & Bergen, J. (2004). Visual odometry. In
12(4), 566–580. Proceedings of the 2004 IEEE computer society conference on
Khan, A., Zhang, C., Atanasov, N., Karydis, K., Kumar, V., & Lee, D. computer vision and pattern recognition, 2004. CVPR 2004 (Vol.
D. (2018). Memory augmented control networks. In International 1, p. I). IEEE.
conference on learning representations (ICLR). Okal, B., & Arras, K. O. (2016). Learning socially normative robot nav-
Kim, B., & Pineau, J. (2016). Socially adaptive path planning in human igation behaviors with Bayesian inverse reinforcement learning. In
environments using inverse reinforcement learning. International 2016 IEEE international conference on robotics and automation
Journal of Social Robotics, 8(1), 51–66. (ICRA) (pp. 2889–2895). IEEE.
Koenig, S., & Likhachev, M. (2002). Dˆ∗ lite. In AAAI/IAAI (Vol. 15). OSRF. (2018). Ros wiki move_base. http://wiki.ros.org/move_base
Kretzschmar, H., Spies, M., Sprunk, C., & Burgard, W. (2016). Socially Palmieri, L., & Arras, K. O. (2014). Efficient and smooth RRT motion
compliant mobile robot navigation via inverse reinforcement learn- planning using a novel extend function for wheeled mobile robots.
ing. The International Journal of Robotics Research, 35(11), In IEEE/RSJ international conference on intelligent robots and
1289–1307. systems (IROS) (pp. 205–211).
Kroemer, O., Niekum, S., & Konidaris, G. (2021). A review of robot Pan, Y., Cheng, C. A., Saigol, K., Lee, K., Yan, X., Theodorou, E.
learning for manipulation: Challenges, representations, and algo- A., & Boots, B. (2020). Imitation learning for agile autonomous
rithms. Journal of Machine Learning Research, 22, 30–1. driving. The International Journal of Robotics Research, 39(2–3),
LaValle, S. M. (1998). Rapidly-exploring random trees: A new tool for 286–302.
path planning. Park, J. J. (2016). Graceful navigation for mobile robots in dynamic and
LaValle, S. M. (2006). Planning algorithms. Cambridge University uncertain environments. Ph.D. thesis.
Press. Pérez-Higueras, N., Caballero, F., & Merino, L. (2018). Learning
LeCunn, Y., Muller, U., Ben, J., Cosatto, E., & Flepp, B. (2006). Off- human-aware path planning with fully convolutional networks. In
road obstacle avoidance through end-to-end learning. In Advances 2018 IEEE international conference on robotics and automation
in neural information processing systems (pp. 739–746). (ICRA) (pp. 1–5). IEEE.
Li, M., Jiang, R., Ge, S. S., & Lee, T. H. (2018). Role playing learning for Pérez-Higueras, N., Caballero, F., & Merino, L. (2018). Teaching robot
socially concomitant mobile robot navigation. CAAI Transactions navigation behaviors to optimal RRT planners. International Jour-
on Intelligence Technology, 3(1), 49–58. nal of Social Robotics, 10(2), 235–249.
Liang, J., Patel, U., Sathyamoorthy, A. J., & Manocha, D. (2020). Pfeiffer, M., Schaeuble, M., Nieto, J., Siegwart, R., & Cadena, C. (2017).
Crowd-steer: Realtime smooth and collision-free robot navigation From perception to decision: A data-driven approach to end-to-end
in densely crowded scenarios trained using high-fidelity simula- motion planning for autonomous ground robots. In 2017 IEEE
tion. In IJCAI (pp. 4221–4228). international conference on robotics and automation (ICRA) (pp.
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., 1527–1533). IEEE.
Silver, D., & Wierstra, D. (2015). Continuous control with deep Pfeiffer, M., Schwesinger, U., Sommer, H., Galceran, E., & Siegwart,
reinforcement learning. arXiv preprint arXiv:1509.02971 R. (2016). Predicting actions to act predictably: Cooperative par-
Lin, J., Wang, L., Gao, F., Shen, S., & Zhang, F. (2019). Flying through tial motion planning with maximum entropy models. In 2016
a narrow gap using neural network: An end-to-end planning and IEEE/RSJ international conference on intelligent robots and sys-
control approach. In 2019 IEEE/RSJ international conference on tems (IROS) (pp. 2096–2101). IEEE.
intelligent robots and systems (IROS) (pp. 3526–3533). IEEE. Pfeiffer, M., Shukla, S., Turchetta, M., Cadena, C., Krause, A., Sieg-
Liu, B., Xiao, X., & Stone, P. (2021). A lifelong learning approach to wart, R., & Nieto, J. (2018). Reinforced imitation: Sample efficient
mobile robot navigation. IEEE Robotics and Automation Letters, deep reinforcement learning for mapless navigation by leveraging
6(2), 1090–1096. prior demonstrations. IEEE Robotics and Automation Letters, 3(4),
Long, P., Fanl, T., Liao, X., Liu, W., Zhang, H., & Pan, J. (2018). Towards 4423–4430.
optimally decentralized multi-robot collision avoidance via deep Pokle, A., Martín-Martín, R., Goebel, P., Chow, V., Ewald, H. M., Yang,
reinforcement learning. In 2018 IEEE international conference on J., Wang, Z., Sadeghian, A., Sadigh, D., Savarese, S.,et al. (2019).
robotics and automation (ICRA) (pp. 6252–6259). IEEE. Deep local trajectory replanning and control for robot naviga-
Lopez-Paz, D., & Ranzato, M. (2017). Gradient episodic memory for tion. In 2019 international conference on robotics and automation
continual learning. In Advances in neural information processing (ICRA) (pp. 5815–5822). IEEE.
systems (pp. 6467–6476). Pomerleau, D. A. (1989). Alvinn: An autonomous land vehicle in a neu-
Loquercio, A., Maqueda, A. I., Del-Blanco, C. R., & Scaramuzza, D. ral network. In Advances in neural information processing systems
(2018). Dronet: Learning to fly by driving. IEEE Robotics and (pp. 305–313).
Automation Letters, 3(2), 1088–1095. Quinlan, S., & Khatib, O. (1993). Elastic bands: Connecting path
Lu, D. V., Hershberger, D., & Smart, W. D. (2014). Layered costmaps planning and control. In [1993] Proceedings IEEE international
for context-sensitive navigation. In 2014 IEEE/RSJ international conference on robotics and automation (pp. 802–807). IEEE.
conference on intelligent robots and systems (pp. 709–715). IEEE. Ramachandran, D., & Amir, E. (2007). Bayesian inverse reinforcement
Luber, M., Spinello, L., Silva, J., & Arras, K. O. (2012). Socially-aware learning. In IJCAI (Vol. 7, pp. 2586–2591).
robot navigation: A learning approach. In 2012 IEEE/RSJ interna- Richter, C., & Roy, N. (2017). Safe visual navigation via deep learning
tional conference on intelligent robots and systems (pp. 902–907). and novelty detection. In Robotics: Science and systems (RSS).
IEEE. Ross, S., Gordon, G., & Bagnell, D. (2011). A reduction of imitation
learning and structured prediction to no-regret online learning. In

123
596 Autonomous Robots (2022) 46:569–597

Proceedings of the fourteenth international conference on artificial Van Den Berg, J., Guy, S. J., Lin, M., & Manocha, D. (2011). Recipro-
intelligence and statistics (pp. 627–635). cal n-body collision avoidance. In Robotics research (pp. 3–19).
Ross, S., Melik-Barkhudarov, N., Shankar, K. S., Wendel, A., Dey, D., Springer.
Bagnell, J. A., & Hebert, M. (2013). Learning monocular reactive Wang, Y., He, H., & Sun, C. (2018). Learning to navigate through
UAV control in cluttered natural environments. In 2013 IEEE inter- complex dynamic environment with modular deep reinforcement
national conference on robotics and automation (pp. 1765–1772). learning. IEEE Transactions on Games, 10(4), 400–412.
IEEE. Wang, Z., Xiao, X., Liu, B., Warnell, G., & Stone, P. (2021). Appli:
Russell, S. J., & Norvig, P. (2016). Artificial intelligence: A modern Adaptive planner parameter learning from interventions. In 2021
approach. Pearson Education Limited. IEEE international conference on robotics and automation (ICRA)
Sadeghi, F., & Levine, S. (2017). CAD2RL: Real single-image flight (pp. 6079–6085). IEEE.
without a single real image. In Robotics: Science and systems Wang, Z., Xiao, X., Nettekoven, A. J., Umasankar, K., Singh, A., Bom-
(RSS). makanti, S., Topcu, U., & Stone, P. (2021). From agile ground to
Sepulveda, G., Niebles, J. C., & Soto, A. (2018). A deep learning based aerial navigation: Learning from learned hallucination. In 2021
behavioral approach to indoor autonomous navigation. In 2018 IEEE/RSJ international conference on intelligent robots and sys-
IEEE international conference on robotics and automation (ICRA) tems (IROS). IEEE.
(pp. 4646–4653). IEEE. Wang, Z., Xiao, X., Warnell, G., & Stone, P. (2021). Apple: Adap-
Sergeant, J., Sünderhauf, N., Milford, M., & Upcroft, B. (2015). Mul- tive planner parameter learning from evaluative feedback. IEEE
timodal deep autoencoders for control of a mobile robot. In Robotics and Automation Letters, 6(4), 7744–7749.
Proceedings of Australasian conference for robotics and automa- Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine Learning,
tion (ACRA). 8(3–4), 279–292.
Shiarlis, K., Messias, J., & Whiteson, S. (2017). Rapidly exploring Wigness, M., Rogers, J. G., & Navarro-Serment, L. E. (2018). Robot
learning trees. In 2017 IEEE international conference on robotics navigation from human demonstration: Learning control behav-
and automation (ICRA) (pp. 1541–1548). IEEE. iors. In 2018 IEEE international conference on robotics and
Siva, S., Wigness, M., Rogers, J., & Zhang, H. (2019). Robot adaptation automation (ICRA) (pp. 1150–1157). IEEE.
to unstructured terrains by joint representation and apprenticeship Xiao, X., Biswas, J., & Stone, P. (2021a). Learning inverse kinodynam-
learning. In Robotics: Science and systems (RSS). ics for accurate high-speed off-road navigation on unstructured
Sood, R., Vats, S., & Likhachev, M. (2020). Learning to use adaptive terrain. IEEE Robotics and Automation Letters, 6(3), 6054–6060.
motion primitives in search-based planning for navigation. In 2020 Xiao, X., Liu, B., & Stone, P. (2021b). Agile robot navigation through
IEEE/RSJ international conference on intelligent robots and sys- hallucinated learning and sober deployment. In 2021 IEEE inter-
tems (IROS) (pp. 6923–6929). IEEE. national conference on robotics and automation (ICRA) (pp.
Stein, G. J., Bradley, C., & Roy, N. (2018). Learning over subgoals 7316–7322). IEEE.
for efficient navigation of structured, unknown environments. In Xiao, X., Liu, B., Warnell, G., Fink, J., & Stone, P. (2020). Appld:
Conference on robot learning (pp. 213–222). Adaptive planner parameter learning from demonstration. IEEE
Stratonovich, R. L. (1965). Conditional Markov processes. In Non- Robotics and Automation Letters, 5(3), 4541–4547.
linear transformations of stochastic processes (pp. 427–453). Xiao, X., Liu, B., Warnell, G., & Stone, P. (2021c). Toward agile maneu-
Elsevier. vers in highly constrained spaces: Learning from hallucination.
Tai, L., Li, S., & Liu, M. (2016). A deep-network solution towards IEEE Robotics and Automation Letters, 6(2), 1503–1510.
model-less obstacle avoidance. In 2016 IEEE/RSJ international Xiao, X., Wang, Z., Xu, Z., Liu, B., Warnell, G., Dhamankar, G., Nair,
conference on intelligent robots and systems (IROS) (pp. 2759– A., & Stone, P. (2021d). Appl: Adaptive planner parameter learn-
2764). IEEE. ing. arXiv preprint arXiv:2105.07620
Tai, L., & Liu, M. (2016). Deep-learning in mobile robotics-from per- Xie, L., Wang, S., Rosa, S., Markham, A., & Trigoni, N. (2018).
ception to control systems: A survey on why and why not. arXiv Learning with training wheels: Speeding up training with a sim-
preprint arXiv:1612.07139 ple controller for deep reinforcement learning. In 2018 IEEE
Tai, L., Paolo, G., & Liu, M. (2017). Virtual-to-real deep reinforcement international conference on robotics and automation (ICRA) (pp.
learning: Continuous control of mobile robots for mapless navi- 6276–6283). IEEE.
gation. In 2017 IEEE/RSJ international conference on intelligent Xu, Z., Dhamankar, G., Nair, A., Xiao, X., Warnell, G., Liu, B., Wang,
robots and systems (IROS) (pp. 31–36). IEEE. Z., & Stone, P. (2021). Applr: Adaptive planner parameter learning
Tai, L., Zhang, J., Liu, M., Boedecker, J., & Burgard, W. (2016). A from reinforcement. In 2021 IEEE international conference on
survey of deep network solutions for learning control in robotics: robotics and automation (ICRA) (pp. 6086–6092). IEEE.
From reinforcement to imitation. arXiv preprint arXiv:1612.07139 Yao, X., Zhang, J., & Oh, J. (2019). Following social groups: Socially
Tai, L., Zhang, J., Liu, M., & Burgard, W. (2018). Socially compliant compliant autonomous navigation in dense crowds. arXiv preprint
navigation through raw depth inputs with generative adversar- arXiv:1911.12063
ial imitation learning. In 2018 IEEE international conference on Zeng, J., Ju, R., Qin, L., Hu, Y., Yin, Q., & Hu, C. (2019). Navigation
robotics and automation (ICRA) (pp. 1111–1117). IEEE. in unknown dynamic environments based on deep reinforcement
Tamar, A., Wu, Y., Thomas, G., Levine, S., & Abbeel, P. (2016). Value learning. Sensors, 19(18), 3837.
iteration networks. In Advances in neural information processing Zhang, J., Springenberg, J. T., Boedecker, J., & Burgard, W. (2017).
systems (pp. 2154–2162). Deep reinforcement learning with successor features for naviga-
Teso-Fz-Betoño, D., Zulueta, E., Fernandez-Gamiz, U., Saenz-Aguirre, tion across similar environments. In 2017 IEEE/RSJ international
A., & Martinez, R. (2019). Predictive dynamic window approach conference on intelligent robots and systems (IROS) (pp. 2371–
development with artificial neural fuzzy inference improvement. 2378). IEEE.
Electronics, 8(9), 935. Zhang, J., Tai, L., Boedecker, J., Burgard, W., & Liu, M. (2017). Neural
Thrun, S. (1995). An approach to learning mobile robot navigation. slam: Learning to explore with external memory. arXiv preprint
Robotics and Autonomous Systems, 15(4), 301–319. arXiv:1706.09520
Ullman, S. (1979). The interpretation of structure from motion. Pro- Zhang, T., Kahn, G., Levine, S., & Abbeel, P. (2016). Learning deep
ceedings of the Royal Society of London. Series B. Biological control policies for autonomous aerial vehicles with MPC-guided
Sciences, 203(1153), 405–426.

123
Autonomous Robots (2022) 46:569–597 597

policy search. In 2016 IEEE international conference on robotics Garrett Warnell is a research sci-
and automation (ICRA) (pp. 528–535). IEEE. entist with Army Research Labo-
Zhao, L., & Roh, M. I. (2019). Colregs-compliant multiship collision ratory’s Computational and Infor-
avoidance based on deep reinforcement learning. Ocean Engineer- mation Sciences Directorate. He
ing, 191, 106436. received BS degrees in mathe-
Zhelo, O., Zhang, J., Tai, L., Liu, M., & Burgard, W. (2018). Curiosity- matics and computer engineering
driven exploration for mapless navigation with deep reinforcement from Michigan State University in
learning. arXiv preprint arXiv:1804.00456 2009, and MS and Ph.D. degrees
Zhou, X., Gao, Y., & Guan, L. (2019). Towards goal-directed naviga- in electrical engineering from Uni-
tion through combining learning based global and local planners. versity of Maryland in 2013 and
Sensors, 19(1), 176. 2014, respectively. He joined Army
Zhu, Y., Mottaghi, R., Kolve, E., Lim, J. J., Gupta, A., Fei-Fei, L., Research Laboratory in 2014. In
& Farhadi, A. (2017). Target-driven visual navigation in indoor 2016, he became part of the ARL
scenes using deep reinforcement learning. In 2017 IEEE inter- South extended campus commu-
national conference on robotics and automation (ICRA) (pp. nity, and joined The University
3357–3364). IEEE. of Texas at Austin Department of Computer Science as a visiting
Zhu, Y., Schwab, D., & Veloso, M. (2019). Learning primitive skills for researcher. His research interests are broadly in the areas of robotics,
mobile robots. In 2019 international conference on robotics and machine learning, and artificial intelligence, with current focuses on
automation (ICRA) (pp. 7597–7603). IEEE. online and human-in-the-loop machine learning.
Ziebart, B. D., Maas, A. L., Bagnell, J. A., & Dey, A. K. (2008). Maxi-
mum entropy inverse reinforcement learning. In AAAI (Vol. 8, pp.
1433–1438). Peter Stone is the founder and
director of the Learning Agents
Research Group (LARG) within
Publisher’s Note Springer Nature remains neutral with regard to juris- the Artificial Intelligence Labora-
dictional claims in published maps and institutional affiliations. tory in the Department of Com-
puter Science at The University of
Texas at Austin, as well as asso-
ciate department chair and Direc-
Xuesu Xiao is a postdoctoral fel- tor of Texas Robotics. Dr. Stone’s
low in the Department of Com- main research interest in AI is
puter Science at The University understanding how we can best
of Texas at Austin. Dr. Xiao’s create complete intelligent agents.
research focuses on field robotics, Dr. Stone considers adaptation, inter-
motion planning, and machine learn- action, and embodiment to be essen-
ing. He develops highly capable tial capabilities of such agents.
and intelligent mobile robots that Thus, Dr Stone’s research focuses mainly on machine learning, multi-
are robustly deployable in the real agent systems, and robotics.
world with minimal human super-
vision. Dr. Xiao received his Ph.D.
in Computer Science from Texas
A&M University in 2019, Master
of Science in Mechanical Engi-
neering from Carnegie Mellon Uni-
versity in 2015, and dual Bachelor of Science in Mechatronics Engi-
neering from Tongji University and FH Aachen University of Applied
Sciences in 2013.

Bo Liu is a Computer Science


Ph.D. student at the university of
Texas at Austin. Bo’s research inter-
est lies in reinforcement learn-
ing and robotics. In particular, Bo
aims to develop long-term autonomous
agent that continually learns over
different environments. Bo receives
his bachelor degree of science in
computer engineering from the Johns
Hopkins university in 2017 and
his master of science in computer
science from the Stanford univer-
sity in 2019.

123

You might also like