Benchmarking Reinforcement Learning Techniques For Autonomous Navigation

Uploaded by

asikuzzamananik0

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views7 pages

Benchmarking Reinforcement Learning Techniques For Autonomous Navigation

Uploaded by

asikuzzamananik0

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA 2023),

London, England, May 2023

Benchmarking Reinforcement Learning Techniques for Autonomous Navigation

Zifan Xu1 , Bo Liu1 , Xuesu Xiao2,3 , Anirudh Nair1 , and Peter Stone1,4

Abstract— Deep reinforcement learning (RL) has brought fronts, including safety [5]–[7], generalizability [8]–[11],
many successes for autonomous robot navigation. However, sample efficiency [12], [13], and addressing temporal data
there still exists important limitations that prevent real-world [14]–[16]. For the problem of navigation, learned naviga-
use of RL-based navigation systems. For example, most learning
approaches lack safety guarantees; and learned navigation tion systems from RL [17] have the potential to relieve
systems may not generalize well to unseen environments. roboticists from extensive engineering efforts [18]–[22] spent
Despite a variety of recent learning techniques to tackle these on developing and fine-tuning classical systems. Moreover,
challenges in general, a lack of an open-source benchmark a simple case study conducted in five randomly generated
arXiv:2210.04839v2 [cs.RO] 27 Jun 2023

and reproducible learning methods specifically for autonomous obstacle courses where classical navigation systems often
navigation makes it difficult for roboticists to choose what
learning methods to use for their mobile robots and for learning fail shows that a RL-based navigation has the potential to
researchers to identify current shortcomings of general learning achieve superior behaviors in terms of successful collision
methods for autonomous navigation. In this paper, we identify avoidance and goal reaching (Fig. 1 left).
four major desiderata of applying deep RL approaches for Despite such promising advantages, learning-based nav-
autonomous navigation: (D1) reasoning under uncertainty, (D2) igation systems are far from finding their way into real-
safety, (D3) learning from limited trial-and-error data, and (D4)
generalization to diverse and novel environments. Then, we world robotics use cases, which currently still rely heavily
explore four major classes of learning techniques with the on their classical counterparts. Such reluctance in adopting
purpose of achieving one or more of the four desiderata: learning-based systems in the real world stems from a
memory-based neural network architectures (D1), safe RL (D2), series of fundamental limitations of learning methods, e.g.,
model-based RL (D2, D3), and domain randomization (D4). By lack of safety, explainability, and generalizability. To make
deploying these learning techniques in a new open-source large-
scale navigation benchmark and real-world environments, we things even worse, a lack of well-established comparison
perform a comprehensive study aimed at establishing to what metrics and reproducible learning methods further obfuscates
extent can these techniques achieve these desiderata for RL- the effects of different learning approaches on navigation
based navigation systems. across both the robotics and learning communities, making
it difficult to assess the state of the art and therefore to adopt
I. I NTRODUCTION
learned navigation systems in the real world.
Autonomous robot navigation, i.e., moving a robot from To facilitate research in developing RL-based navigation
one point to another without colliding with any obstacle, systems with the goal of deploying them in real-world
has been studied by the robotics community for decades. scenarios, we introduce a new open-source large-scale nav-
Classical navigation systems [1], [2] can successfully solve igation benchmark with a variety of challenging, highly
such navigation problem in many real-world scenarios, e.g., constrained obstacle courses to evaluate different learning
handling noisy, partially observable sensory input but still approaches, along with the implementation of several state-
providing verifiable collision-free safety guarantees. How- of-the-art RL algorithms. The obstacle courses resemble
ever, these systems require extensive engineering effort, highly-constraint real-world navigation environments (Fig.
and can still be brittle in challenging scenarios, e.g., in 1 right), and present major challenges to existing classical
highly constrained environments. This is reflected by a recent navigation systems, while RL-based navigation systems have
competition (The BARN Challenge [3]) held in ICRA 2022, the potential to perform well in them (Fig. 1 left).
which suggests that even experienced roboticists tend to We identify four major desiderata that ought to be fulfilled
underestimate how difficult navigation scenarios are for real by any learning-based system that is to be deployed: (D1)
robots. Recently, data-driven approaches have also been used reasoning under uncertainty of partially observed sensory
to tackle the navigation problem [4] thanks to advances in the inputs, (D2) safety, (D3) learning from limited trial-and-
machine learning community. In particular, Reinforcement error data, and (D4) generalization to diverse and novel
Learning (RL), i.e., learning from self-supervised trial-and- environments. By deploying four major classes of learning
error data, has achieved tremendous progress on multiple techniques: memory-based neural network architectures, safe
1
RL, model-based RL, and domain randomization, we perform
Department of Computer Science, University of Texas at Austin 2 Department
of Computer Science, George Mason University 3 Everyday Robots 4 Sony AI. This
extensive experiments and empirically compare a large range
work has taken place in the Learning Agents Research Group (LARG) at UT Austin. of RL-based methods based on the degree to which they
LARG research is supported in part by NSF (CPS-1739964, IIS-1724157, NRI-
1925082), ONR (N00014-18-2243), FLI (RFP2-000), ARO (W911NF-19-2-0333),
achieve each of these desiderata. Moreover, by deploying six
DARPA, Lockheed Martin, GM, and Bosch. Peter Stone serves as the Executive selected navigation systems in three qualitatively different
Director of Sony AI America and receives financial compensation for this work. The
terms of this arrangement have been reviewed and approved by the University of Texas
real-world navigation environments, we investigate to what
at Austin in accordance with its policy on objectivity in research. degree the conclusions drawn from the benchmark can be
Fig. 1: Left: Success rate of two classical navigation systems, DWA [2] (red) and E-band [1] (blue), and vanilla end-to-end
RL-based (green) navigation systems (individually trained) in five randomly generated difficult obstacle courses. The insets
at the top show top-down views of the five obstacle courses. Right: Navigation environments in the real world (left) and the
proposed benchmark (right) are similar to the robot perception system (e.g., white/red laser scans and cyan/purple costmaps).

applied to the real world. Supplementary videos and material expensive compared to other RL domains, e.g., MuJuCo or
for this work are available on the project webpage.1 Atari games [23], [24], which presents a high requirement
for sample efficiency. Most prior works have used off-
II. D ESIDERATA FOR L EARNING - BASED NAVIGATION policy RL algorithms to improve sample efficiency with
In this section, we introduce four desiderata for learning- experience replay [25], [26]. In addition, model-based RL
based autonomous navigation systems and briefly discuss the methods can explicitly improve sample efficiency, and are
learning techniques as their corresponding solutions. widely used in robot control problems. In this study, we
(D1) reasoning under uncertainty of partially observed compare two common classes of model-based RL method
sensory inputs. Autonomous navigation without explicit [12], [13] combined with an off-policy RL algorithm, and
mapping and localization is usually formalized as a Partially empirically study to what extent model-based approaches
Observable Markov Decision Process (POMDP), where the improve sample efficiency when provided with different
agent produces the motion of the robot only based on limited amounts of data.
sensory inputs that are usually not sufficient to recover the (D4) generalization to diverse and novel environments.
full state of the navigation environment. Most RL approaches The ultimate goal of deep RL approaches for autonomous
solve POMDPs by maintaining a history of past observations navigation is to learn a generalizable policy for all kinds
and actions [14], [15]. Then, neural network architectures of navigation environments in the real world. A simple
like Recurrent Neural Networks (RNNs) that process sequen- strategy is to train the agent in as many diverse navigation
tial data are employed to encode history and address partial environments as possible or domain randomization, but it is
observability. In this study, we investigate various design unclear what is the necessary amount of training environ-
choices of history-dependent architectures. ments to efficiently achieve good generalization. Utilizing
(D2) safety. Even though in some cases deep RL methods the large-scale navigation benchmark proposed in this paper,
achieve comparable performance to classical navigation, they we empirically study the dependence of generalization on
still suffer from poor explainability and do not guarantee the number of training environments.
collision-free navigation. The lack of safety guarantee is a
major challenge preventing RL-based navigation from being III. NAVIGATION B ENCHMARK
used in the real world. Prior works have addressed this This section details the proposed navigation benchmark
challenge by formalizing the navigation as a multi-objective for RL-based navigation systems, which aims to provide a
problem that treats collision avoidance as a separate objective unified and comprehensive testbed for future autonomous
from reaching the goal and solving it with Lagrangian or navigation research. First, Sec. III-A discusses the difference
Lyapunov-based methods [5]. For simplicity, we only explore between the proposed benchmark and existing navigation
Lagrangian method and investigate whether explicitly treat benchmarks. In Sec. III-B and III-C, the navigation task is
safety as a separate objective leads to safer and smoother formally defined and formulated as a POMDP. More detailed
learned navigation behavior. background of MDP and POMDP can be found on the
(D3) learning from limited trial-and-error data. Al- project webpage. Finally, Sec. III-D introduces simulated and
though deep RL approaches can alleviate roboticists from real-world environments that benchmark different aspects of
extensive engineering effort, a large amount of data is navigation performance.
still required to train a typical deep RL agent. However,
A. Existing Navigation Benchmarks
autonomous navigation data is usually expensive to collect
in the real world. Therefore, data collection is usually con- Our proposed benchmark differs from existing benchmarks
ducted in simulation, e.g., in the Robot Operating System in three aspects: (1) high-fidelity physics: the navigation
(ROS) Gazebo simulator, which provides an easy interface tasks are simulated by Gazebo [27], which is based on realis-
with real-world robots. However, simulating a full navigation tic physical dynamics and therefore tests motion planners that
stack from perception to actuation is more computationally directly produce low-level motion commands, i.e., linear and
angular velocies, in contrast to high-level instructions such as
1 https://cs.gmu.edu/ xiao/Research/RLNavBenchmark/ turn left, turn right, move forward [28], [29]. In other words,
we focus on “how to navigate” (motion planning), instead of
“where to navigate” (path planning); (2) ROS integration:
our benchmark is based on ROS [30], which allows seamless
transfer of a navigation method developed and benchmarked
in simulation directly onto a physical robot with little (if
any) effort; and (3) collision-free navigation: the benchmark
includes both static and dynamic environments, and requires
collision-free navigation, whereas other benchmarks assume Fig. 2: Three types of navigation environments: static
that either collisions are possible [29] or collision-avoidance (left), dynamic box (middle), and dynamic-wall
will be addressed by other low-level controllers out of the (right). The red squares mark the obstacle fields, and
scope of the benchmark [28]. A special case is the photo- the yellow arrows mark the direction of navigation. In
realistic interactive Gibson benchmark by Xia et. al. [31], dynamic-wall, the green (blue) arrows indicate the case
which intentionally allows physical interaction with objects when the two walls are moving apart (together). In dynamic
(e.g., pushing) and therefore pose no challenges to the box, the red arrows indicate the velocities of obstacles.
collision-avoidance system. agent, which matches with the objective of the navigation
task in Definition 1. The second and third terms are auxiliary
B. Navigation Problem Definition
rewards that facilitate the training by encouraging local
Definition 1 (Robot Navigation Problem). Situated within progress and penalizing collisions.
a navigation environment e which includes information of We perform a grid search over different values of the
all the obstacle locations at any time t, a start location coefficients in this reward function, and the result shows
(xi , yi ), a start orientation θi , and a goal location (xg , yg ), that the auxiliary reward term (dt−1 − dt ) is necessary for
the navigation problem Te is to maximize the probability p successful training, and a much smaller coefficient bp relative
of a mobile robot reaching the goal location from the start to bf can lead to a better asymptotic performance. The agent
location and orientation under a constraint on the number can learn without the penalty reward for collision (bc = 0),
of collisions with any obstacle C < 1 and a time limit but a moderate value of bc can improve the asymptotic
t < Tmax . performance and speed up training. For all the experiments
A navigation problem can be formally defined as above. in this paper, we fix the coefficients as bf = 20, bp = 1 and
Given the current location (xt , yt ), the robot is considered to bc = 4.
have reached the goal location if and only if its distance to In our experiments, the RL algorithm solves a multi-task
the goal location is smaller than a threshold, dt < ds , where RL problem where the tasks are randomly sampled from
dt is the Euclidean distance between (xt , yt ) and (xg , yg ), a task distribution Te ∼ p(Te ). Here the task distribution
and ds is a constant threshold. p(Te ) := U ({ei }Ni=1 ) is a uniform distribution on a set of
N navigation environments {ei }N i=1 . The overall objective
C. POMDP Formulation of this multi-task RL problem P∞ tot find an optimal
is policy
π ∗ = maxπ ETe ∼p(Te ),τt ∼π

γ R e (st , at ) .
A navigation task Te can be formulated as a POMDP t=0

conditioned on a navigation environment e, which can be D. Navigation Environments

represented by a 7-tuple (Se , Ae , Oe , Te , γe , Re , Ze ) . In this The navigation is performed by a ClearPath Jackal
POMDP, the state st ∈ Se is a 5-tuple (xt , yt , θt , ct , e) with differential-drive ground robot in simulated by the Gazebo
xt , yt , θt the two-dimensional coordinates and the orientation simulator. More details of the robot and simulation can be
of the robot at time step t, ct a binary indicator of whether found on the project webpage. Each environment in this
a collision has occurred since the last time step t − 1, and e benchmark will have a navigation system navigating the
the navigation environment. The action at = (vt , ωt ) ∈ Ae robot through a 10m navigation path that passes through
is a two-dimensional continuous vector that encodes the a highly constrained obstacle course. Walls are placed at
robot’s linear and angular velocity. The observation ot = three edges of a square so that passing through the obstacle
(χt , x¯t , y¯t ) ∈ Oe is a 3-tuple composed of the sensory input field is the only path to the goal location (see Fig. 2).
χt from LiDAR scans and the relative goal position (x¯t , y¯t ) The benchmark includes 300 static environments, 100
in the robot frame. The observation model Z : S → O maps dynamic-box environments, and 100 dynamic-wall
the state to the observation. The reward function for this environments. The static environments contains a
POMDP is defined as follows: diverse set of obstacle course covering a large range of
Re (st , at ) = +bf · 1(dt < ds ) + bp · (dt−1 − dt ) − bc · ct , (1) difficulty levels from easy to hard. A dynamic-box
environment has small boxes with random shapes and
where 1(dt < ds ) is the indicator function of reaching velocities to test the system’s immediate reactions to
the goal location, dt is the Euclidean distance to the goal small moving obstacles. A dynamic-wall has two
location, and bf , bp , bc are the coefficient constants. In this walls moving oppositely that requires the system to make
reward function, the first term is the true reward function a longer-term decision of whether to pass or wait. The
that assigns a positive constant bf for the success of an detailed procedures of generating these environments can
and dynamic-wall-train from Sec. III-D. After train-
ing, the policies are tested in their corresponding test sets.
In addition, MLP with history length of one is added as a
memory-less baseline. Table I shows the success rates of
policies with different architectures and history lengths eval-
uated in static-test ( left), dynamic-wall-test
(middle) and dynamic-box-test respectively.
Memory-based NNs only marginally improve navi-
gation performance in static environments. In Table I,
the policy represented by Transformer with a history length
of 4 shows the best success rate of 68%, with a slightly
worse success rate of 65% achieved by the baseline MLP.
Additionally, a monotonic decrease in success rate with
increasing history length is observed in each tested NN
Fig. 3: Real-world benchmark-like (top-right), in-door
architecture. For example, a 32% drop in the success rate
highly-constrained (top-left), and large-scale (bottom) envi-
of Transformer is shown by increasing the history length
ronments. The yellow curves mark the paths of navigation.
from 4 to 8. One possible explanation is that, if only few
be found on the project webpage. We randomly select 50 past observations are useful to make the decision, including
environments from each type as the test sets, which are more history will make it more difficult to learn a generalized
denoted as static-test, dynamic-box-test, and policy in this very diverse training set.
dynamic-wall-test. The remaining environments are Memory is essential when possible catastrophic failures
denoted as static-train, dynamic-box-train, will happen by making the wrong long-term decisions.
and dynamic-wall-train respectively. To study Memory usually matters for dynamic environments when a
the effect of randomization, static-train is further single time frame is not sufficient to estimate the motion
separated as static-train-5, static-train-10, of obstacles. Surprisingly, in dynamic-box where the
static-train-50, static-train-100, and dynamic obstacles are completely random, the memory-
static-train-250 by randomly sampling 5, 10, based NN architectures do not outperform the memory-less
50, 100, and all 250 environments from static-train. baseline. On the other hand, in dynamic-wall with a
To test the sim-to-real transferability of the policies learn- manually designed dynamic challenge, the best success rate
ing with different techniques, the navigation systems are of 82% is observed in GRU with a history length of 4,
deployed in three qualitatively different static navigation which improves about 15% over the non-memory baseline.
environments including a benchmark-like environment (Fig. During our deployment of the policies, we observe that, in
3 left), an indoor highly-constrained environment (Fig. 3 dynamic-box even though the memory-less agent does not
right), and a large-scale environment of 30 meters in length. estimate the motion and adjust its plan in advance, it tends
We denote them as real-world-1, real-world-2, and to perform safely and avoids the obstacles when they get
real-world-3 respectively. close enough. This simple strategy works surprisingly well
and achieves similar success rate as the memory-based poli-
IV. E XPERIMENTS cies. However, this strategy does not work in the manually
designed dynamic challenges like dynamic-wall where
In this section, we present experimental results of each the agent has to estimate the motion of the obstacles to pass
studied technique to achieve the proposed desiderata in Sec. safely.
II. We implement distributed training pipelines (similar to
[21]) of different RL algorithms including TD3 [32], SAC B. Safe RL (D2)
[33], and DDPG [34]. They perform similarly in the study To investigate to what extent safe RL methods can help
of different neural network architectures. For simplicity, all to improve safety, a TD3 agent with the Lagrangian-based
the experiments mentioned in this section use TD3 combined safe RL method is trained in static-train-50, and then
with the corresponding techniques, and all the data points are tested in static-test. The policy is represented by a
averaged over three independent runs. MLP with its input containing only one history length. Table
II shows the success rate, average survival time, and average
A. Memory-based Neural Network Architectures (D1) traversal time of the safe RL agent trained with Lagrangian
To benchmark the performance of different neural network method and a baseline MLP agent tested in static-test.
(NN) architectures, deep RL policies represented by archi- We define survival time as the time cost of an unsuccessful
tectures of Multilayer Perceptron (MLP), One-dimensional episode (collision or exceeding a time limit of 80s). Traversal
Convolutional Neural Network (CNN), Gated Recurrent time, instead, is the time cost of a successful episode. With
Units (GRU), and Transformer with history length of 4 the same level of success rate, a longer survival time means
and 8 are trained in static-train-50, and the two that the agent tends to, at least, avoid collisions if it cannot
types of dynamic environments dynamic-box-train succeed. To compare the safe RL method with classical
static env. dynamic-box env. dynamic-wall env.
Success rate (%) (↑)
H=1 H=4 H=8 H=1 H=4 H=8 H=1 H=4 H=8
MLP 65 ± 4 57 ± 7 42 ± 2 50 ± 5 35 ± 2 46 ± 3 67 ± 7 72 ± 1 69 ± 4
GRU - 51 ± 2 43 ± 4 - 48 ± 4 45 ± 1 - 82 ± 4 78 ± 5
CNN - 55 ± 4 45 ± 5 - 42 ± 5 40 ± 1 - 63 ± 3 43 ± 3
Transformer - 68 ± 2 46 ± 3 - 52 ± 1 44 ± 4 - 33 ± 28 15 ± 13

TABLE I: (D1) Success rate (%) (↑) of policies trained with different neural network architectures and history lengths. H
is the history length of the memory. Bold font indicates the best success rate for each type of environment.
Methods Baseline (model-free) Lagrangian method MPC (model-based) DWA TEB
Success rate (%) (↑) 65 ± 4 74 ± 2 70 ± 3 82 70
Survival time (s) (↑) 8.0 ± 1.5 16.2 ± 2.5 55.7 ± 4.9 62.7 26.9
Traversal time (s) (↓) 7.5 ± 0.3 8.6 ± 0.2 24.7 ± 2.0 35.6 26.9

TABLE II: (D2) Success rate (↑), survival time (↑), and traversal time (↓) of policies trained with Lagrangian method, MPC
with probabilistic transition model, and DWA. The bold font indicates the best number achieved for each type of metric.

navigation systems which are believed to have better safety, the model compared to the Dyna-style method, which leads
we also add evaluation metrics from a classical navigation to much worse asymptotic performance (about 20% success
stack with the Dynamic Window Approach (DWA) [2] local rate in the end).
planner. Model-based methods with probabilistic dynamic mod-
Lagrangian method reduces the gap between training els improve the asymptotic performance. In the last
and test environments. When deployed in the training column of Table IV, both Dyna-style and MPC with proba-
environments, both the baseline MLP and the safe RL bilistic dynamic models achieve slightly better success rates
method achieves about 80% success rate. However, in the test of 70% compared to 65% in the baseline MLP method
environments, the Lagrangian method has a better success when sufficient transition samples of 2000k are given to the
rate of 74% compare to 65% by the baseline MLP. We learning agent.
hypothesize that the safety constraint applied by the safe The MPC policy performs conservatively when de-
RL methods forms a way of regularization, and therefore, ployed in unseen test environments and shows a better
improves the generalization to unseen environments. safety performance. The safety performances of MPC poli-
Lagrangian method increases the average survival time cies with probabilistic dynamic models are also tested (see
in failed episodes. As expected, the Lagrangian method Table II). We observe that the agents with MPC policies
increases the average survival time by 8.2s compared to navigate very conservatively with an average traversal time of
the baseline MLP at a cost of 1.1s longer average traversal 24.7s, which is about two times more than the MLP baseline.
time. However, such improved safety are still worse than the In the meantime, MPC policies achieve improved safety with
classical navigation systems given the best survival time of the best survival time of 55.7s among the RL-based methods.
88.6s achieved by DWA.
D. Domain Randomization (D4)
C. Model-based RL (D2 and D3) To explore how model generalization depends on the
To explore how the model-based approaches help with degree of randomness in the training environments, baseline
the autonomous navigation tasks, we implement Dyna-style, MLP policies with one history length are trained in the
MPC, and MBPO, and evaluate the methods in static en- environment sets with 5, 10, 50, 100, and 250 training
vironments. The transition models are either represented by environments. The trained policies are tested in the same
a deterministic NN or a probabilistic NN that predicts the static-test. To investigate the performance gap be-
mean and variance of the next state. During the training tween training and test, the policies trained with 50, 100, and
in static-train-50, the policies are saved when 100k, 250 environments are also tested on static-train-50,
500k and 2000k transition samples are collected, then tested which is part of their training sets. Fig 4 shows the success
in static-test. The success rates of these policies are rate of policies trained with different number of training
reported in Table IV. environments.
Model-based methods do not improve sample effi- The generalization to unseen environments improves
ciency. As shown in the second and third columns in Table with increasing number of training environments. As
IV, better success rates of 13% and 58% are achieved by the shown in Fig. 4, the performances on the unseen test en-
baseline MLP method provided by limited 100k and 500k vironments monotonously increase from 43% to 74% with
transition samples respectively. In addition, Higher success the number of training environments increasing from 5 to
rates at 500k transition samples are observed in probabilistic 250. Moreover, the gaps between training and test environ-
models compared to their deterministic counterparts, which ments gradually shrink by adding more training environments
indicates a more efficient learning with probabilistic transi- provided by that the polices are robust enough to maintain
tion models. Notice that MBPO exploits more heavily on similar performances of about 80% on the training environ-
real-world-1 real-world-2 real-world-3
H # envs
traversal time (↓) (# successful trials (↑) / total # trials)
MLP 1 50 6.9 (1/3) 10.6 (1/3) N (0/3)
MLP 1 250 4.6 ± 0.8 (3/3) 6.6 ± 0.6 (3/3) 22.6 ± 0.5 (3/3)
Transformer 4 50 6.1 ± 0.4 (3/3) 6.1 ± 0.1 (2/3) 20.5 ± 2 (2/3)
Lagrangian 1 50 4.4 ± 0.6 (3/3) 7.1 ± 0.1 (2/3) 26.2 (1/3)
MPC 1 50 13.2 ± 0.7 (3/3) 24.8 ± 3.7 (3/3) N (0/3)
DWA - - 16.2 ± 0.7 (3/3) 35.2 ± 8.2 (2/3) 66.9 ± 0.6 (3/3)

TABLE III: Physical experiments. The table shows the traversal time (s) (↓) and the
Fig. 4: (D4) Success rate (%) of number of successful trials (↑) of 5 RL-based navigation systems and a classical
policies trained with different num- navigation system (DWA) evaluated in three real-world environments. The bold font
ber of training environments. indicates the best traversal time when all three trials are successful.
ments. by the desiderata as follows:
(D1) reasoning under uncertainty of partially observed
Transition samples 100k 500k 2000k sensory inputs does not obviously benefit from adding mem-
MLP 13 ± 7 58 ± 2 65 ± 4 ory in simulated static environments and very random dy-
Dyna-style deterministic 8±2 30 ± 10 66 ± 5 namic (dynamic-box) environments, but much more sig-
MPC deterministic 0±0 21 ± 10 62 ± 3
Dyna-style probabilistic 0±0 48 ± 4 70 ± 1
nificant improvements were observed in the real world and in
MPC probabilistic 0±0 45 ± 4 70 ± 3 more challenging dynamic environments (dynamic-wall).
MBPO 0±0 0±0 21.9 ± 3 (D2) safety is improved by both safe RL and model-
based MPC methods. However, classical navigation systems
TABLE IV: (D3) Success rate (%) (↑) of policies trained still achieve the best safety performance at a cost of very
with different model-based methods and different number of long traversal time. Whether RL-based navigation systems
transition samples. The bold font indicates the best success can achieve similar safety guarantees as classical navigation
rate for each number of transition samples. systems and whether safety can be improved without signif-
E. Physical experiments icantly sacrificing the traversal time are still open questions.
To study the consistency of the above observations in (D3) the ability to learn from limited trial-and-error
simulation and the real world, we deploy one baseline MLP data is not improved by the evaluated model-based methods.
policy, one best policy for each studied desideratum, and one Currently, we observe that model-based RL methods indeed
classical navigation system (DWA [2]) in the three real-world improve sample-efficiency, but only when the number of
environments introduced in Sec. III-D. Each deployment is imaginary rollouts from the learned model is large (e.g.
repeated three times, and the average traversal time and ≥ 2000k) and when they are sampled with randomness.
the number of successful trials are reported in Table. III. We therefore hypothesize that the improvement comes from
Even though the best memory-based policy, transformer the robustness brought by learning on more data sampled
architecture with 4 history length, was only marginally from the learned model. Hence, this result motivates not
better than the baseline MLP in simulation, in the real only more accurate model learning for reducing the number
world it can navigate very smoothly and fails only once of imaginary rollouts, but also theoretical understanding of
in real-world-2 and real-world-3, while baseline how the model helps improve the robustness or even safety
MLP fails most of the trials in all the environments including of navigation.
the benchmark-like environment. One possible reason for (D4) the generalization to diverse and novel envi-
this is that simulations are typically more predictable than ronments is improved by increasing the randomness of
the real world. Therefore, it is particularly important to use training environments. However, a noticeable gap of about
historical data in the real world to estimate the environment 5% between training and test environments is not eliminated
and current states of the robot. Similarly, MLP policy trained by further increasing the number of training environments to
with 250 environments can successfully navigate in all 250. This reflects the limitation of simple domain randomiza-
the environments without any failures, while baseline MLP tion to increase the generalization, which is, however, widely
trained with 50 environments fails most of the trials. Safe RL used by the community.
improves the chances of success in all the environments and In summary, although the proposed benchmark is not
can navigate more safely by performing backups and small intended to represent every real-world navigation scenario, it
adjustments of robots’ poses. Similar to the simulation, MPC serves as a simple yet comprehensive testbed for RL-based
navigates very conservatively and succeeds in all the trials in navigation methods. We observed that for every desideratum,
real-world-1 and real-world-2, but has much more no method can achieve 100% success rate on all training
difficulty generalizing to large-scale real-world-3. environments. Even though we ensured that we have made
sure that every environment is indeed individually solvable.
V. C ONCLUSION This alone indicates that there exists an optimization and
In this section, we discuss the conclusions we draw from generalization challenge when we have a large number of
these benchmark experiments. We organize these conclusions training environments as in our proposed benchmark.
R EFERENCES [22] X. Xiao, Z. Wang, Z. Xu, B. Liu, G. Warnell, G. Dhamankar, A. Nair,
and P. Stone, “Appl: Adaptive planner parameter learning,” Robotics
[1] S. Quinlan and O. Khatib, “Elastic bands: Connecting path planning and Autonomous Systems, vol. 154, p. 104132, 2022.
and control,” in [1993] Proceedings IEEE International Conference [23] E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for
on Robotics and Automation. IEEE, 1993, pp. 802–807. model-based control,” in 2012 IEEE/RSJ International Conference on
[2] D. Fox, W. Burgard, and S. Thrun, “The dynamic window approach to Intelligent Robots and Systems, 2012, pp. 5026–5033.
collision avoidance,” IEEE Robotics & Automation Magazine, vol. 4, [24] M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling, “The arcade
no. 1, pp. 23–33, 1997. learning environment: An evaluation platform for general agents,”
Journal of Artificial Intelligence Research, vol. 47, pp. 253–279, jun
[3] X. Xiao, Z. Xu, Z. Wang, Y. Song, G. Warnell, P. Stone, T. Zhang,
2013.
S. Ravi, G. Wang, H. Karnan et al., “Autonomous ground navigation
[25] H.-T. L. Chiang, A. Faust, M. Fiser, and A. Francis, “Learning
in highly constrained spaces: Lessons learned from the barn challenge
navigation behaviors end-to-end with autorl,” IEEE Robotics and
at icra 2022,” arXiv preprint arXiv:2208.10473, 2022.
Automation Letters, vol. 4, pp. 2007–2014, 2019.
[4] X. Xiao, B. Liu, G. Warnell, and P. Stone, “Motion planning and
[26] A. Wahid, A. Toshev, M. Fiser, and T.-W. E. Lee, “Long range neural
control for mobile robot navigation using machine learning: a survey,”
navigation policies for the real world,” 2019 IEEE/RSJ International
Autonomous Robots, pp. 1–29, 2022.
Conference on Intelligent Robots and Systems (IROS), pp. 82–89,
[5] Y. Chow, O. Nachum, A. Faust, M. Ghavamzadeh, and E. A. Duéñez- 2019.
Guzmán, “Lyapunov-based safe policy optimization for continuous [27] N. Koenig and A. Howard, “Design and use paradigms for gazebo, an
control,” CoRR, vol. abs/1901.10031, 2019. [Online]. Available: open-source multi-robot simulator,” in 2004 IEEE/RSJ International
http://arxiv.org/abs/1901.10031 Conference on Intelligent Robots and Systems (IROS)(IEEE Cat. No.
[6] G. Thomas, Y. Luo, and T. Ma, “Safe reinforcement learning by 04CH37566), vol. 3. IEEE, 2004, pp. 2149–2154.
imagining the near future,” 2022. [28] Y. Zhu, R. Mottaghi, E. Kolve, J. J. Lim, A. Gupta, L. Fei-Fei, and
[7] E. Rodrı́guez-Seda, D. Stipanovic, and M. Spong, “Lyapunov-based A. Farhadi, “Target-driven visual navigation in indoor scenes using
cooperative avoidance control for multiple lagrangian systems with deep reinforcement learning,” in 2017 IEEE international conference
bounded sensing uncertainties,” in 2011 50th IEEE Conference on on robotics and automation (ICRA). IEEE, 2017, pp. 3357–3364.
Decision and Control and European Control Conference, CDC-ECC [29] L. Harries, S. Lee, J. Rzepecki, K. Hofmann, and S. Devlin, “Mazeex-
2011, ser. Proceedings of the IEEE Conference on Decision and plorer: A customisable 3d benchmark for assessing generalisation in
Control, Dec. 2011, pp. 4207–4213, 2011 50th IEEE Conference on reinforcement learning,” in 2019 IEEE Conference on Games (CoG).
Decision and Control and European Control Conference, CDC-ECC IEEE, 2019, pp. 1–4.
2011 ; Conference date: 12-12-2011 Through 15-12-2011. [30] Stanford Artificial Intelligence Laboratory et al., “Robotic operating
[8] K. Cobbe, O. Klimov, C. Hesse, T. Kim, and J. Schulman, “Quanti- system.” [Online]. Available: https://www.ros.org
fying generalization in reinforcement learning,” in ICML, 2019. [31] F. Xia, W. B. Shen, C. Li, P. Kasimbeg, M. E. Tchapmi, A. Toshev,
[9] K. Cobbe, C. Hesse, J. Hilton, and J. Schulman, “Leveraging proce- R. Martı́n-Martı́n, and S. Savarese, “Interactive gibson benchmark: A
dural generation to benchmark reinforcement learning,” arXiv preprint benchmark for interactive navigation in cluttered environments,” IEEE
arXiv:1912.01588, 2019. Robotics and Automation Letters, vol. 5, no. 2, pp. 713–720, 2020.
[10] N. Justesen, R. R. Torrado, P. Bontrager, A. Khalifa, J. Togelius, and [32] S. Fujimoto, H. van Hoof, and D. Meger, “Addressing function
S. Risi, “Illuminating generalization in deep reinforcement learning approximation error in actor-critic methods,” 2018.
through procedural level generation,” arXiv: Learning, 2018. [33] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-
[11] J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, policy maximum entropy deep reinforcement learning with a stochastic
“Domain randomization for transferring deep neural networks from actor,” in International conference on machine learning. PMLR,
simulation to the real world,” in 2017 IEEE/RSJ international con- 2018, pp. 1861–1870.
ference on intelligent robots and systems (IROS). IEEE, 2017, pp. [34] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa,
23–30. D. Silver, and D. Wierstra, “Continuous control with deep reinforce-
[12] R. S. Sutton, “Dyna, an integrated architecture for learning, planning, ment learning,” arXiv preprint arXiv:1509.02971, 2015.
and reacting,” SIGART Bull., vol. 2, no. 4, p. 160–163, jul 1991.
[Online]. Available: https://doi.org/10.1145/122344.122377
[13] A. Nagabandi, G. Kahn, R. S. Fearing, and S. Levine, “Neural network
dynamics for model-based deep reinforcement learning with model-
free fine-tuning,” in 2018 IEEE International Conference on Robotics
and Automation (ICRA). IEEE, 2018, pp. 7559–7566.
[14] M. J. Hausknecht and P. Stone, “Deep recurrent q-learning for partially
observable mdps,” in AAAI Fall Symposia, 2015.
[15] D. Wierstra, A. Förster, J. Peters, and J. Schmidhuber, “Solving deep
memory pomdps with recurrent policy gradients,” in ICANN, 2007.
[16] K. Chua, R. Calandra, R. McAllister, and S. Levine, “Deep rein-
forcement learning in a handful of trials using probabilistic dynamics
models,” Advances in neural information processing systems, vol. 31,
2018.
[17] H.-T. L. Chiang, A. Faust, M. Fiser, and A. Francis, “Learning
navigation behaviors end-to-end with autorl,” IEEE Robotics and
Automation Letters, vol. 4, no. 2, pp. 2007–2014, 2019.
[18] X. Xiao, B. Liu, G. Warnell, J. Fink, and P. Stone, “Appld: Adaptive
planner parameter learning from demonstration,” IEEE Robotics and
Automation Letters, vol. 5, no. 3, pp. 4541–4547, 2020.
[19] Z. Wang, X. Xiao, B. Liu, G. Warnell, and P. Stone, “APPLI:
Adaptive planner parameter learning from interventions,” in 2021
IEEE International Conference on Robotics and Automation (ICRA).
IEEE, 2021.
[20] Z. Wang, X. Xiao, G. Warnell, and P. Stone, “Apple: Adaptive planner
parameter learning from evaluative feedback,” IEEE Robotics and
Automation Letters, vol. 6, no. 4, pp. 7744–7749, 2021.
[21] Z. Xu, G. Dhamankar, A. Nair, X. Xiao, G. Warnell, B. Liu, Z. Wang,
and P. Stone, “APPLR: Adaptive planner parameter learning from
reinforcement,” in 2021 IEEE International Conference on Robotics
and Automation (ICRA). IEEE, 2021.

2023 Dynamics RL
No ratings yet
2023 Dynamics RL
20 pages
Advancements and Challenges in Mobile Robot Naviga
No ratings yet
Advancements and Challenges in Mobile Robot Naviga
25 pages
CityWalker - Learning Embodied Urban Navigation From Web-Scale Videos
No ratings yet
CityWalker - Learning Embodied Urban Navigation From Web-Scale Videos
14 pages
Reinforcement Learning-Based Mobile Robot Navigation
No ratings yet
Reinforcement Learning-Based Mobile Robot Navigation
22 pages
Report ML Aat g1 Final
No ratings yet
Report ML Aat g1 Final
8 pages
Cooling Tower Motor Type
No ratings yet
Cooling Tower Motor Type
1 page
Wheeled Robots Paper-Final VersionB
No ratings yet
Wheeled Robots Paper-Final VersionB
27 pages
UGV Navigation Optimization Aided by Reinforcement Learning-Based Path Tracking
No ratings yet
UGV Navigation Optimization Aided by Reinforcement Learning-Based Path Tracking
12 pages
Impact of RL in Robot Control
No ratings yet
Impact of RL in Robot Control
20 pages
Stable Training Via Elastic Ad
No ratings yet
Stable Training Via Elastic Ad
9 pages
Systematic Comparison of Path Planning Algorithms
No ratings yet
Systematic Comparison of Path Planning Algorithms
23 pages
Virtual Testing and Policy Deployment Framework For Autonomous Navigation of An Unmanned Ground Vehicle Using Reinforcement Learning
No ratings yet
Virtual Testing and Policy Deployment Framework For Autonomous Navigation of An Unmanned Ground Vehicle Using Reinforcement Learning
6 pages
Reinforcement Learning For Robotics Advance
No ratings yet
Reinforcement Learning For Robotics Advance
2 pages
83%visual Navigation in Real-World Indoor Environments Using End-to-End Deep Reinforcement Learning
No ratings yet
83%visual Navigation in Real-World Indoor Environments Using End-to-End Deep Reinforcement Learning
8 pages
Learn To Navigate in Dynamic Environments With Normalized LiDAR Scans
No ratings yet
Learn To Navigate in Dynamic Environments With Normalized LiDAR Scans
8 pages
A Data-Driven Path Planner For Small Autonomous Robots Using Deep Regression Models.
No ratings yet
A Data-Driven Path Planner For Small Autonomous Robots Using Deep Regression Models.
9 pages
Intelligent Unmanned Ground Vehicles Autonomous Navigation Research at Carnegie Mellon
No ratings yet
Intelligent Unmanned Ground Vehicles Autonomous Navigation Research at Carnegie Mellon
314 pages
96%-1-Deep-Reinforcement-Learning-Based Semantic Navigation of Mobile Robots in Dynamic Environments
No ratings yet
96%-1-Deep-Reinforcement-Learning-Based Semantic Navigation of Mobile Robots in Dynamic Environments
6 pages
97%autonomous Navigation of A Mobile Robot With A Monocular Camera Using Deep Reinforcement Learning and Semantic Image Segmentation
No ratings yet
97%autonomous Navigation of A Mobile Robot With A Monocular Camera Using Deep Reinforcement Learning and Semantic Image Segmentation
6 pages
Collision Avoidance Using RL
No ratings yet
Collision Avoidance Using RL
19 pages
Efficient Autonomous Navigation For Mobile Robots Using Machine Learning
No ratings yet
Efficient Autonomous Navigation For Mobile Robots Using Machine Learning
11 pages
PAS Report 556
No ratings yet
PAS Report 556
264 pages
Design and Experimental Validation of Deep Reinforcement Learning-Based Fast Trajectory Planning and Control For Mobile Robot in Unknown Environment
No ratings yet
Design and Experimental Validation of Deep Reinforcement Learning-Based Fast Trajectory Planning and Control For Mobile Robot in Unknown Environment
15 pages
Guastella 和 Muscato - 2020 - Learning-Based Methods of Perception and Navigatio
No ratings yet
Guastella 和 Muscato - 2020 - Learning-Based Methods of Perception and Navigatio
22 pages
Deep Reinforcement Learning For Autonomous Driving A Survey
No ratings yet
Deep Reinforcement Learning For Autonomous Driving A Survey
18 pages
Taxi Reimbursement Request Form 07.31.24 - 0
No ratings yet
Taxi Reimbursement Request Form 07.31.24 - 0
2 pages
Evaluation of Deep Reinforcement Learning Algorithms For Autonomous Driving
No ratings yet
Evaluation of Deep Reinforcement Learning Algorithms For Autonomous Driving
7 pages
Motion Planning and Control For Mobile Robot Navigation Using
No ratings yet
Motion Planning and Control For Mobile Robot Navigation Using
29 pages
Developing Path Planning With Behavioral Cloning and Proximal Policy Optimization For Path-Tracking and Static Obstacle Nudging
No ratings yet
Developing Path Planning With Behavioral Cloning and Proximal Policy Optimization For Path-Tracking and Static Obstacle Nudging
6 pages
ROS2 - Powered Autonomous Navigation For TurtleBot3 Integrating Nav2 Stack in Gazebo RViz and Real-World Environments
No ratings yet
ROS2 - Powered Autonomous Navigation For TurtleBot3 Integrating Nav2 Stack in Gazebo RViz and Real-World Environments
6 pages
Improving Deep Reinforcement L
No ratings yet
Improving Deep Reinforcement L
9 pages
Adaptive Navigation in Collaborative Robots A Rein
No ratings yet
Adaptive Navigation in Collaborative Robots A Rein
20 pages
Slidesgo Unlocking The Future The Impact of Ai and Machine Learning Technology 20241129165854RC3Y
No ratings yet
Slidesgo Unlocking The Future The Impact of Ai and Machine Learning Technology 20241129165854RC3Y
11 pages
Generalized Visual Path Following On Jetbot Using Normalization With Reinforcement Learning
No ratings yet
Generalized Visual Path Following On Jetbot Using Normalization With Reinforcement Learning
6 pages
Bell, SOME EXPERIMENTS IN DIAGNOSTIC TEACHING
No ratings yet
Bell, SOME EXPERIMENTS IN DIAGNOSTIC TEACHING
23 pages
Paper 103-Navigation of Autonomous Vehicles Using Reinforcement Learning
No ratings yet
Paper 103-Navigation of Autonomous Vehicles Using Reinforcement Learning
6 pages
LM-Nav - Robotic Navigation With Large Pre-Trained Models of Language, Vision, and Action
No ratings yet
LM-Nav - Robotic Navigation With Large Pre-Trained Models of Language, Vision, and Action
18 pages
Art and Technology in Poland Ed. Agnieszka Jelewska
No ratings yet
Art and Technology in Poland Ed. Agnieszka Jelewska
258 pages
Pediatric Demyelinating Diseases of The Central Nervous System and Their Mimics
100% (1)
Pediatric Demyelinating Diseases of The Central Nervous System and Their Mimics
338 pages
2.'continuous Control With Deep Reinforcement Learning
No ratings yet
2.'continuous Control With Deep Reinforcement Learning
16 pages
Prewedding Catalog 2023
No ratings yet
Prewedding Catalog 2023
8 pages
Instruction Manual
No ratings yet
Instruction Manual
2 pages
Combining Learning-Based Locomotion Policy With Model-Based Manipulation For Legged Mobile Manipulators
No ratings yet
Combining Learning-Based Locomotion Policy With Model-Based Manipulation For Legged Mobile Manipulators
8 pages
PGP Report Sachin t22060
No ratings yet
PGP Report Sachin t22060
20 pages
SocProS 2018 Paper ID 35
No ratings yet
SocProS 2018 Paper ID 35
13 pages
Ibarz Et Al 2021 How To Train Your Robot With Deep Reinforcement Learning Lessons We Have Learned
No ratings yet
Ibarz Et Al 2021 How To Train Your Robot With Deep Reinforcement Learning Lessons We Have Learned
24 pages
Literature Review Last Edit
No ratings yet
Literature Review Last Edit
11 pages
Subtitle
No ratings yet
Subtitle
4 pages
Computation 12 00116
No ratings yet
Computation 12 00116
17 pages
Schwarke Et Al. - 2023 - Curiosity-Driven Learning of Joint Locomotion and
No ratings yet
Schwarke Et Al. - 2023 - Curiosity-Driven Learning of Joint Locomotion and
17 pages
Deep Reinforcement Learning With Heuristic Correct
No ratings yet
Deep Reinforcement Learning With Heuristic Correct
18 pages
A CMOS Self-Regulating VCO With Low Supply Sensitivity 4
No ratings yet
A CMOS Self-Regulating VCO With Low Supply Sensitivity 4
7 pages
The Actor-Dueling-Critic Method
No ratings yet
The Actor-Dueling-Critic Method
20 pages
Navigation of Mobile Robots Based On Deep Reinforcement Learning: Re-Ward Function Optimization and Knowledge Transfer
No ratings yet
Navigation of Mobile Robots Based On Deep Reinforcement Learning: Re-Ward Function Optimization and Knowledge Transfer
12 pages
Paper 3 Mlis
No ratings yet
Paper 3 Mlis
9 pages
Vision-Based Mobile Robotics Obstacle Avoidance With Deep Reinforcement Learning
No ratings yet
Vision-Based Mobile Robotics Obstacle Avoidance With Deep Reinforcement Learning
7 pages
A Neural Network-Based Navigation Approach
No ratings yet
A Neural Network-Based Navigation Approach
17 pages
High-Speed Robot Navigation Using Predicted Occupancy Maps
No ratings yet
High-Speed Robot Navigation Using Predicted Occupancy Maps
7 pages
Peerj Cs 556
No ratings yet
Peerj Cs 556
25 pages
Deep Reinforcement Learning For Autonomous Driving A Survey
No ratings yet
Deep Reinforcement Learning For Autonomous Driving A Survey
18 pages
10、《Let Hybrid a Path Planner Obey Traffic Rules a Deep Reinforcement Learning-Based Planning Framework》
No ratings yet
10、《Let Hybrid a Path Planner Obey Traffic Rules a Deep Reinforcement Learning-Based Planning Framework》
8 pages
Behavior-Aware Robot Navigation With DRL
No ratings yet
Behavior-Aware Robot Navigation With DRL
8 pages
Insurance Awareness Handouts - Basics of Insurance
No ratings yet
Insurance Awareness Handouts - Basics of Insurance
8 pages
Learning Applied to Ground Vehicles: Enhancing Ground Vehicle Performance through Computer Vision Learning
From Everand
Learning Applied to Ground Vehicles: Enhancing Ground Vehicle Performance through Computer Vision Learning
Fouad Sabry
No ratings yet
Soumya Ranjan Dash - Es20913
No ratings yet
Soumya Ranjan Dash - Es20913
1 page
Deep Reinforcement Learning Based Mobile Robot Navigation A Review
No ratings yet
Deep Reinforcement Learning Based Mobile Robot Navigation A Review
18 pages
0504 Learning Robust Driving Policies Without Online Exploration
No ratings yet
0504 Learning Robust Driving Policies Without Online Exploration
8 pages
FP 3
No ratings yet
FP 3
7 pages
06 - Class 06 - Trade Setups
No ratings yet
06 - Class 06 - Trade Setups
12 pages
Gsu100 6648-0.0
No ratings yet
Gsu100 6648-0.0
16 pages
Learning For Autonomous Vehicles: A Focus On Expert Demonstration
No ratings yet
Learning For Autonomous Vehicles: A Focus On Expert Demonstration
26 pages
Mobile Robot Path Planning in Dynamic Environments Through Globally Guided Reinforcement Learning
No ratings yet
Mobile Robot Path Planning in Dynamic Environments Through Globally Guided Reinforcement Learning
8 pages
Autonomous Driving With Deep Reinforcement Learning in CARLA Simulation
No ratings yet
Autonomous Driving With Deep Reinforcement Learning in CARLA Simulation
7 pages
Learning For Autonomous Vehicles: A Focus On Expert Demonstration
No ratings yet
Learning For Autonomous Vehicles: A Focus On Expert Demonstration
24 pages
Mobilink Packages FF
No ratings yet
Mobilink Packages FF
6 pages
Reinforcement Learning Based Approach For Mobile Robot Navigation
No ratings yet
Reinforcement Learning Based Approach For Mobile Robot Navigation
4 pages
Week 5 MODULE PURPOSIVE COMMUNICATION
No ratings yet
Week 5 MODULE PURPOSIVE COMMUNICATION
13 pages
Graphing Motion
No ratings yet
Graphing Motion
30 pages
Construction of Anganwadi Centres: Madhya Pradesh
No ratings yet
Construction of Anganwadi Centres: Madhya Pradesh
4 pages
Inspection Preparation For Ships
No ratings yet
Inspection Preparation For Ships
3 pages
Research Thesis
No ratings yet
Research Thesis
6 pages
Adult Christian Education: A Training of Kingdom Workers
No ratings yet
Adult Christian Education: A Training of Kingdom Workers
9 pages
Library Cataloger General Responsibilities
No ratings yet
Library Cataloger General Responsibilities
2 pages
Murabahah and Murabahah For Purchase Orderer: Islamic Financial Transactions
No ratings yet
Murabahah and Murabahah For Purchase Orderer: Islamic Financial Transactions
14 pages
Standard Requirements For Tourist Land, Water &
100% (1)
Standard Requirements For Tourist Land, Water &
29 pages
Neural Networks Based Reinforcement Learning For Mobile Robots Obstacle Avoidance
No ratings yet
Neural Networks Based Reinforcement Learning For Mobile Robots Obstacle Avoidance
12 pages
Don Mariano Marcos Memorial State University College of Graduate Studies
No ratings yet
Don Mariano Marcos Memorial State University College of Graduate Studies
4 pages
14 Hes
No ratings yet
14 Hes
2 pages
Data Sheet - Carrier Chiller
No ratings yet
Data Sheet - Carrier Chiller
4 pages
Sample Study Matter JEE (Advanced) PDF
100% (1)
Sample Study Matter JEE (Advanced) PDF
89 pages

Benchmarking Reinforcement Learning Techniques For Autonomous Navigation

Uploaded by

Benchmarking Reinforcement Learning Techniques For Autonomous Navigation

Uploaded by

In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA 2023),

London, England, May 2023

Benchmarking Reinforcement Learning Techniques for Autonomous Navigation

conditioned on a navigation environment e, which can be D. Navigation Environments

You might also like