Zero-Shot Terrain Generalization For Visual Locomotion Policies

Uploaded by

qinyupan29

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views7 pages

Zero-Shot Terrain Generalization For Visual Locomotion Policies

Uploaded by

qinyupan29

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Zero-Shot Terrain Generalization for Visual Locomotion Policies

Alejandro Escontrela1,2 , George Yu1 , Peng Xu1 , Atil Iscen1 , Jie Tan1

Abstract— Legged robots have unparalleled mobility on un-

structured terrains. However, it remains an open challenge
to design locomotion controllers that can operate in a large
variety of environments. In this paper, we address this chal-
lenge of automatically learning locomotion controllers that can
generalize to a diverse collection of terrains often encountered
arXiv:2011.05513v1 [cs.RO] 11 Nov 2020

in the real world. We frame this challenge as a multi-task

reinforcement learning problem and define each task as a
type of terrain that the robot needs to traverse. We propose
an end-to-end learning approach that makes direct use of (a) Office with humans in motion (b) Office 1
the raw exteroceptive inputs gathered from a simulated 3D
LiDAR sensor, thus circumventing the need for ground-truth
heightmaps or preprocessing of perception information. As
a result, the learned controller demonstrates excellent zero-
shot generalization capabilities and can navigate 13 different
environments, including stairs, rugged land, cluttered offices,
and indoor spaces with humans.

I. INTRODUCTION
(c) Office 2 (d) Hilly
The ability to traverse unstructured terrains make legged
robots an appealing solution to a wide variety of tasks,
including disaster relief, last-mile delivery, industrial inspec-
tion, and planetary exploration [1], [2]. To deploy robots
in these settings successfully, we must design controllers
that work well across many different terrains. Due to the
diversity of environments that a legged robot can operate
in, hand-engineering such a controller presents unique chal-
lenges. Deep Reinforcement Learning (DRL) has proven (e) Mountainous (f) Maze
itself capable of automatically acquiring control policies to Fig. 1. A Laikago robot navigating a variety of complex terrains not
accomplish a large variety of challenging locomotion tasks. encountered during training.
However, many of these approaches learn control policies
that succeed in a single type of terrain with limited variations.
This approach limits the robot’s ability to generalize to new prior works in the legged robot literature focused on blind
or unseen environments, which is a crucial feature of a useful walking, which does not involve exteroceptive sensors (e.g.,
locomotion controller. camera, LiDAR), we find that exteroceptive perception is
In this paper, we develop an end-to-end reinforcement essential for robots to navigate in diverse environments. Our
learning system that enables legged robots to traverse a end-to-end visual-locomotion policy takes both exterocep-
large variety of terrains. To facilitate learning generalizable tive (a LiDAR scan) and proprioceptive information of the
policies, we make two purposeful design decisions for our robot and outputs low-level motor commands. We embed
learning system. First, we formulate the problem as a Multi- the Policies Modulating Trajectory Generator (PMTG) [3]
Task Partially Observable Markov Decision Problem and framework into our policy architecture to generate cyclic and
show that the robot learns a robust policy that works well smooth actuation patterns, and to facilitate the learning of
across a wide variety of tasks (terrains). To this end, we robust locomotion policies.
develop a novel procedural terrain generation method, which We evaluate our learning system using a high-fidelity
can efficiently generate a large variety of terrains for training. physics simulator [4] and visually-realistic indoor scans [5]
Second, we design an end-to-end neural network architecture (Figure 1). We test the learned policy in thirteen different
that can handle both perception and locomotion. We call and realistic simulation environments (five training and eight
this parameterization a visual-locomotion policy. While many testing). Our system learns highly generalizable locomotion
policies, which demonstrate zero-shot generalization to un-
1 Google Brain Robotics, seen testing environments. We also show that our visual-
{georgeyu,pengxu,atil,jietan}@google.com
2 Georgia Institute of Technology, [email protected] locomotion policy’s parameterization is key to generaliza-
Work performed while Alejandro was an intern at Google Brain. tion and yields far better performance than commonly-used
reactive policies. This paper’s main contributions include an Ti : S × A × S → R+ is the transition probability function,
end-to-end visual-locomotion policy parameterization and a and R : S × A → R is the reward function. During training,
complete multi-task learning system, with which a quadruped the agent is presented with randomly sampled tasks Mi ∈ M
robot learns a single locomotion policy that can traverse a (Section III-B). The solution of the multi-task POMDP is
diverse set of terrains. a stochastic policy π : O × A → R+ that maximizes the
expected accumulated reward over the episode length T .
II. RELATED WORK " T #
A. Legged Locomotion
X
π ∗ = arg max E r(st , at )
π Mi ∈M
Locomotion controllers can be developed using trajectory t=0
optimization [6], whole-body control [7], model predictive Our problem is partially observable because of the limited
control [8], and state-machines [9]. While the controllers sensors onboard the robot1 . The robot is equipped with a
developed by these techniques can generalize to a certain LiDAR sensor to perceive the distances d to the surrounding
degree, expertise and manual tuning are often needed to adapt environment. Proprioceptive information comes from a simu-
them to different terrains. lated IMU sensor, which includes measurement of the roll φ,
In contrast, Deep Reinforcement Learning [10] can au- pitch θ, and the angular velocity of the torso β ω = (φ̇, θ̇, ψ̇),
tomatically learn agile and robust locomotion skills [11], and from motor encoders that measure the robot’s 12 joint
[12], [13], [14]. Prior work in RL has learned policies that angles q. The complete observation at timestep t is
are specific for a single environment [15], or generalize to
variations of a single type of terrain [16], [17], [18]. Recently, st = [aTt−1 , ot , sTTG , gd,t , gh,t ],
Lee et. al. [14] combined various techniques, such as Actu- where ot = [dTt ,β ωtT , q Tt , φt , θt , ] are the sensor obser-
atorNet [13], PMTG [3], curriculum learning and “learning vations, gd and gh are the distance and relative heading
by cheating” [19], which successfully performed zero-shot to the target, at−1 is the action at the last timestep, and
transfer from simulation to many challenging terrains in sTTG are the parameters of the trajectory generator (Section
the real world. While our paper’s high-level goal is similar III-C). Unlike some prior work in MTRL, where the task
to this prior work, our approach incorporates exteroceptive ID is part of the observation [22], [23], we purposefully
sensors that enable the robot to navigate in cluttered indoor choose not to leverage such information, because identifying
environments where blind walking may have difficulties. tasks automatically in the real world is challenging. Instead,
B. Multi-Task Reinforcement Learning we would like to train a policy that can rely on its own
perception input and demonstrates zero-shot generalization to
Multi-task reinforcement learning (MTRL) [20] is a new tasks, without knowing the task ID explicitly. In section
promising approach to train generalizable policies that can IV, we demonstrate our perception is crucial in learning
accomplish a wide variety of tasks. Hessel et. al. [21] learned policies which generalize well to new tasks. The output
a single policy that achieves state-of-the-art performance on action at of the policy specifies the desired joint angles,
57 Atari games. Yu et al. [22] evaluated the performance which are tracked by PD controllers by the simulated robot.
of various RL algorithms on a grasping and manipulation We employ a simple reward function, which encourages
benchmark and demonstrated that a single control policy is the agent to navigate to a target location g = (xg , yg , zg )
capable of completing a variety of complex robotic manip- (the red ball in Figure 1):
ulation tasks. In this paper, we apply MTRL to develop a
learning system for locomotion that enables legged robots to gd,t − gd,t−1
rt = ,
navigate in a large variety of environments. ∆t
where gd,t is the Euclidean distance from the robot to the
III. METHODS target location at timestep t, and ∆t is the timestep duration.
In this work, we frame legged locomotion as a multi- This reward can be interpreted as the speed that the robot is
task reinforcement learning problem (MTRL) and define moving towards the target location. Once the robot’s center
each task as a type of terrain that the legged robot (agent) of mass is within a threshold distance to the target location,
must traverse. To learn generalizable locomotion policies, our the task is complete.
learning system consists of a procedural terrain generator
B. Terrain Parameterization and Procedural Task Genera-
that can efficiently generate diverse training environments,
tion
and an end-to-end visual-locomotion policy architecture that
directly maps the robot’s exteroceptive and proprioceptive We develop a procedural terrain generator to generate
observations to motor commands. diverse and challenging terrains that provide the robot with
a large quantity of rich training data. The environment is
A. Multi-Task Reinforcement Learning Formulation composed of m×n pillars, each pillar having cross-sectional
Given a distribution of tasks M, each task Mi ∈ M is dimensions of l, w, and height h. We denote H = {hi,j } ∈
a Partially Observable Markov Decision Process (POMDP). 1 Although we use a simulated robot due to limited access to the physical
A POMDP is tuple, Mi = hS, O, A, Ti , Ri, where S is the robot during COVID-19, we strive to make the simulation, including the
state space, O is the observation space, A is the action space, sensor measurement, as faithful as possible to the real robot.
TABLE I Generators (PMTG) [3] as our locomotion component ar-
T ERRAIN PARAMETERIZATION AND GENERATION FOR SELECTED chitecture (Fig. 2b). PMTG encourages the policy to learn
EXAMPLES .
smooth and cyclic locomotion behaviors. PMTG outputs a
desired trajectory for the legs that is modulated by a learned
Terrain Parameterization policy πθ (·): The policy observes the state of the trajectory
Terrain
Parameters φ Terrain Generation
generator (TG), stg , and the robot’s observation st , then
Flat No parameters H=0
Min terrain height: hmin H ∼ Um,n (hmin , hmax ) outputs parameters of the TG, ptg , including gait frequency,
Rugged Max terrain height: hmax Apply Gaussian smoothing swing height, and stride length, and a residual action term
Gaussian kernel std: σ with σ on H µf b . The final output action of our visual-locomotion policy
H=0
Number of holes: n Sample n index pairs (i, j)
is the combination of the trajectory generator and the residual
Holes Hole depth: h
H(i, j) = h action: at = µtg + µf b . Please refer to the original paper [3]
H=0 for more details. As detailed in [16], our visual-locomotion
Number of obstacles: n Sample n index pairs (i, j)
Obstacles obstacle height: h policy architecture achieves a separation of concerns between
H(i, j) = h the basic locomotion skills and terrain perception, which
H(0, :) = 0
Stairs
Stair step height: h
Set column lengths to l
enables the robot to adapt its smooth locomotion behaviors
Stair step length: l according to its surrounding environments.
H(i + 1, :) = H(i, :) + h
IV. EXPERIMENTAL RESULTS
Rm×n as the height field for all the pillars. During training, We design experiments to validate the proposed system’s
we select a task Mi and adjust each pillar’s heights to ability to learn a visual locomotion policy that generalizes
reflect the chosen task. Each task is a set of randomly well to terrains not encountered during training. In particular,
generated terrains that belongs to the same type (e.g., flat, we would like to answer the following two questions:
stairs). Each type of terrain is described by a parameter • Can our system learn visual locomotion policies that

vector φi , which provides the lower and upper bounds demonstrate zero-shot generalization to new terrains?
for the random sampling. The terrain generator constructs • Can our policy architecture effectively use LiDAR input

the heightfield H from the given parameter vector φ. For and PMTG parameterization to improve the generaliza-
example, the parameter vector φ for the rugged terrain task tion performance over unseen terrains?
(Fig. 3b) includes the minimum and maximum values of the A. Experiment Details
heightfield; for the stairs task, the parameter vector defines
the height and length of each step. Table I summarizes the To answer the above questions, we evaluate our system
parameters and terrain generator for selected terrain types. using a simulated Unitree Laikago quadruped robot [26],
With this simple parameterization, we can generate over ten
different types of terrains that a robot may encounter in
the real world. Our procedural terrain generation algorithm
provides a rich set of training data essential for generalizable
policies to emerge.
C. Visual-Locomotion Policy Architecture
Exteroceptive perception plays a crucial role when legged
robots need to navigate different terrains and environments
with obstacles and humans [24], [25]. As such, we aim
to incorporate perception into our policy architecture such
that information from the robot’s surroundings can modulate
(a) Visual-locomotion policy architecture.
locomotion. Additionally, the policy’s low-level actuation
commands need to be smooth and realizable on the physical
robot. To this end, we seek to restrict the search space of
possible gaits to be cyclic and smooth while still expressive
enough so that the perception can modulate locomotion
sufficiently to work on different terrains.
In our visual-locomotion policy architecture (Fig. 2), we
use two separate neural network encoders to process the
proprioceptive and exteroceptive inputs. The upper branch
of Fig. 2a processes the LiDAR input, while the lower
branch takes care of proprioceptive information. The learned
lower-dimensional features are concatenated with the target (b) The locomotion component using PMTG [3] for smooth
information before being passed to the policy’s locomotion and cyclic actuation patterns.
component. We chose to use Policies Modulating Trajectory Fig. 2. Overview of the visual-locomotion policy architecture.
(a) Obstacles (b) Rugged (c) Stairs (d) Cliff

(e) Forest (f) Holes (g) Gaps (h) Hills

Fig. 3. A Laikago robot deployed in various procedurally generated training environments. The red sphere represents goal g and success radius rg .

which weighs approximately 22kg and is actuated by 12 exteroceptive and proprioceptive input encoders are both
motors. We simulate the onboard Velodyne VLP-16 (Puck) (32, 16, 4), respectively. We use the ReLU activation function
LiDAR sensor, which provides the perception of the sur- for all layers in both networks [29]. The advantages are
rounding environment (See Figure 2b). The LiDAR measures estimated using Generalized Advantage Estimation [30].
the distance from the surrounding obstacles and terrain to the We then evaluate the trained policies on a suite of test-
robot. This sensor supports 16 channels, a 360◦ horizontal ing environments not encountered during training. Figure
field of view, and a 30◦ vertical field of view. We add 1 illustrates a subset of these testing environments. These
Gaussian noise to the ground-truth distance readings in high-fidelity simulated environments are created in PyBullet
simulation to mimic the real-world noise model. The 3D physics engine [4] with Gibson scenes [5]. A policy’s ability
LiDAR scan matrix D is normalized to range [0, 1] and to successfully navigate across a given terrain is measured
flattened to a vector d. using the task completion rate, tcr, which measures how
close the agent gets to the target relative to its starting
Our policy computes joint target positions (at ), which are
position:
converted to target joint torques by a PD controller running at gd,T
1kHz. Rigid body dynamics and contacts are also simulated tcr = 1 − ,
gd,0
at 1kHz. In other words, the position and velocity (provided
by PyBullet [4]) and the desired torque (provided by the PD where gd,T is the final Euclidean distance between the robot
controller) are sent to the actuator model every 1ms. The and the target when the robot falls or completes the task, and
actuator model then computes 10 internal 100µs steps and gd,0 is the distance at the beginning of the episode. A task
provides the effective output torque of the actuator, which completion rate of 1 indicates successful navigation to the
is then used by PyBullet to compute joint accelerations. The target, whereas tcr close to zero means that the robot cannot
simulation environment is configured to use an action repeat navigate across the terrain.
of 10 steps, which means that our policy computes a new B. The Impact of MTRL on Generalization
action at and receive a state st every 10ms (100Hz).
Table II shows the generalization performance of our
We train the visual-locomotion policy using the MTRL visual-locomotion policy trained on different types of ter-
formulation with simulated environments randomly gener- rains (rows) and tested in unseen environments (columns),
ated using our procedural task generation method (Section including a maze (Maze), a steep and rugged mountain
III-B). We choose a distributed version of the Proximal (Mountain), two indoor scenarios (Office 1 and Office 2), an
Policy Optimization (PPO) [27] in TF-Agents [28] for train- office space with moving humans (Dynamic Env), a forest
ing. We use a 2-layer fully-connected neural network of scene with rugged terrain and obstacles (Forest), a winding
dimensions (512, 256) to parameterize the value function and path with a cliff on both sides (Cliff), and a randomly-
another network of dimensions (256, 128) to parameterize generated continuous mesh (Continuous). Policies trained
the policy. The policy outputs the parameters of a multi- on a single type of terrain achieve a low task completion
variate Gaussian distribution, which we sample actions from rate in the testing environments due to a lack of diverse
during training. We use a greedy policy during evaluation by training data. In contrast, our approach achieves much higher
executing the mean of the multivariate Gaussian distribution generalization performance. For instance, our method on
provided by the policy network. The dimensions of the average achieves a task completion rate of 67% on the
TABLE II
G ENERALIZATION PERFORMANCE OF OUR VISUAL - LOCOMOTION POLICY.

TABLE III
C OMPARISON OF OUR PROPOSED METHOD TO OTHER POLICIES DEPLOYED IN A MTRL TRAINING REGIME . T HE PERFORMANCE DECREASES WHEN
THE POLICY DOES NOT USE A PMTG PARAMETERIZATION , WHEN THE POLICY IS NOT PROVIDED EXTEROCEPTIVE INPUTS FROM THE L I DAR, AND
WHEN MULTI - TASK TRAINING IS PERFORMED IN A SEQUENTIAL MANNER .

mountain task, while policies trained in a single type of learned with our system can be successfully deployed in new
terrain only achieve 28% at best (See Figure 4 for a snapshot unseen environments.
of our policy navigating up the rugged mountain trail). These
results indicate that our MTRL formulation using procedural C. Ablation Studies
task generation, and visual-locomotion policy architecture, We perform three ablation studies to understand the im-
results in superior generalization performance. The policy portance of each design decision in our system. Table III
summarizes their impacts on the resulting generalization
performance of the policy.
a) PMTG: We replace the locomotion component of
the visual-locomotion policy with a reactive policy that does
not have a trajectory generator. Our PMTG-parameterized
visual-locomotion policy performs 28%-218% better than a
pure reactive locomotion component. We find that PMTG
produces smoother actions and leads to improved zero-shot
generalization to new terrains.
b) Exteroceptive input: We remove the LiDAR input
from the visual-locomotion policy. Observing Table III, it is
clear that the exteroceptive information plays a critical role
in learning generalizable locomotion policies that can adapt
to a wide variety of terrains. This finding agrees with results
from the field of experimental psychology, which establish
Fig. 4. Snapshot of a laikago robot navigating through mountainous terrain
the importance of exteroceptive observations in guiding foot
not encountered during training. Please refer to the supplementary video for placement when navigating over complex terrains [25], [24].
more examples of the agent navigating challenging terrains. Figure 5 visualizes the trajectory produced by our visual
Fig. 5. Visualization of trajectory generated by our method in an Fig. 6. Visualization of trajectory generated by our method in a rugged
environment with many obstacles. Foot Z positions for the left hind, right terrain. Foot Z positions for the left hind, right hind, left forward, and right
hind, left forward, and right forward feet are shown. forward feet are shown. The rugged terrain requires that the robot carefully
place its feet to maintain balance.

locomotion policy in a terrain with obstacles. When walking

combining these components, our system can learn locomo-
over flat terrain, the robot’s foot height is constant and cyclic,
tion policies that work on various terrains and demonstrate
only varying when turning to avoid obstacles. In contrast, on
zero-shot generalization to new environments.
rugged terrain (Figure 6), the robot carefully places its feet to
adapt to the geometry of the terrain to maintain balance. This V. CONCLUSION
careful foot placement is essential for challenging terrains We introduce a learning system that enables legged robots
and requires a visual feedback loop, which our learning to traverse various environments and demonstrates zero-shot
system can provide. generalization to new terrains. Our system consists of a novel
c) MTRL training scheme: Our system generates a new multi-task reinforcement learning formulation of the loco-
random locomotion task at each episode for all the distributed motion problem, a visual locomotion policy architecture that
workers. This ensures a steady stream of rich training data to encourages smooth actions and incorporates perception to
the agent. In this study, we lower the variety of tasks supplied modulate locomotion, and a novel procedural terrain genera-
in a single training step by proving tasks sequentially. That is, tion algorithm that provides the agent with rich training data
the agent learns one task for a fixed number of training steps from a variety of simulated terrains. Our results on a suite of
before switching the task. The policy trained in a sequential simulated environments show that treating legged locomotion
fashion performs poorly due to catastrophic forgetting [31]. as a multi-task POMDP leads to increased generalization
These ablation studies confirm the importance of each performance. Additionally, we show that providing the policy
component of our system, including the exteroceptive input with a strong prior over the space of gaits further enhances
and PMTG used in the visual-locomotion policy architecture, its ability to generalize to unseen terrains. In future work,
as well as our multi-task POMDP training formulation. By we plan to evaluate our work on a real-world robot.
R EFERENCES [23] T. Yu, S. Jumar, A. Gupta, S. Levine, K. Hausmann, and C. Finn,
“Multi-task reinforcement learning without interference,” 2019.
[1] K. Albee, A. C. Hernandez, O. Jia-Richards, and A. T. Espinoza, [24] J. S. Matthis, J. L. Yates, and M. M. Hayhoe, “Gaze and the control
“Real-time motion planning in unknown environments for legged of foot placement when walking in natural terrain,” Current Biology,
robotic planetary exploration,” in 2020 IEEE Aerospace Conference, vol. 28, no. 8, pp. 1224 – 1233.e5, 2018. [Online]. Available:
2020, pp. 1–9. http://www.sciencedirect.com/science/article/pii/S0960982218303099
[2] C. D. Bellicoso, M. Bjelonic, L. Wellhausen, K. Holtmann, [25] J. S. Matthis and B. R. Fajen, “Visual control of foot placement when
F. Günther, M. Tranzatto, P. Fankhauser, and M. Hutter, “Advances walking over complex terrain,” J Exp Psychol Hum Percept Perform,
in real-world applications for legged robots,” Journal of Field vol. 40, no. 1, pp. 106–115, Feb 2014.
Robotics, vol. 35, no. 8, pp. 1311–1326, 2018. [Online]. Available: [26] Unitree, “Laikago: Let’s challenge new possibilities,” 2018.
https://onlinelibrary.wiley.com/doi/abs/10.1002/rob.21839 [Online]. Available: http://www.unitree.cc/e/action/ShowInfo.php?
[3] A. Iscen, K. Caluwaerts, J. Tan, T. Zhang, E. Coumans, V. Sindhwani, classid=6&id=1
and V. Vanhoucke, “Policies modulating trajectory generators,” in [27] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov,
Journal of Machine Learning Research, ser. Proceedings of Machine “Proximal policy optimization algorithms,” 2017.
Learning Research, A. Billard, A. Dragan, J. Peters, and J. Morimoto, [28] S. Guadarrama, A. Korattikara, O. Ramirez, P. Castro, E. Holly,
Eds., vol. 87. PMLR, 29–31 Oct 2018, pp. 916–926. [Online]. S. Fishman, K. Wang, E. Gonina, N. Wu, E. Kokiopoulou, L. Sbaiz,
Available: http://proceedings.mlr.press/v87/iscen18a.html J. Smith, G. Bartók, J. Berent, C. Harris, V. Vanhoucke, and E. Brevdo,
[4] E. Coumans and Y. Bai, “Pybullet, a python module for physics sim- “TF-Agents: A library for reinforcement learning in tensorflow,”
ulation for games, robotics and machine learning,” http://pybullet.org, https://github.com/tensorflow/agents, 2018, [Online; accessed 25-
2016–2019. June-2019]. [Online]. Available: https://github.com/tensorflow/agents
[5] F. Xia, A. R. Zamir, Z.-Y. He, A. Sax, J. Malik, and S. Savarese, [29] M. Andrychowicz, A. Raichuk, P. Stańczyk, M. Orsini, S. Girgin,
“Gibson env: real-world perception for embodied agents,” in Computer R. Marinier, L. Hussenot, M. Geist, O. Pietquin, M. Michalski,
Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on. S. Gelly, and O. Bachem, “What matters in on-policy reinforcement
IEEE, 2018. learning? a large-scale empirical study,” 2020.
[6] A. W. Winkler, C. D. Bellicoso, M. Hutter, and J. Buchli, “Gait and [30] J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High-
trajectory optimization for legged systems through phase-based end- dimensional continuous control using generalized advantage estima-
effector parameterization,” IEEE Robotics and Automation Letters, tion,” 2018.
vol. 3, no. 3, pp. 1560–1567, 2018. [31] M. McCloskey and N. Cohen, “Catastrophic interference in connec-
[7] D. Kim, Y. Zhao, G. Thomas, B. R. Fernandez, and L. Sentis, “Sta- tionist networks: The sequential learning problem,” Psychology of
bilizing series-elastic point-foot bipeds using whole-body operational Learning and Motivation - Advances in Research and Theory, vol. 24,
space control,” IEEE Transactions on Robotics, vol. 32, no. 6, pp. no. C, pp. 109–165, Jan. 1989.
1362–1379, 2016.
[8] R. Grandia, F. Farshidian, R. Ranftl, and M. Hutter, “Feedback mpc
for torque-controlled legged robots,” 05 2019.
[9] G. Bledt, M. Powell, B. Katz, J. Carlo, P. Wensing, and S. Kim, “Mit
cheetah 3: Design and control of a robust, dynamic quadruped robot,”
10 2018.
[10] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction.
MIT press, 2018.
[11] J. Tan, T. Zhang, E. Coumans, A. Iscen, Y. Bai, D. Hafner, S. Bo-
hez, and V. Vanhoucke, “Sim-to-real: Learning agile locomotion for
quadruped robots,” 2018.
[12] T. Haarnoja, S. Ha, A. Zhou, J. Tan, G. Tucker, and S. Levine,
“Learning to walk via deep reinforcement learning,” in Robotics:
Science and Systems, 2019.
[13] J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V. Tsounis,
V. Koltun, and M. Hutter, “Learning agile and dynamic motor skills
for legged robots,” Science Robotics, vol. 4, no. 26, 2019.
[14] J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter,
“Learning quadrupedal locomotion over challenging terrain,” Science
Robotics, vol. 5, no. 47, 2020. [Online]. Available: https://robotics.
sciencemag.org/content/5/47/eabc5986
[15] S. Ha, P. Xu, Z. Tan, S. Levine, and J. Tan, “Learning to walk in the
real world with minimal human effort,” 2020.
[16] N. Heess, D. TB, S. Sriram, J. Lemmon, J. Merel, G. Wayne, Y. Tassa,
T. Erez, Z. Wang, S. M. A. Eslami, M. Riedmiller, and D. Silver,
“Emergence of locomotion behaviours in rich environments,” 2017.
[17] X. B. Peng, G. Berseth, K. Yin, and M. van de Panne, “Deeploco:
Dynamic locomotion skills using hierarchical deep reinforcement
learning,” ACM Transactions on Graphics (Proc. SIGGRAPH 2017),
vol. 36, no. 4, 2017.
[18] V. Tsounis, M. Alge, J. Lee, F. Farshidian, and M. Hutter, “Deepgait:
Planning and control of quadrupedal gaits using deep reinforcement
learning,” 2020.
[19] D. Chen, B. Zhou, V. Koltun, and P. Krähenbühl, “Learning by
cheating,” 2019.
[20] R. Caruana, “Multitask learning: A knowledge-based source of induc-
tive bias,” in Proceedings of the Tenth International Conference on
International Conference on Machine Learning, ser. ICML’93. San
Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1993, p.
41–48.
[21] M. Hessel, H. Soyer, L. Espeholt, W. Czarnecki, S. Schmitt, and H. van
Hasselt, “Multi-task deep reinforcement learning with popart,” 2018.
[22] T. Yu, D. Quillen, Z. He, R. Julian, K. Hausman, C. Finn, and
S. Levine, “Meta-world: A benchmark and evaluation for multi-task
and meta reinforcement learning,” 2019.

PVTi Course
50% (2)
PVTi Course
21 pages
Maxime2022 - Learning To Walk Legged Hexapod Locomotion From Simulation To The Real World
No ratings yet
Maxime2022 - Learning To Walk Legged Hexapod Locomotion From Simulation To The Real World
61 pages
Robust High-Speed Running For Quadruped Robots Via
No ratings yet
Robust High-Speed Running For Quadruped Robots Via
7 pages
Combining Learning-Based Locomotion Policy With Model-Based Manipulation For Legged Mobile Manipulators
No ratings yet
Combining Learning-Based Locomotion Policy With Model-Based Manipulation For Legged Mobile Manipulators
8 pages
Deep Gait
No ratings yet
Deep Gait
8 pages
Scirobotics Abn6798
No ratings yet
Scirobotics Abn6798
2 pages
Legged Locomotion in Challenging Terrains Using Egocentric Vision
No ratings yet
Legged Locomotion in Challenging Terrains Using Egocentric Vision
13 pages
Learning Quadrupedal Locomotion Over Challenging Terrain
No ratings yet
Learning Quadrupedal Locomotion Over Challenging Terrain
22 pages
Generalized Visual Path Following On Jetbot Using Normalization With Reinforcement Learning
No ratings yet
Generalized Visual Path Following On Jetbot Using Normalization With Reinforcement Learning
6 pages
10.1109 Lars-Sbr.2015.41 Apbp
No ratings yet
10.1109 Lars-Sbr.2015.41 Apbp
6 pages
Path Planning With Local Motion Estimations
No ratings yet
Path Planning With Local Motion Estimations
8 pages
Scirobotics Abk2822
No ratings yet
Scirobotics Abk2822
15 pages
Slope Handling For Quadruped Robots Using Deep Reinforcement Learning and Toe Trajectory Planning
No ratings yet
Slope Handling For Quadruped Robots Using Deep Reinforcement Learning and Toe Trajectory Planning
6 pages
Learning Robust Perceptive Locomotion for Quadrupedal Robots in the Wild
No ratings yet
Learning Robust Perceptive Locomotion for Quadrupedal Robots in the Wild
22 pages
Design and Experimental Validation of Deep Reinforcement Learning-Based Fast Trajectory Planning and Control For Mobile Robot in Unknown Environment
No ratings yet
Design and Experimental Validation of Deep Reinforcement Learning-Based Fast Trajectory Planning and Control For Mobile Robot in Unknown Environment
15 pages
Robots Science
No ratings yet
Robots Science
14 pages
A Controller For The LittleDog Quadruped Walking On Rough Terrain
No ratings yet
A Controller For The LittleDog Quadruped Walking On Rough Terrain
7 pages
Learning To Walk Via Deep Reinforcement Learning
No ratings yet
Learning To Walk Via Deep Reinforcement Learning
10 pages
Efficient Learning of Robust Quadruped Bounding Using Pretrained Neural Networks
No ratings yet
Efficient Learning of Robust Quadruped Bounding Using Pretrained Neural Networks
12 pages
Rob 19
No ratings yet
Rob 19
15 pages
Agrawal Et Al. - 2022 - Vision-Aided Dynamic Quadrupedal Locomotion On Discrete Terrain Using Motion Libraries
No ratings yet
Agrawal Et Al. - 2022 - Vision-Aided Dynamic Quadrupedal Locomotion On Discrete Terrain Using Motion Libraries
7 pages
H I M: L A L L S R R: Ybrid Nternal Odel Earning Gile Egged Ocomotion With Imulated Obot Esponse
No ratings yet
H I M: L A L L S R R: Ybrid Nternal Odel Earning Gile Egged Ocomotion With Imulated Obot Esponse
17 pages
Scirobotics Adc8892
No ratings yet
Scirobotics Adc8892
14 pages
Vision-Based Mobile Robotics Obstacle Avoidance With Deep Reinforcement Learning
No ratings yet
Vision-Based Mobile Robotics Obstacle Avoidance With Deep Reinforcement Learning
7 pages
Project Report
No ratings yet
Project Report
11 pages
Paper Ask1 Arxiv
No ratings yet
Paper Ask1 Arxiv
7 pages
Robust Flight Navigation Out of Distribution With Liquid Neural Networks
No ratings yet
Robust Flight Navigation Out of Distribution With Liquid Neural Networks
15 pages
基于液体神经网络的鲁棒飞行导航
No ratings yet
基于液体神经网络的鲁棒飞行导航
14 pages
QT-Opt - Scalable Deep Reinforcement Learning For Vision-Based Robotic Manipulationa
No ratings yet
QT-Opt - Scalable Deep Reinforcement Learning For Vision-Based Robotic Manipulationa
23 pages
1 s2.0 S0921889022001373 Main
No ratings yet
1 s2.0 S0921889022001373 Main
11 pages
Language Models As Zero-Shot Trajectory Generators
No ratings yet
Language Models As Zero-Shot Trajectory Generators
22 pages
Bipedal Walking Robot Using Deep Deterministic Policy Gradient
No ratings yet
Bipedal Walking Robot Using Deep Deterministic Policy Gradient
5 pages
Sensors 24 07306
No ratings yet
Sensors 24 07306
32 pages
97%autonomous Navigation of A Mobile Robot With A Monocular Camera Using Deep Reinforcement Learning and Semantic Image Segmentation
No ratings yet
97%autonomous Navigation of A Mobile Robot With A Monocular Camera Using Deep Reinforcement Learning and Semantic Image Segmentation
6 pages
Awies Mohammad Mulla-R
No ratings yet
Awies Mohammad Mulla-R
1 page
QT-Opt, Scalable Deep RL For Vision-Based Robotic Manipulation
No ratings yet
QT-Opt, Scalable Deep RL For Vision-Based Robotic Manipulation
23 pages
Reinforcement and Imitation Learning For Diverse Visuomotor Skills
No ratings yet
Reinforcement and Imitation Learning For Diverse Visuomotor Skills
12 pages
Master Thesis
No ratings yet
Master Thesis
77 pages
Unidexgrasp++: Improving Dexterous Grasping Policy Learning Via Geometry-Aware Curriculum and Iterative Generalist-Specialist Learning
No ratings yet
Unidexgrasp++: Improving Dexterous Grasping Policy Learning Via Geometry-Aware Curriculum and Iterative Generalist-Specialist Learning
17 pages
A Survey of Robotics Control Based
No ratings yet
A Survey of Robotics Control Based
22 pages
Sim-to-Real Reinforcement Learning For Vision-Based Dexterous Manipulation On Humanoids
No ratings yet
Sim-to-Real Reinforcement Learning For Vision-Based Dexterous Manipulation On Humanoids
12 pages
Identifying Important Sensory Feedback For Learning Locomotion Skills
No ratings yet
Identifying Important Sensory Feedback For Learning Locomotion Skills
22 pages
Paper Ask1
No ratings yet
Paper Ask1
7 pages
BC-Z - Zero-Shot Task Generalization With Robotic Imitation Learning
No ratings yet
BC-Z - Zero-Shot Task Generalization With Robotic Imitation Learning
23 pages
Robust Feedback Motion Policy Design Using Reinforcement Learning On A 3D Digit Bipedal Robot
No ratings yet
Robust Feedback Motion Policy Design Using Reinforcement Learning On A 3D Digit Bipedal Robot
8 pages
HVF Hierarchical Foresight
No ratings yet
HVF Hierarchical Foresight
16 pages
Reinforcement Learning For Robotics Advance
No ratings yet
Reinforcement Learning For Robotics Advance
2 pages
Deep Reinforcement Learning With Optimized Reward Functions For Robotic Trajectory Planning
No ratings yet
Deep Reinforcement Learning With Optimized Reward Functions For Robotic Trajectory Planning
11 pages
Sim 2 Real S25
No ratings yet
Sim 2 Real S25
59 pages
RobotKeyframing Learning Locomotion With High Level Objectives Via Mixture of Dense and Sparse Rewards Paper
No ratings yet
RobotKeyframing Learning Locomotion With High Level Objectives Via Mixture of Dense and Sparse Rewards Paper
17 pages
Bootstrapping Reinforcement Learning With Imitation For Vision-Based Agile Flight
No ratings yet
Bootstrapping Reinforcement Learning With Imitation For Vision-Based Agile Flight
15 pages
Multi-Task Reinforcement Learning Based Mobile Manipulation Control For Dynamic Object Tracking and Grasping
No ratings yet
Multi-Task Reinforcement Learning Based Mobile Manipulation Control For Dynamic Object Tracking and Grasping
6 pages
Model-Based Deep Reinforcement Learning For Robotic Systems
No ratings yet
Model-Based Deep Reinforcement Learning For Robotic Systems
146 pages
Deep Tracking Control
No ratings yet
Deep Tracking Control
18 pages
High-Speed Robot Navigation Using Predicted Occupancy Maps
No ratings yet
High-Speed Robot Navigation Using Predicted Occupancy Maps
7 pages
Documentation
No ratings yet
Documentation
27 pages
Miller Ajm4 SM EECS 2022 Thesis
No ratings yet
Miller Ajm4 SM EECS 2022 Thesis
70 pages
Actuators 12 00157 v2
No ratings yet
Actuators 12 00157 v2
12 pages
Sensors 23 02036
No ratings yet
Sensors 23 02036
24 pages
Octo: An Open-Source Generalist Robot Policy
No ratings yet
Octo: An Open-Source Generalist Robot Policy
19 pages
Subsumption Architecture: Fundamentals and Applications for Behavior Based Robotics and Reactive Control
From Everand
Subsumption Architecture: Fundamentals and Applications for Behavior Based Robotics and Reactive Control
Fouad Sabry
No ratings yet
ESE-2018 Mains Test Series: Mechanical Engineering Test No: 11
No ratings yet
ESE-2018 Mains Test Series: Mechanical Engineering Test No: 11
38 pages
Magmatip 58: Recommended Handling of High Pressure Die Casting Projects
No ratings yet
Magmatip 58: Recommended Handling of High Pressure Die Casting Projects
10 pages
Optimetrics: Parametric
No ratings yet
Optimetrics: Parametric
36 pages
(2022) Joint - Quantum - Communication - and - Sensing
No ratings yet
(2022) Joint - Quantum - Communication - and - Sensing
6 pages
Psychological Statistics - Basic Terms
100% (2)
Psychological Statistics - Basic Terms
4 pages
Batc 602
No ratings yet
Batc 602
8 pages
Linear Models
No ratings yet
Linear Models
92 pages
Program Description WRO 2025 SENIOR 102 SEC
No ratings yet
Program Description WRO 2025 SENIOR 102 SEC
30 pages
Bocp CGD en
No ratings yet
Bocp CGD en
330 pages
(3rd Month) MATH 112 - Statistics and Probability
No ratings yet
(3rd Month) MATH 112 - Statistics and Probability
65 pages
Parameter
No ratings yet
Parameter
229 pages
Tutorial Software SGeMS
100% (3)
Tutorial Software SGeMS
26 pages
Class Exercises Topic 2 Solutions: Jordi Blanes I Vidal Econometrics: Theory and Applications
No ratings yet
Class Exercises Topic 2 Solutions: Jordi Blanes I Vidal Econometrics: Theory and Applications
12 pages
Redline
No ratings yet
Redline
58 pages
MATH10282: Introduction To Statistics Supplementary Lecture Notes
No ratings yet
MATH10282: Introduction To Statistics Supplementary Lecture Notes
50 pages
Topography Optimization: by Prakash Pagadala
No ratings yet
Topography Optimization: by Prakash Pagadala
7 pages
Wayne Daniel
100% (6)
Wayne Daniel
186 pages
Group 5 Research Methods
No ratings yet
Group 5 Research Methods
15 pages
COM 114 Note-1
No ratings yet
COM 114 Note-1
18 pages
The Fragmentation Energy-Fan Model in Quarry Blast
No ratings yet
The Fragmentation Energy-Fan Model in Quarry Blast
16 pages
SPE 90580 Productivity of Selectively Perforated Horizontal Wells
No ratings yet
SPE 90580 Productivity of Selectively Perforated Horizontal Wells
7 pages
Advisory Circular: U.S. Department of Transportation
No ratings yet
Advisory Circular: U.S. Department of Transportation
78 pages
Mplus 2
No ratings yet
Mplus 2
12 pages
Archie's Dream-Petrophysics From Sidewall Samples and Cuttings
No ratings yet
Archie's Dream-Petrophysics From Sidewall Samples and Cuttings
10 pages
Well-Test Horizontal Well, Student Presentation
No ratings yet
Well-Test Horizontal Well, Student Presentation
13 pages
TEPZZ - 99598 B - T: European Patent Specification
No ratings yet
TEPZZ - 99598 B - T: European Patent Specification
19 pages
Code Bright Tutorials 06 2016
No ratings yet
Code Bright Tutorials 06 2016
347 pages
DSAI Group3 SlidingCUSUM Paper
No ratings yet
DSAI Group3 SlidingCUSUM Paper
13 pages