0% found this document useful (0 votes)
59 views16 pages

CS391R Presentation Dexterous

This document summarizes a presentation on learning dexterous in-hand manipulation. It discusses (1) the motivation for focusing on dexterity, (2) the difficulties with dexterous manipulation using a 24 degree of freedom robot hand, (3) related work on dexterous and in-hand manipulation using various strategies, and (4) the proposed approach of using reinforcement learning trained in a simulated environment with domain randomization and then transferring the policy to a physical robot.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views16 pages

CS391R Presentation Dexterous

This document summarizes a presentation on learning dexterous in-hand manipulation. It discusses (1) the motivation for focusing on dexterity, (2) the difficulties with dexterous manipulation using a 24 degree of freedom robot hand, (3) related work on dexterous and in-hand manipulation using various strategies, and (4) the proposed approach of using reinforcement learning trained in a simulated environment with domain randomization and then transferring the policy to a physical robot.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Learning Dexterous In-Hand Manipulation

Presenter: Franke Tang

09-27-2022

CS391R: Robot Learning (Fall 2022) 1


Motivation
Why focus on dexterity?
● The human hands are able to solve a huge array of tasks
● The world is built around human hands
● OpenAI’s goal is built a general purpose robot
● Start off interacting with environment and utilizing dexterous manipulation

CS391R: Robot Learning (Fall 2022) 2


Difficulties with Dexterous Manipulation
OpenAI uses the Shadow Dexterous Hand
- 24 Degrees of Freedom (7 typically)
- 20 Actuators

Real-world Hardware
- Hard to simulate

CS391R: Robot Learning (Fall 2022) 3


Related Works
Dexterous Manipulation
● Active area of research for decades, with many strategies
● rolling, sliding, finger gaiting, pushing, etc.
● However, required planning and exact models of hand and object

Dexterous In-Hand Manipulation


● Promising in-hand manipulation in simulations, but do not transfer to real-world robot
● Training on physical robots are slow and learning is limited due less training data

Simulation to Real Transfer


● Domain adaptation methods
● Domain randomization for policy to be more adaptive
● Adversarial Training for more robust policies and can help with transfer
CS391R: Robot Learning (Fall 2022) 4
Proposed Approach
Goal: Using a humanoid hand, orientate a block to desired orientation

Utilize Reinforcement Learning to train a policy


- Requires a lot data
- Unfeasible with physical robots (very hard to scale)

Instead let try to simulate it first, then later transfer it to the physical robot

CS391R: Robot Learning (Fall 2022) 5


Simulated vs Real
The right image is the simulated
environment using MuJoCo
Physics Engine.

The left image is the physical


world environment. The hand is
in the center of the “cage”. The
red circle are the RGB camera
The simulated environment is model after the physical
used for position and object
environment to mimic experiences the robot should have.
pose estimation.

CS391R: Robot Learning (Fall 2022) 6


Reality Gap
A simulated model can never be an exact replica of the physical model.
In general, policies trained in simulation perform poorly in the real world (the Reality Gap).

Domain Randomization:
- randomize certain aspects of a simulated environment
- train policies over wide range of environment
- avoids overfitting on one specific environment
- Key things randomized:
- physics randomizations
- visual randomizations (Unity rendered images)

CS391R: Robot Learning (Fall 2022) 7


Experimental Setup

CS391R: Robot Learning (Fall 2022) 8


The control policy
Policy represented as RNN with memory
● LSTM
● Trained with PPO

Actions and Rewards


● rt = dt - dt + 1 (rotation angles)
● +5 for goal is achieved
● -20 for dropping the object

Internal framework: Rapid


● Used in OpenAI Five
● Train policy and vision models
● Policy is trained with states (not images) 8 GPUs were used. Pools contained 384 worker machines
and each with 16 CPU cores. 6144 CPU cores in total
CS391R: Robot Learning (Fall 2022) 9
Vision Model
3 RGB cameras with different angles
- predicts position and orientation
- feed the pose estimator’s prediction to policy

Training
- trained policy until 1 million states
- Use 2 GPUs for rendering, 1 GPU for training

CS391R: Robot Learning (Fall 2022) 10


Qualitative Results
The policy naturally exhibit many grasps human do.
Naturally discovered many strategies for dexterous in-hand manipulation.

For precision grasps, the policy tends to use the little finger more than the index and middle
finger. Likely due to more DoF.

CS391R: Robot Learning (Fall 2022) 11


Quantitative Results
Domain randomization is key to
the policy performing well

CS391R: Robot Learning (Fall 2022) 12


Quantitative Results cont.
Using memory achieve better
performance on randomized simulation

CS391R: Robot Learning (Fall 2022) 13


Limitations
Transfer from simulation to physical is still limited.

Failures still occurs


- Most common: dropping when rotating the
wrist pitch joint down
- Dropping the object near the beginning of
the trial
- Getting stuck because the edge of object
gets caught in a screw hole

Hardware is super expensive.


Multiple GPUs and machines.
Shadow Hand price is also extremely expensive!

CS391R: Robot Learning (Fall 2022) 14


Future Work / Extended Readings
Solving Rubik's Cube with a Robot Hand. OpenAI et. al (2019)
- Same people, added on the challenge of solving a Rubik’s Cube
- Utilizes Automatic Domain Randomization

A System for General In-Hand Object Re-Orientation. Tao Chen et. al (2021)
- Best Paper Award for Conference on Robot Learning 2021
- 2000 distinct objects, and pick up objects with hand facing downwards

Dota 2 with Large Scale Deep Reinforcement Learning. OpenAI et. al (2019)
- Learn more about Rapid
- See how a complex game was broken by an AI

CS391R: Robot Learning (Fall 2022) 15


Summary
This paper attempts to train a robot manipulate objects with a 5-fingered robot
hands. Due to the complexity of the hardware, prior works were unable to both
train the robotic hand and use in the physical world. This paper utilized
reinforcement learning and simulations to train the policy. Domain randomization
allowed them to generalize the policy and allow it be transfer to the physical world.
Using Rapid as the internal framework of the policy and vision model, combining
the two allowed the robot physically manipulate objects with relative success.

CS391R: Robot Learning (Fall 2022) 16

You might also like