0% found this document useful (0 votes)

54 views11 pages

Solving Rubik's Cube With A Robot Hand OpenAI Blog

The document describes how researchers trained neural networks using reinforcement learning and a technique called Automatic Domain Randomization to solve a Rubik's Cube with a robotic hand. This allowed the networks trained in simulation to transfer their skills to controlling a real robotic hand to solve a physical Rubik's Cube. The system was tested under various perturbations to demonstrate its robustness.

Uploaded by

Kanishka Patra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views11 pages

Solving Rubik's Cube With A Robot Hand OpenAI Blog

Uploaded by

Kanishka Patra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

A PI P ROJ ECTS B LO G ABOUT

Solving Rubik’s Cube

with a Robot Hand

▶Watch Video

We’ve trained a pair of neural networks to solve the Rubik’s Cube with a human-like
robot hand. The neural networks are trained entirely in simulation, using the same
reinforcement learning code as OpenAI Five paired with a new technique called
Automatic Domain Randomization (ADR). The system can handle situations it
never saw during training, such as being prodded by a stuffed giraffe. This shows
that reinforcement learning isn’t just a tool for virtual tasks, but can solve physical-
world problems requiring unprecedented dexterity.

October 15, 2019

8 minute read
📄 READ PA P E R ▶ WATC H A L L V I D EO S

Human hands let us solve a wide variety of tasks. For the past 60 years of robotics, hard
tasks which humans accomplish with their ﬁxed pair of hands have required designing a
custom robot for each task. As an alternative, people have spent many decades trying to
use general-purpose robotic hardware, but with limited success due to their high degrees
of freedom. In particular, the hardware we use here is not new—the robot hand we use
has been around for the last 15 years—but the software approach is.

Since May 2017, we’ve been trying to train a human-like robotic hand to solve the
Rubik’s Cube. We set this goal because we believe that successfully training such a
robotic hand to do complex manipulation tasks lays the foundation for general-purpose
robots. We solved the Rubik’s Cube in simulation in July 2017. But as of July 2018, we
could only manipulate a block on the robot. Now, we’ve reached our initial goal.

Solving Rubik’s Cube with a Robot Hand: Uncut

A full solve of the Rubik’s Cube. This video plays at real-time and was not edited in any way.

Solving a Rubik’s Cube one-handed is a challenging task even for humans, and it takes
children several years to gain the dexterity required to master it. Our robot still hasn’t
perfected its technique though, as it solves the Rubik’s Cube 60% of the time (and only
20% of the time for a maximally diﬃcult scramble).

Our approach
We train neural networks to solve the Rubik’s Cube in simulation using reinforcement
learning and Kociemba’s algorithm for picking the solution steps. [1] Domain
randomization enables networks trained solely in simulation to transfer to a real robot.

Domain randomization exposes the neural network to many different variants of the same problem, in this
case solving a Rubik’s Cube.

The biggest challenge we faced was to create environments in simulation diverse enough
to capture the physics of the real world. Factors like friction, elasticity and dynamics are
incredibly diﬃcult to measure and model for objects as complex as Rubik’s Cubes or
robotic hands and we found that domain randomization alone is not enough.

To overcome this, we developed a new method called Automatic Domain Randomization

(ADR), which endlessly generates progressively more diﬃcult environments in
simulation. [2] This frees us from having an accurate model of the real world, and enables
the transfer of neural networks learned in simulation to be applied to the real world.

ADR starts with a single, nonrandomized environment, wherein a neural network learns
to solve Rubik’s Cube. As the neural network gets better at the task and reaches a
performance threshold, the amount of domain randomization is increased automatically.
This makes the task harder, since the neural network must now learn to generalize to
more randomized environments. The network keeps learning until it again exceeds the
performance threshold, when more randomization kicks in, and the process is repeated.

ADR applied to the size of the Rubik’s Cube

. – . cm

. cm

Days of Training Time

One of the parameters we randomize is the size of the Rubik’s Cube (above). ADR begins
with a fixed size of the Rubik’s Cube and gradually increases the randomization range as
training progresses. We apply the same technique to all other parameters, such as the
mass of the cube, the friction of the robot fingers, and the visual surface materials of the
hand. The neural network thus has to learn to solve the Rubik’s Cube under all of those
increasingly more difficult conditions.

Automatic vs. manual domain randomization

Average Successes

ADR

M a n u a l D o m a i n Ra n d o m i z a t i o n

- . - . - . - . . .
ADR Entropy

Domain randomization required us to manually specify randomization ranges, which is

diﬃcult since too much randomization makes learning diﬃcult but too little
randomization hinders transfer to the real robot. ADR solves this by automatically
expanding randomization ranges over time with no human intervention. ADR removes
the need for domain knowledge and makes it simpler to apply our methods to new tasks.
In contrast to manual domain randomization, ADR also keeps the task always
challenging with training never converging.

We compared ADR to manual domain randomization on the block ﬂipping task, where
we already had a strong baseline. In the beginning ADR performs worse in terms of
number of successes on the real robot. But as ADR increases the entropy, which is a
measure of the complexity of the environment, the transfer performance eventually
doubles over the baseline—without human tuning.

Analysis

Testing for robustness

Using ADR, we are able to train neural networks in simulation that can solve the Rubik’s
Cube on the real robot hand. This is because ADR exposes the network to an endless
variety of randomized simulations. It is this exposure to complexity during training that
prepares the network to transfer from simulation to the real world since it has to learn to
quickly identify and adjust to whatever physical world it is confronted with.
Unperturbed (for reference) Rubber glove

Tied fingers Blanket occlusion and perturbation

Plush giraffe perturbation Pen perturbation

Perturbations that we apply to the real robot hand while it solves the Rubik’s Cube. All videos play at real-
time.

To test the limits of our method, we experiment with a variety of perturbations while the
hand is solving the Rubik’s Cube. Not only does this test for the robustness of our control
network but also tests our vision network, which we here use to estimate the cube’s
position and orientation.

We ﬁnd that our system trained with ADR is surprisingly robust to perturbations even
though we never trained with them: The robot can successfully perform most ﬂips and
face rotations under all tested perturbations, though not at peak performance.

Emergent meta-learning
We believe that meta-learning, or learning to learn, is an important prerequisite for
building general-purpose systems, since it enables them to quickly adapt to changing
conditions in their environments. The hypothesis behind ADR is that a memory-
augmented networks combined with a suﬃciently randomized environment leads to
emergent meta-learning, where the network implements a learning algorithm that allows
itself to rapidly adapt its behavior to the environment it is deployed in. [3]

To test this systematically, we measure the time to success per cube flip (rotating the
cube such that a different color faces up) for our neural network under different
perturbations, such as resetting the network’s memory, resetting the dynamics, or
breaking a joint. We perform these experiments in simulation, which allows us to
average performance over 10,000 trials in a controlled setting.

Reset memory Reset dynamics Broken joint

Time to success when the network’s memory is erased

. sec

Perturbation

P e r t u r b e d p o l i cy
.
Baseline

th flip th th th th th th th th th

In the beginning, as the neural network successfully achieves more ﬂips, each successive
time to success decreases because the network learns to adapt. When perturbations are
applied (vertical gray lines in the above chart), we see a spike in time to success. This is
because the strategy the network is employing doesn’t work in the changed environment.
The network then relearns about the new environment and we again see time to success
decrease to the previous baseline.
We also measure failure probability and performed the same experiments for face
rotations (rotating the top face 90 degrees clockwise or counterclockwise) and ﬁnd the
same pattern of adaptation. [4]

Understanding our neural networks

Visualizing our networks enables us to understand what they are storing in memory. This
becomes increasingly important as the networks grow in complexity.

The memory of our neural network is visualized above. We use a building block from the
interpretability toolbox, namely non-negative matrix factorization, to condense this
high-dimensional vector into 6 groups and assign each a unique color. We then display
the color of the currently dominant group for every timestep.

We ﬁnd that each memory group has a semantically meaningful behavior associated
with it. For example, we can tell by looking at only the dominant group of the network’s
memory if it is about to spin the cube or rotate the top clockwise before it happens.

Challenges
Solving the Rubik’s Cube with a robot hand is still not easy. Our method currently solves
the Rubik’s Cube 20% of the time when applying a maximally diﬃcult scramble that
requires 26 face rotations. For simpler scrambles that require 15 rotations to undo, the
success rate is 60%. When the Rubik’s Cube is dropped or a timeout is reached, we
consider the attempt failed. However, our network is capable of solving the Rubik’s Cube
from any initial condition. So if the cube is dropped, it is possible to put it back into the
hand and continue solving.

We generally find that our neural network is much more likely to fail during the first few
face rotations and flips. This is the case because the neural network needs to balance
solving the Rubik’s Cube with adapting to the physical world during those early rotations
and flips.

Behind the scenes: Rubik’s Cube prototypes

In order to benchmark our progress and make the problem tractable, we built and
designed custom versions of cubes as stepping stones towards ultimately solving a
regular Rubik’s Cube. [5]

Rubik’s Cube prototypes, from left to right: Locked cube, Face cube, Full cube, Giiker cube, regular
Rubik’s Cube.

P ROTOT Y P E P O S I T IO N + O R I E N TAT IO N I N T E R N A L D EG R E E S O F F R E E D O M (S E N S O R)

Locked cube Vision (No sensor)

Face cube PhaseSpace (PhaseSpace)

P ROTOT Y P E P O S I T IO N + O R I E N TAT IO N I N T E R N A L D EG R E E S O F F R E E D O M (S E N S O R)

Full cube PhaseSpace (PhaseSpace)

Giiker cube Vision (Built-in sensors)

Regular Rubik’s Cube Vision (Vision)

Next steps
We believe that human-level dexterity is on the path towards building general-purpose
robots and we are excited to push forward in this direction.

If you want to help make increasingly general AI systems, whether robotic or virtual,
we’re hiring!

Footnotes
. We focus on the problems that are currently difficult for machines to master: perception and dexterous
manipulation. We therefore train our neural networks to achieve the required face rotations and cube flips
as generated by Kociemba’s algorithm. ↩

. Our work is strongly related to POET, which automatically generates 2D environments. However, our work
learns a joint policy over all environments, which transfers to any newly generated environment. ↩
. More concretely, we hypothesize that a neural network with finite capacity trained on environments with
unbounded complexity forces the network to learn a special-purpose learning algorithm since it cannot
memorize solutions for each individual environment and there exists no single robust policy that works
under all randomizations. ↩
. Please refer to our paper for full results. ↩
. The only modification we made was cutting out a small piece of each center cublet’s colorful sticker. This
was necessary to break rotational symmetry. ↩

Authors
OpenAI, Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur
Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Jerry
Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba & Lei Zhang

Acknowledgments
Thanks to the following for feedback on drafts of this post and paper: Josh Achiam, Greg Brockman, Nick
Cammarata, Jack Clark, Jeff Clune, Ruben D’Sa, Harri Edwards, David Farhi, Ken Goldberg, Leslie P.
Kaelbling, Hyeonwoo Noh, Lerrel Pinto, John Schulman, Ilya Sutskever & Tao Xu.
Video

Peter Jordan (Director), Yvette Solis (Producer), Brooke Chan (Producer)

Editor
Ashley Pilipiszyn

Design
Justin Jay Wang & Ben Barry

Photography
Eric Haines

Filed Under
Research, Milestones

Company Latest Resources

API Research Newsroom

Projects Announcements Timeline
Blog Events Papers
About Milestones Charter
Jobs

Hogan Kevin Hypnosis NLP Persuasion and More 415pages
100% (2)
Hogan Kevin Hypnosis NLP Persuasion and More 415pages
415 pages
Sim 2 Real S25
No ratings yet
Sim 2 Real S25
59 pages
How To Make CUBOTino Autonomous Robot 20220625
No ratings yet
How To Make CUBOTino Autonomous Robot 20220625
106 pages
E: H - R D C L L M: Ureka Uman Level Eward Esign Via Oding Arge Anguage Odels
No ratings yet
E: H - R D C L L M: Ureka Uman Level Eward Esign Via Oding Arge Anguage Odels
45 pages
DuranK Thesis Redacted
No ratings yet
DuranK Thesis Redacted
63 pages
How To Make A Very Small Rubik Cube Solver Robot 20220406
No ratings yet
How To Make A Very Small Rubik Cube Solver Robot 20220406
48 pages
Robotic Hand Thesis
100% (3)
Robotic Hand Thesis
6 pages
Gr33rapport2021 02
No ratings yet
Gr33rapport2021 02
53 pages
Robot RobotxR1 2505.03238v1
No ratings yet
Robot RobotxR1 2505.03238v1
19 pages
Towards Human-Level Bimanual Dexterous Manipulatio
No ratings yet
Towards Human-Level Bimanual Dexterous Manipulatio
36 pages
The Rubik's Cube
No ratings yet
The Rubik's Cube
51 pages
L Hoang PDF
No ratings yet
L Hoang PDF
86 pages
L Hoang PDF
No ratings yet
L Hoang PDF
86 pages
6552-Article Text-23235-1-4-20250613
No ratings yet
6552-Article Text-23235-1-4-20250613
8 pages
Speedcubingiitk
No ratings yet
Speedcubingiitk
15 pages
IARC
No ratings yet
IARC
8 pages
Continual Curiosity Driven Skill Acquisition From High Di - 2017 - Artificial in
No ratings yet
Continual Curiosity Driven Skill Acquisition From High Di - 2017 - Artificial in
23 pages
Self-Improving Robots
No ratings yet
Self-Improving Robots
13 pages
QT-Opt, Scalable Deep RL For Vision-Based Robotic Manipulation
No ratings yet
QT-Opt, Scalable Deep RL For Vision-Based Robotic Manipulation
23 pages
Deepmind Control Suite
No ratings yet
Deepmind Control Suite
24 pages
Sim-to-Real Reinforcement Learning For Vision-Based Dexterous Manipulation On Humanoids
No ratings yet
Sim-to-Real Reinforcement Learning For Vision-Based Dexterous Manipulation On Humanoids
12 pages
Marketing Lessons From CEO Factory
No ratings yet
Marketing Lessons From CEO Factory
8 pages
Building Static Robots To Solve Manipulative Puzzles Ranulf Green BSC Computer Science Session 2006/2007
No ratings yet
Building Static Robots To Solve Manipulative Puzzles Ranulf Green BSC Computer Science Session 2006/2007
67 pages
CS391R Presentation Dexterous
No ratings yet
CS391R Presentation Dexterous
16 pages
Research
No ratings yet
Research
3 pages
Line Following - Braitenberg, Robot Examples
No ratings yet
Line Following - Braitenberg, Robot Examples
79 pages
BSBLDR601 Assessment Task 02 v1.0
50% (2)
BSBLDR601 Assessment Task 02 v1.0
12 pages
Blackberrys Case Study PDF
100% (1)
Blackberrys Case Study PDF
3 pages
Kids in The Classroom
No ratings yet
Kids in The Classroom
114 pages
African Sign Languages
No ratings yet
African Sign Languages
327 pages
301020-Healthcare Market Landscape in India - Final-V2.0
No ratings yet
301020-Healthcare Market Landscape in India - Final-V2.0
43 pages
Burg Ard 22 Language Imitation Learning
No ratings yet
Burg Ard 22 Language Imitation Learning
8 pages
AIGDEL - 0820 Red 1 26 - Compressed 1 26
No ratings yet
AIGDEL - 0820 Red 1 26 - Compressed 1 26
26 pages
Reinforcement Learning For Robotics Advance
No ratings yet
Reinforcement Learning For Robotics Advance
2 pages
Haptic-Guided Path Generation For Remote Car-Like Vehicles
No ratings yet
Haptic-Guided Path Generation For Remote Car-Like Vehicles
8 pages
Final Project - Reinforcement Learning
No ratings yet
Final Project - Reinforcement Learning
8 pages
ROBEL - Robotics Benchmarks For Learning With Low-Cost Robots
No ratings yet
ROBEL - Robotics Benchmarks For Learning With Low-Cost Robots
14 pages
How To Make A Very Small Rubik Cube Solver Robot 20220630
No ratings yet
How To Make A Very Small Rubik Cube Solver Robot 20220630
59 pages
Paper Ask1
No ratings yet
Paper Ask1
7 pages
Picture Description
100% (3)
Picture Description
7 pages
Schwarke Et Al. - 2023 - Curiosity-Driven Learning of Joint Locomotion and
No ratings yet
Schwarke Et Al. - 2023 - Curiosity-Driven Learning of Joint Locomotion and
17 pages
2011-Leon Teaching A Robotb
No ratings yet
2011-Leon Teaching A Robotb
8 pages
Jeswiezy Project Final Report
No ratings yet
Jeswiezy Project Final Report
24 pages
Solving The Rubik S Cube With
No ratings yet
Solving The Rubik S Cube With
8 pages
Paper Ask1 Arxiv
No ratings yet
Paper Ask1 Arxiv
7 pages
9 Bhatia Battery Performance Test
No ratings yet
9 Bhatia Battery Performance Test
16 pages
Training As A Development Tool
No ratings yet
Training As A Development Tool
75 pages
CS480 Fall2023 Written Assignment 01 Problem1 Solution
No ratings yet
CS480 Fall2023 Written Assignment 01 Problem1 Solution
3 pages
Accreditation
No ratings yet
Accreditation
18 pages
Module 3: Non-Digital and Digital Skills and Tools in Delivering Technology-Enhanced Lessons
No ratings yet
Module 3: Non-Digital and Digital Skills and Tools in Delivering Technology-Enhanced Lessons
4 pages
Solution Based Therapy
No ratings yet
Solution Based Therapy
10 pages
Picking The Right Robotics Challenge
No ratings yet
Picking The Right Robotics Challenge
1 page
Easy For You, Tough For A Robot - Science News For Students
No ratings yet
Easy For You, Tough For A Robot - Science News For Students
12 pages
HHWC Stress Management Manual Final
No ratings yet
HHWC Stress Management Manual Final
33 pages
Submission
No ratings yet
Submission
1 page
Final PM Examination Profed 102 Teaching Profession 2021
No ratings yet
Final PM Examination Profed 102 Teaching Profession 2021
8 pages
Physics 10 12 Justin Marcellienus Report
No ratings yet
Physics 10 12 Justin Marcellienus Report
22 pages
Document
No ratings yet
Document
5 pages
Chapter 9 Methods For Literature Reviews - Handbook of EHealth Evaluation An Evidence-Based Approach - NCBI Bookshelf
No ratings yet
Chapter 9 Methods For Literature Reviews - Handbook of EHealth Evaluation An Evidence-Based Approach - NCBI Bookshelf
14 pages
Project Group/Seminar/Lab Report: Putting Things On A Table Efficiently
No ratings yet
Project Group/Seminar/Lab Report: Putting Things On A Table Efficiently
3 pages
Project Group/Seminar/Lab Report: Putting Things On A Table Efficiently
No ratings yet
Project Group/Seminar/Lab Report: Putting Things On A Table Efficiently
3 pages
Project Group/Seminar/Lab Report: Putting Things On A Table Efficiently
No ratings yet
Project Group/Seminar/Lab Report: Putting Things On A Table Efficiently
3 pages
Yesid 2017
No ratings yet
Yesid 2017
12 pages
ROBOTICS
No ratings yet
ROBOTICS
1 page
Proposal PDF
No ratings yet
Proposal PDF
8 pages
ILM Verb Sheet
No ratings yet
ILM Verb Sheet
3 pages
Optimal Solutions - Richard Korf
No ratings yet
Optimal Solutions - Richard Korf
6 pages
QMS Assignment
No ratings yet
QMS Assignment
14 pages
Rubiks Cube Solver PDF
No ratings yet
Rubiks Cube Solver PDF
7 pages
Tata Power Business Case Study Challenge
No ratings yet
Tata Power Business Case Study Challenge
1 page
Article Description 5
No ratings yet
Article Description 5
1 page
Reinforcement Learning and Transfer Learning: Simulation-Robot System For Object-Handling
No ratings yet
Reinforcement Learning and Transfer Learning: Simulation-Robot System For Object-Handling
3 pages
An Implementation of The Spacetime Constraints Approach To The Synthesis of Realistic Motion
No ratings yet
An Implementation of The Spacetime Constraints Approach To The Synthesis of Realistic Motion
46 pages
MicroMouse and Random Maze Generator Article
No ratings yet
MicroMouse and Random Maze Generator Article
9 pages
Social Media Marketing Trends: How To Create Effective and Relevant Content Online
No ratings yet
Social Media Marketing Trends: How To Create Effective and Relevant Content Online
25 pages
Faizan All About Arduino Robotics
No ratings yet
Faizan All About Arduino Robotics
21 pages
Homo Economicus Presentation by Swati Vaidya Participant No 26
No ratings yet
Homo Economicus Presentation by Swati Vaidya Participant No 26
18 pages
Lesson Plan-Math7-Q1-Week1-Day2
No ratings yet
Lesson Plan-Math7-Q1-Week1-Day2
3 pages
Facebook and Artificial Intelligence
No ratings yet
Facebook and Artificial Intelligence
6 pages
5c - 6a An Introduction To ML Alpaydin PP 1-12
No ratings yet
5c - 6a An Introduction To ML Alpaydin PP 1-12
12 pages
How Google Aims To Dominate Artificial Intelligence
No ratings yet
How Google Aims To Dominate Artificial Intelligence
8 pages
AC Project
No ratings yet
AC Project
7 pages
DBS10012 Mini Project
No ratings yet
DBS10012 Mini Project
5 pages
Form 9 Attachment - Recommendation Letter CONTOH
No ratings yet
Form 9 Attachment - Recommendation Letter CONTOH
1 page
How Amazon Has Reorganized Around Artificial Intelligence and Machine Learning
No ratings yet
How Amazon Has Reorganized Around Artificial Intelligence and Machine Learning
4 pages
Puerta Melguizo: Visualizing Argumentation
No ratings yet
Puerta Melguizo: Visualizing Argumentation
34 pages
Marketing Raven - Loreal
No ratings yet
Marketing Raven - Loreal
4 pages
Causes of Prosocial Behaviour
No ratings yet
Causes of Prosocial Behaviour
15 pages
Post Structuralism Wikipedia
No ratings yet
Post Structuralism Wikipedia
8 pages
Activity 2 Module 2
No ratings yet
Activity 2 Module 2
4 pages
Robotics Review 1
No ratings yet
Robotics Review 1
4 pages
Embodied Reactive Agents
No ratings yet
Embodied Reactive Agents
5 pages
Building A Mechanical Rubik's Cube
No ratings yet
Building A Mechanical Rubik's Cube
4 pages
Redesigning Work in An Era of Cognitive Technologies Rottman Magazine
No ratings yet
Redesigning Work in An Era of Cognitive Technologies Rottman Magazine
8 pages
ROSS and Watson Tackle The Law
No ratings yet
ROSS and Watson Tackle The Law
2 pages
Memorial Sloan Kettering Cancer Center, IBM To Collaborate in Applying Watson Technology To Help Oncologists
No ratings yet
Memorial Sloan Kettering Cancer Center, IBM To Collaborate in Applying Watson Technology To Help Oncologists
8 pages
LMD2 Module1 Wilfinal
No ratings yet
LMD2 Module1 Wilfinal
9 pages
Biology 2 WHLP Week 5-7
No ratings yet
Biology 2 WHLP Week 5-7
1 page
Model Gabungan Sanford & Kemmis 1
No ratings yet
Model Gabungan Sanford & Kemmis 1
8 pages
Winner Take All Autoencoders
No ratings yet
Winner Take All Autoencoders
11 pages
Feasibility Study Oral Presentation Rubric For Entrep Q4
No ratings yet
Feasibility Study Oral Presentation Rubric For Entrep Q4
2 pages
Lesson Plan Template 2016 Ib Setting-Revised Curriculum 1
No ratings yet
Lesson Plan Template 2016 Ib Setting-Revised Curriculum 1
4 pages
Matthew McEwen's Resume
No ratings yet
Matthew McEwen's Resume
1 page