Week12 RobotSystem
Week12 RobotSystem
Lin Shao
NUS
1
2
camera image
laser scan
…
perception 𝑧 state estimation decision 𝑎
decision
𝑎
control
3
4
Robot architectures. We learned to construct powerful
components for robot perception, state estimation, decision,
control, … Following the modular system design approach, we
design, implement, and test each module and integrate them.
There are many architectures for integration. An illustrative
example of the most common architecture is shown below:
control
linear velocity
angular velocity
…
(10-100 Hz)
5
This example architecture consists of two hierarchical layers.
The upper layer performs global reasoning, e.g., finding a
route from NUS to Changi airport. It runs at a relatively low
frequency. The lower layer performs local control, e.g.,
accelerating to 70 km/h. It runs at a high frequency.
✓ • By decomposing a system into functional components
and layers, we can build complex systems incrementally
and systematically.
✓
• We can add layers and components when necessary.
✓ • Each component can be developed independently.
✓ • Each component serves a clearly-defined, well-
understood purpose. This eases the difficulty of
development and tests.
✗ • For most components, we need models of the world:,
e.g., robot dynamics, environment geometry, sensor
noise, … Acquiring accurate world models is often
challenging and requires domain expertise and creative
thinking, in other words, the design engineer’s brain.
6
Instead of building models, we can build behaviors, roughly, policies.
The subsumption architecture consists of a set of interacting
behaviors.
• All behaviors coexist and act simultaneously. One behavior may
inhibit or suppress others. For example, if the robot performs an
emergency stop, it cannot follow a path at the same time.
• Behaviors can be structured hierarchically: a behavior may invoke
sub-behaviors.
follow path
emergency stop
7
The subsumption architecture is also modular and offers the
usual benefits of a modular system. At the same time, it has
some unique characteristics.
✓ • As all behaviors act in parallel, the overall system response
is fast. In fact, the subsumption architecture is designed for
real-time performance.
✗ • Unfortunately, acquiring complex behaviors are even more
difficult than acquiring complex models.
✗ • As more and more behaviors are added, they interact in
complex ways that are difficult to anticipate and control.
The subsumption architecture was important historically, but no
longer widely practiced in its entirety. Its influence, however,
exists in many robot architectures, in various forms.
8
The rise of deep learning has brought about the end-to-end
(e2e) learning system. It replaces all system components by a
single giant neural network.
camera image
linear velocity
laser scan
angular velocity
encoder reading
…
…
9
An autonomous driving system.
10
• Closing the loop for robotic grasping: a real-time, generative
grasp synthesis approach
• Training Details
Integration LLM with robotics
• Pros: • Cons:
• Enable open-vocabulary task planning for • Require a manipulation skill repository/predefined APIs
complex instructions
21
Current Robot Foundation Models
• End-to-end Policy Training • Pros:
• Enable the learning of low-level manipulation skills
• Simple framework
• Cons:
• The performance needs improvement (62% in RT-X
RT-X, 2023
generalization tasks)
• Lack of interpretability
• LLMs & VLMs for Reward Design
• Pros:
• Enable the learning of low-level manipulation skills
• Borrow the knowledge from VLMs & LLMs
• Cons:
• Require oracle state estimators
• Requiring RL training or MPC, which is infeasible for
complex real-world tasks
L2R, 2023 Eureka, 2023 • Separate policies for different tasks
22
CV & NLP have standard data formats
Diversity of Tasks
NLP: “Token”
CV: “Pixel”
Diversity of Robots
23
What is the “token” for robotic manipulation?
Can we find a standard/unified format Diversity of Tasks
for robotic manipulation?
Diversity of Objects
24
Diversity of Robots
Unified Description for Manipulation
Instruction: Open the fridge High-Level
and pick up the milk box Generative Planning
Universal Task Planning Ability
Task Plan
Low-Level
Contact Synthesis
Universal Manipulation Ability
26
Bi-Level Process
• High-level Planning
• Understand visual inputs
• Think about action plans
• Imagine the outcomes of each action plan
27
Bi-Level Process
• High-level Planning
• Understand visual inputs
• Think about action plans
• Imagine the outcomes of each action plan
• Choose the best action plan and execute
• Online replan
• Low-Level Control
• Execute the plan with hands and arms
28
Building Foundation Models for High-level Planning
29
Unified Description for Manipulation
Instruction: Open the fridge High-Level
and pick up the milk box Generative Planning
Universal Task Planning Ability
Task Plan
Low-Level
Contact Synthesis
Universal Manipulation Ability
31
Unified Description for Manipulation
How to manipulate the object
to move it from the current
pose/shape to the target
pose/shape ( or target motion)
32
Unified Description for Manipulation
How to manipulate the object
to move it from the current
pose/shape to the target
pose/shape ( or target motion)
33
Software
Manipulation Foundation Model
Robot Actions:
Current Point Clouds Manipulation FM
Contact Regions, Forces, Motions
Target Point Clouds/Target Motion Outputs Actions
Robots Joint Values and Torques
articulated object rigid object 1D Deformable Object 2D Deformable Object 3D Deformable Object
3
5
ManiFoundation Model Pipeline Overview
3
6
ManiFoundation Model for Contact Synthesis
37
ManiFoundation Model for Contact Synthesis
38
ManiFoundation Model for Contact Synthesis
39
ManiFoundation Model for Contact Synthesis
40
Physical Properties Encoding and Effects
- Friction coefficient for all objects
- Density and elasticity coefficients for 2D deformable
- Young’s modulus and Poission’s ratio for 1D and 3D
deformable objects
- Using a set of MLPs.
4
1
Physical Properties Encoding and Effects
- Friction coefficient for all objects
- Density and elasticity coefficients for 2D deformable Square-shaped
- Young’s modulus and Poission’s ratio for 1D and 3D handkerchief
deformable objects Move forward
Square-shaped card
Move forward
4
4
Contact Region Mask and Effects
4
5
CVAE for Multi-modal Contact Synthesis
4
6
CVAE for Multi-modal Contact Synthesis
4
7
Support Dexterous Hands Optimization & Motion Planning
The network prediction serves as the initial solution to the rigid body
contact optimization process, improving its speed and success rate
Differentiable
Non-maximum Iterative
Pose Init. Contact Wrench
Suppression Closest Point
Optimization
49
Real Robot Experiments
Rope Rearrangement
50
Real Robot Experiments
Rope Rearrangement
52
Key concepts.
• Model-based modular system
• End-to-end learning system
53