0% found this document useful (0 votes)
10 views

Week12 RobotSystem

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Week12 RobotSystem

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

CS4278/CS5478 Intelligent Robots:

Algorithms and Systems

Lin Shao

NUS

1
2
camera image
laser scan

perception 𝑧 state estimation decision 𝑎

decision

𝑎
control

3
4
Robot architectures. We learned to construct powerful
components for robot perception, state estimation, decision,
control, … Following the modular system design approach, we
design, implement, and test each module and integrate them.
There are many architectures for integration. An illustrative
example of the most common architecture is shown below:

perception state estimation decision


action
(0.01-1 Hz)

control

linear velocity
angular velocity

(10-100 Hz)

5
This example architecture consists of two hierarchical layers.
The upper layer performs global reasoning, e.g., finding a
route from NUS to Changi airport. It runs at a relatively low
frequency. The lower layer performs local control, e.g.,
accelerating to 70 km/h. It runs at a high frequency.
✓ • By decomposing a system into functional components
and layers, we can build complex systems incrementally
and systematically.

• We can add layers and components when necessary.
✓ • Each component can be developed independently.
✓ • Each component serves a clearly-defined, well-
understood purpose. This eases the difficulty of
development and tests.
✗ • For most components, we need models of the world:,
e.g., robot dynamics, environment geometry, sensor
noise, … Acquiring accurate world models is often
challenging and requires domain expertise and creative
thinking, in other words, the design engineer’s brain.

6
Instead of building models, we can build behaviors, roughly, policies.
The subsumption architecture consists of a set of interacting
behaviors.
• All behaviors coexist and act simultaneously. One behavior may
inhibit or suppress others. For example, if the robot performs an
emergency stop, it cannot follow a path at the same time.
• Behaviors can be structured hierarchically: a behavior may invoke
sub-behaviors.
follow path

camera image linear velocity


avoid obstacle
laser scan angular velocity
encoder reading …

emergency stop

7
The subsumption architecture is also modular and offers the
usual benefits of a modular system. At the same time, it has
some unique characteristics.
✓ • As all behaviors act in parallel, the overall system response
is fast. In fact, the subsumption architecture is designed for
real-time performance.
✗ • Unfortunately, acquiring complex behaviors are even more
difficult than acquiring complex models.
✗ • As more and more behaviors are added, they interact in
complex ways that are difficult to anticipate and control.
The subsumption architecture was important historically, but no
longer widely practiced in its entirety. Its influence, however,
exists in many robot architectures, in various forms.

8
The rise of deep learning has brought about the end-to-end
(e2e) learning system. It replaces all system components by a
single giant neural network.

camera image
linear velocity
laser scan
angular velocity
encoder reading

9
An autonomous driving system.

10
• Closing the loop for robotic grasping: a real-time, generative
grasp synthesis approach

• Learning synergies between pushing and grasping with self-supervised deep


reinforcement learning

• ManiFoundation Model for General-Purpose Robotic Manipulation of Contact


Synthesis with Arbitrary Objects and Robots
Primitive Actions

parameterize each action as a motion primitive behavior


(e.g. pushing or grasping) executed at the 3D location q
projected from a pixel p of the heightmap image representation of the state

Pushing: q denotes the starting position of a 10cm push in one of k = 16 directions.


The trajectory of the push is straight.
Grasping: q denotes the middle position of a top-down parallel-jaw grasp in
one of k = 16 orientations.
• Reward Function

𝑅𝑔 𝑠𝑡 , 𝑠𝑡+1 = 1 if a grasp is successful

𝑅𝑝 𝑠𝑡 , 𝑠𝑡+1 = 0.5 if a push makes detectable changes to the environment

• Training Details
Integration LLM with robotics

• LLM & VLM as high-level planners

SayCan, 2022 VoxPoser, 2023 Code as Policies, 2022 Palm-E, 2023

• Pros: • Cons:
• Enable open-vocabulary task planning for • Require a manipulation skill repository/predefined APIs
complex instructions

We need a foundation model for manipulation skills (action model)

21
Current Robot Foundation Models
• End-to-end Policy Training • Pros:
• Enable the learning of low-level manipulation skills
• Simple framework

• Cons:
• The performance needs improvement (62% in RT-X
RT-X, 2023
generalization tasks)
• Lack of interpretability
• LLMs & VLMs for Reward Design
• Pros:
• Enable the learning of low-level manipulation skills
• Borrow the knowledge from VLMs & LLMs

• Cons:
• Require oracle state estimators
• Requiring RL training or MPC, which is infeasible for
complex real-world tasks
L2R, 2023 Eureka, 2023 • Separate policies for different tasks

22
CV & NLP have standard data formats
Diversity of Tasks
NLP: “Token”

CV: “Pixel”

What is the “token” for robotic manipulation? Diversity of Objects

Can we find a standard/unified format


for robotic manipulation?

Diversity of Robots

23
What is the “token” for robotic manipulation?
Can we find a standard/unified format Diversity of Tasks
for robotic manipulation?

Diversity of Objects

24

Diversity of Robots
Unified Description for Manipulation
Instruction: Open the fridge High-Level
and pick up the milk box Generative Planning
Universal Task Planning Ability

Task Plan

Follow the Task Plan

Low-Level
Contact Synthesis
Universal Manipulation Ability

Across Diverse Objects, Robots, Tasks


Consider a human performing a manipulation task

26
Bi-Level Process
• High-level Planning
• Understand visual inputs
• Think about action plans
• Imagine the outcomes of each action plan

27
Bi-Level Process
• High-level Planning
• Understand visual inputs
• Think about action plans
• Imagine the outcomes of each action plan
• Choose the best action plan and execute
• Online replan

• Low-Level Control
• Execute the plan with hands and arms

28
Building Foundation Models for High-level Planning

• High-Level World Model for Planning Action Module


• High-Level Actions: Plan
obs Actions Iterative
• Dynamics: Dynamics Module planning
along time
Future
Goal Obs
𝑜: observation • Rewards:
𝑎: high-level plan action Reward Module

29
Unified Description for Manipulation
Instruction: Open the fridge High-Level
and pick up the milk box Generative Planning
Universal Task Planning Ability

Task Plan

Follow the Task Plan

Low-Level
Contact Synthesis
Universal Manipulation Ability

Across Diverse Objects, Robots, Tasks


Unified Description for Manipulation
How to manipulate the object
to move it from the current
pose/shape to the target
pose/shape ( or target motion)

rigid object: translation, rotation, …

Target Object Point Cloud


at time t+1

Current Object Point


Cloud at time t

31
Unified Description for Manipulation
How to manipulate the object
to move it from the current
pose/shape to the target
pose/shape ( or target motion)

rigid object: translation, rotation, …

Target Object Point Cloud articulated rigid object: rotation


at time t+1

Current Object Point


Cloud at time t

32
Unified Description for Manipulation
How to manipulate the object
to move it from the current
pose/shape to the target
pose/shape ( or target motion)

rigid object: translation, rotation, …

Target Object Point Cloud articulated rigid object: rotation


at time t+1
deformable object: shape deformation
Current Object Point
Cloud at time t 1d deformable: rope
2d deformable: cloth
3d deformable: sponge

33
Software
Manipulation Foundation Model
Robot Actions:
Current Point Clouds Manipulation FM
Contact Regions, Forces, Motions
Target Point Clouds/Target Motion Outputs Actions
Robots Joint Values and Torques

Supports Any Object and Any Task

articulated object rigid object 1D Deformable Object 2D Deformable Object 3D Deformable Object

Supports Any Robot


ManiFoundation Model Pipeline Overview

3
5
ManiFoundation Model Pipeline Overview

3
6
ManiFoundation Model for Contact Synthesis

37
ManiFoundation Model for Contact Synthesis

38
ManiFoundation Model for Contact Synthesis

39
ManiFoundation Model for Contact Synthesis

40
Physical Properties Encoding and Effects
- Friction coefficient for all objects
- Density and elasticity coefficients for 2D deformable
- Young’s modulus and Poission’s ratio for 1D and 3D
deformable objects
- Using a set of MLPs.

4
1
Physical Properties Encoding and Effects
- Friction coefficient for all objects
- Density and elasticity coefficients for 2D deformable Square-shaped
- Young’s modulus and Poission’s ratio for 1D and 3D handkerchief
deformable objects Move forward

- Using a set of MLPs.

Square-shaped card
Move forward

Different contact predictions of same object


geometry with different physical properties
4
2
Physical Properties Encoding and Effects

- Friction coefficient for all objects


- Density and elasticity coefficients for 2D deformable
- Young’s modulus and Poission’s ratio for 1D and 3D
deformable objects
- Using a set of MLPs.

Different contact predictions of same object


geometry with different physical properties
4
3
Contact Region Mask and Effects

Placed in constrained environments User preference for interaction

4
4
Contact Region Mask and Effects

4
5
CVAE for Multi-modal Contact Synthesis

4
6
CVAE for Multi-modal Contact Synthesis

4
7
Support Dexterous Hands Optimization & Motion Planning
The network prediction serves as the initial solution to the rigid body
contact optimization process, improving its speed and success rate

NMS ICP Opt.

Differentiable
Non-maximum Iterative
Pose Init. Contact Wrench
Suppression Closest Point
Optimization

The wrench optimization is applied on


the oriented point clouds and output
robot joints and torques.
Real Robot Experiments

Get milk from fridge Toast the bread

49
Real Robot Experiments

Rope Rearrangement

Get milk from fridge Toast the bread Cloth Folding

50
Real Robot Experiments

Rope Rearrangement

Get milk from fridge Toast the bread Cloth Folding

1/2/3D Deformable & Articulated & Rigid + Grippers & Hands

-> All in One ManiFM! 51


Summary.
• There are many considerations in choosing a robot architectures:
• Modularity and scalability
• Robustness
• Real-time response
• The technical merits between model-based modular systems and
data-driven e2e learning system are hotly debated. They are,
however, not mutually exclusive. Integrating model-based and
data-driven approaches likely lead to the best performance.

52
Key concepts.
• Model-based modular system
• End-to-end learning system

53

You might also like