0% found this document useful (0 votes)
4 views

Lecture Robotics

The document is a comprehensive compilation of lecture slides and exercises from a Robotics course taught at the University of Stuttgart in the winter term of 2014/15. It covers various topics including kinematics, dynamics, path planning, mobile robotics, control theory, and reinforcement learning, providing foundational knowledge and exam preparation resources. Additionally, it includes bullet points for key concepts and exercises for practical application.

Uploaded by

mechmaster4u
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Lecture Robotics

The document is a comprehensive compilation of lecture slides and exercises from a Robotics course taught at the University of Stuttgart in the winter term of 2014/15. It covers various topics including kinematics, dynamics, path planning, mobile robotics, control theory, and reinforcement learning, providing foundational knowledge and exam preparation resources. Additionally, it includes bullet points for key concepts and exercises for practical application.

Uploaded by

mechmaster4u
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 201

Introduction to Robotics

Marc Toussaint

April 2016

This is a direct concatenation and reformatting of all lecture slides and exercises from
the Robotics course (winter term 2014/15, U Stuttgart), including a bullet point list to
help prepare for exams.

Contents
1 Introduction 5

2 Kinematics 13
Mobile robotics vs. Manipulation vs. Kinematic/Dynamic motion control (2:1)

2.1 Basic 3D geometry & notation . . . . . . . . . . . . . . . . . . . . . . . 14


Coordinate frames and transforms (2:6) Homogeneous transformation (2:11) Com-
position of transforms (2:13)

2.2 Kinematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Forward kinematics (2:15) Joint types (2:16) Kinematic map (2:18) Jacobian (2:20)

2.3 Inverse Kinematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21


Inverse kinematics as optimization problem (2:26) Inverse kinematics solution (2:29)
Singularity (2:35) Null space, task space, operational space, joint space (2:36) Motion
rate control (2:37) Sine motion profile (2:39) Joint space trajectory interpolation (2:41)
Task space trajectory interpolation (2:42)

2.4 Multiple tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28


The ’big’ task vector (2:47) Inverse kinematics for all tasks (2:48) Hierarchicak inverse
kinematics, nullspace motion (2:50) Reference of task maps and their Jacobians (2:52)
2 Introduction to Robotics, Marc Toussaint

3 Dynamics 36
Dynamic vs. non-dynamic robots (3:1) Definition of a dynamics equation (3:3)

3.1 PID and a 1D point mass . . . . . . . . . . . . . . . . . . . . . . . . . . 37


1D point mass (3:6) P gain (3:7) D gain (3:10) oscillatory-damped, critically damped,
over-damped (3:13) Damping ration, wave length (3:14) I gain (3:15) PID control
(3:16)
3.2 Dynamics of mechanical systems . . . . . . . . . . . . . . . . . . . . . 42
Euler-Lagrange equation (3:20) Pendulum example for Euler-Lagrange (3:21) Gen-
eral structure of Euler-Lagrange (3:22) Newton-Euler recursion (3:23)
3.3 Controlling a dynamic robot . . . . . . . . . . . . . . . . . . . . . . . . 45
General structure of robot dynamics (3:28) Inverse and forward dynamics (3:29) Fol-
lowing a reference trajectory in joint space (3:30) Following a reference trajectory in
task space (3:31) Operational space control (3:32) Multiple tasks (3:35)

4 Path Planning 50
Path finding examples (4:1) Feedback control vs path finding vs trajectory optimiza-
tion (4:5)
4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Bug algorithms (4:8) Potential fields (4:13) Dijkstra (4:18)

4.2 Sample-based Path Finding . . . . . . . . . . . . . . . . . . . . . . . . 68


Probabilistic Road Maps (PRMs) (4:28) Planning/testing local connections (4:32)
Probabilistic completeness of PRMs (4:34) Other PRM sampling strategies (4:35)
Rapidly Exploring Random Trees (RRTs) (4:39) Goal-oriented RRT (4:40) Bi-
directional RRT (4:42)
4.3 Non-holonomic systems . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Definition of non-holonomic (4:47) Path finding for a non-holonomic system (4:50)
Control-based sampling (4:52) RRTs with differential constraints (4:58) Configura-
tion state metrics (4:59) Side story: Dubins curves (4:60)

5 Path Optimization – briefly 90


From inverse kinematics to path costs (5:2) General k-order cost terms (5:3) Choice
of optimizer (5:5)

6 Probabilities 92
Probabilities as (subjective) information calculus (6:1) Frequentist vs Bayesian (6:3)

6.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93


Definitions based on sets (6:5) Random variables (6:6) Probability distribution (6:7)
Joint distribution (6:8) Marginal (6:8) Conditional distribution (6:8) Bayes’ Theorem
(6:10) Multiple RVs, conditional independence (6:11)
6.2 Probability distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Introduction to Robotics, Marc Toussaint 3

Bernoulli and Binomial distributions (6:14) Beta (6:15) Multinomial (6:18) Dirichlet
(6:19) Conjugate priors (6:23) Dirac distribution (6:26) Gaussian (6:27) Particle ap-
proximation of a distribution (6:30) Student’s t, Exponential, Laplace, Chi-squared,
Gamma distributions (6:33)

7 Mobile Robotics 104


Darpa challenge (7:1)

7.1 State Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107


Applying Bayes’ rule to state estimation (7:12) Bayes Filter (7:13) Odometry: Filtering
without observations (7:17) Particle Filter = Bayes Filter with Particles (7:22) Kalman
Filter = Bayes Filter with Gaussian (7:26) Kalman filter equations (7:27) Extended KF,
Unscented Transform (7:28)
7.2 Simultaneous Localization and Mapping (SLAM) . . . . . . . . . . . 118
Definition of a map, definition of SLAM problem (7:34) Joint Bayes Filter over state
and map (Kalman SLAM) (7:36) Particle SLAM (7:39) Particles carry their own his-
tory, their own map (7:39) Graph-based SLAM (7:45)

8 Control Theory 126


Topics in control theory (8:3)

8.1 Optimal control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127


Discrete time finite horizon definition (8:7) Bellman equation (discrete time) (8:9)
Continuous time finite horizion definition (8:10) Hamilton-Jacobi-Bellman equation
(8:11) Infinite horizon case (8:12) Linear-Quadratic Optimal Control (8:15) Riccati
differential eq = HJB eq in LQ case (8:17) Algebraic Riccati equation (ARE) for infinite
horizion (8:19) Riccati equations (also discrete time) (8:19)

8.2 Controllability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135


Definition of controllability (8:27)

8.3 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137


Definition of closed loop system equation (8:33) Lyapunov and exponential stability
(8:35) Linear stability analysis using eigenvalues (8:36) Definition of Lyapunov func-
tion (8:40) Energy as Lyapunov function (8:42) Value as Lyapunov function (8:43)

9 Reinforcement Learning in Robotics – briefly 143


Markov Decision Process (9:4) Relation to optimal control (9:5) Five approaches to RL
(9:7) Imitation Learning (9:9) Inverse RL (9:12) Policy Search with Policy Gradients
(9:16) Policy Search with Black-Box Optimization (9:21)

10 SKIPPED THIS TERM – Grasping (brief intro) 157


Force Closure (10:5) Form Closure, Caging (10:7) The “mitten thought experiment”
(10:10) Compliant/soft hands (10:12)
4 Introduction to Robotics, Marc Toussaint

11 SKIPPED THIS TERM – Legged Locomotion (brief intro) 163


Statically stable walk (11:7) Zero moment point (ZMP) (11:8) Human locomotion
(11:14) Passive dynamic walking: Compass Gait (11:19) Impact models (11:22)

12 Exercises 178
12.1 Exercise 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
12.2 Exercise 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
12.3 Exercise 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
12.4 Exercise 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
12.5 Exercise 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
12.6 Exercise 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
12.7 Exercise 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
12.8 Exercise 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
12.9 Exercise 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
12.10Exercise 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
12.11Exercise 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
12.12Exercise 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
12.13Exercise extra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

13 Bullet points to help learning 195


13.1 Kinematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
13.2 Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
13.3 Path planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
13.4 Mobile Robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
13.5 Control Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

Index 199
Introduction to Robotics, Marc Toussaint 5

1 Introduction

Why Robotics?
1:1

1:2
6 Introduction to Robotics, Marc Toussaint

http://www.saintpatrick.org/index.aspx/Health_Services/da_Vinci_Surgery
1:3

(robot “wife” aico)


1:4
Introduction to Robotics, Marc Toussaint 7

http://people.csail.mit.edu/nikolaus/drg/
1:5

Why Robotics?

• Commercial:
Industrial, health care, entertainment, agriculture, surgery, etc

• Critical view:
– International Committee for Robot Arms Control
http://icrac.net/

– Noel Sharkey’s articles on robot ethics (Child care robots PePeRo...)

http://www.nec.co.jp/products/robot/en/
1:6
8 Introduction to Robotics, Marc Toussaint

Robotics as intelligence research

AI in the real world

AI: Machine Learning, probabilistic reasoning, optimization


Real World: Interaction, manipulation, perception, navigation, etc
1:7

Why AI needs to go real world

Tunicates digest their brain once they settled!


• Motion was the driving force to develop intelligence
– motion needs control & decision making ↔ fast information processing
– motion needs anticipation & planning
– motion needs perception
– motion needs spatial representations
• Manipulation requires to acknowledge the structure (geometry, physics, ob-
jects) of the real world. Classical AI does not
1:8

Robotics as intelligence research


• Machine Learning and AI are computational disciplines, which had great suc-
cess with statistical modelling, analysis of data sets, symbolic reasoning. But
they have not solved autonomous learning, acting & reasoning in real worlds.
• Neurosciences and psychology are descriptive sciences, either on the biological
or cognitive level, e.g. with geat sucesses to describe and cure certain deceases.
But they are not sufficient to create intelligent systems.
• Robotics is the only synthetic discipline to understand intelligent behavior in
natural worlds. Robotics tells us what the actual problems are when trying to
Introduction to Robotics, Marc Toussaint 9

organize behavior in natural worlds.


1:9

History

• little movie...

( http://www.csail.mit.edu/videoarchive/history/aifilms http://www.ai.sri.
com/shakey/ )
1:10

Four chapters
• Kinematics & Dynamics
goal: orchestrate joint movements for desired movement in task spaces

Kinematic map, Jacobian, optimality principle of inverse kinematics, singularities, configuration/operational/null space, multiple si-
multaneous tasks, special task variables, trajectory interpolation, motion profiles; 1D point mass, damping & oscillation, PID, general
dynamic systems, Newton-Euler, joint space control, reference trajectory following, optimal operational space control

• Planning & optimization


goal: planning around obstacles, optimizing trajectories

Path finding vs. trajectory optimization, local vs. global, Dijkstra, Probabilistic Roadmaps, Rapidly Exploring Random Trees, differen-
tial constraints, metrics; trajectory optimization, general cost function, task variables, transition costs, gradient methods, 2nd order
methods, Dynamic Programming

• Control Theory
theory on designing optimal controllers

Topics in control theory, optimal control, HJB equation, infinite horizon case, Linear-Quadratic optimal control, Riccati equations

(differential, algebraic, discrete-time), controllability, stability, eigenvalue analysis, Lyapunov function

• Mobile robots
goal: localize and map yourself

State estimation, Bayes filter, odometry, particle filter, Kalman filter, Bayes smoothing, SLAM, joint Bayes filter, EKF SLAM, particle
SLAM, graph-based SLAM

1:11
10 Introduction to Robotics, Marc Toussaint

Last Year’s script

see https://ipvs.informatik.uni-stuttgart.de/mlr/marc/teaching/
13-Robotics/13-Robotics-script.pdf
1:12

Prerequisites
• Is this a practical or theoretical course?
“There is nothing more practical than a good theory.”
(Vapnik, others...)

• Essentially, the whole course is about


reducing real-world problems to mathematical problems

that can be solved efficiently


• Required: Basics in
– Linear Algebra
– Probability theory
– Optimization
1:13

Books
There is no reference book for this lecture. But a basic well-known standard
text book is:

Craig, J.J.: Introduction to robotics: me-


chanics and control. Addison-Wesley New
York, 1989. (3rd edition 2006)

1:14

Books
An advanced text book on planning is this:
Introduction to Robotics, Marc Toussaint 11

Steven M. LaValle: Planning Algorithms.


Cambridge University Press, 2006.
online: http://planning.cs.uiuc.edu/

1:15

Online resources
• VideoLecture by Oussama Khatib: http://academicearth.org/courses/introduction-to-robo
http://www.virtualprofessors.com/introduction-to-robotics-stanford-cs223a-khat
(focus on kinematics, dynamics, control)
• Oliver Brock’s lecture http://courses.robotics.tu-berlin.de/mediawiki/index.php/
Robotics:_Schedule_WT09
• Stefan Schaal’s lecture Introduction to Robotics: http://www-clmc.usc.edu/Teaching/TeachingIn
(focus on control, useful: Basic Linear Control Theory (analytic solution to simple dynamic model
→ PID), chapter on dynamics)
• Chris Atkeson’s “Kinematics, Dynamic Systems, and Control” http://www.cs.cmu.edu/˜cga/
kdc/
(uses Schaal’s slides and LaValle’s book, useful: slides on 3d kinematics http://www.cs.cmu.
edu/˜cga/kdc/ewhitman1.pptx)
• CMU lecture “introduction to robotics” http://www.cs.cmu.edu/afs/cs.cmu.edu/academic/
class/16311/www/current/syllabus.html
(useful: PID control, simple BUGs algorithms for motion planning, non-holonomic constraints)

• Handbook of Robotics (partially online at Google books) http://tiny.cc/u6tzl


• LaValle’s Planning Algorithms http://planning.cs.uiuc.edu/
1:16

Organization
• Course Webpage:
http://ipvs.informatik.uni-stuttgart.de/mlr/marc/teaching/
– Slides, Exercises & Software (C++)
– Links to books and other resources
• Admin things, please first ask:
Carola Stahl, [email protected], Raum 2.217
• Tutorials: Wednesdays, 15:45-17:15 & 17:30-19:00 (0.108)
• Rules for the tutorials:
12 Introduction to Robotics, Marc Toussaint

– Doing the exercises is crucial!


– At the beginning of each tutorial:
– sign into a list
– mark which exercises you have (successfully) worked on
– Students are randomly selected to present their solutions
– You need 50% of completed exercises to be allowed to the exam
– Please check 2 weeks before the end of the term, if you can take the exam
1:17
Introduction to Robotics, Marc Toussaint 13

2 Kinematics

• Two “types of robotics”:


1) Mobile robotics – is all about localization & mapping
2) Manipulation – is all about interacting with the world
0) Kinematic/Dynamic Motion Control: same as 2) without ever making it to
interaction..

• Typical manipulation robots (and animals) are kinematic trees


Their pose/state is described by all joint angles
2:1

Basic motion generation problem


• Move all joints in a coordinated way so that the endeffector makes a desired
movement

01-kinematics: ./x.exe -mode 2/3/4


2:2

Outline

• Basic 3D geometry and notation

• Kinematics: φ : q 7→ y
• Inverse Kinematics: y ∗ 7→ q ∗ = argminq ||φ(q) − y ∗ ||2C + ||q − q0 ||2W
• Basic motion heuristics: Motion profiles

• Additional things to know


– Many simultaneous task variables
14 Introduction to Robotics, Marc Toussaint

– Singularities, null space,


2:3

2.1 Basic 3D geometry & notation


2:4

Pose (position & orientation)

• A pose is described by a translation p ∈ R3 and a rotation R ∈ SO(3)


– R is an orthonormal matrix (orthogonal vectors stay orthogonal, unit vectors stay
unit)
– R-1 = R>
– columns and rows are orthogonal unit vectors
– det(R) = 1
R11 R12 R13 
 

– R=

 R21 R22 R23 

 
R31 R32 R33
2:5

Frame and coordinate transforms

• Let (o, e1:3 ) be the world frame, (o0 , e01:3 ) be the body’s frame.
The new basis vectors are the columns in R, that is, e01 = R11 e1 + R21 e2 + R31 e3 ,
etc,
• x = coordinates in world frame (o, e1:3 )
x0 = coordinates in body frame (o0 , e01:3 )
Introduction to Robotics, Marc Toussaint 15

p = coordinates of o0 in world frame (o, e1:3 )

x = p + Rx0

2:6

Briefly: Alternative Rotation Representations

(See the “geometry notes” for more details!)


2:7

Euler angles
• Describe rotation by consecutive rotation about different axis:
– 3-1-3 or 3-1-2 conventions, yaw-pitch-roll (3-2-1) in air flight
– first rotate φ about e3 , then θ about the new e01 , then ψ about the new e003
• Gimbal Lock

• Euler angles have severe problem:


– if two axes align: blocks 1 DoF of rotation!!
– “singularity” of Euler angles
– Example: 3-1-3 and second rotation 0 or π
2:8

Rotation vector
• vector w ∈ R3
– length |w| = θ is rotation angle (in radians)
– direction of w = rotation axis (w = w/θ)
• Application on a vector v (Rodrigues’ formula):

w · v = cos θ v + sin θ (w × v) + (1 − cos θ) w(w>v)

• Conversion to matrix:

R(w) = exp(ŵ)
= cos θ I + sin θ ŵ/θ + (1 − cos θ) ww>/θ2
16 Introduction to Robotics, Marc Toussaint

0 −w3 w2 
 

ŵ := w3


 0 −w1 


−w2
 
w1 0
(ŵ is called skew matrix, with property ŵv = w × v; exp(·) is called exponential
map)
• Composition: convert to matrix first
• Drawback: singularity for small rotations
2:9

Quaternion
• A quaternion is r ∈ R4 with unit length |r| = r02 + r12 + r22 + r32 = 1
 

r= r0 
 , r0 = cos(θ/2) , r̄ = sin(θ/2) w

where w is the unit length rotation axis


• Conversion to matrix
1 − r22 − r33 r12 − r03 r13 + r02 
 

R(r) =  r12 + r03 1 − r11 − r33 r23 − r01 



 
r13 − r02 r23 + r01 1 − r11 − r22
1√
rij = 2ri rj , r0 = 1 + trR
2
r3 = (R21 − R12 )/(4r0 ) , r2 = (R13 − R31 )/(4r0 ) , r1 = (R32 − R23 )/(4r0 )
• Composition
r0 r00 − r̄>r̄0
 

r ◦ r0 = 
r̄0+ r00 r̄ + r̄0 × r̄

r0
• Application to vector v: convert to matrix first
• Benefits: fast composition. No sin/cos computations. Use this!
2:10

Homogeneous transformations
• xA = coordinates of a point in frame A
xB = coordinates of a point in frame B

• Translation and rotation: xA = t + RxB

• Homogeneous transform T ∈ R4×4 :


 

TA→B =  R t
0 1
 

B B
     

xA = TA→B xB =  R t x   Rx
 = 
+ t
0 1 1 1
   

in homogeneous coordinates, we append a 1 to all coordinate vectors


2:11
Introduction to Robotics, Marc Toussaint 17

Is TA→B forward or backward?


• TA→B describes the translation and rotation of frame B relative to A
That is, it describes the forward FRAME transformation (from A to B)

• TA→B describes the coordinate transformation from xB to xA


That is, it describes the backward COORDINATE transformation

• Confused? Vectors (and frames) transform covariant, coordinates contra-variant.


See “geometry notes” or Wikipedia for more details, if you like.
2:12

Composition of transforms

TW →C = TW →A TA→B TB→C
xW = TW →A TA→B TB→C xC

2:13

2.2 Kinematics
2:14

Kinematics
18 Introduction to Robotics, Marc Toussaint

B eff

joint B'
transf.
A
relative
C'

A'
C eff.
offset

link
transf.
W

• A kinematic structure is a graph (usually tree or chain)


of rigid links and joints

TW →eff (q) = TW →A TA→A0 (q) TA0 →B TB→B 0 (q) TB 0 →C TC→C 0 (q) TC 0 →eff

2:15

Joint types
• Joint transformations: TA→A0 (q) depends on q ∈ Rn

revolute joint: joint angle q ∈ R determines rotation about x-axis:

1 0 0 0
 

TA→A0 (q) = 0 cos(q)




 − sin(q) 0

0 sin(q) cos(q) 0 
 

 
 
0 0 0 1

prismatic joint: offset q ∈ R determines translation along x-axis:

1 0 0 q
 

TA→A0 (q) = 0 1 0 0

 
 
0 0 1 0
 

 
 
0 0 0 1

others: screw (1dof), cylindrical (2dof), spherical (3dof), universal (2dof)


2:16
Introduction to Robotics, Marc Toussaint 19

2:17

Kinematic Map

• For any joint angle vector q ∈ Rn we can compute TW →eff (q)


by forward chaining of transformations

TW →eff (q) gives us the pose of the endeffector in the world frame

• In general, a kinematic map is any (differentiable) mapping


φ : q 7→ y

that maps to some arbitrary feature y ∈ Rd of the pose q ∈ Rn


2:18

Kinematic Map
• The three most important examples for a kinematic map φ are
– A position v on the endeffector transformed to world coordinates:

φpos
eff,v (q) = TW →eff (q) v ∈ R3

– A direction v ∈ R3 attached to the endeffector in world coordinates:

φvec
eff,v (q) = RW →eff (q) v ∈ R3

Where RA→B is the rotation in TA→B .


20 Introduction to Robotics, Marc Toussaint

– The (quaternion) orientation q ∈ R4 of the endeffector:

φquat
eff (q) = RW →eff (q) ∈ R4

• See the technical reference later for more kinematic maps, especially relative
position, direction and quaternion maps.
2:19

Jacobian
• When we change the joint angles, δq, how does the effector position change,
δy?


• Given the kinematic map y = φ(q) and its Jacobian J(q) = ∂q φ(q), we have:

δy = J(q) δq


∂φ1 (q) ∂φ1 (q) ∂φ1 (q) 

 ∂q1 ∂q2 ... ∂qn  
∂φ2 (q) ∂φ2 (q) ∂φ2 (q) 
 
∂ ...

 
∂q1 ∂q2 ∂qn 
∈ Rd×n

J(q) = φ(q) = 

.. ..


∂q
 
 


 . . 


∂φd (q) ∂φd (q) ∂φd (q)
 
...
 
∂q1 ∂q2 ∂qn

2:20

Jacobian for a rotational degree of freedom

eff

axis

point
i-th joint

• Assume the i-th joint is located at pi = tW →i (q) and has rotation axis ai =
1
 
 
RW →i (q)
0
 
 

0
• We consider an infinitesimal variation δqi ∈ R of the ith joint and see how an
endeffector position peff = φpos vec
eff,v (q) and attached vector aeff = φeff,v (q) change.
2:21
Introduction to Robotics, Marc Toussaint 21

Jacobian for a rotational degree of freedom

eff
Consider a variation δqi
axis
→ the whole sub-tree rotates
point
i-th joint
δpeff = [ai × (peff − pi )] δqi
δaeff = [ai × aeff ] δqi

⇒ Position Jacobian:
 
[an × (peff − pn )]
[a1 × (peff − p1 )]
[a2 × (peff − p2 )]

⇒ Vector Jacobian:
 
 
   

[an × aeff ]
[a1 × aeff ]

[a2 × aeff ]
 
 
   
   
pos vec
 
3×n
 
Jeff,v (q) = ∈R Jeff,v (q) = ∈ R3×n
   

..
.
   
   
   
   
   
 
..
.

 
 
 
 
 
 

2:22

Jacobian for general degrees of freedom


• Every degree of freedom qi generates (infinitesimally, at a given q)
– a rotation around axis ai at point pi
– and/or a translation along the axis bi
For instance:
– the DOF of a hinge joint just creates a rotation around ai at pi
– the DOF of a prismatic joint creates a translation along bi
– the DOF of a rolling cylinder creates rotation and translation
– the first DOF of a cylindrical joint generates a translation, its second DOF a transla-
tion

• We can compute all Jacobians from knowing ai , pi and bi for all DOFs (in the
current configuration q ∈ Rn )
2:23

2.3 Inverse Kinematics


2:24

Inverse Kinematics problem


• Generally, the aim is to find a robot configuration q such that φ(q) = y ∗
22 Introduction to Robotics, Marc Toussaint

• Iff φ is invertible
q ∗ = φ-1 (y ∗ )

• But in general, φ will not be invertible:

1) The pre-image φ-1 (y ∗ ) = may be empty: No configuration can generate the


desired y ∗

2) The pre-image φ-1 (y ∗ ) may be large: many configurations can generate the
desired y ∗
2:25

Inverse Kinematics as optimization problem


• We formalize the inverse kinematics problem as an optimization problem

q ∗ = argmin ||φ(q) − y ∗ ||2C + ||q − q0 ||2W


q

• The 1st term ensures that we find a configuration even if y ∗ is not exactly reach-
able
The 2nd term disambiguates the configurations if there are many φ-1 (y ∗ )

2:26

Inverse Kinematics as optimization problem

q ∗ = argmin ||φ(q) − y ∗ ||2C + ||q − q0 ||2W


q
Introduction to Robotics, Marc Toussaint 23

• The formulation of IK as an optimization problem is very powerful and has


many nice properties
• We will be able to take the limit C → ∞, enforcing exact φ(q) = y ∗ if possible
• Non-zero C -1 and W corresponds to a regularization that ensures numeric sta-
bility
• Classical concepts can be derived as special cases:
– Null-space motion
– regularization; singularity robutness
– multiple tasks
– hierarchical tasks
2:27

Solving Inverse Kinematics


• The obvious choice of optimization method for this problem is Gauss-Newton,
using the Jacobian of φ
• We first describe just one step of this, which leads to the classical equations for
inverse kinematics using the local Jacobian...
2:28

Solution using the local linearization


• When using the local linearization of φ at q0 ,

φ(q) ≈ y0 + J (q − q0 ) , y0 = φ(q0 )

• We can derive the optimum as

f (q) = ||φ(q) − y ∗ ||2C + ||q − q0 ||2W


= ||y0 − y ∗ + J (q − q0 )||2C + ||q − q0 ||2W

f (q) = 0> = 2(y0 − y ∗ + J (q − q0 ))>CJ + 2(q − q0 )T W
∂q
J>C (y ∗ − y0 ) = (J>CJ + W ) (q − q0 )

q ∗ = q0 + J ] (y ∗ − y0 )

with J ] = (J>CJ + W )-1 J>C = W -1 J>(JW -1 J> + C -1 )-1 (Woodbury identity)

– For C → ∞ and W = I, J ] = J>(JJ>)-1 is called pseudo-inverse


– W generalizes the metric in q-space
– C regularizes this pseudo-inverse (see later section on singularities)
2:29
24 Introduction to Robotics, Marc Toussaint

“Small step” application


• This approximate solution to IK makes sense
– if the local linearization of φ at q0 is “good”
– if q0 and q ∗ are close
• This equation is therefore typically used to iteratively compute small steps in
configuration space

qt+1 = qt + J ] (yt+1 − φ(qt ))

where the target yt+1 moves smoothly with t
2:30

Example: Iterating IK to follow a trajectory


• Assume initial posture q0 . We want to reach a desired endeff position y ∗ in T
steps:

Input: initial state q0 , desired y ∗ , methods φpos and J pos


Output: trajectory q0:T
1: Set y0 = φpos (q0 ) // starting endeff position
2: for t = 1 : T do
3: y ← φpos (qt-1 ) // current endeff position
4: J ← J pos (qt-1 ) // current endeff Jacobian
5: ŷ ← y0 + (t/T )(y ∗ − y0 ) // interpolated endeff target
6: qt = qt-1 + J ] (ŷ − y) // new joint positions
7: Command qt to all robot motors and compute all TW →i (qt )
8: end for

01-kinematics: ./x.exe -mode 2/3

• Why does this not follow the interpolated trajectory ŷ0:T exactly?
– What happens if T = 1 and y ∗ is far?
2:31

Two additional notes


• What if we linearize at some arbitrary q 0 instead of q0 ?
φ(q) ≈ y 0 + J (q − q 0 ) , y 0 = φ(q 0 )
q ∗ = argmin ||φ(q) − y ∗ ||2C + ||q − q 0 + (q 0 − q0 )||2W
q

= q 0 + J ] (y ∗ − y 0 ) + (I − J ] J) h , h = q0 − q 0 (1)

• What if we want to find the exact (local) optimum? E.g. what if we want to
compute a big step (where q ∗ will be remote from q) and we cannot not rely
only on the local linearization approximation?
Introduction to Robotics, Marc Toussaint 25

– Iterate equation (1) (optionally with a step size < 1 to ensure convergence) by set-
ting the point y 0 of linearization to the current q ∗
– This is equivalent to the Gauss-Newton algorithm
2:32

Where are we?


• We’ve derived a basic motion generation principle in robotics from
– an understanding of robot geometry & kinematics
– a basic notion of optimality

• In the remainder:
A. Discussion of classical concepts
B. Heuristic motion profiles for simple trajectory generation
C. Extension to multiple task variables
2:33

Discussion of classical concepts

– Singularity and singularity-robustness


– Nullspace, task/operational space, joint space
– “inverse kinematics” ↔ “motion rate control”
2:34

Singularity
• In general: A matrix J singular ⇐⇒ rank(J) < d
– rows of J are linearly dependent
– dimension of image is < d
– δy = Jδq ⇒ dimensions of δy limited
– Intuition: arm fully stretched

• Implications:
det(JJ>) = 0
→ pseudo-inverse J>(JJ>)-1 is ill-defined!
→ inverse kinematics δq = J>(JJ>)-1 δy computes “infinite” steps!

• Singularity robust pseudo inverse J>(JJ> + I)-1


The term I is called regularization
• Recall our general solution (for W = I)
J ] = J>(JJ> + C -1 )-1
26 Introduction to Robotics, Marc Toussaint

is already singularity robust


2:35

Null/task/operational/joint/configuration spaces
• The space of all q ∈ Rn is called joint/configuration space
The space of all y ∈ Rd is called task/operational space
Usually d < n, which is called redundancy

• For a desired endeffector state y ∗ there exists a whole manifold (assuming φ is


smooth) of joint configurations q:

nullspace(y ∗ ) = {q | φ(q) = y ∗ }

• We have

δq = argmin ||q − a||2W + ||Jq − δy||2C


q

= J δy + (I − J # J)a ,
#
J # = W -1 J>(JW -1 J> + C -1 )-1

In the limit C → ∞ it is guaranteed that Jδq = δy (we are exacty on the mani-
fold). The term a introduces additional “nullspace motion”.
2:36

Inverse Kinematics and Motion Rate Control


Some clarification of concepts:

• The notion “kinematics” describes the mapping φ : q 7→ y, which usually is a


many-to-one function.
• The notion “inverse kinematics” in the strict sense describes some mapping
g : y 7→ q such that φ(g(y)) = y, which usually is non-unique or ill-defined.
• In practice, one often refers to δq = J ] δy as inverse kinematics.

• When iterating δq = J ] δy in a control cycle with time step τ (typically τ ≈ 1−10


msec), then ẏ = δy/τ and q̇ = δq/τ and q̇ = J ] ẏ. Therefore the control cycle
effectively controls the endeffector velocity—this is why it is called motion rate
control.
2:37

Heuristic motion profiles


2:38
Introduction to Robotics, Marc Toussaint 27

Heuristic motion profiles


• Assume initially x = 0, ẋ = 0. After 1 second you want x = 1, ẋ = 0.
How do you move from x = 0 to x = 1 in one second?

The sine profile xt = x0 + 12 [1 − cos(πt/T )](xT − x0 ) is a compromise for low


max-acceleration and max-velocity
Taken from http://www.20sim.com/webhelp/toolboxes/mechatronics_toolbox/motion_
profile_wizard/motionprofiles.htm
2:39

Motion profiles
• Generally, let’s define a motion profile as a mapping

MP : [0, 1] 7→ [0, 1]

with MP(0) = 0 and MP(1) = 1 such that the interpolation is given as

xt = x0 + MP(t/T ) (xT − x0 )

• For example

MPramp (s) = s
1
MPsin (s) = [1 − cos(πs)]
2
2:40

Joint space interpolation


28 Introduction to Robotics, Marc Toussaint

1) Optimize a desired final configuration qT :


Given a desired final task value yT , optimize a final joint state qT to minimize the func-
tion
f (qT ) = ||qT − q0 ||2W/T + ||yT − φ(qT )||2C

– The metric T1 W is consistent with T cost terms with step metric W .


– In this optimization, qT will end up remote from q0 . So we need to iterate Gauss-
Newton, as described on slide 2.3.

2) Compute q0:T as interpolation between q0 and qT :


Given the initial configuration q0 and the final qT , interpolate on a straight line with a
some motion profile. E.g.,

qt = q0 + MP(t/T ) (qT − q0 )

2:41

Task space interpolation

1) Compute y0:T as interpolation between y0 and yT :


Given a initial task value y0 and a desired final task value yT , interpolate on a straight
line with a some motion profile. E.g,

yt = y0 + MP(t/T ) (yT − y0 )

2) Project y0:T to q0:T using inverse kinematics:


Given the task trajectory y0:T , compute a corresponding joint trajectory q0:T using in-
verse kinematics
qt+1 = qt + J ] (yt+1 − φ(qt ))
(As steps are small, we should be ok with just using this local linearization.)
2:42

peg-in-a-hole demo
2:43

2.4 Multiple tasks


2:44
Introduction to Robotics, Marc Toussaint 29

Multiple tasks

2:45

Multiple tasks

RightHand
position
LeftHand
position

2:46

Multiple tasks
• Assume we have m simultaneous tasks; for each task i we have:
– a kinematic map φi : Rn → Rdi
– a current value φi (qt )
30 Introduction to Robotics, Marc Toussaint

– a desired value yi∗


– a precision %i (equiv. to a task cost metric Ci = %i I)

• Each task contributes a term to the objective function

q ∗ = argmin ||q − q0 ||2W + %1 ||φ1 (q) − y1∗ ||2 + %2 ||φ2 (q) − y2∗ ||2 + · · ·
q

which we can also write as

q ∗ = argmin ||q − q0 ||2W + ||Φ(q)||2


q
√ ∗ 
√%1 (φ1 (q) − y1∗ ) 


%2 (φ2 (q) − y2 ) 
  P
di
where Φ(q) := ∈R




 i

 .. 

.
 

2:47

Multiple tasks
• We can “pack” together all tasks in one “big task” Φ.

Example: We want to control the 3D position of the left hand and of the right hand.
Both are “packed” to one 6-dimensional task vector which becomes zero if both tasks
are fulfilled.

• The big Φ is scaled/normalized in a way that


– the desired value is always zero
– the cost metric is I

∂Φ(q0 )
• Using the local linearization of Φ at q0 , J = ∂q , the optimum is

q ∗ = argmin ||q − q0 ||2W + ||Φ(q)||2


q

≈ q0 − (J>J + W )-1 J> Φ(q0 ) = q0 − J # Φ(q0 )

2:48

Multiple tasks
Introduction to Robotics, Marc Toussaint 31

• We learnt how to “puppeteer a robot”


• We can handle many task variables (but
specifying their precisions %i becomes
cumbersome...)

• In the remainder:
RightHand A. Classical limit of “hierarchical IK” and
position
LeftHand nullspace motion
position B. What are interesting task variables?

2:49

Hierarchical IK & nullspace motion


• In the classical view, tasks should be executed exactly, which means taking the limit
%i → ∞ in some prespecified hierarchical order.
• We can rewrite the solution in a way that allows for such a hierarchical limit:
• One task plus “nullspace motion”:

f (q) = ||q − a||2W + %1 ||J1 q − y1 ||2


q ∗ = [W + %1 J> -1 >
1 J1 ] [W a + %1 J1 y1 ]

= J1# y1 + (I − J1# J1 )a
J1# = (W/%1 + J> -1 > -1 > -1 >
1 J1 ) J1 = W J1 (J1 W J1 + I/%1 )
-1

• Two tasks plus nullspace motion:

f (q) = ||q − a||2W + %1 ||J1 q − y1 ||2 + %2 ||J2 q − y2 ||2


q ∗ = J1# y1 + (I − J1# J1 )[J2# y2 + (I − J2# J2 )a]
J2# = (W/%2 + J> -1 > -1 > -1 >
2 J2 ) J2 = W J2 (J2 W J2 + I/%2 )
-1

• etc...
2:50

Hierarchical IK & nullspace motion


• The previous slide did nothing but rewrite the nice solution q ∗ = −J # Φ(q0 )
(for the “big” Φ) in a strange hierarchical way that allows to “see” nullspace
projection

• The benefit of this hierarchical way to write the solution is that one can take
the hierarchical limit %i → ∞ and retrieve classical hierarchical IK
32 Introduction to Robotics, Marc Toussaint

• The drawbacks are:


– It is somewhat ugly
– In practise, I would recommend regularization in any case (for numeric stability).
Regularization corresponds to NOT taking the full limit %i → ∞. Then the hier-
archical way to write the solution is unnecessary. (However, it points to a “hier-
archical regularization”, which might be numerically more robust for very small
regularization?)
– The general solution allows for arbitrary blending of tasks
2:51

Reference: interesting task variables

The following slides will define 10 different types of task variables.


This is meant as a reference and to give an idea of possibilities...
2:52

Position
Position of some point attached to link i
dimension d=3
parameters link index i, point offset v
kin. map φpos
iv (q) = TW →i v
pos
Jacobian Jiv (q)·k = [k ≺ i] ak × (φpos
iv (q) − pk )

Notation:
– ak , pk are axis and position of joint k
– [k ≺ i] indicates whether joint k is between root and link i
– J·k is the kth column of J
2:53

Vector
Vector attached to link i
dimension d=3
parameters link index i, attached vector v
kin. map φvec
iv (q) = RW →i v
vec
Jacobian Jiv (q) = Ai × φvec
iv (q)

Notation:
– Ai is a matrix with columns (Ai )·k = [k ≺ i] ak containing the joint axes or zeros
– the short notation “A × p” means that each column in A takes the cross-product with
p.
2:54
Introduction to Robotics, Marc Toussaint 33

Relative position
Position of a point on link i relative to point on link j
dimension d=3
parameters link indices i, j, point offset v in i and w in j
kin. map φpos -1 pos pos
iv|jw (q) = Rj (φiv − φjw )
pos pos pos
Jacobian Jiv|jw (q) = Rj-1 [Jiv − Jjw − Aj × (φpos pos
iv − φjw )]

Derivation:
For y = Rp the derivative w.r.t. a rotation around axis a is y 0 = Rp0 + R0 p = Rp0 + a × Rp.
For y = R-1 p the derivative is y 0 = R-1 p0 − R-1 (R0 )R-1 p = R-1 (p0 − a × p). (For details see
http://ipvs.informatik.uni-stuttgart.de/mlr/marc/notes/3d-geometry.pdf)

2:55

Relative vector
Vector attached to link i relative to link j
dimension d=3
parameters link indices i, j, attached vector v in i
-1 vec
kin. map φvec
iv|j (q) = Rj φiv
Jacobian vec
Jiv|j (q) = Rj-1 [Jiv
vec
− Aj × φvec
iv ]
2:56

Alignment
Alignment of a vector attached to link i with a reference v ∗
dimension d=1
parameters link index i, attached vector v, world reference v ∗
align
kin. map φiv (q) = v ∗> φvec
iv
align
Jacobian Jiv (q) = v ∗> Jivvec

Note: φalign = 1 ↔ align φalign = −1 ↔ anti-align φalign = 0 ↔ orthog.


2:57

Relative Alignment
Alignment a vector attached to link i with vector attached to j
dimension d=1
parameters link indices i, j, attached vectors v, w
align > vec
kin. map φiv|jw (q) = (φvec
jw ) φiv
align > vec vec> vec
Jacobian Jiv|jw (q) = (φvec
jw ) Jiv + φiv Jjw
2:58
34 Introduction to Robotics, Marc Toussaint

Joint limits
Penetration of joint limits
dimension d=1
parameters joint limits qlowP, qhi , margin m
1 n + +
kin. map φlimits (q) = m i=1 [m − qi + qlow ] + [m + qi − qhi ]
1 1
Jacobian Jlimits (q)1,i = − m [m − qi + qlow > 0] + m [m + qi − qhi > 0]

[x]+ = x > 0?x : 0 [· · · ]: indicator function

2:59

Collision limits
Penetration of collision limits
dimension d=1
parameters margin m
1
PK a b +
kin. map φcol (q) = m k=1 [m − |pk − pk |]
1
P K a b
Jacobian Jcol (q) = m k=1 [m − |pk − pk | > 0]
pa −pb
(−Jppos pos > k
a + J b )
p
k
|pa −pb |
k k k k

A collision detection engine returns a set {(a, b, pa , pb )K


k=1 } of potential collisions be-
tween link ak and bk , with nearest points pak on a and pbk on b.
2:60

Center of gravity
Center of gravity of the whole kinematic structure
dimension d=3
parameters (none)
φcog (q) = i massi φpos
P
kin. map ici
pos
J cog (q) = i massi Jic
P
Jacobian i

ci denotes the center-of-mass of link i (in its own frame)


2:61

Homing
Introduction to Robotics, Marc Toussaint 35

The joint angles themselves


dimension d=n
parameters (none)
kin. map φqitself (q) = q
Jacobian Jqitself (q) = In

Example: Set the target y ∗ = 0 and the precision % very low → this task describes
posture comfortness in terms of deviation from the joints’ zero position. In the classical
view, it induces “nullspace motion”.
2:62

Task variables – conclusions

• There is much space for creativity in defining


task variables! Many are extensions of φpos
and φvec and the Jacobians combine the basic
Jacobians.

• What the right task variables are to de-


sign/describe motion is a very hard prob-
lem! In what task space do humans control
their motion? Possible to learn from data
(“task space retrieval”) or perhaps via Rein-
nearest
distance
forcement Learning.
LeftHand
position
• In practice: Robot motion design (includ-
ing grasping) may require cumbersome hand-
tuning of such task variables.
2:63

• Technical Reference: The four rotation axes of a quaternion joint:

A quaternion joint has four DOFs. If it is currently in configuration q ∈ R4 , the ith DOFs generates
(infinitesimally) a rotation around the axis
−2
ai = p [ei ◦ q -1 ]1:3
q2
where ei ∈ R4 is the ith unit vector, ◦ is the concatenation of quaternions, q -1 the inverse quater-
nion, q 2 the quaternion 2-norm (in case it is not normalized), and [·]1:3 pics the vector elements
of the quaternion (derivation: see geometry notes). As for the hinge joint, these four axes are fur-
ther transformed to world coordinates, ai ← RW →j ai , if the quaternion joint is located in the
coordinate frame j.
2:64
36 Introduction to Robotics, Marc Toussaint

3 Dynamics

Kinematic Dynamic

instantly change joint velocities q̇: instantly change joint torques u:


! ] ∗ !
δqt = J (y − φ(qt )) u=?

accounts for kinematic coupling of accounts for dynamic coupling of


joints but ignores inertia, forces, joints and full Newtonian physics
torques
gears, stiff, all of industrial robots future robots, compliant, few re-
search robots

3:1

When velocities cannot be changed/set arbitrarily


• Examples:
– An air plane flying: You cannot command it to hold still in the air, or to move
straight up.
– A car: you cannot command it to move side-wards.
– Your arm: you cannot command it to throw a ball with arbitrary speed (force limits).
– A torque controlled robot: You cannot command it to instantly change velocity (infi-
nite acceleration/torque).

• What all examples have in comment:


– One can set controls ut (air plane’s control stick, car’s steering wheel, your muscles
activations, torque/voltage/current send to a robot’s motors)

– But these controls only indirectly influence the dynamics of state

xt+1 = f (xt , ut )
3:2
Introduction to Robotics, Marc Toussaint 37

Dynamics
• The dynamics of a system describes how the controls ut influence the change-
of-state of the system
xt+1 = f (xt , ut )

– The notation xt refers to the dynamic state of the system: e.g., joint positions and
velocities xt = (qt , q̇t ).
– f is an arbitrary function, often smooth
3:3

Outline
• We start by discussing a 1D point mass for 3 reasons:
– The most basic force-controlled system with inertia
– We can introduce and understand PID control
– The behavior of a point mass under PID control is a reference that we can also follow
with arbitrary dynamic robots (if the dynamics are known)

• We discuss computing the dynamics of general robotic systems


– Euler-Lagrange equations
– Euler-Newton method
• We derive the dynamic equivalent of inverse kinematics:
– operational space control
3:4

3.1 PID and a 1D point mass


3:5

The dynamics of a 1D point mass


• Start with simplest possible example: 1D point mass
(no gravity, no friction, just a single mass)

• The state x(t) = (q(t), q̇(t)) is described by:


– position q(t) ∈ R
– velocity q̇(t) ∈ R
38 Introduction to Robotics, Marc Toussaint

• The controls u(t) is the force we apply on the mass point

• The system dynamics is:

q̈(t) = u(t)/m

3:6

1D point mass – proportional feedback


• Assume current position is q.
The goal is to move it to the position q ∗ .

What can we do?

• Idea 1:
“Always pull the mass towards the goal q ∗ :”

u = Kp (q ∗ − q)

3:7

1D point mass – proportional feedback


• What’s the effect of this control law?

m q̈ = u = Kp (q ∗ − q)

q = q(t) is a function of time, this is a second order differential equation


• Solution: assume q(t) = a + beωt
(a “non-imaginary” alternative would be q(t) = a + b −λt cos(ωt))

m b ω 2 eωt = Kp q ∗ − Kp a − Kp b eωt
(m b ω 2 + Kp b) eωt = Kp (q ∗ − a)
⇒ (m b ω 2 + Kp b) = 0 ∧ (q ∗ − a) = 0
q
⇒ ω = i Kp /m

q(t) = q ∗ + b ei Kp /m t

This is an oscillation around q ∗ with amplitude b = q(0) − q ∗ and frequency


p
Kp /m!
3:8
Introduction to Robotics, Marc Toussaint 39

1D point mass – proportional feedback

m q̈ = u = Kp (q ∗ − q)

q(t) = q ∗ + b ei Kp /m t

Oscillation around q ∗ with amplitude b = q(0) − q ∗ and frequency


p
Kp /m

1 cos(x)

0.5

-0.5

-1
0 2 4 6 8 10 12 14

3:9

1D point mass – derivative feedback


• Idea 2
“Pull less, when we’re heading the right direction already:”
“Damp the system:”

u = Kp (q ∗ − q) + Kd (q̇ ∗ − q̇)

q̇ ∗ is a desired goal velocity


For simplicity we set q̇ ∗ = 0 in the following.

3:10

1D point mass – derivative feedback


• What’s the effect of this control law?

mq̈ = u = Kp (q ∗ − q) + Kd (0 − q̇)

• Solution: again assume q(t) = a + beωt

m b ω 2 eωt = Kp q ∗ − Kp a − Kp b eωt − Kd b ωeωt


(m b ω 2 + Kd b ω + Kp b) eωt = Kp (q ∗ − a)
40 Introduction to Robotics, Marc Toussaint

⇒ (m ω 2 + Kd ω + Kp ) = 0 ∧ (q ∗ − a) = 0
p
−Kd ± Kd2 − 4mKp
⇒ω=
2m
q(t) = q ∗ + b eω t
Kd
The term − 2m in ω is real ↔ exponential decay (damping)
3:11

1D point mass – derivative feedback


p
∗ ωt −Kd ± Kd2 − 4mKp
q(t) = q + b e , ω=
2m
p
• Effect of the second term Kd2 − 4mKp /2m in ω:

Kd2 < 4mKp ⇒ ω has imaginary part p


2 2
√ Kp /m − Kd /4m
oscillating with frequency
Kp /m−Kd2 /4m2 t
q(t) = q ∗ + be−Kd /2m t ei
Kd2 > 4mKp ⇒ ω real
strongly damped
Kd2 = 4mKp ⇒ second term zero
only exponential decay
3:12

1D point mass – derivative feedback

illustration from O. Brock’s lecture


3:13
Introduction to Robotics, Marc Toussaint 41

1D point mass – derivative feedback


Alternative parameterization:
Instead of the gains Kp and Kd it is sometimes more intuitive to set the

1 1
• wave length λ = ω0 =√ , Kp = m/λ2
Kp /m

• damping ratio ξ = √ Kd = λKd


2m , Kd = 2mξ/λ
4mKp

ξ > 1: over-damped
ξ = 1: critically dampled
ξ < 1: oscillatory-damped
√ 2
q(t) = q ∗ + be−ξ t/λ ei 1−ξ t/λ
3:14

1D point mass – integral feedback


• Idea 3
“Pull if the position error accumulated large in the past:”

Z t
∗ ∗
u = Kp (q − q) + Kd (q̇ − q̇) + Ki (q ∗ (s) − q(s)) ds
s=0

• This is not a linear ODE w.r.t. x = (q, q̇).


However, when we extend the state to x = (q, q̇, e) we have the ODE
q̇ = q̇
q̈ = u/m = Kp /m(q ∗ − q) + Kd /m(q̇ ∗ − q̇) + Ki /m e
ė = q ∗ − q

(no explicit discussion here)


3:15

1D point mass – PID control

Z t
u = Kp (q ∗ − q) + Kd (q̇ ∗ − q̇) + Ki (q ∗ − q(s)) ds
s=0

• PID control
– Proportional Control (“Position Control”)
42 Introduction to Robotics, Marc Toussaint

u ∝ Kp (q ∗ − q)

– Derivative Control (“Damping”)


u ∝ Kd (q̇ ∗ − q̇) (ẋ∗ = 0 → damping)

– Integral Control (“Steady State Error”)


Rt
u ∝ Ki s=0 (q ∗ (s) − q(s)) ds
3:16

Controlling a 1D point mass – lessons learnt


• Proportional and derivative feedback (PD control) are like adding a spring and
damper to the point mass

• PD control is a linear control law

(q, q̇) 7→ u = Kp (q ∗ − q) + Kd (q̇ ∗ − q̇)


(linear in the dynamic system state x = (q, q̇))

• With such linear control laws we can design approach trajectories (by tuning
the gains)
– but no optimality principle behind such motions
3:17

3.2 Dynamics of mechanical systems


3:18

Two ways to derive dynamics equations for mechanical systems


• The Euler-Lagrange equation, L = L(t, q(t), q̇(t)),
d ∂L ∂L
− =u
dt ∂ q̇ ∂q
Used when you want to derive analytic equations of motion (“on paper”)

• The Newton-Euler recursion (and related algorithms)

fi = mv̇i , ui = Ii ẇ + w × Iw
Algorithms that “propagate” forces through a kinematic tree and numerically compute
the inverse dynamics u = NE(q, q̇, q̈) or forward dynamics q̈ = f (q, q̇, u).
3:19
Introduction to Robotics, Marc Toussaint 43

The Euler-Lagrange equation

d ∂L ∂L
− =u
dt ∂ q̇ ∂q
• L(q, q̇) is called Lagrangian and defined as
L=T −U
where T =kinetic energy and U =potential energy.
• q is called generalized coordinate – any coordinates such that (q, q̇) describes
the state of the system. Joint angles in our case.
• u are external forces
3:20

Example: A pendulum

• Generalized coordinates: angle q = (θ)


• Kinematics:
– velocity of the mass: v = (lθ̇ cos θ, 0, lθ̇ sin θ)
– angular velocity of the mass: w = (0, −θ̇, 0)
• Energies:
1 1 1
T = mv 2 + w>Iw = (ml2 + I2 )θ̇2 , U = −mgl cos θ
2 2 2
• Euler-Lagrange equation:
d ∂L ∂L
u= −
dt ∂ q̇ ∂q
d
= (ml2 + I2 )θ̇ + mgl sin θ = (ml2 + I2 )θ̈ + mgl sin θ
dt
3:21

The Euler-Lagrange equation


• How is this typically done?
• First, describe the kinematics and Jacobians for every link i:
(q, q̇) 7→ {TW →i (q), vi , wi }
Recall TW →i (q) = TW →A TA→A0 (q) TA0 →B TB→B 0 (q) · · ·
Further, we know that a link’s velocity vi = Ji q̇ can be described via its position Jacobian.
Similarly we can describe the link’s angular velocity wi = Jiw q̇ as linear in q̇.
44 Introduction to Robotics, Marc Toussaint

• Second, formulate the kinetic energy

X1 >
1 X1 >
Mi =  Jwi  mi I3 0  J i 
  

T = mi vi2 + w>
i Ii wi = q̇ Mi q̇ ,
Ii Jiw

2 2 2 Ji 0
i i

where Ii = Ri I¯i R> ¯


i and Ii the inertia tensor in link coordinates
• Third, formulate the potential energies (typically independent of q̇)

U = gmi height(i)

• Fourth, compute the partial derivatives analytically to get something like

d ∂L ∂L ∂T ∂U
u = − M q̈ + Ṁ q̇ −
= |{z} +
|{z} dt ∂ q̇ ∂q ∂q ∂q
control inertia | {z } |{z}
Coriolis gravity

which relates accelerations q̈ to the forces


3:22

Newton-Euler recursion
• An algorithm that computes the inverse dynamics

u = NE(q, q̇, q̈ ∗ )

by recursively computing force balance at each joint:


– Newton’s equation expresses the force acting at the center of mass for an acceler-
ated body:
fi = mv̇i

– Euler’s equation expresses the torque (=control) acting on a rigid body given an
angular velocity and angular acceleration:

ui = Ii ẇ + w × Iw

• Forward recursion: (≈ kinematics)


Compute (angular) velocities (vi , wi ) and accelerations (v̇i , ẇi ) for every link
(via forward propagation; see geometry notes for details)

• Backward recursion:
For the leaf links, we now know the desired accelerations q ∗ and can compute
the necessary joint torques. Recurse backward.
3:23
Introduction to Robotics, Marc Toussaint 45

Numeric algorithms for forward and inverse dynamics


• Newton-Euler recursion: very fast (O(n)) method to compute inverse dynam-
ics
u = NE(q, q̇, q̈ ∗ )
Note that we can use this algorithm to also compute
– gravity terms: u = NE(q, 0, 0) = G(q)
– Coriolis terms: u = NE(q, q̇, 0) = C(q, q̇) q̇ + G(q)
– column of Intertia matrix: u = NE(q, 0, ei ) = M (q) ei

• Articulated-Body-Dynamics: fast method (O(n)) to compute forward dynam-


ics q̈ = f (q, q̇, u)
3:24

Some last practical comments


• [demo]
• Use energy conservation to measure dynamic of physical simulation
• Physical simulation engines (developed for games):
– ODE (Open Dynamics Engine)
– Bullet (originally focussed on collision only)
– Physx (Nvidia)
Differences of these engines to Lagrange, NE or ABD:
– Game engine can model much more: Contacts, tissues, particles, fog, etc
– (The way they model contacts looks ok but is somewhat fictional)
– On kinematic trees, NE or ABD are much more precise than game engines
– Game engines do not provide inverse dynamics, u = NE(q, q̇, q̈)

• Proper modelling of contacts is really really hard


3:25

3.3 Controlling a dynamic robot


3:26

• We previously learnt the effect of PID control on a 1D point mass

• Robots are not a 1D point mass


– Neither is each joint a 1D point mass
– Applying separate PD control in each joint neglects force coupling
(Poor solution: Apply very high gains separately in each joint ↔ make joints stiff,
as with gears.)
46 Introduction to Robotics, Marc Toussaint

• However, knowing the robot dynamics we can transfer our understanding of


PID control of a point mass to general systems
3:27

General robot dynamics


• Let (q, q̇) be the dynamic state and u ∈ Rn the controls (typically joint torques
in each motor) of a robot
• Robot dynamics can generally be written as:

M (q) q̈ + C(q, q̇) q̇ + G(q) = u

M (q) ∈ Rn×n is positive definite intertia matrix


(can be inverted → forward simulation of dynamics)
C(q, q̇) ∈ Rn are the centripetal and coriolis forces
G(q) ∈ Rn are the gravitational forces
u are the joint torques
(cf. to the Euler-Lagrange equation on slide 22)
• We often write more compactly:
M (q) q̈ + F (q, q̇) = u
3:28

Controlling a general robot


• From now on we just assume that we have algorithms to efficiently compute
M (q) and F (q, q̇) for any (q, q̇)
• Inverse dynamics: If we know the desired q̈ ∗ for each joint,

u = M (q) q̈ ∗ + F (q, q̇)

gives the necessary torques


• Forward dynamics: If we know which torques u we apply, use

q̈ ∗ = M (q)-1 (u − F (q, q̇))

to simulate the dynamics of the system (e.g., using Runge-Kutta)


3:29

Following a reference trajectory in joint space


• Where could we get the desired q̈ ∗ from?
ref
Assume we have a nice smooth reference trajectory q0:T (generated with some
motion profile or alike), we can at each t read off the desired acceleration as

1
q̈tref := [(qt+1 − qt )/τ − (qt − qt-1 )/τ ] = (qt-1 + qt+1 − 2qt )/τ 2
τ
Introduction to Robotics, Marc Toussaint 47

However, tiny errors in acceleration will accumulate greatly over time! This is
Instable!!
• Choose a desired acceleration q̈t∗ that implies a PD-like behavior around the refer-
ence trajectory!
q̈t∗ = q̈tref + Kp (qtref − qt ) + Kd (q̇tref − q̇t )
This is a standard and very convenient heuristic to track a reference trajectory when the
robot dynamics are known: All joints will exactly behave like a 1D point particle around the
reference trajectory!
3:30

Following a reference trajectory in task space


• Recall the inverse kinematics problem:
– We know the desired step δy ∗ (or velocity ẏ ∗ ) of the endeffector.
– Which step δq (or velocities q̇) should we make in the joints?

• Equivalent dynamic problem:


– We know how the desired acceleration ÿ ∗ of the endeffector.
– What controls u should we apply?
3:31

Operational space control

• Inverse kinematics:

q ∗ = argmin ||φ(q) − y ∗ ||2C + ||q − q0 ||2W


q

• Operational space control (one might call it “Inverse task space dynamics”):

u∗ = argmin ||φ̈(q) − ÿ ∗ ||2C + ||u||2H


u

3:32

Operational space control

• We can derive the optimum perfectly analogous to inverse kinematics


We identify the minimum of a locally squared potential, using the local linearization
(and approx. J¨ = 0)

d d ˙ ≈ J q̈ + 2J˙q̇ = JM -1 (u − F ) + 2J˙q̇
φ̈(q) = φ̇(q) ≈ (J q̇ + Jq)
dt dt
48 Introduction to Robotics, Marc Toussaint

We get

u∗ = T ] (ÿ ∗ − 2J˙q̇ + T F )
with T = JM -1 , T ] = (T>CT + H)-1 T>C

(C → ∞ ⇒ T ] = H -1 T>(T H -1 T>)-1 )
3:33

Controlling a robot – operational space approach


• Where could we get the desired ÿ ∗ from?
ref
– Reference trajectory y0:T in operational space
– PD-like behavior in each operational space:
ÿt∗ = ÿtref + Kp (ytref − yt ) + Kd (ẏtref − ẏt )

illustration from O. Brock’s lecture

• Operational space control: Let the system behave as if we could directly “apply a 1D
point mass behavior” to the endeffector
3:34

Multiple tasks
• Recall trick last time: we defined a “big kinematic map” Φ(q) such that

q ∗ = argmin ||q − q0 ||2W + ||Φ(q)||2


q

• Works analogously in the dynamic case:

u∗ = argmin ||u||2H + ||Φ(q)||2


u

3:35
Introduction to Robotics, Marc Toussaint 49

What have we learned? What not?


• More theory
– Contacts → Inequality constraints on the dynamics
– Switching dynamics (e.g. for walking)
– Controllling contact forces

• Hardware limits
– I think: the current success stories on highly dynamic robots are all anchored in
novel hardware approaches
3:36
50 Introduction to Robotics, Marc Toussaint

4 Path Planning

Path finding examples

Alpha-Puzzle, solved with James Kuffner’s RRTs


4:1

Path finding examples

J. Kuffner, K. Nishiwaki, S. Kagami, M. Inaba, and H. Inoue. Footstep Planning


Among Obstacles for Biped Robots. Proc. IEEE/RSJ Int. Conf. on Intelligent
Robots and Systems (IROS), 2001.
4:2

Path finding examples


Introduction to Robotics, Marc Toussaint 51

T. Bretl. Motion Planning of Multi-Limbed Robots Subject to Equilibrium Con-


straints: The Free-Climbing Robot Problem. International Journal of Robotics
Research, 25(4):317-342, Apr 2006.
4:3

Path finding examples

S. M. LaValle and J. J. Kuffner. Randomized Kinodynamic Planning. Interna-


tional Journal of Robotics Research, 20(5):378–400, May 2001.
4:4

Feedback control, path finding, trajectory optim.


52 Introduction to Robotics, Marc Toussaint

path finding
trajectory optimization

feedback control
start goal

• Feedback Control: E.g., qt+1 = qt + J ] (y ∗ − φ(qt ))


• Trajectory Optimization: argminq0:T f (q0:T )
• Path Finding: Find some q0:T with only valid configurations
4:5

Outline

• Really briefly: Heuristics & Discretization (slides from Howie CHoset’s CMU
lectures)
– Bugs algorithm
– Potentials to guide feedback control
– Dijkstra

• Sample-based Path Finding


– Probabilistic Roadmaps
– Rapidly Exploring Random Trees

• Non-holonomic systems
4:6

4.1 Background

4:7
Introduction to Robotics, Marc Toussaint 53

A better bug?
“Bug 2” Algorithm
m-line

1) head toward goal on the m-line


2) if an obstacle is in the way,
follow it until you encounter the
m-line again.
3) Leave the obstacle and continue
toward the goal

OK ?

4:8
54 Introduction to Robotics, Marc Toussaint

A better bug?
“Bug 2” Algorithm

1) head toward goal on the m-line


2) if an obstacle is in the way,
Start
follow it until you encounter the
m-line again.
3) Leave the obstacle and continue
toward the goal

Goal Better or worse than Bug1?

4:9
Introduction to Robotics, Marc Toussaint 55

A better bug?
“Bug 2” Algorithm

1) head toward goal on the m-line


2) if an obstacle is in the way,
Start
follow it until you encounter the
m-line again.
3) Leave the obstacle and continue
toward the goal

Goal
NO! How do we fix this?

4:10
56 Introduction to Robotics, Marc Toussaint

A better bug?
“Bug 2” Algorithm

1) head toward goal on the m-line


2) if an obstacle is in the way,
Start
follow it until you encounter the
m-line again closer to the goal.
3) Leave the obstacle and continue
toward the goal

Goal Better or worse than Bug1?

4:11

BUG algorithms – conclusions

• Other variants: TangentBug, VisBug, RoverBug, WedgeBug, . . .


• only 2D! (TangentBug has extension to 3D)
• Guaranteed convergence
• Still active research:
K. Taylor and S.M. LaValle: I-Bug: An Intensity-Based Bug Algorithm

⇒ Useful for minimalistic, robust 2D goal reaching


– not useful for finding paths in joint space
4:12
Introduction to Robotics, Marc Toussaint 57

Start-Goal Algorithm:
Potential Functions

4:13
58 Introduction to Robotics, Marc Toussaint

Repulsive Potential

4:14
Introduction to Robotics, Marc Toussaint 59

Total Potential Function


U (q ) = U att (q ) + U rep (q )

F (q ) = −∇U (q )

+ =

4:15
60 Introduction to Robotics, Marc Toussaint

Local Minimum Problem with the Charge Analogy

4:16

Potential fields – conclusions

• Very simple, therefore popular


• In our framework: Combining a goal (endeffector) task variable, with a con-
straint (collision avoidance) task variable; then using inv. kinematics is exactly
the same as “Potential Fields”

⇒ Does not solve locality problem of feedback control.

4:17
Introduction to Robotics, Marc Toussaint 61

The Wavefront in Action (Part 2)


• Now repeat with the modified cells
– This will be repeated until no 0’s are adjacent to cells
with values >= 2
• 0’s will only remain when regions are unreachable

4:18
62 Introduction to Robotics, Marc Toussaint

The Wavefront in Action (Part 1)


• Starting with the goal, set all adjacent cells with
“0” to the current cell + 1
– 4-Point Connectivity or 8-Point Connectivity?
– Your Choice. We’ll use 8-Point Connectivity in our example

4:19
Introduction to Robotics, Marc Toussaint 63

The Wavefront in Action (Part 2)


• Now repeat with the modified cells
– This will be repeated until no 0’s are adjacent to cells
with values >= 2
• 0’s will only remain when regions are unreachable

4:20
64 Introduction to Robotics, Marc Toussaint

The Wavefront in Action (Part 3)


• Repeat again...

4:21
Introduction to Robotics, Marc Toussaint 65

The Wavefront in Action (Part 4)


• And again...

4:22
66 Introduction to Robotics, Marc Toussaint

The Wavefront in Action (Part 5)


• And again until...

4:23
Introduction to Robotics, Marc Toussaint 67

The Wavefront in Action (Done)


• You’re done
– Remember, 0’s should only remain if unreachable
regions exist

4:24
68 Introduction to Robotics, Marc Toussaint

The Wavefront, Now What?


• To find the shortest path, according to your metric, simply always
move toward a cell with a lower number
– The numbers generated by the Wavefront planner are roughly proportional to their
distance from the goal

Two
possible
shortest
paths
shown

4:25

Dijkstra Algorithm
• Is efficient in discrete domains
– Given start and goal node in an arbitrary graph
– Incrementally label nodes with their distance-from-start

• Produces optimal (shortest) paths

• Applying this to continuous domains requires discretization


– Grid-like discretization in high-dimensions is daunting! (curse of dimensionality)
– What are other ways to “discretize” space more efficiently?
4:26

4.2 Sample-based Path Finding


4:27
Introduction to Robotics, Marc Toussaint 69

Probabilistic Road Maps

[Kavraki, Svetska, Latombe,Overmars, 95]

q ∈ Rn describes configuration
Qfree is the set of configurations without collision

4:28

Probabilistic Road Maps

[Kavraki, Svetska, Latombe,Overmars, 95]


70 Introduction to Robotics, Marc Toussaint

Probabilistic Road Map


• generates a graph G = (V, E) of configurations
– such that configurations along each edges are ∈ Qfree
4:29

Probabilistic Road Maps

Given the graph, use (e.g.) Dijkstra to find path from qstart to qgoal .
4:30

Probabilistic Road Maps – generation

Input: number n of samples, number k number of nearest neighbors


Output: PRM G = (V, E)
1: initialize V = ∅, E = ∅
2: while |V | < n do // find n collision free points qi
3: q ← random sample from Q
4: if q ∈ Qfree then V ← V ∪ {q}
5: end while
6: for all q ∈ V do // check if near points can be connected
7: Nq ← k nearest neighbors of q in V
8: for all q 0 ∈ Nq do
9: if path(q, q 0 ) ∈ Qfree then E ← E ∪ {(q, q 0 )}
10: end for
11: end for

where path(q, q 0 ) is a local planner (easiest: straight line)


4:31
Introduction to Robotics, Marc Toussaint 71

Local Planner

tests collisions up to a specified resolution δ


4:32

Problem: Narrow Passages

The smaller the gap (clearance %) the more unlikely to sample such points.
4:33

PRM theory
(for uniform sampling in d-dim space)
• Let a, b ∈ Qfree and γ a path in Qfree connecting a and b.
72 Introduction to Robotics, Marc Toussaint

Then the probability that P RM found the path after n samples is

2|γ| −σ%d n
P (PRM-success | n) ≥ 1 − e
%

|B1 |
σ= 2d |Qfree |
% = clearance of γ (distance to obstacles)
(roughly: the exponential term are “volume ratios”)

• This result is called probabilistic complete (one can achieve any probability with
high enough n)
• For a given success probability, n needs to be exponential in d
4:34

Other PRM sampling strategies

illustration from O. Brock’s lecture

Gaussian: q1 ∼ U; q2 ∼ N(q1 , σ); if q1 ∈ Qfree and q2 6∈ Qfree , add q1 (or vice versa).
Bridge: q1 ∼ U; q2 ∼ N(q1 , σ); q3 = (q1 + q2 )/2; if q1 , q2 6∈ Qfree and q3 ∈ Qfree , add q3 .
• Sampling strategy can be made more intelligence: “utility-based sampling”
• Connection sampling
(once earlier sampling has produced connected components)
4:35

Probabilistic Roadmaps – conclusions


• Pros:
– Algorithmically very simple
– Highly explorative
– Allows probabilistic performance guarantees
– Good to answer many queries in an unchanged environment
Introduction to Robotics, Marc Toussaint 73

• Cons:
– Precomputation of exhaustive roadmap takes a long time
(but not necessary for “Lazy PRMs”)

4:36

Rapidly Exploring Random Trees

2 motivations:

• Single Query path finding: Focus computational efforts on paths for specific
(qstart , qgoal )

• Use actually controllable DoFs to incrementally explore the search space: control-
based path finding.

(Ensures that RRTs can be extended to handling differential constraints.)


4:37
74 Introduction to Robotics, Marc Toussaint
Introduction to Robotics, Marc Toussaint 75
76 Introduction to Robotics, Marc Toussaint

n = 1 n = 100 n = 300 n = 600 n = 1000 n = 2000


4:38

Rapidly Exploring Random Trees

Simplest RRT with straight line local planner and step size α

Input: qstart , number n of nodes, stepsize α


Output: tree T = (V, E)
1: initialize V = {qstart }, E = ∅
2: for i = 0 : n do
3: qtarget ← random sample from Q
4: qnear ← nearest neighbor of qtarget in V
α
5: qnew ← qnear + |q −q
(q
| target
− qnear )
target near
6: if qnew ∈ Qfree then V ← V ∪ {qnew }, E ← E ∪ {(qnear , qnew )}
7: end for

4:39

Rapidly Exploring Random Trees

RRT growing directedly towards the goal


Introduction to Robotics, Marc Toussaint 77

Input: qstart , qgoal , number n of nodes, stepsize α, β


Output: tree T = (V, E)
1: initialize V = {qstart }, E = ∅
2: for i = 0 : n do
3: if rand(0, 1) < β then qtarget ← qgoal
4: else qtarget ← random sample from Q
5: qnear ← nearest neighbor of qtarget in V
α
6: qnew ← qnear + |q −q
(q
| target
− qnear )
target near
7: if qnew ∈ Qfree then V ← V ∪ {qnew }, E ← E ∪ {(qnear , qnew )}
8: end for

4:40
78 Introduction to Robotics, Marc Toussaint
Introduction to Robotics, Marc Toussaint 79

n = 1 n = 100 n = 200 n = 300 n = 400 n = 500


4:41

Bi-directional search

• grow two trees starting from qstart and qgoal


80 Introduction to Robotics, Marc Toussaint

let one tree grow towards the other


(e.g., “choose qnew of T1 as qtarget of T2 ”)
4:42

Summary: RRTs
• Pros (shared with PRMs):
– Algorithmically very simple
– Highly explorative
– Allows probabilistic performance guarantees

• Pros (beyond PRMs):


– Focus computation on single query (qstart , qgoal ) problem
– Trees from multiple queries can be merged to a roadmap
– Can be extended to differential constraints (nonholonomic systems)

• To keep in mind (shared with PRMs):


– The metric (for nearest neighbor selection) is sometimes critical
– The local planner may be non-trivial
4:43

References
Steven M. LaValle: Planning Algorithms, http://planning.cs.uiuc.edu/.

Choset et. al.: Principles of Motion Planning, MIT Press.

Latombe’s “motion planning” lecture, http://robotics.stanford.edu/


˜latombe/cs326/2007/schedule.htm
Introduction to Robotics, Marc Toussaint 81

4:44

RRT*
Sertac Karaman & Emilio Frazzoli: Incremental sampling-based algorithms for
optimal motion planning, arXiv 1005.0416 (2010).

4:45

4.3 Non-holonomic systems


4:46
82 Introduction to Robotics, Marc Toussaint

Non-holonomic systems
• We define a nonholonomic system as one with differential constraints:
dim(ut ) < dim(xt )
⇒ Not all degrees of freedom are directly controllable

• Dynamic systems are an example!

• General definition of a differential constraint:


For any given state x, let Ux be the tangent space that is generated by controls:
Ux = {ẋ : ẋ = f (x, u), u ∈ U }
(non-holonomic ⇐⇒ dim(Ux ) < dim(x))

The elements of Ux are elements of Tx subject to additional differential con-


straints.
4:47

Car example

ẋ = v cos θ
ẏ = v sin θ
θ̇ = (v/L) tan ϕ
|ϕ| < Φ

x
 
 

State q = y 

  Controls u =  v
ϕ
   
 
θ
 
ẋ   v cos θ 
 

Sytem equation  = 
ẏ  v sin θ 
   

  
   
θ̇ (v/L) tan ϕ
 

4:48

Car example
• The car is a non-holonomic system: not all DoFs are controlled, dim(u) < dim(q)
We have the differential constraint q̇:

ẋ sin θ − ẏ cos θ = 0
Introduction to Robotics, Marc Toussaint 83

“A car cannot move directly lateral.”

• Analogy to dynamic systems: Just like a car cannot instantly move sidewards, a dy-
namic system cannot instantly change its position q: the current change in position is
constrained by the current velocity q̇.

4:49

Path finding for a non-holonomic system

Could a car follow this trajectory?

This is generated with a PRM in the state space q = (x, y, θ) ignoring the differn-
tial constraint.
4:50

Path finding with a non-holonomic system

This is a solution we would like to have:


84 Introduction to Robotics, Marc Toussaint

The path respects differential constraints.


Each step in the path corresponds to setting certain controls.
4:51

Control-based sampling to grow a tree

• Control-based sampling: fulfils none of the nice exploration properties of RRTs,


but fulfils the differential constraints:

1) Select a q ∈ T from tree of current configurations

2) Pick control vector u at random

3) Integrate equation of motion over short duration (picked at random or not)

4) If the motion is collision-free, add the endpoint to the tree


4:52

Control-based sampling for the car


Introduction to Robotics, Marc Toussaint 85

1) Select a q ∈ T
2) Pick v, φ, and τ
3) Integrate motion from q
4) Add result if collision-free

4:53

J. Barraquand and J.C. Latombe. Nonholonomic Multibody Robots: Controllability


and Motion Planning in the Presence of Obstacles. Algorithmica, 10:121-155, 1993.

car parking
4:54
86 Introduction to Robotics, Marc Toussaint

car parking
4:55

parking with only left-steering


4:56
Introduction to Robotics, Marc Toussaint 87

with a trailer
4:57

Better control-based exploration: RRTs revisited


• RRTs with differential constraints:

Input: qstart , number k of nodes, time interval τ


Output: tree T = (V, E)
1: initialize V = {qstart }, E = ∅
2: for i = 0 : k do
3: qtarget ← random sample from Q
4: qnear ← nearest neighbor of qtarget in V
5: use local plannerR τto compute controls u that steer qnear towards qtarget
6: qnew ← qnear + t=0 q̇(q, u)dt
7: if qnew ∈ Qfree then V ← V ∪ {qnew }, E ← E ∪ {(qnear , qnew )}
8: end for

• Crucial questions:
• How meassure near in nonholonomic systems?
• How find controls u to steer towards target?
4:58

Configuration state metrics


88 Introduction to Robotics, Marc Toussaint

Standard/Naive metrics:
• Comparing two 2D rotations/orientations θ1 , θ2 ∈ SO(2):
a) Euclidean metric between eiθ1 and eiθ2
b) d(θ1 , θ2 ) = min{|θ1 − θ2 |, 2π − |θ1 − θ2 |}
• Comparing two configurations (x, y, θ)1,2 in R2 :
Eucledian metric on (x, y, eiθ )
• Comparing two 3D rotations/orientations r1 , r2 ∈ SO(3):
Represent both orientations as unit-length quaternions r1 , r2 ∈ R4 :
d(r1 , d2 ) = min{|r1 − r2 |, |r1 + r2 |}
where | · | is the Euclidean metric.
(Recall that r1 and −r1 represent exactly the same rotation.)

• Ideal metric:
Optimal cost-to-go between two states x1 and x2 :
• Use optimal trajectory cost as metric
• This is as hard to compute as the original problem, of course!!
→ Approximate, e.g., by neglecting obstacles.
4:59

Side story: Dubins curves

• Dubins car: constant velocity, and steer ϕ ∈ [−Φ, Φ]

• Neglecting obstacles, there are only six types of trajectories that connect any
configuration ∈ R2 × S1 :
{LRL, RLR, LSL, LSR, RSL, RSR}

• annotating durations of each phase:


{Lα Rβ Lγ , , Rα Lβ Rγ , Lα Sd Lγ , Lα Sd Rγ , Rα Sd Lγ , Rα Sd Rγ }
with α ∈ [0, 2π), β ∈ (π, 2π), d ≥ 0
4:60

Side story: Dubins curves


Introduction to Robotics, Marc Toussaint 89

→ By testing all six types of trajectories for (q1 , q2 ) we can define a Dubins
metric for the RRT – and use the Dubins curves as controls!

• Reeds-Shepp curves are an extension for cars which can drive back.
(includes 46 types of trajectories, good metric for use in RRTs for cars)
4:61
90 Introduction to Robotics, Marc Toussaint

5 Path Optimization – briefly

Outline
• These are only some very brief notes on path optimization

• The aim is to explain how to formulate the optimization problem. Concerning


the optimization algortihm itself, refer to the Optimization lecture.
5:1

From inverse kinematics to path costs


• Recall our optimality principle of inverse kinematics

argmin ||q − q0 ||2W + ||Φ(q)||2


q

• A trajectory q0:T is a sequence of robot configurations qt ∈ Rn


• Consider the cost function
T
X T
X
f (q0:T ) = ||qt-1 − qt ||2W + ||Φt (qt )||2
t=0 t=0

(where (q-1 is a given prefix)


• ||qt-1 −qt ||2W
represents control costs
Φt (qt ) represents task costs
5:2

General k-order cost terms


[Notation: xt instead of qt represents joint state]

T
X
min ft (xt−k:t )>ft (xt−k:t )
x0:T
t=0
s.t. ∀t : gt (xt−k:t ) ≤ 0 , ht (xt−k:t ) = 0 .

5:3

Cost terms
• The ft (xt−k:t ) terms can penalize various things:

k=1 ft (xt-1 , xt ) = xt − xt-1 penalize velocity


k=2 ft (xt-2 , .., xt ) = xt − 2xt-1 + xt-2 penalize acceleration
k=3 ft (xt-3 , .., xt ) = xt − 3xt-1 + 3xt-2 − xt-3 penalize jerk
Introduction to Robotics, Marc Toussaint 91

or in some arbitrary task spaces


k=0 ft (xt ) = φ(xt ) − y ∗ penalize offset in
some task space
k=1 ft (xt-1 , xt ) = φxt − φxt-1
k=2 ft (xt-2 , .., xt ) = φxt − 2φxt-1 + φxt-2
k=3 ft (xt-3 , .., xt ) = φxt − 3φxt-1 + 3φxt-2 − φxt-3

• And terms ft can be stacked arbitrarily


5:4

Choice of optimizer

T
X
min ft (xt−k:t )>ft (xt−k:t )
x0:T
t=0
s.t. ∀t : gt (xt−k:t ) ≤ 0 , ht (xt−k:t ) = 0 .

• Constrained optimization methods:


– Log-barrier, squared penalties
– Augmented Lagrangian

• Note: also the Lagrangian is the form of the so-called Gauss-Newton form.
The pseudo Hessian is a banded, symmetric, positive-definite matrix.
5:5
92 Introduction to Robotics, Marc Toussaint

6 Probabilities

Probability Theory
• Why do we need probabilities?

– Obvious: to express inherent (objective) stochasticity of the world

• But beyond this: (also in a “deterministic world”):


– lack of knowledge!
– hidden (latent) variables
– expressing uncertainty
– expressing information (and lack of information)
– Subjective Probability

• Probability Theory: an information calculus


6:1

Outline
• Basic definitions
– Random variables
– joint, conditional, marginal distribution
– Bayes’ theorem
• Probability distributions:
– Gauss
– Dirak & Particles
These are generic slides on probabilities I use throughout my lecture. Only parts are
relevant for this course.
6:2

Probability: Frequentist and Bayesian


• Frequentist probabilities are defined in the limit of an infinite number of trials
Example: “The probability of a particular coin landing heads up is 0.43”

• Bayesian (subjective) probabilities quantify degrees of belief


Example: “The probability of it raining tomorrow is 0.3”
– Not possible to repeat “tomorrow”
6:3
Introduction to Robotics, Marc Toussaint 93

6.1 Basic definitions


6:4

Probabilities & Sets


• Sample Space/domain Ω, e.g. Ω = {1, 2, 3, 4, 5, 6}

• Probability P : A ⊂ Ω 7→ [0, 1]
e.g., P ({1}) = 61 , P ({4}) = 16 , P ({2, 5}) = 13 ,

• Axioms: ∀A, B ⊆ Ω
– Nonnegativity P (A) ≥ 0
– Additivity P (A ∪ B) = P (A) + P (B) if A ∩ B = ∅
– Normalization P (Ω) = 1

• Implications
0 ≤ P (A) ≤ 1
P (∅) = 0
A ⊆ B ⇒ P (A) ≤ P (B)
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
P (Ω \ A) = 1 − P (A)
6:5

Probabilities & Random Variables


• For a random variable X with discrete domain dom(X) = Ω we write:
∀x∈Ω : 0 ≤ P (X = x) ≤ 1
P
x∈Ω P (X = x) = 1

Example: A dice can take values Ω = {1, .., 6}.


X is the random variable of a dice throw.
P (X = 1) ∈ [0, 1] is the probability that X takes value 1.

• A bit more formally: a random variable is a map from a measureable space to a domain
(sample space) and thereby introduces a probability measure on the domain (“assigns a
probability to each possible value”)
6:6

Probabilty Distributions
• P (X = 1) ∈ R denotes a specific probability
94 Introduction to Robotics, Marc Toussaint

P (X) denotes the probability distribution (function over Ω)

Example: A dice can take values Ω = {1, 2, 3, 4, 5, 6}.


By P (X) we discribe the full distribution over possible values {1, .., 6}. These are 6
numbers that sum to one, usually stored in a table, e.g.: [ 16 , 16 , 61 , 16 , 16 , 16 ]

• In implementations we typically represent distributions over discrete random


variables as tables (arrays) of numbers

• Notation for summing over a RV:


In equation we often need to sum P over RVs. We then write
X P (X) · · ·
P
as shorthand for the explicit notation x∈dom(X) P (X = x) · · ·
6:7

Joint distributions
Assume we have two random variables X and Y

• Definitions:
Joint: P (X, Y )
P
Marginal: P (X) = Y P (X, Y )
P (X,Y )
Conditional: P (X|Y ) = P (Y )

P
The conditional is normalized: ∀Y : X P (X|Y ) = 1

• X is independent of Y iff: P (X|Y ) = P (X)


(table thinking: all columns of P (X|Y ) are equal)
6:8

Joint distributions
joint: P (X, Y )
P
marginal: P (X) = Y P (X, Y )
P (X,Y )
conditional: P (X|Y ) = P (Y )

• Implications of these definitions:


Product rule: P (X, Y ) = P (X|Y ) P (Y ) = P (Y |X) P (X)
Introduction to Robotics, Marc Toussaint 95

P (Y |X) P (X)
Bayes’ Theorem: P (X|Y ) = P (Y )

6:9

Bayes’ Theorem

P (Y |X) P (X)
P (X|Y ) =
P (Y )

likelihood · prior
posterior = normalization
6:10

Multiple RVs:
• Analogously for n random variables X1:n (stored as a rank n tensor)
Joint: P (X1:n )
P
Marginal: P (X1 ) = X2:n P (X1:n ),
P (X1:n )
Conditional: P (X1 |X2:n ) = P (X2:n )

• X is conditionally independent of Y given Z iff:


P (X|Y, Z) = P (X|Z)

• Product rule and Bayes’ Theorem:


Qn P (X, Z, Y ) = P (X|Y, Z) P (Y |Z) P (Z)
P (X1:n ) = i=1 P (Xi |Xi+1:n ) P (Y |X,Z) P (X|Z)
P (X|Y, Z) = P (Y |Z)
P (X2 |X1 ,X3:n ) P (X1 |X3:n )
P (X1 |X2:n ) = P (X2 |X3:n ) P (X,Z|Y ) P (Y )
P (X, Y |Z) = P (Z)
6:11

6.2 Probability distributions


6:12

• We skip Bernoulli & Beta, Multinomial & Dirichlet..


6:13
96 Introduction to Robotics, Marc Toussaint

Bernoulli & Binomial


• We have a binary random variable x ∈ {0, 1} (i.e. dom(x) = {0, 1})
The Bernoulli distribution is parameterized by a single scalar µ,

P (x = 1 | µ) = µ , P (x = 0 | µ) = 1 − µ
Bern(x | µ) = µ (1 − µ)1−x
x

• We have a data set of random variables D = {x1 , .., xn }, each xi ∈ {0, 1}. If
each xi ∼ Bern(xi | µ) we have
Qn Qn
P (D | µ) = i=1 Bern(xi | µ) = i=1 µxi (1 − µ)1−xi
n n
X 1X
argmax log P (D | µ) = argmax xi log µ + (1 − xi ) log(1 − µ) = xi
µ µ
i=1
n i=1
Pn
• The Binomial distribution is the distribution over the count m = i=1 xi

n

m
 
n n!
Bin(m | n, µ) =   µ (1 − µ)n−m , 
 =
m m (n − m)! m!
 

6:14

Beta
How to express uncertainty over a Bernoulli parameter µ
• The Beta distribution is over the interval [0, 1], typically the parameter µ of a
Bernoulli:
1
Beta(µ | a, b) = µa−1 (1 − µ)b−1
B(a, b)

with mean hµi = a


a+b and mode µ∗ = a−1
a+b−2 for a, b > 1

• The crucial point is:


– Assume we are in a world with a “Bernoulli source” (e.g., binary bandit), but don’t
know its parameter µ
– Assume we have a prior distribution P (µ) = Beta(µ | a, b)
– Assume
P we collected somePdata D = {x1 , .., xn }, xi ∈ {0, 1}, with counts aD =
i xi of [x i = 1] and bD = i (1 − xi ) of [xi = 0]
– The posterior is

P (D | µ)
P (µ | D) = P (µ) ∝ Bin(D | µ) Beta(µ | a, b)
P (D)
∝ µaD (1 − µ)bD µa−1 (1 − µ)b−1 = µa−1+aD (1 − µ)b−1+bD
= Beta(µ | a + aD , b + bD )
6:15
Introduction to Robotics, Marc Toussaint 97

Beta

The prior is Beta(µ | a, b), the posterior is Beta(µ | a + aD , b + bD )

• Conclusions:
– The semantics of a and b are counts of [xi = 1] and [xi = 0], respectively
– The Beta distribution is conjugate to the Bernoulli (explained later)
– With the Beta distribution we can represent beliefs (state of knowledge) about un-
certain µ ∈ [0, 1] and know how to update this belief given data
6:16

Beta

from Bishop
6:17

Multinomial
• We have an integer random variable x ∈ {1, .., K}
The probability of a single x can be parameterized by µ = (µ1 , .., µK ):

P (x = k | µ) = µk
PK
with the constraint k=1 µk = 1 (probabilities need to be normalized)
• We have a data set of random variables D = {x1 , .., xn }, each xi ∈ {1, .., K}. If
each xi ∼ P (xi | µ) we have
Qn Qn QK [x =k] QK
P (D | µ) = i=1 µxi = i=1 k=1 µk i = k=1 µm
k
k
98 Introduction to Robotics, Marc Toussaint

Pn
where mk = i=1 [xi = k] is the count of [xi = k]. The ML estimator is

1
argmax log P (D | µ) = (m1 , .., mK )
µ n

• The Multinomial distribution is this distribution over the counts mk


QK
Mult(m1 , .., mK | n, µ) ∝ k=1 µm k
k

6:18

Dirichlet
How to express uncertainty over a Multinomial parameter µ
• The Dirichlet distribution is over the K-simplex, that is, over µ1 , .., µK ∈ [0, 1]
PK
subject to the constraint k=1 µk = 1:
QK k −1
Dir(µ | α) ∝ k=1 µα
k

It is parameterized by α = (α1 , .., αK ), has mean hµi i = Pαi and mode µ∗i =
j αj
Pαi −1 for ai > 1.
j αj −K

• The crucial point is:


– Assume we are in a world with a “Multinomial source” (e.g., an integer bandit), but
don’t know its parameter µ
– Assume we have a prior distribution P (µ) = Dir(µ | α)
– Assume
P we collected some data D = {x1 , .., xn }, xi ∈ {1, .., K}, with counts mk =
i [xi = k]
– The posterior is

P (D | µ)
P (µ | D) = P (µ) ∝ Mult(D | µ) Dir(µ | a, b)
P (D)
mk QK αk −1 αk −1+mk
∝ K = K
Q Q
k=1 µk k=1 µk k=1 µk
= Dir(µ | α + m)

6:19

Dirichlet

The prior is Dir(µ | α), the posterior is Dir(µ | α + m)

• Conclusions:
– The semantics of α is the counts of [xi = k]
– The Dirichlet distribution is conjugate to the Multinomial
Introduction to Robotics, Marc Toussaint 99

– With the Dirichlet distribution we can represent beliefs (state of knowledge) about
uncertain µ of an integer random variable and know how to update this belief given
data
6:20

Dirichlet
Illustrations for α = (0.1, 0.1, 0.1), α = (1, 1, 1) and α = (10, 10, 10):

from Bishop
6:21

Motivation for Beta & Dirichlet distributions


• Bandits:
– If we have binary [integer] bandits, the Beta [Dirichlet] distribution is a way to
represent and update beliefs
– The belief space becomes discrete: The parameter α of the prior is continuous, but
the posterior updates live on a discrete “grid” (adding counts to α)
– We can in principle do belief planning using this
• Reinforcement Learning:
– Assume we know that the world is a finite-state MDP, but do not know its transition
probability P (s0 | s, a). For each (s, a), P (s0 | s, a) is a distribution over the integer
s0
– Having a separate Dirichlet distribution for each (s, a) is a way to represent our
belief about the world, that is, our belief about P (s0 | s, a)
– We can in principle do belief planning using this → Bayesian Reinforcement Learning
• Dirichlet distributions are also used to model texts (word distributions in text),
images, or mixture distributions in general
6:22

Conjugate priors

• Assume you have data D = {x1 , .., xn } with likelihood

P (D | θ)

that depends on an uncertain parameter θ


100 Introduction to Robotics, Marc Toussaint

Assume you have a prior P (θ)

• The prior P (θ) is conjugate to the likelihood P (D | θ) iff the posterior

P (θ | D) ∝ P (D | θ) P (θ)

is in the same distribution class as the prior P (θ)

• Having a conjugate prior is very convenient, because then you know how to
update the belief given data
6:23

Conjugate priors

likelihood conjugate
Binomial Bin(D | µ) Beta Beta(µ | a, b)
Multinomial Mult(D | µ) Dirichlet Dir(µ | α)
Gauss N(x | µ, Σ) Gauss N(µ | µ0 , A)
1D Gauss N(x | µ, λ-1 ) Gamma Gam(λ | a, b)
nD Gauss N(x | µ, Λ-1 ) Wishart Wish(Λ | W, ν)
nD Gauss N(x | µ, Λ-1 ) Gauss-Wishart
N(µ | µ0 , (βΛ)-1 ) Wish(Λ | W, ν)
6:24

Distributions over continuous domain


6:25

Distributions over continuous domain


• Let x be a continuous RV. The probability density function (pdf) p(x) ∈ [0, ∞)
defines the probability
Z b
P (a ≤ x ≤ b) = p(x) dx ∈ [0, 1]
a

Ry
The (cumulative) probability distribution F (y) = P (x ≤ y) = −∞
dx p(x) ∈
[0, 1] is the cumulative integral with limy→∞ F (y) = 1

(In discrete domain: probability distribution and probability mass function P (x) ∈ [0, 1] are
used synonymously.)

• Two basic examples:


>
1
Σ-1 (x−µ)
Gaussian: N(x | µ, Σ) = 1
| 2πΣ | 1/2
e− 2 (x−µ)
Introduction to Robotics, Marc Toussaint 101

R
Dirac or δ (“point particle”) δ(x) = 0 except at x = 0, δ(x) dx = 1

δ(x) = ∂x H(x) where H(x) = [x ≥ 0] = Heavyside step function
6:26

Gaussian distribution

N (x|µ, σ 2 )

2
1
/σ 2
• 1-dim: N(x | µ, σ 2 ) = 1
| 2πσ 2 | 1/2
e− 2 (x−µ) µ

• n-dim Gaussian in normal form:


1 1
N(x | µ, Σ) = 1/2
exp{− (x − µ)> Σ-1 (x − µ)}
| 2πΣ | 2

with mean µ and covariance matrix Σ. In canonical form:

exp{− 21 a>A-1 a} 1
N[x | a, A] = -1 1/2
exp{− x> A x + x>a} (2)
| 2πA | 2

with precision matrix A = Σ-1 and coefficient a = Σ-1 µ (and mean µ = A-1 a).

• Gaussian identities: see http://ipvs.informatik.uni-stuttgart.de/mlr/marc/


notes/gaussians.pdf
6:27

Gaussian identities
Symmetry: N(x | a, A) = N(a | x, A) = N(x − a | 0, A)

Product:
N(x | a, A) N(x | b, B) = N[x | A-1 a+B -1 b, A-1 +B -1 ] N(a | b, A+B) N[x | a, A] N[x | b, B] =
N[x | a + b, A + B] N(A-1 a | B -1 b, A-1 + B -1 )

“Propagation”:
N(x | a + F y, A) N(y | b, B) dy = N(x | a + F b, A + F BF>)
R
y

Transformation:
N(F x + f | a, A) = 1
|F |
N(x | F -1 (a − f ), F -1 AF -> )

Marginal
 & conditional:

x a A C
N , > = N(x | a, A) · N(y | b + C>A-1 (x - a), B − C>A-1 C)
y b C B
102 Introduction to Robotics, Marc Toussaint

More Gaussian identities: see http://ipvs.informatik.uni-stuttgart.de/mlr/marc/


notes/gaussians.pdf
6:28

Motivation for Gaussian distributions


• Gaussian Bandits
• Control theory, Stochastic Optimal Control
• State estimation, sensor processing, Gaussian filtering (Kalman filtering)
• Machine Learning
• etc
6:29

Particle Approximation of a Distribution


• We approximate a distribution p(x) over a continuous domain Rn

• A particle distribution q(x) is a weighed set S = {(xi , wi )}N


i=1 of N particles
– each particle has a “location” xi ∈ Rn and a weight wi ∈ R
– weights are normalized, i wi = 1
P

N
X
q(x) := wi δ(x − xi )
i=1

where δ(x − xi ) is the δ-distribution.


• Given weighted particles, we can estimate for any (smooth) f :
Z
PN
hf (x)ip = f (x)p(x)dx ≈ i=1 wi f (xi )
x

See An Introduction to MCMC for Machine Learning www.cs.ubc.ca/˜nando/


papers/mlintro.pdf
6:30

Particle Approximation of a Distribution


Histogram of a particle representation:
Introduction to Robotics, Marc Toussaint 103

6:31

Motivation for particle distributions


• Numeric representation of “difficult” distributions
– Very general and versatile
– But often needs many samples
• Distributions over games (action sequences), sample based planning, MCTS
• State estimation, particle filters
• etc
6:32

Some more continuous distributions*


>
1
A-1 (x−a)
Gaussian N(x | a, A) = | 2πA1 | 1/2 e− 2 (x−a)

Dirac or δ δ(x) = ∂x H(x)
2 ν+1
Student’s t p(x; ν) ∝ [1 + xν ]− 2
(=Gaussian for ν → ∞, otherwise heavy
tails)
Exponential p(x; λ) = [x ≥ 0] λe−λx
(distribution over single event time)
1 − | x−µ | /b
Laplace p(x; µ, b) = 2b e
(“double exponential”)
Chi-squared p(x; k) ∝ [x ≥ 0] xk/2−1 e−x/2
Gamma p(x; k, θ) ∝ [x ≥ 0] xk−1 e−x/θ
6:33
104 Introduction to Robotics, Marc Toussaint

7 Mobile Robotics

http://www.darpa.mil/grandchallenge05/
DARPA Grand Challenge 2005

7:1
Introduction to Robotics, Marc Toussaint 105

http://www.darpa.mil/grandchallenge/index.asp
DARPA Grand Urban Challenge 2007
7:2

http://www.slawomir.de/publications/grzonka09icra/grzonka09icra.pdf
Quadcopter Indoor Localization
7:3
106 Introduction to Robotics, Marc Toussaint

http://stair.stanford.edu/multimedia.php
STAIR: STanford Artificial Intelligence Robot
7:4

Types of Robot Mobility


Introduction to Robotics, Marc Toussaint 107

7:5

• Each type of robot mobility corresponds to a


system equation xt+1 = xt + τ f (xt , ut )
or, if the dynamics are stochastic,

P (xt+1 | ut , xt ) = N(xt+1 | xt + τ f (xt , ut ), Σ)

• We considered control, path finding, and trajectory optimization

For this we always assumed to know the state xt of the robot (e.g., its pos-
ture/position)!
7:6

Outline
• PART I:
A core challenge in mobile robotics is state estimation
→ Bayesian filtering & smoothing
particles, Kalman

• PART II:
Another challenge is to build a map while exploring
→ SLAM (simultaneous localization and mapping)
7:7

7.1 State Estimation


7:8

State Estimation Problem


• Our sensory data does not provide sufficient information to determine our lo-
cation.

• Given the local sensor readings yt , the current state xt (location, position) is
uncertain.
108 Introduction to Robotics, Marc Toussaint

– which hallway?
– which door exactly?
– which heading direction?

7:9

State Estimation Problem

• What is the probability of being in front


of room 154, given we see what is shown
in the image?

• What is the probability given that we


were just in front of room 156?

• What is the probability given that we


were in front of room 156 and moved 15
meters?

7:10

Recall Bayes’ theorem

P (Y |X) P (X)
P (X|Y ) = P (Y )

likelihood · prior
posterior = (normalization)
7:11
Introduction to Robotics, Marc Toussaint 109

• How can we apply this to the

State Estimation Problem?

Using Bayes Rule:


P (sensor | location)P (location)
P (location | sensor) = P (sensor)

7:12

Bayes Filter
xt = state (location) at time t
yt = sensor readings at time t
ut-1 = control command (action, steering, velocity) at time t-1

• Given the history y0:t and u0:t-1 , we want to compute the probability distribu-
tion over the state at time t

pt (xt ) := P (xt | y0:t , u0:t-1 )


110 Introduction to Robotics, Marc Toussaint

• Generally:
xt
Filtering: P (xt |y0:t ) 111111111
000000000
000000000
111111111
y 0:t

x t
Smoothing: P (xt |y0:T ) 000000000000000
111111111111111
000000000000000
111111111111111
y 0:T

xt
Prediction: P (xt |y0:s ) 11111
00000
00000
11111
y 0:s

7:13

Bayes Filter

pt (xt ) := P (xt | y0:t , u0:t-1 )


= ct P (yt | xt , y0:t-1 , u0:t-1 ) P (xt | y0:t-1 , u0:t-1 )
= ct P (yt | xt ) P (xt | y0:t-1 , u0:t-1 )
Z
= ct P (yt | xt ) P (xt , xt-1 | y0:t-1 , u0:t-1 ) dxt-1
xt-1
Z
= ct P (yt | xt ) P (xt | xt-1 , y0:t-1 , u0:t-1 ) P (xt-1 | y0:t-1 , u0:t-1 ) dxt-1
Zxt-1
= ct P (yt | xt ) P (xt | xt-1 , ut-1 ) P (xt-1 | y0:t-1 , u0:t-1 ) dxt-1
Zxt-1
= ct P (yt | xt ) P (xt | ut-1 , xt-1 ) pt-1 (xt-1 ) dxt-1
xt-1

using Bayes rule P (X|Y, Z) = c P (Y |X, Z) P (X|Z) with some normalization


constant ct
uses conditional independence of the observation on past observations and
controls
by definition of the marginal
by definition of a conditional
given xt-1 , xt depends only on the controls ut-1 (Markov Property)
• A Bayes filter updates the posterior belief pt (xt ) in each time step using the:
observation model P (yt | xt )
transition model P (xt | ut-1 , xt-1 )
7:14

Bayes Filter Z
pt (xt ) ∝ P (yt | xt ) P (xt | ut-1 , xt-1 ) pt-1 (xt-1 ) dxt-1
| {z } xt-1 | {z }
new information old estimate
| {z }
predictive estimate p̂t (xt )
Introduction to Robotics, Marc Toussaint 111

1. We have a belief pt-1 (xt-1 ) of our previous position

2. We use the motion modelR to predict the current position


p̂t (xt ) ∝ xt-1 P (xt | ut-1 , xt-1 ) pt-1 (xt-1 ) dxt-1

3. We integetrate this with the current observation to get a better belief


pt (xt ) ∝ P (yt | xt ) p̂t (xt )
7:15

• Typical transition model P (xt | ut-1 , xt-1 ) in robotics:

(from Robust Monte Carlo localization for mobile robots Sebastian Thrun, Dieter
Fox, Wolfram Burgard, Frank Dellaert)
7:16

Odometry (“Dead Reckoning”): Filtering without observations


• The predictive distributions p̂t (xt ) without integrating observations (removing
the P (yt |xt ) part from the Bayesian filter)

(from Robust Monte Carlo localization for mobile robots Sebastian Thrun, Dieter
Fox, Wolfram Burgard, Frank Dellaert)
112 Introduction to Robotics, Marc Toussaint

7:17

Again, predictive distributions p̂t (xt ) without integrating landmark observations

7:18

The Bayes-filtered distributions pt (xt ) integrating landmark observations

7:19

Bayesian Filters
• How to represent the belief pt (xt ):
Introduction to Robotics, Marc Toussaint 113

• Gaussian

• Particles

7:20

Recall: Particle Representation of a Distribution

• Weighed set of N particles {(xi , wi )}N


i=1

PN
p(x) ≈ q(x) := i=1 wi δ(x, xi )
114 Introduction to Robotics, Marc Toussaint

7:21

Particle Filter := Bayesian Filtering with Particles


R
(Bayes Filter: pt (xt ) ∝ P (yt | xt ) xt-1 P (xt | ut-1 , xt-1 ) pt-1 (xt-1 ) dxt-1 )

1. Start with N particles {(xit-1 , wt-1


i
)}N
i=1
2. Resample particles to get N weight-1-particles: {x̂it-1 }N
i=1
3. Use motion model to get new “predictive” particles {xit }N i=1
each xit ∼ P (xt | ut-1 , x̂it-1 )
4. Use observation model to assign new weights wti ∝ P (yt | xit )
7:22

• “Particle Filter”

aka Monte Carlo Localization in the mobile robotics community


Introduction to Robotics, Marc Toussaint 115

Condensation Algorithm in the vision community

• Efficient resampling is important:


Typically “Residual Resampling”:

Instead of sampling directly n̂i ∼ Multi({N wi }) set n̂i = bN wi c + n̄i with n̄i ∼
Multi({N wi − bN wi c})
Liu & Chen (1998): Sequential Monte Carlo Methods for Dynamic Systems.
Douc, Cappé & Moulines: Comparison of Resampling Schemes for Particle Filtering.
7:23

Example: Quadcopter Localization

http://www.slawomir.de/publications/grzonka09icra/grzonka09icra.pdf
Quadcopter Indoor Localization
7:24

Typical Pitfall in Particle Filtering


• Predicted particles {xit }N i
i=1 have very low observation likelihood P (yt | xt ) ≈ 0
(“particles die over time”)

• Classical solution: generate particles also with other than purely forward pro-
posal P (xt | ut-1 , xt-1 ):

– Choose a proposal that depends on the new observation yt , ideally approxi-


mating P (xt | yt , ut-1 , xt-1 )

– Or mix particles sampled directly from P (yt | xt ) and from P (xt | ut-1 , xt-1 ).
(Robust Monte Carlo localization for mobile robots. Sebastian Thrun, Dieter Fox, Wolfram Burgard,
Frank Dellaert)
116 Introduction to Robotics, Marc Toussaint

7:25

Kalman filter := Bayesian Filtering with Gaussians


R
Bayes Filter: pt (xt ) ∝ P (yt | xt ) xt-1 P (xt | ut-1 , xt-1 ) pt-1 (xt-1 ) dxt-1

• Can be computed analytically for linear-Gaussian observations and transitions:


P (yt | xt ) = N(yt | Cxt + c, W )
P (xt | ut-1 , xt-1 ) = N(xt | A(ut-1 ) xt-1 + a(ut-1 ), Q)

Defition:
N(x | a, A) = 1
exp{− 12 (x - a)> A-1 (x - a)}
|2πA|1/2
Product:
N(x | a, A) N(x | b, B) = N(x | B(A+B)-1 a + A(A+B)-1 b, A(A+B)-1 B) N(a | b, A + B)
“Propagation”:
y N(x | a + F y, A) N(y | b, B) dy = N(x | a + F b, A + F BF )
>
R

Transformation:
N(F x + f | a, A) = 1
|F |
N(x | F -1 (a − f ), F -1 AF -> )

(more identities: see “Gaussian identities” http://ipvs.informatik.uni-stuttgart.de/


mlr/marc/notes/gaussians.pdf)
7:26

Kalman filter derivation

pt (xt ) = N(xt | st , St )
P (yt | xt ) = N(yt | Cxt + c, W )
P (xt | ut-1 , xt-1 ) = N(xt | Axt-1 + a, Q)
Z
pt (xt ) ∝ P (yt | xt ) P (xt | ut-1 , xt-1 ) pt-1 (xt-1 ) dxt-1
xt-1
Z
= N(yt | Cxt + c, W ) N(xt | Axt-1 + a, Q) N(xt-1 | st-1 , St-1 ) dxt-1
xt-1

= N(yt | Cxt + c, W ) N(xt | Ast-1 + a, Q + ASt-1 A>)


| {z } | {z }
=:ŝt =:Ŝt

= N(Cxt + c | yt , W ) N(xt | ŝt , Ŝt )


= N[xt | C>W -1 (yt − c), C>W -1 C] N(xt | ŝt , Ŝt )
= N(xt | st , St ) · hterms indep. of xt i
St = (C>W -1 C + Ŝt-1 )-1 = Ŝt − Ŝt C>(W + C Ŝt C>)-1 C Ŝt
| {z }
“Kalman gain” K

st = St [C>W -1 (yt − c) + Ŝt-1 ŝt ] = ŝt + K(yt − C ŝt − c)


The second to last line uses the general Woodbury identity.
The last line uses St C>W -1 = K and St Ŝt-1 = I − KC
Introduction to Robotics, Marc Toussaint 117

7:27

Extended Kalman filter (EKF) and Unscented Transform


R
Bayes Filter: pt (xt ) ∝ P (yt | xt ) xt-1 P (xt | ut-1 , xt-1 ) pt-1 (xt-1 ) dxt-1

• Can be computed analytically for linear-Gaussian observations and transitions:


P (yt | xt ) = N(yt | Cxt + c, W )
P (xt | ut-1 , xt-1 ) = N(xt | A(ut-1 )xt-1 + a(ut-1 ), Q)

• If P (yt | xt ) or P (xt | ut-1 , xt-1 ) are not linear:


P (yt | xt ) = N(yt | g(xt ), W )
P (xt | ut-1 , xt-1 ) = N(xt | f (xt-1 , ut-1 ), Q)
– approximate f and g as locally linear (Extended Kalman Filter)
– or sample locally from them and reapproximate as Gaussian (Unscented Trans-
form)
7:28

Bayes smoothing

xt
Filtering: P (xt |y0:t ) 111111111
000000000
000000000
111111111
y 0:t

x t
Smoothing: P (xt |y0:T ) 000000000000000
111111111111111
000000000000000
111111111111111
y 0:T

xt
Prediction: P (xt |y0:s ) 11111
00000
00000
11111
y 0:s

7:29

Bayes smoothing
• Let P = y0:t past observations, F = yt+1:T future observations

P (xt | P, F, u0:T ) ∝ P (F | xt , P, u0:T ) P (xt | P, u0:T )


= P (F | xt , ut:T ) P (xt | P, u0:t-1 )
| {z } | {z }
=:βt (xt ) =:p(xt )

Bayesian smoothing fuses a forward filter pt (xt ) with a backward “filter” βt (xt )

• Backward recursion (derivation analogous to the Bayesian filter)

βt (xt ) := P (yt+1:T | xt , ut:T )


118 Introduction to Robotics, Marc Toussaint

Z
= βt+1 (xt+1 ) P (yt+1 | xt+1 ) P (xt+1 | xt , ut ) dxt+1
xt+1

7:30

7.2 Simultaneous Localization and Mapping (SLAM)


7:31

Localization and Mapping


• The Bayesian filter requires an observation model P (yt | xt )

• A map is something that provides the observation model:


A map tells us for each xt what the sensor readings yt might look like
7:32

Types of maps

Grid map
Landmark map

K. Murphy (1999): Bayesian map learning


in dynamic environments.
Grisetti, Tipaldi, Stachniss, Burgard, Nardi:
Fast and Accurate SLAM with
Rao-Blackwellized Particle Filters

Laser scan map Victoria Park data set


M. Montemerlo, S. Thrun, D. Koller, & B. Wegbreit
(2003): FastSLAM 2.0: An improved particle filter-
ing algorithm for simultaneous localization and mapping
that provably converges. IJCAI, 1151–1156.

7:33

Simultaneous Localization and Mapping Problem


• Notation:
xt = state (location) at time t
yt = sensor readings at time t
Introduction to Robotics, Marc Toussaint 119

ut-1 = control command (action, steering, velocity) at time t-1


m = the map; formally: a map is the parameters that define P (yt | xt , m)

• Given the history y0:t and u0:t-1 , we want to compute the belief over the pose
AND THE MAP m at time t

pt (xt , m) := P (xt , m | y0:t , u0:t-1 )

• We assume to know the:


– transition model P (xt | ut-1 , xt-1 )
– observation model P (yt | xt , m) (defined by the map)
7:34

SLAM: classical “chicken or egg problem”


• If we knew the state trajectory x0:t we could efficiently compute the belief over
the map
P (m | x0:t , y0:t , u0:t-1 )

• If we knew the map we could use a Bayes filter to compute the belief over the
state
P (xt | m, y0:t , u0:t-1 )

• SLAM requires to tie state estimation and map building together:


1) Joint inference on xt and m (→ Kalman-SLAM)
2) Tie a state hypothesis (=particle) to a map hypothesis
(→ particle SLAM)
3) Frame everything as a graph optimization problem (→ graph SLAM)
7:35

Joint Bayesian Filter over x and m


• A (formally) straight-forward approach is the joint Bayesian filter
R
pt (xt , m) ∝ P (yt | xt , m) xt-1
P (xt | ut-1 , xt-1 ) pt-1 (xt-1 , m) dxt-1

But: How represent a belief over high-dimensional xt , m?


7:36
120 Introduction to Robotics, Marc Toussaint

Map uncertainty
• In the case the map m = (θ1 , .., θN ) is a set of N landmarks, θj ∈ R2

• Use Gaussians to represent the uncertainty of landmark positions


7:37

(Extended) Kalman Filter SLAM


• Analogous to Localization with Gaussian for the pose belief pt (xt )
– But now: joint belief pt (xt , θ1:N ) is 3 + 2N -dimensional Gaussian
– Assumes the map m = (θ1 , .., θN ) is a set of N landmarks, θj ∈ R2
– Exact update equations (under the Gaussian assumption)
– Conceptually very simple

• Drawbacks:
– Scaling (full covariance matrix is O(N 2 ))
– Sometimes non-robust (uni-modal, “data association problem”)
– Lacks advantages of Particle Filter
(multiple hypothesis, more robust to non-linearities)
7:38

SLAM with particles


Core idea: Each particle carries its own map belief

• Use a conditional representation “pt (xt , m) = pt (xt ) pt (m | xt )”


(This notation is flaky... the below is more precise)
Introduction to Robotics, Marc Toussaint 121

• As for the Localization Problem use particles to represent the pose belief pt (xt )
Note: Each particle actually “has a history xi0:t ” – a whole trajectory!

• For each particle separately, estimate the map belief pit (m) conditioned on the
particle history xi0:t .
The conditional beliefs pit (m) may be factorized over grid points or landmarks
of the map

K. Murphy (1999): Bayesian map learning in dynamic environments.


http://www.cs.ubc.ca/˜murphyk/Papers/map_nips99.pdf

7:39

Map estimation for a given particle history


• Given x0:t (e.g. a trajectory of a particle), what is the posterior over the map m?

→ simplified Bayes Filter:

pt (m) := P (m | x0:t , y0:t ) ∝ P (yt | m, xt ) pt-1 (m)

(no transtion model: assumption that map is constant)

• In the case of landmarks (FastSLAM):


m = (θ1 , .., θN )
θj = position of the jth landmark, j ∈ {1, .., N }
nt = which landmark we observe at time t, nt ∈ {1, .., N }

We can use a separate (Gaussian) Bayes Filter for each θj


conditioned on x0:t , each θj is independent from each θk :
Y
P (θ1:N | x0:t , y0:n , n0:t ) = P (θj | x0:t , y0:n , n0:t )
j

7:40

Particle likelihood in SLAM


• Particle likelihood for Localization Problem:
wti = P (yt | xit )
(determins the new importance weight wti
122 Introduction to Robotics, Marc Toussaint

• In SLAM the map is uncertain → each particle is weighted with the expected
likelihood:
wti = P (yt | xit , m) pt−1 (m) dm
R

• In case of landmarks (FastSLAM):


wti = P (yt | xit , θnt , nt ) pt−1 (θnt ) dθnt
R

• Data association problem (actually we don’t know nt ):


For each particle separately choose nit = argmaxnt wti (nt )

7:41

Particle-based SLAM summary

• We have a set of N particles {(xi , wi )}N


i=1 to represent the pose belief pt (xt )

• For each particle we have a separate map belief pit (m); in the case of landmarks,
this factorizes in N separate 2D-Gaussians

• Iterate
1. Resample particles to get N weight-1-particles: {x̂it-1 }N i=1
2. Use motion model to get new “predictive” particles {xit }N i=1
3. Update the map belief pim (m) ∝ P (yt | m, xt ) pit-1 (m) for each particle
4. Compute new importance weights wti ∝ P (yt | xit , m) pt−1 (m) dm
R

using the observation model and the map belief

7:42

Demo: Visual SLAM

• Map building from a freely moving camera


Introduction to Robotics, Marc Toussaint 123

7:43

Demo: Visual SLAM


• Map building from a freely moving camera

– SLAM has become a big topic in the vision community..


– features are typically landmarks θ1:N with SURF/SIFT features
– PTAM (Parallel Tracking and Mapping) parallelizes computations...

PTAM1 PTAM2 DTAM KinectFusion LSD-SLAM

Klein & Murray: Parallel Tracking and Mapping for Small AR Workspaces http:
//www.robots.ox.ac.uk/˜gk/PTAM/
Newcombe, Lovegrove & Davison: DTAM: Dense Tracking and Mapping in Real-
Time ICCV 2011. http://www.doc.ic.ac.uk/˜rnewcomb/
Engel, Schöps & Cremers: LSD-SLAM: Large-Scale Direct Monocular SLAM,
ECCV 2014. http://vision.in.tum.de/_media/spezial/bib/engel14eccv.pdf
7:44

Alternative SLAM approach: Graph-based


124 Introduction to Robotics, Marc Toussaint

• Represent the previous trajectory as a graph


– nodes = estimated positions & observations
– edges = transition & step estimation based on scan matching
• Loop Closing: check if some nodes might coincide → new edges
• Classical Optimization:
The whole graph defines an optimization problem: Find poses that minimize
sum of edge & node errors
7:45

Loop Closing Problem


(Doesn’t explicitly exist in Particle Filter methods: If particles cover the belief,
then “data association” solves the “loop closing problem”)

7:46

Graph-based SLAM
Introduction to Robotics, Marc Toussaint 125

Life-long Map Learning for Graph-based SLAM Approaches in Static Environments Kretzschmar, Grisetti,
Stachniss
7:47

SLAM code
• Graph-based and grid map methods:
http://openslam.org/

• Visual SLAM
see previous links
e.g. http://ewokrampage.wordpress.com/
7:48
126 Introduction to Robotics, Marc Toussaint

8 Control Theory

Cart pole example

state x = (x, ẋ, θ, θ̇)


h i
g sin(θ) + cos(θ) −c1 u − c2 θ̇2 sin(θ)
θ̈ = 4
l − c2 cos2 (θ)
h 3 i
ẍ = c1 u + c2 θ̇2 sin(θ) − θ̈ cos(θ)

8:1

Control Theory
• Concerns controlled systems of the form

ẋ = f (x, u) + noise(x, u)

and a controller of the form


π : (x, t) 7→ u

• We’ll neglect stochasticity here

• When analyzing a given controller π, one analyzes closed-loop system as de-


scribed by the differential equation

ẋ = f (x, π(x, t))

(E.g., analysis for convergence & stability)


8:2
Introduction to Robotics, Marc Toussaint 127

Core topics in Control Theory


• Stability*
Analyze the stability of a closed-loop system
→ Eigenvalue analysis or Lyapunov function method
• Controllability*
Analyze which dimensions (DoFs) of the system can actually in principle be controlled
• Transfer function
Analyze the closed-loop transfer function, i.e., “how frequencies are transmitted through
the system”. (→ Laplace transformation)
• Controller design
Find a controller with desired stability and/or transfer function properties
• Optimal control*
Define a cost function on the system behavior. Optimize a controller to minimize costs
8:3

Control Theory references

• Robert F. Stengel: Optimal control and estimation


Online lectures: http://www.princeton.edu/˜stengel/MAE546Lectures.
html (esp. lectures 3,4 and 7-9)

• From robotics lectures:


Stefan Schaal’s lecture Introduction to Robotics: http://www-clmc.usc.
edu/Teaching/TeachingIntroductionToRoboticsSyllabus
Drew Bagnell’s lecture on Adaptive Control and Reinforcement Learning http:
//robotwhisperer.org/acrls11/
8:4

Outline
• We’ll first consider optimal control
Goal: understand Algebraic Riccati equation
significance for local neighborhood control

• Then controllability & stability


8:5

8.1 Optimal control


8:6
128 Introduction to Robotics, Marc Toussaint

Optimal control (discrete time)


Given a controlled dynamic system

xt+1 = f (xt , ut )

we define a cost function


T
X
Jπ = c(xt , ut ) + φ(xT )
t=0

where x0 and the controller π : (x, t) 7→ u are given, which determines x1:T and
u0:T
8:7

Dynamic Programming & Bellman principle


An optimal policy has the property that whatever the initial state and initial decision
are, the remaining decisions must constitute an optimal policy with regard to the state
resulting from the first decision.

1 1
1 3
3
7 1
15 Goal
5 8
Start 20 3
3 3 10
1
1
5 3

“V (state) = minedge [c(edge) + V (next-state)]”


8:8

Bellman equation (discrete time)


• Define the value function or optimal cost-to-go function
T
hX i
Vt (x) = min c(xs , us ) + φ(xT )
π xt =x
s=t

• Bellman equation
h i
Vt (x) = minu c(x, u) + Vt+1 (f (x, u))
h i
The argmin gives the optimal control signal: πt∗ (x) = argminu · · ·
Introduction to Robotics, Marc Toussaint 129

Derivation:
T
hX i
Vt (x) = min c(xs , us ) + φ(xT )
π
s=t
h T
X i
= min c(x, ut ) + min[ c(xs , us ) + φ(xT )]
ut π
s=t+1
h i
= min c(x, ut ) + Vt+1 (f (x, ut ))
ut
8:9

Optimal Control (continuous time)


Given a controlled dynamic system
ẋ = f (x, u)
we define a cost function with horizon T
Z T
Jπ = c(x(t), u(t)) dt + φ(x(T ))
0

where the start state x(0) and the controller π : (x, t) 7→ u are given, which
determine the closed-loop system trajectory x(t), u(t) via ẋ = f (x, π(x, t)) and
u(t) = π(x(t), t)
8:10

Hamilton-Jacobi-Bellman equation (continuous time)


• Define the value function or optimal cost-to-go function
hZ T i
V (x, t) = min c(x(s), u(s)) ds + φ(x(T ))
π t x(t)=x

• Hamilton-Jacobi-Bellman equation
h i
∂ ∂V
− ∂t V (x, t) = minu c(x, u) + ∂x f (x, u)
h i
The argmin gives the optimal control signal: π ∗ (x) = argminu · · ·

Derivation: Apply the discrete-time Bellman equation for Vt and Vt+dt :


h Z t+dt i
V (x, t) = min c(x, u) dt + V (x(t + dt), t + dt)
u t
hZ t+dt Z t+dt dV (x, t) i
= min c(x, u) dt + V (x, t) + dt
u t t dt
hZ t+dt ∂V ∂V i
0 = min c(x, u) + + ẋ dt
u t ∂t ∂x
h ∂V ∂V i
0 = min c(x, u) + + f (x, u)
u ∂t ∂x
8:11
130 Introduction to Robotics, Marc Toussaint

Infinite horizon case

Z ∞
π
J = c(x(t), u(t)) dt
0

• This cost function is stationary (time-invariant)!


→ the optimal value function is stationary (V (x, t) = V (x))
→ the optimal control signal depends on x but not on t
→ the optimal controller π ∗ is stationary

• The HBJ and Bellman equations remain “the same” but with the same (station-
ary) value function independent of t:
h ∂V i
0 = min c(x, u) + f (x, u) (cont. time)
u
h ∂x i
V (x) = min c(x, u) + V (f (x, u)) (discrete time)
u

h i
The argmin gives the optimal control signal: π ∗ (x) = argminu · · ·
8:12

Infinite horizon examples


• Cart-pole balancing:
– You always want the pole to be upright (θ ≈ 0)
– You always want the car to be close to zero (x ≈ 0)
– You want to spare energy (apply low torques) (u ≈ 0)
You might define a cost Z ∞
Jπ = ||θ||2 + ||x||2 + ρ||u||2
0

• Reference following:
– You always want to stay close to a reference trajectory r(t)
˙
Define x̃(t) = x(t) − r(t) with dynamics x̃(t) = f (x̃(t) + r(t), u) − ṙ(t)
Define a cost Z ∞
Jπ = ||x̃||2 + ρ||u||2
0

• Many many problems in control can be framed this way


8:13

Comments
• The Bellman equation is fundamental in optimal control theory, but also Rein-
forcement Learning
Introduction to Robotics, Marc Toussaint 131

• The HJB eq. is a differential equation for V (x, t) which is in general hard to
solve
• The (time-discretized) Bellman equation can be solved by Dynamic Program-
ming starting backward:
h i
VT (x) = φ(x) , VT -1 (x) = min c(x, u) + VT (f (x, u)) etc.
u

But it might still be hard or infeasible to represent the functions Vt (x) over
continuous x!

• Both become significantly simpler under linear dynamics and quadratic costs:
→ Riccati equation
8:14

Linear-Quadratic Optimal Control


linear dynamics
ẋ = f (x, u) = Ax + Bu

quadratic costs

c(x, u) = x>Qx + u>Ru , φ(xT ) = x>T F xT

• Note: Dynamics neglects constant term; costs neglect linear and constant terms.
This is because
– constant costs are irrelevant
– linear cost terms can be made away by redefining x or u
– constant dynamic term only introduces a constant drift
8:15

Linear-Quadratic Control as Local Approximation


• LQ control is important also to control non-LQ systems in the neighborhood
of a desired state!

Let x∗ be such a desired state (e.g., cart-pole: x∗ = (0, 0, 0, 0))


– linearize the dynamics around x∗
– use 2nd order approximation of the costs around x∗
– control the system locally as if it was LQ
– pray that the system will never leave this neighborhood!
8:16
132 Introduction to Robotics, Marc Toussaint

Riccati differential equation = HJB eq. in LQ case


• In the Linear-Quadratic (LQ) case, the value function always is a quadratic
function of x!

Let V (x, t) = x>P (t)x, then the HBJ equation becomes


∂ h ∂V i
− V (x, t) = min c(x, u) + f (x, u)
∂t u
h ∂x i
−x>Ṗ (t)x = min x>Qx + u>Ru + 2x>P (t)(Ax + Bu)
u
∂ h > i
0= x Qx + u>Ru + 2x>P (t)(Ax + Bu)
∂u
= 2u>R + 2x>P (t)B
u∗ = −R-1 B>P x

⇒ Riccati differential equation


−Ṗ = A>P + P A − P BR-1 B>P + Q
8:17

Riccati differential equation

−Ṗ = A>P + P A − P BR-1 B>P + Q


• This is a differential equation for the matrix P (t) describing the quadratic value
function. If we solve it with the finite horizon constraint P (T ) = F we solved
the optimal control problem

• The optimal control u∗ = −R-1 B>P x is called Linear Quadratic Regulator

Note: If the state is dynamic (e.g., x = (q, q̇)) this control is linear in the posi-
tions and linear in the velocities and is an instance of PD control
The matrix K = R-1 B>P is therefore also called gain matrix
For instance, if x(t) = (q(t) − r(t), q̇(t) − ṙ(t)) for a reference r(t) and K = [Kp Kd ]
then
u∗ = Kp (r(t) − q(t)) + Kd (ṙ(t) − q̇(t))
8:18

Riccati equations
• Finite horizon continuous time
Riccati differential equation

−Ṗ = A>P + P A − P BR-1 B>P + Q , P (T ) = F


Introduction to Robotics, Marc Toussaint 133

• Infinite horizon continuous time


Algebraic Riccati equation (ARE)
0 = A>P + P A − P BR-1 B>P + Q

PT
• Finite horizon discrete time (J π = t=0 ||xt ||2Q + ||ut ||2R + ||xT ||2F )

Pt-1 = Q + A>[Pt − Pt B(R + B>Pt B)-1 B>Pt ]A , PT = F

P∞
• Infinite horizon discrete time (J π = t=0 ||xt ||2Q + ||ut ||2R )

P = Q + A>[P − P B(R + B>P B)-1 B>P ]A

8:19

Example: 1D point mass


• Dynamics:
q̈(t) = u(t)/m

         

x= q
  , ẋ =  q̇   q̇ 
  =   = 
0 1  0 
x +  u
q̇ q̈ u(t)/m 0 0 1/m
   

= Ax + Bu , A =  0 1 , B= 0 
0 0 1/m
  

• Costs:

c(x, u) = ||x||2 + %||u||2 , Q = I , R = %I

• Algebraic Riccati equation:

a
 
c −1
P =  , u∗ = −R-1 B>P x = [cq + bq̇]
c b %m

0 = A>P + P A − P BR-1 B>P + Q



c b
 
0 a

1 c
2 
bc 

1

0
=  +   −  + 
0 0 0 c 2 2 0 1
%m bc b
  

8:20

Example: 1D point mass (cont.)


134 Introduction to Robotics, Marc Toussaint

• Algebraic Riccati equation:

a

c

−1
P =  , u∗ = −R-1 B>P x = [cq + bq̇]
c b %m

c

b

a

0

1 c
2 
bc 

1 0

0=  −
 +   + 
0 0 c 0 2 2 0 1
%m bc b
  

√ √
First solve for c, then for b = m % c + . Whether the damping ration ξ =
√b depends on the choices of % and .
4mc

• The Algebraic Riccati equation is usually solved numerically. (E.g. are, care
or dare in Octave)
8:21

Optimal control comments


• HJB or Bellman equation are very powerful concepts

• Even if we can solve the HJB eq. and have the optimal control, we still don’t
know how the system really behaves?
– Will it actually reach a “desired state”?
– Will it be stable?
– It is actually “controllable” at all?

• Last note on optimal control:


Formulate as a constrainted optimization problem with objective function J π and con-

straint ẋ = f (x, u). λ(t) are the Langrange multipliers. It turns out that ∂x V (x, t) =
λ(t). (See Stengel.)
8:22

Relation to other topics


• Optimal Control:
Z T
min J π = c(x(t), u(t)) dt + φ(x(T ))
π 0

• Inverse Kinematics:

min f (q) = ||q − q0 ||2W + ||φ(q) − y ∗ ||2C


q

• Operational space control:

min f (u) = ||u||2H + ||φ̈(q) − ÿ ∗ ||2C


u
Introduction to Robotics, Marc Toussaint 135

• Trajectory Optimization: (control hard constraints could be included)

T
X T
X
min f (q0:T ) = ||Ψt (qt-k , .., qt )||2 + ||Φt (qt )||2
q0:T
t=0 t=0

• Reinforcement Learning:
– Markov Decision Processes ↔ discrete time stochastic controlled system P (xt+1 | ut , x
– Bellman equation → Basic RL methods (Q-learning, etc)
8:23

8.2 Controllability
8:24

Controllability
• As a starting point, consider the claim:
“Intelligence means to gain maximal controllability over all degrees of freedom over the
environment.”

Note:
– controllability (ability to control) 6= control
– What does controllability mean exactly?

• I think the general idea of controllability is really interesting


– Linear control theory provides one specific definition of controllability, which
we introduce next..
8:25

Controllability
• Consider a linear controlled system

ẋ = Ax + Bu

How can we tell from the matrices A and B whether we can control x to eventually
reach any desired state?

• Example: x is 2-dim, u is 1-dim:


      
 ẋ1  0
 = 
0 x1  1
 +  u
ẋ2 0 0 x2 0
 

Is x “controllable”?
136 Introduction to Robotics, Marc Toussaint

      
 ẋ1  0
 = 
1 x1  0
 +  u
ẋ2 0 0 x2 1
 

Is x “controllable”?
8:26

Controllability
We consider a linear stationary (=time-invariant) controlled system

ẋ = Ax + Bu

• Complete controllability: All elements of the state can be brought from arbi-
trary initial conditions to zero in finite time
• A system is completely controllable iff the controllability matrix
h i
C := B AB A2 B · · · An-1 B

has full rank dim(x) (that is, all rows are linearly independent)

• Meaning of C:
The ith row describes how the ith element xi can be influenced by u
“B”: ẋi is directly influenced via B
“AB”: ẍi is “indirectly” influenced via AB (u directly influences some ẋj
via B; the dynamics A then influence ẍi depending on ẋj )
...
“A2 B”: x i is “double-indirectly” influenced
etc...
Note: ẍ = Aẋ + B u̇ = AAx + ABu + B u̇
...
x = A3 x + A2 Bu + AB u̇ + B ü
8:27

Controllability
• When all rows of the controllability matrix are linearly independent ⇒ (u, u̇, ü, ...)
can influence all elements of x independently
• If a row is zero → this element of x cannot be controlled at all
• If 2 rows are linearly dependent → these two elements of x will remain coupled
forever
8:28

Controllability examples
Introduction to Robotics, Marc Toussaint 137

        
 ẋ1  0
 = 
0  x1  1
 +  u C= 1 0 rows linearly dependent
ẋ2 0 0 x2 1 1 0
  

        
 ẋ1  0
 = 
0 x1  1
 +  u C= 1 0 2nd row zero
ẋ2 0 0 x2 0 0 0
  

        
 ẋ1  0
 = 
1 x1  0
 +  u C= 0 1 good!
ẋ2 0 0 x2 1 1 0
  

8:29

Controllability
Why is it important/interesting to analyze controllability?

• The Algebraic Riccati Equation will always return an “optimal” controller – but
controllability tells us whether such a controller even has a chance to control x

• “Intelligence means to gain maximal controllability over all degrees of freedom over the
environment.”
– real environments are non-linear
– “to have the ability to gain controllability over the environment’s DoFs”
8:30

8.3 Stability
8:31

Stability
• One of the most central topics in control theory

• Instead of designing a controller by first designing a cost function and then


applying Riccati,
design a controller such that the desired state is provably a stable equilibrium
point of the closed loop system
8:32

Stability
• Stability is an analysis of the closed loop system. That is: for this analysis we
don’t need to distinguish between system and controller explicitly. Both to-
gether define the dynamics

ẋ = f (x, π(x, t)) =: f (x)


138 Introduction to Robotics, Marc Toussaint

• The following will therefore discuss stability analysis of general differential


equations ẋ = f (x)

• What follows:
– 3 basic definitions of stability
– 2 basic methods for analysis by Lyapunov
8:33

Aleksandr Lyapunov (1857–1918)


8:34

Stability – 3 definitions
ẋ = F (x) with equilibrium point x = 0
• x0 is an equilibrium point ⇐⇒ f (x0 ) = 0

• Lyapunov stable or uniformly stable ⇐⇒

∀ : ∃δ s.t. ||x(0)|| ≤ δ ⇒ ∀t : ||x(t)|| ≤ 


Introduction to Robotics, Marc Toussaint 139

(when it starts off δ-near to x0 , it will remain -near forever)

• asymtotically stable ⇐⇒
Lyapunov stable and limt→∞ x(t) = 0

• exponentially stable ⇐⇒
asymtotically stable and ∃α, a s.t. ||x(t)|| ≤ ae−αt ||x(0)||
R∞
(→ the “error” time integral 0 ||x(t)||dt ≤ αa ||x(0)|| is bounded!)
8:35

Linear Stability Analysis


(“Linear” ↔ “local” for a system linearized at the equilibrium point.)
• Given a linear system
ẋ = Ax
Let λi be the eigenvalues of A
– The system is asymptotically stable ⇐⇒ ∀i : real(λi ) < 0
– The system is unstable stable ⇐⇒ ∃i : real(λi ) > 0
– The system is marginally stable ⇐⇒ ∀i : real(λi ) ≤ 0

• Meaning: An eigenvalue describes how the system behaves along one state dimension
(along the eigenvector):
ẋi = λi xi
As for the 1D point mass the solution is xi (t) = aeλi t and
– imaginary λi → oscillation
– negative real(λi ) → exponential decay ∝ e−|λi |t
– positive real(λi ) → exponential explosion ∝ e|λi |t
8:36

Linear Stability Analysis: Example


• Let’s take the 1D point mass q̈ = u/m in closed loop with a PD u = −Kp q − Kd q̇
140 Introduction to Robotics, Marc Toussaint

• Dynamics:
     

ẋ =  q̇   0 1 
  =  x + 1/m
 0 0 x
q̈ 0 0 −Kp −Kd
 

A= 0 1 
−Kp /m −Kd /m
 

• Eigenvalues:
    

The equation λ q 
  = 
0 1  q  2
  leads to the equation λq̇ = λ q =
q̇ −Kp /m −Kd /m q̇
−Kp /mq − Kd /mλq or mλ2 + Kd λ + Kp = 0 with solution (compare slide 05:10)
p
−Kd ± Kd2 − 4mKp
λ=
2m

For Kd2 − 4mKp negative, the real(λ) = −Kd /2m


⇒ Positive derivative gain Kd makes the system stable.
8:37

Side note: Stability for discrete time systems


• Given a discrete time linear system

xt+1 = Axt

Let λi be the eigenvalues of A


– The system is asymptotically stable ⇐⇒ ∀i : |λi | < 1
– The system is unstable stable ⇐⇒ ∃i : |λi | > 1
– The system is marginally stable ⇐⇒ ∀i : |λi | ≤ 1
8:38

Linear Stability Analysis comments


• The same type of analysis can be done locally for non-linear systems, as we
will do for the cart-pole in the exercises

• We can design a controller that minimizes the (negative) eigenvalues of A:


↔ controller with fastest asymtopic convergence

This is a real alternative to optimal control!


8:39

Lyapunov function method


Introduction to Robotics, Marc Toussaint 141

• A method to analyze/prove stability for general non-linear systems is the fa-


mous “Lyapunov’s second method”

Let D be a region around the equilibrium point x0


• A Lyaponov function V (x) for a system dynamics ẋ = f (x) is
– positive, V (x) > 0, everywhere in D except...
at the equilibrium point where V (x0 ) = 0
∂V (x)
– always decreases, V̇ (x) = ∂x ẋ < 0, in D except...
at the equilibrium point where f (x) = 0 and therefore V̇ (x) = 0

• If there exists a D and a Lyapunov function ⇒ the system is asymtotically stable

If D is the whole state space, the system is globally stable


8:40

Lyapunov function method


• The Lyapunov function method is very general. V (x) could be “anything”
(energy, cost-to-go, whatever). Whenever one finds some V (x) that decreases,
this proves stability

• The problem though is to think of some V (x) given a dynamics!


(In that sense, the Lyapunov function method is rather a method of proof than
a concrete method for stability analysis.)

• In standard cases, a good guess for the Lyapunov function is either the energy
or the value function
8:41

Lyapunov function method – Energy Example


• Let’s take the 1D point mass q̈ = u/m in closed loop with a PD u = −Kp q − Kd q̇,
which has the solution (slide 05:14):
√ 2
q(t) = be−ξ/λ t eiω0 1−ξ t
• Energy of the 1D point mass: V (q, q̇) := 21 mq̇ 2
V̇ (x) = e−2ξ/λ t V (x(0))
(using that the energy of an undamped oscillator is conserved)
• V (x) < 0 ⇐⇒ ξ > 0 ⇐⇒ Kd > 0
Same result as for the eigenvalue analysis
8:42
142 Introduction to Robotics, Marc Toussaint

Lyapunov function method – value function Example


• Consider infinite horizon linear-quadratic optimal control. The solution of the
Algebraic Riccati equation gives the optimal controller.
• The value function satisfies
V (x) = x>P x
V̇ (x) = [Ax + Bu∗ ]>P x + x>P [Ax + Bu∗ ]
u∗ = −R-1 B>P x = Kx
V̇ (x) = x>[(A + BK)>P + P (A + BK)]x
= x>[A>P + P A + (BK)>P + P (BK)]x
0 = A>P + P A − P BR-1 B>P + Q
V̇ (x) = x>[P BR-1 B>P − Q + (P BK)> + P BK]x
= −x>[Q + K>RK]x

(We could have derived this easier! x>Qx are the immediate state costs, and x>K>RKx =
u>Ru are the immediate control costs—and V̇ (x) = −c(x, u∗ )! See slide 11 bottom.)
• That is: V is a Lyapunov function if Q + K>RK is positive definite!
8:43

Observability & Adaptive Control


• When some state dimensions are not directly observable: analyzing higher or-
der derivatives to infer them.
Very closely related to controllability: Just like the controllability matrix tells
whether state dimensions can (indirectly) be controlled; an observation matrix
tells whether state dimensions can (indirectly) be inferred.

• Adaptive Control: When system dynamics ẋ = f (x, u, β) has unknown param-


eters β.
– One approach is to estimate β from the data so far and use optimal control.
– Another is to design a controller that has an additional internal update equa-
tion for an estimate β̂ and is provably stable. (See Schaal’s lecture, for instance.)
8:44
Introduction to Robotics, Marc Toussaint 143

9 Reinforcement Learning in Robotics – briefly

RL = Learning to Act

from Satinder Singh’s Introduction to RL, videolectures.com


9:1

(around 2000, by Schaal, Atkeson, Vijayakumar)


144 Introduction to Robotics, Marc Toussaint

(2007, Andrew Ng et al.)


9:2

Applications of RL
• Robotics
– Navigation, walking, juggling, helicopters, grasping, etc...
• Games
– Backgammon, Chess, Othello, Tetris, ...
• Control
– factory processes, resource control in multimedia networks, elevators, ....
• Operations Research
– Warehousing, transportation, scheduling, ...
9:3

Markov Decision Process

a0 a1 a2

s0 s1 s2

r0 r1 r2
QT
P (s0:T +1 , a0:T , r0:T ; π) = P (s0 ) t=0 P (at |st ; π) P (rt |st , at ) P (st+1 |st , at )

– world’s initial state distribution P (s0 )


– world’s transition probabilities P (st+1 | st , at )
– world’s reward probabilities P (rt | st , at )
– agent’s policy π(at | st ) = P (a0 |s0 ; π) (or deterministic at = π(st ))

• Stationary MDP:
– We assume P (s0 | s, a) and P (r|s, a) independent of time
R
– We also define R(s, a) := E{r|s, a} = r P (r|s, a) dr
9:4

... in the notation of control theory


• We have a (potentially stochastic) controlled system

ẋ = f (x, u) + noise(x, u)
Introduction to Robotics, Marc Toussaint 145

• We have costs (neg-rewards), e.g. in the finite horizon case:


Z T
π
J = c(x(t), u(t)) dt + φ(x(T ))
0

• We want a policy (“controller”)

π : (x, t) 7→ u

9:5

Reinforcement Learning = the dynamics f and costs c are unknown

• All the agent can do is collect data

D = {(xt , ut , ct )}Tt=0

What can we do with this data?


9:6

Five approaches to RL

experience data demonstration data


D = {(xt , ut , ct )}Tt=0 D = {(x0:T , u0:T )d }n
d=1

learn model learn value fct. optimize policy learn policy learn latent costs
P (x0 |u, x) V (x) π(x) π(x) c(x)
c(x, u)
Imitation Learning

dynamic prog. policy dynamic prog.


Vt (x) π(x) Vt
Policy Search
Model−based

policy
Model−free

Inverse OC

policy
πt (x) π(x)

9:7
146 Introduction to Robotics, Marc Toussaint

Five approaches to RL

experience data demonstration data


D = {(st , at , rt )}Tt=0 D = {(s0:T , a0:T )d }n
d=1

learn model learn value fct. optimize policy learn policy learn latent costs
P (s0 |s, a) V (s) π(s) π(s) R(s, a)
R(s, a)

Imitation Learning
dynamic prog. policy dynamic prog.
V (s) π(s) V (s)

Policy Search
Model−based

policy
Model−free

policy

Inverse RL
π(s) π(s)

9:8

Imitation Learning

learn/copy
D = {(s0:T , a0:T )d }nd=1 → π(s)

• Use ML to imitate demonstrated state trajectories x0:T

Literature:

Atkeson & Schaal: Robot learning from demonstration (ICML 1997)


Schaal, Ijspeert & Billard: Computational approaches to motor learning by imitation (Philosophical
Transactions of the Royal Society of London. Series B: Biological Sciences 2003)
Grimes, Chalodhorn & Rao: Dynamic Imitation in a Humanoid Robot through Nonparametric Prob-
abilistic Inference. (RSS 2006)
Rüdiger Dillmann: Teaching and learning of robot tasks via observation of human performance
(Robotics and Autonomous Systems, 2004)
9:9

Imitation Learning
• There a many ways to imitate/copy the oberved policy:

Learn a density model P (at | st )P (st ) (e.g., with mixture of Gaussians) from the
observed data and use it as policy (Billard et al.)
Introduction to Robotics, Marc Toussaint 147

Or trace observed trajectories by minimizing perturbation costs (Atkeson &


Schaal 1997)
9:10

Imitation Learning

Atkeson & Schaal


9:11

Inverse RL

learn DP
D = {(s0:T , a0:T )d }nd=1 → R(s, a) → V (s) → π(s)

• Use ML to “uncover” the latent reward function in observed behavior

Literature:

Pieter Abbeel & Andrew Ng: Apprenticeship learning via inverse reinforcement learning (ICML
2004)
Andrew Ng & Stuart Russell: Algorithms for Inverse Reinforcement Learning (ICML 2000)
Nikolay Jetchev & Marc Toussaint: Task Space Retrieval Using Inverse Feedback Control (ICML
2011).
9:12
148 Introduction to Robotics, Marc Toussaint

Inverse RL (Apprenticeship Learning)


• Given: demonstrations D = {xd0:T }nd=1
• Try to find a reward function that discriminates demonstrations from other
policies
– Assume the reward function is linear in some features R(x) = w>φ(x)
– Iterate:
1. Given a set of candidate policies {π0 , π1 , ..}
2. Find weights w that maximize the value margin between teacher and all
other candidates

max ξ
w,ξ

s.t. ∀πi : w> hφiD ≥ w> hφiπi +ξ


| {z } | {z }
value of demonstrations value of πi
2
||w|| ≤ 1

3. Compute a new candidate policy πi that optimizes R(x) = w>φ(x) and


add to candidate list.
(Abbeel & Ng, ICML 2004)
9:13

9:14
Introduction to Robotics, Marc Toussaint 149

Policy Search with Policy Gradients


9:15

Policy gradients
• In continuous state/action case, represent the policy as linear in arbitrary state
features:
k
X
π(s) = φj (s)βj = φ(s)>β (deterministic)
j=1

π(a | s) = N(a | φ(s)>β, φ(s)>Σφ(s)) (stochastic)

with k features φj .

• Given an episode ξ = (st , at , rt )H


t=0 , we want to estimate

∂V (β)
∂β
9:16

Policy Gradients
• One approach is called REINFORCE:
Z Z
∂V (β) ∂ ∂
= P (ξ|β) R(ξ) dξ = P (ξ|β) log P (ξ|β)R(ξ)dξ
∂β ∂β ∂β
H H
∂ X ∂ log π(at |st ) X t0 −t
= Eξ|β { log P (ξ|β)R(ξ)} = Eξ|β { γt γ rt 0 }
∂β ∂β
t=0 t0 =t
| {z }
Qπ (st ,at ,t)

• Another is Natural Policy Gradient



– Estimate the Q-function as linear in the basis functions ∂β
log π(a|s):
h ∂ log π(a|s) i>
Q(s, a) ≈ w
∂β

– Then the natural gradient ( ∂V∂β(β) multiplied with inv. Fisher metric) updates

β new = β + αw

• Another is PoWER: Use the stochastic policy π(a | s), let at = φ(st )>(β + t ) where
t ∼ N(0, Σ) is the sampled noise, then ∂V∂β(β) = 0 implies

Eξ|β { H t Qπ (st , at , t)}


P
β←β+ Pt=0
H
Eξ|β { t=0 Qπ (st , at , t)}
9:17
150 Introduction to Robotics, Marc Toussaint

Kober & Peters: Policy Search for Motor Primitives in Robotics, NIPS 2008.

(serious reward shaping!)


9:18

Learning to walk in 20 Minutes


• Policy Gradient method (Reinforcement Learning)
P
Stationary policy parameterized as linear in features u = i wi φi (q, q̇)
• Problem: find parameters wi to minimize expected costs
cost = mimick (q, q̇) of the passive down-hill walker at “certain point in cycle”

Learning To Walk

Tedrake, Zhang & Seung: Stochastic policy gradient reinforcement learning on a simple 3D biped. IROS,
2849-2854, 2004. http://groups.csail.mit.edu/robotics-center/public_papers/Tedrake0
pdf
9:19
Introduction to Robotics, Marc Toussaint 151

Policy Gradients – references


Peters & Schaal (2008): Reinforcement learning of motor skills with policy gradients, Neural Networks.

Kober & Peters: Policy Search for Motor Primitives in Robotics, NIPS 2008.

Vlassis, Toussaint (2009): Learning Model-free Robot Control by a Monte Carlo EM Algorithm. Au-
tonomous Robots 27, 123-130.

Rawlik, Toussaint, Vijayakumar(2012): On Stochastic Optimal Control and Reinforcement Learning by


Approximate Inference. RSS 2012. (ψ-learning)

• These methods are sometimes called white-box optimization:


P t
They optimize the policy parameters β for the total reward R = γ rt while
tying to exploit knowledge of how the process is actually parameterized
9:20

Policy Search with Black-Box Optimization [skipped]


9:21

“Black-Box Optimization”
• The term is not really well defined
– I use it to express that only f (x) can be evaluated
– ∇f (x) or ∇2 f (x) are not (directly) accessible
More common terms:

• Global optimization
– This usually emphasizes that methods should not get stuck in local optima
– Very very interesting domain – close analogies to (active) Machine Learning, ban-
dits, POMDPs, optimal decision making/planning, optimal experimental design
– Usually mathematically well founded methods

• Stochastic search or Evolutionary Algorithms or Local Search


– Usually these are local methods (extensions trying to be “more” global)
– Various interesting heuristics
– Some of them (implicitly or explicitly) locally approximating gradients or 2nd order
models
9:22

Black-Box Optimization
• Problem: Let x ∈ Rn , f : Rn → R, find

min f (x)
x
152 Introduction to Robotics, Marc Toussaint

where we can only evaluate f (x) for any x ∈ Rn

• A constrained version: Let x ∈ Rn , f : Rn → R, g : Rn → {0, 1}, find


min f (x) s.t. g(x) = 1
x

where we can only evaluate f (x) and g(x) for any x ∈ Rn


I haven’t seen much work on this. Would be interesting to consider this more rigorously.
9:23

A zoo of approaches
• People with many different backgrounds drawn into this
Ranging from heuristics and Evolutionary Algorithms to heavy mathematics

– Evolutionary Algorithms, esp. Evolution Strategies, Covariance Matrix Adaptation,


Estimation of Distribution Algorithms
– Simulated Annealing, Hill Climing, Downhill Simplex
– local modelling (gradient/Hessian), global modelling
9:24

Optimizing and Learning


• Black-Box optimization is strongly related to learning:
• When we have local a gradient or Hessian, we can take that local information
and run – no need to keep track of the history or learn (exception: BFGS)
• In the black-box case we have no local information directly accessible
→ one needs to account for the history in some way or another to have an idea
where to continue search
• “Accounting for the history” very often means learning: Learning a local or
global model of f itself, learning which steps have been successful recently
(gradient estimation), or which step directions, or other heuristics
9:25

Stochastic Search
9:26

Stochastic Search
• The general recipe:
– The algorithm maintains a probability distribution pθ (x)
– In each iteration it takes n samples {xi }ni=1 ∼ pθ (x)
– Each xi is evaluated → data {(xi , f (xi ))}ni=1
– That data is used to update θ
Introduction to Robotics, Marc Toussaint 153

• Stochastic Search:

Input: initial parameter θ, function f (x), distribution model pθ (x), update heuristic
h(θ, D)
Output: final θ and best point x
1: repeat
2: Sample {xi }ni=1 ∼ pθ (x)
3: Evaluate samples, D = {(xi , f (xi ))}n i=1
4: Update θ ← h(θ, D)
5: until θ converges

9:27

Stochastic Search
• The parameter θ is the only “knowledge/information” that is being propagated
between iterations
θ encodes what has been learned from the history
θ defines where to search in the future

• Evolutionary Algorithms: θ is a parent population


Evolution Strategies: θ defines a Gaussian with mean & variance
Estimation of Distribution Algorithms: θ are parameters of some distribution
model, e.g. Bayesian Network
Simulated Annealing: θ is the “current point” and a temperature
9:28

Example: Gaussian search distribution (µ, λ)-ES


From 1960s/70s. Rechenberg/Schwefel
• Perhaps the simplest type of distribution model

θ = (x̂) , pt (x) = N(x|x̂, σ 2 )

a n-dimenstional isotropic Gaussian with fixed deviation σ

• Update heuristic:
– Given D = {(xi , f (xi ))}λi=1 , select µ best: D0 = bestOfµ (D)
– Compute the new mean x̂ from D0

• This algorithm is called “Evolution Strategy (µ, λ)-ES”


– The Gaussian is meant to represent a “species”
– λ offspring are generated
154 Introduction to Robotics, Marc Toussaint

– the best µ selected


9:29

Covariance Matrix Adaptation (CMA-ES)


• An obvious critique of the simple Evolution Strategies:
– The search distribution N(x|x̂, σ 2 ) is isotropic
(no going forward, no preferred direction)
– The variance σ is fixed!

• Covariance Matrix Adaptation Evolution Strategy (CMA-ES)

9:30

Covariance Matrix Adaptation (CMA-ES)


• In Covariance Matrix Adaptation

θ = (x̂, σ, C, pσ , pC ) , pθ (x) = N(x|x̂, σ 2 C)

where C is the covariance matrix of the search distribution


• The θ maintains two more pieces of information: pσ and pC capture the “path”
(motion) of the mean x̂ in recent iterations
• Rough outline of the θ-update:
– Let D0 = bestOfµ (D) be the set of selected points
– Compute the new mean x̂ from D0
– Update pσ and pC proportional to x̂k+1 − x̂k
– Update σ depending on |pσ |
– Update C depending on pc p>c (rank-1-update) and Var(D0 )
9:31
Introduction to Robotics, Marc Toussaint 155

CMA references
Hansen, N. (2006), ”The CMA evolution strategy: a comparing review”
Hansen et al.: Evaluating the CMA Evolution Strategy on Multimodal Test
Functions, PPSN 2004.

• For “large enough” populations local minima are avoided

• An interesting variant:
Igel et al.: A Computational Efficient Covariance Matrix Update and a (1 + 1)-
CMA for Evolution Strategies, GECCO 2006.
9:32

CMA conclusions
• It is a good starting point for an off-the-shelf black-box algorithm
• It includes components like estimating the local gradient (pσ , pC ), the local
“Hessian” (Var(D0 )), smoothing out local minima (large populations)
9:33

Stochastic search conclusions

Input: initial parameter θ, function f (x), distribution model pθ (x), update heuristic
h(θ, D)
Output: final θ and best point x
1: repeat
2: Sample {xi }ni=1 ∼ pθ (x)
3: Evaluate samples, D = {(xi , f (xi ))}n i=1
4: Update θ ← h(θ, D)
5: until θ converges

• The framework is very general


• The crucial difference between algorithms is their choice of pθ (x)
9:34
156 Introduction to Robotics, Marc Toussaint

RL under Partial Observability


• Data:
D = {(ut , ct , yt )t }Tt=0
→ state xt not observable

• Model-based RL is dauting: Learning P (x0 |u, x) and P (y|u, x) with latent x is


very hard

• Model-free: The policy needs to map the history to a new control

π : (yt−h,..,t-1 , ut−h,..,t−1 ) 7→ ut

or any features of the history

ut = φ(yt−h,..,t-1 , ut−h,..,t−1 )>w

9:35
Introduction to Robotics, Marc Toussaint 157

10 SKIPPED THIS TERM – Grasping (brief intro)

Grasping
• The most elementary type of interaction with (manipulation of) the environ-
ment.
– Basis for intelligent behavior.

• In industrial settings with high precision sensors and actuators: very fast and
precise.

• In general real world with uncertain actuators and perception, still a great re-
search challenge, despite all the theory that has been developed.
10:1

Pick-and-place in industry

(This type of kinematics is called “Delta Robot”, which is a “parallel robot” just
as the Stewart platform.)
10:2

Research
• Using ultra high speed and precise cameras and localization: High speed robot
hand from the Ishikawa Komuro’s “Sensor Fusion” Lab
158 Introduction to Robotics, Marc Toussaint

http://www.k2.t.u-tokyo.ac.jp/fusion/index-e.html

• Asimo’s grasping:

10:3

Outline
• Introduce to the basic classical concepts for grasping (force closure)

• Discussion and alternative views

• References:

Craig’s Introduction to robotics: mechanics and control – chapter 3.

Matt Mason’s lecture: Static and Quasistatic Manipulation


www.cs.cmu.edu/afs/cs/academic/class/16741-s07/www/lecture18.pdf

Daniela Rus and Seth Teller’s lecture: Grasping and Manipulation


courses.csail.mit.edu/6.141/spring2011/pub/lectures/Lec13-Manipulation-I.
pdf
10:4

Force Closure
• Which of these objects is in “tight grip”?
Introduction to Robotics, Marc Toussaint 159

Defining “tight grip”: Assume fingers (vectors) have no friction – but can exert
arbitrary normal forces. Can we generate (or counter-act) arbitrary forces on the
object?
10:5

Force Closure – more rigorously


• Assume that each “finger” is a point that can apply forces on the object as
decribed by the friction cone

• Each finger is described by a point pi and a force fi ∈ Fi in the fingers friction


cone. Together they can exert the the force an torque:
X X
f total = fi , τ total = fi × (pi − c)
i i

• Force closure ⇐⇒ we can generate (counter-act) arbitrary f total and τ total by


choosing fi ∈ Fi appropriately.
↔ Check whether the positive linear span of the fiction cones covers the whole
space.
10:6

Form & Force Closure & Caging


160 Introduction to Robotics, Marc Toussaint

• Force closure: The contacts can apply an arbitrary wrench (=force-torque) to


the object.

• Form closure: The object is at an isolated point in configuration space. Note:


form closure ⇐⇒ frictionless force closure

• Caging: The object is not fixated, but cannot escape

• Equilibrium: The contact forces can balance the object’s weight and other ex-
ternal forces.
10:7

Traditional research into force closure

• Theorem (Mishra, Schwartz and Sharir, 1987):


For any bounded shape that is not a surface of revolution, a force closure (or first order
form closure) grasp exists.

• Guaranteed synthesis:
1) put fingers “everywhere”
2) while redundant finger exists delete any redundant finger
(A finger is redundant if it can be deleted without reducing the positive linear
span.)

Theorem (Mishra, Schwartz, and Sharir, 1987):


For any surface not a surface of revolution, [the above method] yields a grasp with at
most 6 fingers in the plane, at most 12 fingers in three space.
10:8

Traditional research into force closure

• Force closure turn into a continuous optimization criterion:

– Constrain the absolute forces each finger can apply


(cut the friction cones)
– The friction cones define a finite convex polygon in 6D wrench space
→ What is the inner radius of this convex wrench polygon?
Introduction to Robotics, Marc Toussaint 161

Illustration from Suárez, Roa, Cornellà (2006): Grasp Quality Measures


10:9

The “mitten thought experiment”


• From Oliver Brock’s research website:
“Our approach to grasping is motivated by the ”mitten thought experiment”. This ex-
periment illustrates that a sensory information-deprived subject (blindfolded, wearing
a thick mitten to eliminate tactile feedback) is able to grasp a large variety of objects
reliably by simply closing the hand, provided that a second experimenter appropri-
ately positioned the object relative to the hand. This thought experiments illustrates
that an appropriate perceptual strategy (the experimenter) in conjunction with a simple
compliance-based control strategy (the mitten hand) can lead to outstanding grasping
performance.”

Illustration from O. Brock’s page


10:10

Food for thought


• Are point contact a good model?
• Is the whole idea of “arranging friction cones” the right approach?
• What about biomechanics?
10:11

Biomechanics of the human hand


• Finger tendons:
162 Introduction to Robotics, Marc Toussaint

J.N.A.L. Leijnse, 2005

De Bruijne et al. 1999

See Just Herder’s lecture on “Biograsping”


http://www.slideshare.net/DelftOpenEr/bio-inspired-design-lecture6-bio-graspin
10:12

Tendon-based hand mechanisms

Jaster Schuurmans, 2004


Shape Gripper, Shigeo Hirose, TITECH

DLR hand
10:13
Introduction to Robotics, Marc Toussaint 163

11 SKIPPED THIS TERM – Legged Locomotion (brief


intro)

Legged Locomotion

• Why legs?

Bacterial Flagellum: (rotational “motors” in Biology?)

11:1

Legged Locomotion

• Why legs?

– Human/Animal Locomotion Research

– Potentially less weight

– Better handling of rough terrains


(climbing, isolated footholds, ladders, stairs)
11:2

Rolling vs walking
164 Introduction to Robotics, Marc Toussaint

11:3

One-legged locomotion

• Three separate controllers for:


– hopping height
– horizontal velocity (foot placement)
– attitude (hip torques during stance)

• Each a simple (PD-like) controller

Raibert et al.: Dynamically Stable Legged Locomotion. 1985 http://dspace.mit.edu/handle/


1721.1/6820

Tedrake: Applied Optimal Control for Dynamically Stable Legged Locomotion. PhD thesis (2004). http:
//groups.csail.mit.edu/robotics-center/public_papers/Tedrake04b.pdf
11:4

Biped locomotion
• Walking vs Running
Introduction to Robotics, Marc Toussaint 165

Walking := in all instances at least one foot is on ground


Running := otherwise

• 2 phases of Walking
– double-support phase (in Robotics often statically stable)
– single-support phase (statically instable)

11:5

Asimo

11:6

Statically Stable Walk


• You could rest (hold pose) at any point in time and not fall over

⇐⇒ CoG projected on ground is within support polygon

CoG = center of gravity of all body masses


support polygon = hull of foot contact points

• Try yourself: Move as slow as you can but make normal length steps...
11:7

Zero Moment Point


166 Introduction to Robotics, Marc Toussaint

• Vukobratovic’s view on walking, leading to ZMP ideas:


“Basic characteristics of all biped locomotion systems are:

(i) the possibility of rotation of the overall system about one of the foot edges
caused by strong disturbances, which is equivalent to the appearance of an
unpowered (passive) DOF,

(ii) gait repeatability (symmetry), which is related to regular gait only

(iii) regular interchangeability of single- and double-support phases”

Vukobratovic & Borovac: Zero-moment point—Thirty five years of its life. International Journal of Hu-
manoid Robotics 1, 157-173, 2004. http://www.cs.cmu.edu/˜cga/legs/vukobratovic.
pdf
11:8

Zero Moment Point

11:9

Zero Moment Point


Introduction to Robotics, Marc Toussaint 167

• “ZMP is defined as that point on the ground at which the net moment of the
inertial forces and the gravity forces has no component along the horizontal
axes.” (Vukobratovic & Borovac, 2004)

• fi = force vector acting on body i (gravity plus external)


wi = angular vel vector of body i
Ii = interia tensor (∈ R3×3 ) of body i
ri = pi − pZMP = relative position of body i to ZMP

• Definition: pZMP is the point on the ground for which


!
(0, 0, *)>
P
i ri × fi + Ii ẇi + wi × Ii wi =

See also: Popovic, Goswami & Herr: Ground Reference Points in Legged Locomotion: Definitions, Bio-
logical Trajectories and Control Implications. International Journal of Robotics Research 24(12), 2005.
http://www.cs.cmu.edu/˜cga/legs/Popovic_Goswami_Herr_IJRR_Dec_2005.pdf
11:10

Zero Moment Point


• If the ZMP is outside the support polygon → foot rolls over the edge
(presumes foot is a rigid body, and non-compliant control)

• Locomotion is dynamically stable if the ZMP remains within the foot-print poly-
gons.
(↔ the support can always apply some momentum, if necessary)

• Computing the ZMP in practise:


– either exact robot model
– or foot pressure sensors
11:11

Zero Moment Point – example


• combine ZMP with 3D linear inverted pendulum model and model-predictive
control
168 Introduction to Robotics, Marc Toussaint

HRP-2 stair climbing

Kajita et al.: Biped Walking Pattern Generation by using Preview Control of Zero-Moment Point. ICRA
2003. http://eref.uqu.edu.sa/files/eref2/folder1/biped_walking_pattern_generatio
by_usin_53925.pdf
11:12

ZMP Summary
• ZMP is the “rescue” for conventional methods:
– ZMP-stability → the robot can be controlled as if foot is “glued” (virtually) to
the ground!
– The whole body behaves like a “conventional arm”
– Can accellerate q̈ any DoF → conventional dynamic control
fully actuated system

• Limitations:
– Humans don’t use ZMP stability, we allow our feet to roll
(toe-off, heel-strike: ZMP at edge of support polygon)
– Can’t describe robots with point feet (walking on stilts)
11:13

Models of human bipedal locomotion

The following illustrations are from:

McMahon: Mechanics of Locomotion. IJRR 3:4-28, 1984


http://www.cs.cmu.edu/˜cga/legs/mcmahon1.pdf
http://www.cs.cmu.edu/˜cga/legs/mcmahon2.pdf
11:14
Introduction to Robotics, Marc Toussaint 169

Walking research from Marey 1874:

11:15

Six determinants of gait

following Saunders, Inman & Eberhart (1953)


1. Compass Gait:

2. Pelvic Rotation:
170 Introduction to Robotics, Marc Toussaint

3. Pelvic Tilt:

4. Stance Knee Flexion:


Introduction to Robotics, Marc Toussaint 171

5. Stance Ankle Flextion:

6. Pelvis Lateral Displacement:


172 Introduction to Robotics, Marc Toussaint

11:16

Models of human bipedal locomotion


• Human model with 23 DoFs, 54 muscles
– Compare human walking data with model
– Model: optimize energy-per-distance
– Energy estimated based on metabolism and muscle heat rate models

Anderson & Pandy: Dynamic Optimization of Human Walking. Journal of Biomechanical Engineer-
ing 123:381-390, 2001. http://e.guigon.free.fr/rsc/article/AndersonPandy01.pdf
Anderson & Pandy: Static and dynamic optimization solutions for gait are practically equivalent. Journal
of Biomechanics 34 (2001) 153-161. http://www.bme.utexas.edu/faculty/pandy/StaticOptWalk
pdf
11:17

Models of human bipedal locomotion


Introduction to Robotics, Marc Toussaint 173

• Suggest different principles of human motion:


– passive dynamics (Compass Gait) ↔ underactuated system
– modulation of basic passive dynamics
– Energy minimization
11:18

Passive dynamic walking: Compass Gait

• Basic 2D planar model of the Compass Gait:

The pose is described by q = (θs , θns ), the state by (q, q̇)

Goswami, Thuilot & Espiau: A study of the passive gait of a compass-like biped robot: symmetry and
chaos. International Journal of Robotics Research 17, 1998.
http://www.ambarish.com/paper/COMPASS_IJRR_Goswami.pdf

11:19

Passive dynamic walking: Compass Gait

• Swing phase has analytic equations of motions


M (q)q̈ + C(q, q̇)q̇ + G(q) = 0
but can’t be solved analytically...
• Phase space plot of numeric solution:
174 Introduction to Robotics, Marc Toussaint

11:20

Passive walker examples


compass gait simulation
controlled on a circle
passive walker

• Minimally actuated: Minimal Control on rough terrain


11:21

Impact Models in the Compass Gait


• Switch between two consecutive swing phases: depends on slope!
• Typical assumptions made in simulation models:
– The contact of the swing leg with the ground results in no rebound and no slipping of
the swing leg.
– At the moment of impact, the stance leg lifts from the ground without interaction.
– The impact is instantaneous.
– The external forces during the impact can be represented by impulses.
– The impulsive forces may result in an instantaneous change in the velocities, but there
is no instantaneous change in the configuration.
– The actuators cannot generate impulses and, hence, can be ignored during impact.

Westervelt, Grizzle & Koditschek: Hybrid Zero Dynamics of Planar Biped Walkers. IEEE Trans. on
Automatic Control 48(1), 2003.
http://repository.upenn.edu/cgi/viewcontent.cgi?article=1124&context=ese_
papers
11:22
Introduction to Robotics, Marc Toussaint 175

Implausibility of the stiff Compass Gait leg

Geyer, Seyfarth & Blickhan: Compliant leg behaviour explains basic dynamics of walking and running.
Proc. Roy. Soc. Lond. B, 273(1603): 2861-2867, 2006. http://www.cs.cmu.edu/˜cga/legs/
GeyerEA06RoySocBiolSci.pdf
11:23

Learning to walk in 20 Minutes


• Policy Gradient method (Reinforcement Learning)
P
Stationary policy parameterized as linear in features u = i wi φi (q, q̇)
• Problem: find parameters wi to minimize expected costs
cost = mimick (q, q̇) of the passive down-hill walker at “certain point in cycle”

Learning To Walk

Tedrake, Zhang & Seung: Stochastic policy gradient reinforcement learning on a simple 3D biped. IROS,
2849-2854, 2004. http://groups.csail.mit.edu/robotics-center/public_papers/Tedrake0
pdf
11:24

Summary
176 Introduction to Robotics, Marc Toussaint

• ZMP type walking was successful (ASIMO, HRP-2, etc), but limited

• Future types of walking:


– Exploit passive dynamics, cope with underactuation
– Follow some general optimiality principle (but real-time!)
– Learn (esp. Reinforcement Learning)
– Compliant hardware! (controllable elasticity & damping)

• Recommended reading: Tedrake: Underactuated Robotics: Learning, Planning,


and Control for Effcient and Agile Machines. Course Notes for MIT 6.832
www.cs.berkeley.edu/˜pabbeel/cs287-fa09/readings/Tedrake-Aug09.pdf

11:25

Finally, multi-legged locomotion

11:26

Finally, multi-legged locomotion


Introduction to Robotics, Marc Toussaint 177

http://www.bostondynamics.com/robot_rise.html
11:27
178 Introduction to Robotics, Marc Toussaint

12 Exercises

12.1 Exercise 1
No need to prepare for this first tutorial. We’ll do the exercises together on the fly.

12.1.1 Prerequisites: Matrix equations

a) Let X, A be arbitrary matrices, A invertible. Solve for X:

XA + A> = I

b) Let X, A, B be arbitrary matrices, (C − 2A>) invertible. Solve for X:

X>C = [2A(X + B)]>

c) Let x ∈ Rn , y ∈ Rd , A ∈ Rd×n . A obviously not invertible, but let A>A be invert-


ible. Solve for x:
(Ax − y)>A = 0>n

d) As above, additionally B ∈ Rn×n , B positive-definite. Solve for x:

(Ax − y)>A + x>B = 0>n

12.1.2 Prerequisites: Vector derivatives

Let x ∈ Rn , y ∈ Rd , f, g : Rn → Rd , A ∈ Rd×n , C ∈ Rd×d . (Also provide the


dimensionality of the results.)

a) What is ∂x x ?
∂ >
b) What is ∂x [x x] ?
∂ >
c) What is ∂x [f (x) f (x)] ?
∂ >
d) What is ∂x [f (x) Cg(x)] ?
e) Let B and C be symmetric (and pos.def.). What is the minimum of (Ax −
y)>C(Ax − y) + x>Bx ?

12.1.3 Optimization

Given x ∈ Rn , f : Rn → R, we want to find argminx f (x). (We assume f is


uni-modal.)
a) What 1st-order optimization methods (querying f (x), ∇f (x) in each iteration) do
you know?
Introduction to Robotics, Marc Toussaint 179

b) What 2nd-order optimization methods (querying f (x), ∇f (x), ∇2 f (x) in each it-
eration) do you know?
c) What is backtracking line search?

12.2 Exercise 2
12.2.1 Geometry

a) Read the notes on basic 3D geometry at http://ipvs.informatik.uni-stuttgart.


de/mlr/marc/notes/3d-geometry.pdf at least until section 2. Prepare questions for
the exercises if you have any.
b) You have a book (coordinate frame B) lying on the table (world frame W ). Ini-
tially B and W are identical. Now you move the book 1 unit to the right, then rotate
it by 45◦ counter-clock-wise around its origin. Given a dot p marked on the book at
position pB = (1, 1) in the book coordinate frame, what are the coordinates pW of
that dot with respect to the world frame?
c) Given a point x with coordinates xW = (0, 1) in world frame, what are its coor-
dinates xB in the book frame?
d) What is the coordinate transformation from world frame to book frame, and from
book frame to world frame?
Please use homogeneous coordinates to derive these answers.

12.2.2 Subscribe to the Email list

Please subscribe here:


https://mailman.informatik.uni-stuttgart.de/mailman/listinfo/lehre14-rob

12.2.3 Simulation software

Future exercises will require to code some examples in C/C++. Test if you can
compile and run the lib that accompanies this lecture. Report on problems with
installation.
On Ubuntu:

• install the packages


liblapack-dev freeglut3-dev libqhull-dev libf2c2-dev libann-dev gnuplot
doxygen libglew1.6-dev

• get the code from


http://ipvs.informatik.uni-stuttgart.de/mlr/marc/source-code/libRobo
14.tgz
180 Introduction to Robotics, Marc Toussaint

• tar xvzf libRoboticsCourse.14.tgz


cd share/examples/Ors/ors
make
./x.exe

12.3 Exercise 3
12.3.1 Task spaces and Jacobians

In the lecture we introduced the basic kinematic maps φpos vec


i,v (q) and φi,v (q), and their
pos vec
Jacobians, Ji,v (q) and Ji,v (q), where i may refer to any part of the robot and v is
any point or direction on this part. In the following you may assume that we have
routines to compute these for any q. The problem is to express other kinematic
maps and their Jacobians in terms of these knowns. In the following you’re asked
to define more involved kinematics maps (a.k.a. task maps) φ(q) to solve certain
motion problems. Please formulate all these maps such that the overall optimiza-
tion problem is
f (q) = ||q − q0 ||2W + ||φ(q)||2
that is, the motion aims to minimize φ(q) to zero (absorb the y ∗ in the definition of
phi.)
a) Assume you would like to control the pointing direction of the robot’s head (e.g.,
its eyes) to point to a desired external world point xW . What task map can you
define to achieve this? What is the Jacobian?
b) You would like the two hands or the robot to become parallel (e.g. for clapping).
What task map can you define to achieve this? What is the Jacobian?
c) You would like to control a standard endeffector position peff to be at y ∗ , as usual.
Can you define a 1-dimensional task map φ : Rn → R to achieve this? What is its
Jacobian?

12.3.2 IK in the simulator

Download the simulator code from http://ipvs.informatik.uni-stuttgart.de/mlr/


marc/source-code/libRoboticsCourse.14.tgz. (See last exercise for instructions. The
libRoboticsCourse.13.msvc.tgz might include a project file for MSVC.) The
header <src/Ors/roboticsCourse.h> provides a very simple interface to the
simulator—we will use only this header and some generic matrix functionalities. In
share/examples/Core/array you find many examples on how to use the arr
for working with vectors and matrices (many conventions are similar to Matlab).
Consider the example in teaching/RoboticsCourse/01-kinematics (rename
main.problem.cpp to main.cpp). The goal is to reach the coordinates y ∗ =
(−0.2, −0.4, 1.1) with the right hand of the robot. Assume W = I and σ = .01.
Introduction to Robotics, Marc Toussaint 181

a) The example solution generates a motion in one step using inverse kinematics
δq = J ] δy with J ] = (J>J + σ 2 W )-1 J>. Record the task error, that is, the deviation
of hand position from y ∗ after each step. Why is it initially large?
b) Try to do 100 smaller steps δq = αJ ] δy with α = .1 (each step starting with the
outcome of the previous step). How does the task error evolve over time?
c) Generate a nice trajectory composed of T = 100 time steps. Interpolate the target
linearly ŷ ← y0 + (t/T )(y ∗ − y0 ) in each time step.
d) Generate a trajectory that moves the right hand in a circle centered at (−0.2, −0.4, 1.1),
aligned with the xz-plane, with radius 0.2.

12.4 Exercise 4
12.4.1 Verify some things from the lecture

a) On slide 02:46 it says that

argmin ||q − q0 ||2W + ||Φ(q)||2


q

≈ q0 − (J>J + W )-1 J> Φ(q0 ) = q0 − J # Φ(q0 )

where the approximation ≈ means that we use the local linearization Φ(q) = Φ(q0 )+
J(q − q0 ). Verify this.
b) On slide 02:34 there is a term (I − J # J) called nullspace projection. Verify that
for % → ∞ (and C = %I) and any choice of a ∈ Rn

δq = (I − J # J)a ⇒ δy = 0

(assuming linearity of φ, i.e., Jδq = δy). Note: this means that any choice of a, the
motion (I − J # J)a will not change the “endeffector position” y.
c) On slide 02:27 we derived the basic inverse kinematics law. Verify that (assuming
linearity of φ) for C → ∞ the desired task is fulfilled exactly, i.e., φ(q ∗ ) = y ∗ . By
writing C → ∞ we mean that C is a matrix of the form C = %C0 , % ∈ R, and we
take the limit % → ∞.

12.4.2 Multiple task variables

Consider again the last week’s exercise where the robot moved his hand in a circle
(Exercise 3.2d).
a) We’ve seen that the solution does track the circle nicely, but the trajectory “gets
lost” in joint space, leading to very strange postures. We can fix this by adding
more tasks, esp. a task that permanently tries to (moderatly) minimize the distance
182 Introduction to Robotics, Marc Toussaint

of the configuration q to a natural posture qhome . Realize this by adding a respective


task to the code given on the next page.
b) Make the robot simultaneously point upward with the left hand. Use kinematicsVec
and jacobianVec for this.
Introduction to Robotics, Marc Toussaint 183

void multiTask(){
Simulator S("man.ors");
arr q,q_home,y_target,y,J,W,Phi,PhiJ;
uint n = S.getJointDimension();

S.getJointAngles(q);
W.setDiag(1.,n); //we define W as identity matrix
q_home = q; //we store the initial posture as q_home

for(uint i=0;i<10000;i++){
Phi.clear();
PhiJ.clear();

//1st task: track with right hand


y_target = ARR(-0.2, -0.4, 1.1);
y_target += .2 * ARR(cos((double)i/20), 0, sin((double)i/20));
S.kinematicsPos(y,"handR");
S.jacobianPos (J,"handR");

Phi .append( 1e2 * (y - y_target) ); //rho = 1e4 (cp. slide 02:45)


PhiJ.append( 1e2 * J );

//add the "stay close to home" task here

//add another task for the left hand here, if you like

//compute joint updates


q -= inverse(˜PhiJ*PhiJ + W)*˜PhiJ* Phi; //(cp. slide 02:46)
S.setJointAngles(q);
}
}

12.5 Exercise 5
In the lecture we discussed PD force control on a 1D point mass, which leads to
oscillatory behavior for high Kp and damped behavior for high Kd . Slide 03:15
replaces the parameters Kp , Kd by two other, more intuitive parameters, λ and ξ:
λ roughly denotes the time (or time steps) until the goal is reached, and ξ whether
it is reached agressively (ξ > 1, which overshoots a bit) or by exponential decay
(ξ ≤ 1). Use this to solve the following exercise.

12.5.1 PD force control on a 1D mass point

a) Implement the system equation for a 1D point mass with mass m = 3.456. That
is, implement the Euler intergration (see below) of the system dynamics that com-
184 Introduction to Robotics, Marc Toussaint

putes xt+1 given xt and ut in each iteration, where xt = (qt , q̇t ).


There is no need for the robot simulator code—implement it directly in plain C++
or another language.
Assume a step time of τ = 0.01sec. Generate a trajectory from the start position q0 =
0 and velocity q̇0 = 0 that approaches the goal position q ∗ = 1 with high precision
within about 1 second using PD force control. Find 3 different parameter sets for
Kp and Kd to get oscillatory, overdamped and critical damped behaviors. Plot the
point trajectory (e.g. using the routine gnuplot(arr& q); MT::wait(); .)
b) Repeat for time horizon t = 2sec and t = 5sec. How should the values of Kp and
Kd change when we have more time?
c) Implement a PID controller (including the integral (stationary error) term). How
does the solution behave with only Ki turned on (Kp = Kd = 0); how with Ki and
Kd non-zero?
Note on Euler Integration: See also http://en.wikipedia.org/wiki/Euler_
method. Given a differential equation ẋ = f (x, u), Euler integration numerically
“simulates” this equation forward by iterating

xt+1 = xt + τ f (x, u)

for small time steps τ .

12.5.2 Following a reference trajectory with PD control

We have the same point mass as above. But now the goal position q ∗ changes over
time: we have a reference trajectory

q ∗ (t) = cos(t)

How can one use a PD controller to let the point mass robustly and precisely follow
this reference trajectory?
Simulate the problem as above, but at every full second apply a randomized impuls
(=instantaneous change in velocity) on the point mass. That is, simply set q̇ ←
q̇ + ξ for ξ ∼ N(0, .1). (In code: double xi = 0.1 * rnd.gauss();) Generate
oscillatory, overdamped and critical recovery behaviors.
Introduction to Robotics, Marc Toussaint 185

12.6 Exercise 6
12.6.1 Fun with Euler-Lagrange

Consider an inverted pendulum mounted on a wheel in the 2D x-z-plane; similar


to a Segway. The exercise is to derive the Euler-Lagrange equation for this system.
Tips:

• Strictly follow the scheme we discussed on slide 03:23.


• Use the generalized coordinates
q = (x, θ) (3)
with x is the position of the wheel and θ the angle of the pendulum relative
to the world-vertical.
• The system can be parameterized by
– mA , IA , mB , IB : masses and inertias of bodies A (=wheel) and B (=pendulum)
– r: radius of the wheel
– l: length of the pendulum (height of its COM)

• Describe the pose of every body (depending on q) by the (x, z, φ) coordinate:


its position in the x-z-plane, and its rotation relative to the world-vertical.
Accordingly represent the (linear and angular) velocities as a 3-vector.
• In this 3-dim space, the mass matrix of every body is
mi 0 0
 

Mi = 0



mi 0



(4)
0 0 Ii

12.6.2 Optional: PD control to hold an arm steady

In our code, in 03-dynamics you find an example (rename main.problem.cpp


to main.cpp). Please change ../02-pegInAHole/pegInAHole.ors to pegArm.ors.
186 Introduction to Robotics, Marc Toussaint

You will find an arm with three joints that is swinging freely under gravity (ignor-
ing collisions).
a) Apply direct PD control (without using M and F ) to each joint separately and try
to find parameters Kp and Kd (potentially different for each joint) to hold the arm
steady, i.e., q ∗ = 0 and q̇ ∗ = 0. Bonus: If you are successful, try the same for the arm
in pegArm2.ors.
Play with setDynamicsSimulationNoise and check stability.
b) As above, try to hold the arm steady at q ∗ = 0 and q̇ ∗ = 0. But now use the
knowledge of M and F in each time step. For this, decide on a desired wavelength λ
and damping behavior ξ and compute the respective Kp and Kd (assuming m = 1),
the same for each joint. Use the PD equation to determine desired accelerations q̈ ∗
(slide 03:31) and use inverse dynamics to determine the necessary u.

12.7 Exercise 7

12.7.1 RRTs for path finding

In the code in teaching/RoboticsCourse/04-rrt you find an example prob-


lem (rename main.problem.cpp to main.cpp).
a) The code demonstrates an RRT exploration and displays the explored endeffector
positions. What is the endeffector’s exploration distribution in the limit n → ∞?
Specify such a distribution analytically for a planar 2 link arm.
b) First grow an RRT backward from the target configuration q ∗ = (0.945499, 0.431195, −1.97
−0.665206, −1.48356). Stop when there exists a node close (<stepSize) to the
q = 0 configuration. Read out the collision free path from the tree and display it.
Why would it be more difficult to grow the tree forward from q = 0 to q ∗ ?
c) Find a collision free path using bi-directional RRTs (that is, 2 RRTs growing to-
gether). Use q ∗ to root the backward tree and q = 0 to root the forward tree. Stop
when a newly added node is close to the other tree. Read out the collision free path
from the tree and display it.
d) (Bonus) Propose a simple method to make the found path smoother (while keep-
ing it collision free). You’re free to try anything.

12.7.2 A distance measure in phase space

Consider the 1D point mass with mass m = 1 state x = (q, q̇). The 2D space of (q, q̇)
combining position and velocity is also called phase space. In RRT’s (in line 4 on
slide 04:39) we need to find the nearest configuration qnear to a qtarget . But what does
“near” mean in phase space? This exercise explores this question.
Introduction to Robotics, Marc Toussaint 187

Consider a current state x0 = (0, 1) (at position 0 with velocity 1). Pick any random
phase state xtarget ∈ R2 . How would you connect x0 with xtarget in a way that
fulfils the differential constraints of the point mass dynamics? Given this trajectory
connecting x0 with xtarget , how would you quantify/meassure the distance? (If
you defined the connecting trajectory appropriately, you should be able to give an
analytic expression for this distance.) Given a set (tree) of states x1:n and you pick
the closest to xtarget , how would you “grow” the tree from this closest point towards
xtarget ?

12.8 Exercise 8
12.8.1 Bayes Basics

a) Box 1 contains 8 apples and 4 oranges. Box 2 contains 10 apples and 2 oranges.
Boxes are chosen with equal probability. What is the probability of choosing an
apple? If an apple is chosen, what is the probability that it came from box 1?
b) The Monty Hall Problem: I have three boxes. In one I put a prize, and two are
empty. I then mix up the boxes. You want to pick the box with the prize in it. You
choose one box. I then open another one of the two remaining boxes and show that
it is empty. I then give you the chance to change your choice of boxes—should you
do so?
c) Given a joint probability P (X, Y ) over 2 binary random variables as the table
Y=0 Y=1
X=0 .06 .24
X=1 .14 .56
What are P (X) and P (Y )? Are X and Y conditionally independent?

12.8.2 Gaussians

On slide 06:17 there is the definition of a multivariate (n-dim) Gaussian distribu-


tion. Proof the following using only the definition. (You may ignore terms inde-
pendent of x.)
a) Proof that:
N(x|a, A) = N(a|x, A)
N(x | a, A) = |F | N(F x | F a, F AF>)
N(F x + f | a, A) = 1
|F | N(x | F -1 (a − f ), F -1 AF -> )
b) Multiplying two Gaussians is essential in many algorithms (typically, a prior and
a likelihood, to get a posterior). Prove the general rule

N(x | a, A) N(x | b, B)
188 Introduction to Robotics, Marc Toussaint

∝ N(x | (A-1 + B -1 )-1 (A-1 a + B -1 b), (A-1 + B -1 )-1 ) , (5)

where the proportionality ∝ allows you to drop all terms independent of x.


Note: The so-called canonical form of a Gaussian is defined as N[x | ā, Ā] = N(x | Ā-1 ā, Ā1 );
in this convention the product reads much nicher: N[x | ā, Ā] N[x | b̄, B̄] ∝ N[x | ā + b̄, Ā + B̄].
c) The “forward propagation” of a Gaussian belief N(y | b, B) along a stochastic
linear dynamics N(x | a + F y, A) is given as
Z
N(x | a + F y, A) N(y | b, B) dy = N(x | a + F b, A + F BF>) (6)
y

Prove the equation.

12.9 Exercise 9

12.9.1 Particle Filtering the location of a car

Start from the code in RoboticsCourse/05-car.


The CarSimulator simulates a car exactly as described on slide 04:48 (using Eu-
ler integration with step size 1sec). At each time step a control signal u = (v, φ)
moves the car a bit and Gaussian noise with standard deviation σdynamics = .03 is
added to x, y and θ. Then, in each step, the car meassures the relative positions of
m landmark points, resulting in an observation yt ∈ Rm×2 ; these observations are
Gaussian-noisy with standard deviation σobservation = .5. In the current implemen-
tation the control signal ut = (.1, .2) is fixed (roughly driving circles).
a) Odometry (dead reckoning): First write a particle filter (with N = 100 particles)
that ignores the observations. For this you need to use the cars system dynamics
(described on 04:48) to propagate each particle, and add some noise σdynamics to
each particle (step 3 on slide 07:22). Draw the particles (their x, y component) into
the display. Expected is that the particle cloud becomes larger and larger.
1 i 2
b) Next implement the likelihood weights wi ∝ P (yt |xit ) = N(yt |y(xit ), σ) ∝ e− 2 (yt −y(xt )) /

where y(xit ) is the (ideal) observation the car would have if it were in the particle
possition xit . Since i wi = 1, normalize the weights after this computation.
P

c) Test the full particle filter including the likelihood weights (step 4) and resam-
pling (step 2). Test using a larger (10σobservation ) and smaller (σobservation /10) variance
in the computation of the likelihood.

12.9.2 Kalman filter

We consider the same car example as above, but track the car using a Kalman filter.
Introduction to Robotics, Marc Toussaint 189

a) To apply a Kalman filter (slide 07:27) we need Gaussian models for P (xt | xt-1 , ut-1 )
as well as P (yt |xt ). We assume that the dynamics model is given as a local Gaussian
of the form

P (xt+1 | xt , ut ) = N(xt+1 |xt + B(xt )ut , σdynamics )

where the matrix B(xt ) = ∂xt+1∂u(xt ,ut )


t
gives the local linearization of the determin-
istic discrete time car dynamics

u1 cos x3 
 

xt+1 (x, u) = x + τ u1 sin x3 







(u1 /L) tan u2

What is B(xt ) for the car dynamics?


b) Concerning the observation likelihood P (yt |xt ) we assume

P (yt |xt , θ1:N ) = N(yt | C(xt )xt + c(xt ), σobservation )

What is the matrix C(xt ) (the Jacobian of the landmark positions w.r.t. the car state)
in our example?
c) Start with the code in RoboticsCourse/06-kalmanSLAM.
Write a Kalman filter to track the car. You can use the routine getObservationJacobian
∂y
to access C(xt ) = ∂x . Note that c(xt ) = ŷt − C(xt )xt , where ŷt is the mean observa-
tion in state xt (there is another routine for this).

12.10 Exercise 10
12.10.1 The value function in Markov Decision Processes

On slide 5 of the RL lecture we defined MDPs as


T
Y
P (s0:T +1 , a0:T , r0:T ; π) = P (s0 ) P (at |st ; π) P (rt |st , at ) P (st+1 |st , at ) , (7)
t=0

where P (a0 |s0 ; π) described the agent’s policy. We assume a deterministic agent
and write at = π(st ). The value of a state s is defined as the expected discounted
sum of future rewards,

V π (s) = E{r0 + γr1 + γ 2 r2 + · · · | s0 = s} (8)

given that the agent starts in state s and from there executes the policy π.
a) Prove

V π (s) = E{r0 | s, π(s)} + γ P (s0 | s, π(s)) V π (s0 ) .


P
s0 (9)
190 Introduction to Robotics, Marc Toussaint

b) Consider the following T-maze:


rewards: 4 0 2 3

We distinguish 7 states s1 , .., s7 in the maze. The first 3 states are the T-junctions;
the last 4 states receive rewards (4, 0, 2, 3). At each T-junction we have two possible
actions: left, right. Everything is deterministic. Assume a discounting γ = 0.5.
Compute (by hand) the value function V π over all states when π is the random policy
(50/50 left/right at each junction).
c) Bellman’s principle of optimality says that the optimal policy π ∗ has a value

function V π (s) = V ∗ (s),
h i
V ∗ (s) = max E{r0 | s, a} + γ s0 P (s0 | s, a) V π (s0 ) .
P
(10)
a

Compute (by hand) the optimal value function V ∗ over all states for the example
above.
d) Now consider continuous state s and action a. Let the policy be stochastic and
linear in features φ(s) ∈ Rk , that is,

π(a|s; β) = N(a|φ(s)>β, φ(s)>Σφ(s)) . (11)

The covariance matrix φ(s)>Σφ(s) describes that each action at = φ(st )>(β + t ) was
generated by adding a noise term t ∼ N(0, Σ) to the parameter β. We always start
in the same state ŝ and the value V π (s0 ) is
X
V π (ŝ) = E{ Hrt | s0 = ŝ} (12)
t=0

(no discounting, but only a finite horizon H.)


π(β)
Optimality now requires that ∂V∂β = 0. Assume that a ∈ R is just 1-dimensional
and Σ ∈ R just a number. Try to prove (see slide 18) that we can derive

h H
X i-1 XH
β ∗ = β old + Eξ|β { W (st )Qπ(β) (st , at , t)} Eξ|β { W (st )et Qπ(β) (st , at , t)} , W (s) =
t=0 t=0
∗)
∂V π(β
from ∂β = 0. (In the scalar case, W (s) = 1.) As a first step, derive


log π(a|s)
∂β
Introduction to Robotics, Marc Toussaint 191

Then insert at = φ(st )>(β old + t ) and solve


H
X ∂
Eξ|β { log π(at |st ) Q(st , at , t)} = 0
t=0
∂β
for β. This shows how you can get the optimal policy parameters β ∗ based on
samples generated with β.

12.11 Exercise 11
12.11.1 Local linearization and Algebraic Riccati equation

Slide 08:02 describes the cart pole, which is similar to the Segway-type system of
Exercise 6, but a little simpler. We’ll solve the cart pole in this exercise.
The state of the cart-pole is given by x = (p, ṗ, θ, θ̇), with p ∈ R the position of the
cart, θ ∈ R the pendulums angular deviation from the upright position and ṗ, θ̇
their respective temporal derivatives. The only control signal u ∈ R is the force
applied on the cart. The analytic model of the cart pole is
h i
g sin(θ) + cos(θ) −c1 u − c2 θ̇2 sin(θ)
θ̈ = 4 2
(13)
3 l − c2 cos (θ)
h i
p̈ = c1 u + c2 θ̇2 sin(θ) − θ̈ cos(θ) (14)

with g = 9.8ms2 the gravitational constant, l = 1m the pendulum length and con-
stants c1 = (Mp + Mc )−1 and c2 = lMp (Mp + Mc )−1 where Mp = Mc = 1kg are the
pendulum and cart masses respectively.
a) Derive the local linearization of these dynamics around x∗ = (0, 0, 0, 0). The
eventual dynamics should be in the form
ẋ = Ax + Bu
Note that
0 1 0 0
 
0
 
 
∂ p̈ ∂ p̈ ∂ p̈ ∂ p̈ 
 
∂ p̈ 
 
   
∂p ∂ ṗ ∂θ
 
A= ∂ θ̇  , B= ∂u 
  
  
0 0 0 1 0
   
   
   
   
 ∂ θ̈ ∂ θ̈ ∂ θ̈ ∂ θ̈  
∂ θ̈ 
∂p ∂ ṗ ∂θ ∂ θ̇ ∂u

where all partial derivatives are taken at the point p = ṗ = θ = θ̇ = 0.


The solution (to continue with the other parts) is
0 1 0 0 0
   

g c1 c2
 
− 4 l−c c1 +

0 0 0
  
 
  4
3 l−c2
 
A= 2
 , B=
 
3
   
 
0 0 0 1 0
  
  
   

g 
−c1
0 0 0
  
 
4 4
3 l−c2 3 l−c2
192 Introduction to Robotics, Marc Toussaint

b) We assume a stationary infinite-horizon cost function of the form


Z ∞
π
J = c(x(t), u(t)) dt
0
c(x, u) = x>Qx + u>Ru
Q = diag(c, 0, 1, 0) , R=I.
That is, we penalize position offset c||p||2 and pole angle offset ||θ||2 . Choose c = % =
1 to start with.
Solve the Algebraic Riccati equation
0 = A>P + P>A − P BR-1 B>P + Q
by initializing P = Q and iterating using the following iteration:
Pk+1 = Pk + [A>Pk + Pk>A − Pk BR-1 B>Pk + Q]

Choose  = 1/1000 and iterate until convergence. Output the gains K = −R-1 B>P .
(Why should this iteration converge to the solution of the ARE?)
c) Solve the same Algebraic Riccati equation by calling the are routine of the
octave control package (or a similar method in Matlab). For Octave, install the
Ubuntu packages octave3.2, octave-control, and qtoctave, perhaps use
pkg load control and help are in octave to ensure everything is installed,
use P=are(A,B*inverse(R)*B’,Q) to solve the ARE. Output K = −R-1 B>P
and compare to b).
(I found the solution K = (1.0000, 3.8271, 48.3070, 17.615).)
d) Implement the optimal Linear Quadratic Regulator u∗ = −R-1 B>P x on the cart
pole simulator. In the Robotics code base, path 07-cartPole there is some code
– but the main.problem.cpp was outdated. Please use the one from the website or
email.
Simulate the optimal LQR and test it for various noise levels.

12.12 Exercise 13
On Wed. 29th we meet as usual for the exercise.
On Tue. 28th, 14:00 my office, interested students are invited to try whatever they
like on the racer hardware, play around, etc.

12.12.1 Policy Search for the Racer

We consider again the simulation of the racer as given in 09-racer in your code
repo.
Introduction to Robotics, Marc Toussaint 193

In this exercise the goal is to find a policy

π : φ(y) 7→ u = φ(u)>w

that maps some features of the (history of the) direct sensor signals y to the control
policy.
Use black-box optimization to find parameters w that robustly balance the racer.
Some notes:

• Features: In the lecture I suggested that a range of interesting features is:


 
yt , ẏt , hyi0.5 , hẏi0.5 , hyi0.9 , hẏi0.9 , ut , ut−1

However, I noticed that a balancing policy can also be found for the direct
sensor signals only, that is:

φ(y) = (1, y) ∈ R5

(the augmentation by 1 is definitly necessary).

• Cost function: Realistically, the running costs ct would have to be defined on


the sensor signals only. (On the real robot we don’t know the real state – if the
robot is to reward itself it needs to rely on sensor signals only.) I tried
costs += .1*MT::sqr(y(3)) + 1.*MT::sqr(y(2));
which combines the wheel encoder y(3) and the gyroscope reading y(2).
That worked mediocre. For a start, cheat and directly use the state of the
simulator to compute costs:
costs += 1.*MT::sqr(R.q(0)) + 10.*MT::sqr(R.q(1));

• Episodes and duration costs: To compute the cost for a given w you need
to simulate the racer for a couple of time steps. For the optimization it is
really bad if an episode is so long that it includes a complete failure and
wrapping around of the inverted pendulum. Therefore, abort an episode if
fabs(R.q(1)) too large and penalize an abortion with an extra cost, e.g.,
proportional to T − t. Try different episode horizons T up to 500; maybe
increase this horizon stage-wise.

• Optimizer: You are free to use any optimizer you like. On the webpage
you find a reference implementation of CMA by Niko Hansen (with wrap-
per using our arr), which you may use. In that case, add cmaes.o and
search_CMA.o in the Makefile. The typical loop for optimization is
194 Introduction to Robotics, Marc Toussaint

SearchCMA cma;
cma.init(5, -1, -1, -0.1, 0.1);
arr samples, values;

uint T=500;
for(uint t=0;t<1000;t++){
cma.step(samples, values);
for(uint i=0;i<samples.d0;i++) values(i) = evaluateControlParameters(samp
uint i=values.minIndex();
cout <<t <<’ ’ <<values(i) <<’ ’ <<samples[i] <<endl;
}

12.13 Exercise extra


12.13.1 Kalman SLAM

Slide 07:38 outlines how to use a high-dimensional Kalman filter to simultaneously


estimate the robot position (localization) and the landmarks position (mapping).
a) Concerning P (yt |xt , θ1:N ) we assume

P (yt |xt , θ1:N ) = N(yt | D(xt )θ + d(xt ), σobservation )


∂y
where D(xt ) = ∂θ is the observation Jacobian w.r.t. the unknown landmarks, and
2N
θ ∈ R is the same as θ1:N written as a 2N -dim vector.
Write a pseudo-code for Kalman SLAM. (There is much freedom in how to orga-
nize the code, choices of notation and variables, etc. Try to write it as concise as
possible.)
b) Try to implement Kalman SLAM, which tracks the car simultaneous to the land-
marks. You should now access the routines getMeanObservationAtStateAndLandma
and getObservationJacobianAtStateAndLandmarks to retrieve the mean
observation and the necessary Jacobians given the current mean estimate θ of the
landmarks.
Introduction to Robotics, Marc Toussaint 195

13 Bullet points to help learning


This is a summary list of core topics in the lecture and intended as a guide for preparation
for the exam. Test yourself also on the bullet points in the table of contents. Going through
all exercises is equally important.

13.1 Kinematics
• 3D geometry
– Definition of an object pose, frame, transformations
– Homogeneous transformations (4×4 matrix)
– Composition of transformations, notation xW = TW →A TA→B TB→C xC
• Fwd kinematics & Jacobian
– Fwd kinematics as composition of transformations
– Transformations of a rotational joint
– Kinematic maps φpos : q 7→ y and φvec
– Definition of a Jacobian
– Derivation of the position and vector Jacobians J pos , J vec
• Inverse kinematics (IK)
– Optimality criterion for IK
– Using the local linearization to find the optimum
– Pseudo code of following a task trajectory in small steps using Inverse Kine-
matics
– Definition & example for a singularity
• Motion profiles & Interpolation
– Motion profiles (esp. sine profile)
– Joint space vs. task space interpolation of a trajectory [e.g. using a motion
profile in one or the other space]
• Multiple Tasks
– How to incorporate multiple tasks
– What are interesting task maps; know at least about pos, vec, align, and
limits
• Further
– Be able to explain the consequences of the local linearization in IK (big steps
→ errors)
196 Introduction to Robotics, Marc Toussaint

13.2 Dynamics

• 1D point mass & PID control


– General form of a dynamic system
– Dynamics of a 1D point mass
– Position, derivative and integral feedback to control it to a desired state
– Solution to the closed-loop PD system equations
– Qualitative behaviors: oscillatory-damped, over-damped, critically damped

• Euler-Lagrange equation
– Definition
– Roughly: structure of the Euler-Lagrange equation for systems; understand
at least T = 12 q̇>M q̇
– Be able to apply on minimalistic system (e.g. pendulum, point mass)

• Robot dynamics & joint/operational space control


– General form of the dynamics equation
– Joint space control: given desired q̈ ∗ , choose u∗ = M (q) q̈ ∗ + F (q, q̇)
– Operational space control: given desired ÿ ∗ , choose u∗ = T ] (ÿ ∗ − J˙q̇ + T F )
(5:31, 32, 34)
ref
– Following a reference trajectory q0:T in joint space by imposing PD behavior
around it

13.3 Path planning

• Basics
– Path finding vs. trajectroy optimization vs. feedback control
– Roughly: BUG algorithms
– Potential functions, and that they’re nothing but IK with special task vari-
ables
– Dijkstra Algorithm

• Probabilistic Road Maps (PRMs)


– Definition & Generation
– Importance of local planner
– Roughly: knowing about probabilistic completeness
– Roughly: alternative sampling strategies
Introduction to Robotics, Marc Toussaint 197

• Rapidly Exploring Random Trees (RRTs)


– Algorithm
– Goal-directed & bi-directional extensions

• Non-holonomic Systems
– Definition of non-holonomicity. Be able to give example
– Path finding: control-based sampling
– RRT extension for control-based exploration
– Roughly: Intricacies with metrics for non-holonomic systems

13.4 Mobile Robotics

• Probability Basics
– Definitions of random variable, probability distribution, joint, marginal,
conditional distribution, independence
– Bayes’ Theorem
– Continuous distributions, Gaussian, particles

• State Estimation
– Formalization of the state estimation problem
– The Bayes Filter as the general analytic solution
– Gaussians and particles to approximate the Bayes filter and make it compu-
tationally feasible:
– Details of a Particle Filter
– Kalman filter (esp. assumptions made, not eq. or derivation)
– Extended KF (assumptions made)
– Odometry (dead reckoning) as “Bayes filter without observations”

• SLAM
– Definition of the SLAM problem
– In what sense SLAM is a “chicken or egg problem”
– Joint filtering over x and m: Extended Kalman SLAM
– Particle-based SLAM (map belief for each particle)
– Roughly: graph-based SLAM & loop closing
198 Introduction to Robotics, Marc Toussaint

13.5 Control Theory


• Optimal Control
– Definition of the (continuous time) optimal control problem
– Concept & definition of the value function
– HJB equation
– Infinite horizon → stationary solution
– Awareness that optimal control is not the only approach; it shifts the prob-
lem of designing a controller to designing a cost function.
• Linear-Quadratic Optimal Control
– Definition of problem (esp. assumptions made)
– Be able to express system dynamics in (locally linearized) standard form
ẋ = Ax + Bu
– Fact that the value function is quadratic V (x, t) = x>P (t)x
– Riccati differential equation
– How P gives the optimal Linear-Quadratic Regulator
– Algebraic (infinite horizon) Riccati equation

• Controllability
– Definition and understanding/interpretation of the controllability matrix C
– Definition of controllability
– Be able to apply on simple examples

• Stability
– Definitions of stability
– Eigenvalue analysis for linear systems
– Optimize controllers for negativity of eigenvalues
– Definition of a Lyapunov function
– Lyanunov’s theorem: ∃ Lyapunov fct. → stability
– Energy and value function as candidates for a Lyapunov fct.
Index
1D point mass (3:6), Dynamic vs. non-dynamic robots (3:1),

Algebraic Riccati equation (ARE) for in-


finite horizion (8:19), Energy as Lyapunov function (8:42),
Applying Bayes’ rule to state estima- Euler-Lagrange equation (3:20),
tion (7:12), Extended KF, Unscented Transform (7:28),

Bayes Filter (7:13),


Bayes’ Theorem (6:10), Feedback control vs path finding vs tra-
Bellman equation (discrete time) (8:9), jectory optimization (4:5),
Five approaches to RL (9:7),
Bi-directional RRT (4:42), Following a reference trajectory in joint
Bug algorithms (4:8), space (3:30),
Following a reference trajectory in task
Choice of optimizer (5:5), space (3:31),
Compliant/soft hands (10:12), Force Closure (10:5),
Composition of transforms (2:13), Form Closure, Caging (10:7),
Conditional distribution (6:8), Forward kinematics (2:15),
Configuration state metrics (4:59), Frequentist vs Bayesian (6:3),
Continuous time finite horizion defini- From inverse kinematics to path costs
tion (8:10), (5:2),
Control-based sampling (4:52),
General k-order cost terms (5:3),
Coordinate frames and transforms (2:6),
General structure of Euler-Lagrange (3:22),

General structure of robot dynamics (3:28),


D gain (3:10),
Damping ration, wave length (3:14),
Goal-oriented RRT (4:40),
Darpa challenge (7:1),
Graph-based SLAM (7:45),
Definition of a dynamics equation (3:3),
Hamilton-Jacobi-Bellman equation (8:11),
Definition of a map, definition of SLAM
problem (7:34), Hierarchicak inverse kinematics, nullspace
Definition of closed loop system equa- motion (2:50),
tion (8:33), Homogeneous transformation (2:11),
Definition of controllability (8:27), Human locomotion (11:14),
Definition of Lyapunov function (8:40),
I gain (3:15),
Definition of non-holonomic (4:47), Imitation Learning (9:9),
Definitions based on sets (6:5), Impact models (11:22),
Dijkstra (4:18), Infinite horizon case (8:12),
Discrete time finite horizon definition Inverse and forward dynamics (3:29),
(8:7),
200 Introduction to Robotics, Marc Toussaint

Inverse kinematics as optimization prob- Other PRM sampling strategies (4:35),


lem (2:26),
Inverse kinematics for all tasks (2:48),
P gain (3:7),
Inverse kinematics solution (2:29), Particle Filter = Bayes Filter with Par-
Inverse RL (9:12), ticles (7:22),
Particle SLAM (7:39),
Jacobian (2:20), Particles carry their own history, their
Joint Bayes Filter over state and map own map (7:39),
(Kalman SLAM) (7:36), Passive dynamic walking: Compass Gait
Joint distribution (6:8), (11:19),
Joint space trajectory interpolation (2:41), Path finding examples (4:1),
Path finding for a non-holonomic sys-
Joint types (2:16), tem (4:50),
Pendulum example for Euler-Lagrange
Kalman Filter = Bayes Filter with Gaus- (3:21),
sian (7:26), PID control (3:16),
Kalman filter equations (7:27), Planning/testing local connections (4:32),
Kinematic map (2:18),
Policy Search with Black-Box Optimiza-
Linear stability analysis using eigenval- tion (9:21),
ues (8:36), Policy Search with Policy Gradients (9:16),
Linear-Quadratic Optimal Control (8:15),
Potential fields (4:13),
Lyapunov and exponential stability (8:35), Probabilistic completeness of PRMs (4:34),

Probabilistic Road Maps (PRMs) (4:28),


Marginal (6:8),
Markov Decision Process (9:4), Probabilities as (subjective) information
Mobile robotics vs. Manipulation vs. calculus (6:1),
Kinematic/Dynamic motion con- Probability distribution (6:7),
trol (2:1),
Motion rate control (2:37), Random variables (6:6),
Multiple RVs, conditional independence Rapidly Exploring Random Trees (RRTs)
(6:11), (4:39),
Multiple tasks (3:35), Reference of task maps and their Jaco-
bians (2:52),
Newton-Euler recursion (3:23), Relation to optimal control (9:5),
Null space, task space, operational space, Riccati differential eq = HJB eq in LQ
joint space (2:36), case (8:17),
Riccati equations (also discrete time) (8:19),
Odometry: Filtering without observa-
tions (7:17), RRTs with differential constraints (4:58),
Operational space control (3:32),
oscillatory-damped, critically damped,
over-damped (3:13), Side story: Dubins curves (4:60),
Introduction to Robotics, Marc Toussaint 201

Sine motion profile (2:39),


Singularity (2:35),
Statically stable walk (11:7),

Task space trajectory interpolation (2:42),

The ’big’ task vector (2:47),


The “mitten thought experiment” (10:10),

Topics in control theory (8:3),

Value as Lyapunov function (8:43),

Zero moment point (ZMP) (11:8),

You might also like