Lecture Robotics
Lecture Robotics
Marc Toussaint
April 2016
This is a direct concatenation and reformatting of all lecture slides and exercises from
the Robotics course (winter term 2014/15, U Stuttgart), including a bullet point list to
help prepare for exams.
Contents
1 Introduction 5
2 Kinematics 13
Mobile robotics vs. Manipulation vs. Kinematic/Dynamic motion control (2:1)
2.2 Kinematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Forward kinematics (2:15) Joint types (2:16) Kinematic map (2:18) Jacobian (2:20)
3 Dynamics 36
Dynamic vs. non-dynamic robots (3:1) Definition of a dynamics equation (3:3)
4 Path Planning 50
Path finding examples (4:1) Feedback control vs path finding vs trajectory optimiza-
tion (4:5)
4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Bug algorithms (4:8) Potential fields (4:13) Dijkstra (4:18)
6 Probabilities 92
Probabilities as (subjective) information calculus (6:1) Frequentist vs Bayesian (6:3)
Bernoulli and Binomial distributions (6:14) Beta (6:15) Multinomial (6:18) Dirichlet
(6:19) Conjugate priors (6:23) Dirac distribution (6:26) Gaussian (6:27) Particle ap-
proximation of a distribution (6:30) Student’s t, Exponential, Laplace, Chi-squared,
Gamma distributions (6:33)
12 Exercises 178
12.1 Exercise 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
12.2 Exercise 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
12.3 Exercise 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
12.4 Exercise 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
12.5 Exercise 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
12.6 Exercise 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
12.7 Exercise 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
12.8 Exercise 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
12.9 Exercise 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
12.10Exercise 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
12.11Exercise 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
12.12Exercise 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
12.13Exercise extra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
Index 199
Introduction to Robotics, Marc Toussaint 5
1 Introduction
Why Robotics?
1:1
1:2
6 Introduction to Robotics, Marc Toussaint
http://www.saintpatrick.org/index.aspx/Health_Services/da_Vinci_Surgery
1:3
http://people.csail.mit.edu/nikolaus/drg/
1:5
Why Robotics?
• Commercial:
Industrial, health care, entertainment, agriculture, surgery, etc
• Critical view:
– International Committee for Robot Arms Control
http://icrac.net/
http://www.nec.co.jp/products/robot/en/
1:6
8 Introduction to Robotics, Marc Toussaint
History
• little movie...
( http://www.csail.mit.edu/videoarchive/history/aifilms http://www.ai.sri.
com/shakey/ )
1:10
Four chapters
• Kinematics & Dynamics
goal: orchestrate joint movements for desired movement in task spaces
Kinematic map, Jacobian, optimality principle of inverse kinematics, singularities, configuration/operational/null space, multiple si-
multaneous tasks, special task variables, trajectory interpolation, motion profiles; 1D point mass, damping & oscillation, PID, general
dynamic systems, Newton-Euler, joint space control, reference trajectory following, optimal operational space control
Path finding vs. trajectory optimization, local vs. global, Dijkstra, Probabilistic Roadmaps, Rapidly Exploring Random Trees, differen-
tial constraints, metrics; trajectory optimization, general cost function, task variables, transition costs, gradient methods, 2nd order
methods, Dynamic Programming
• Control Theory
theory on designing optimal controllers
Topics in control theory, optimal control, HJB equation, infinite horizon case, Linear-Quadratic optimal control, Riccati equations
• Mobile robots
goal: localize and map yourself
State estimation, Bayes filter, odometry, particle filter, Kalman filter, Bayes smoothing, SLAM, joint Bayes filter, EKF SLAM, particle
SLAM, graph-based SLAM
1:11
10 Introduction to Robotics, Marc Toussaint
see https://ipvs.informatik.uni-stuttgart.de/mlr/marc/teaching/
13-Robotics/13-Robotics-script.pdf
1:12
Prerequisites
• Is this a practical or theoretical course?
“There is nothing more practical than a good theory.”
(Vapnik, others...)
Books
There is no reference book for this lecture. But a basic well-known standard
text book is:
1:14
Books
An advanced text book on planning is this:
Introduction to Robotics, Marc Toussaint 11
1:15
Online resources
• VideoLecture by Oussama Khatib: http://academicearth.org/courses/introduction-to-robo
http://www.virtualprofessors.com/introduction-to-robotics-stanford-cs223a-khat
(focus on kinematics, dynamics, control)
• Oliver Brock’s lecture http://courses.robotics.tu-berlin.de/mediawiki/index.php/
Robotics:_Schedule_WT09
• Stefan Schaal’s lecture Introduction to Robotics: http://www-clmc.usc.edu/Teaching/TeachingIn
(focus on control, useful: Basic Linear Control Theory (analytic solution to simple dynamic model
→ PID), chapter on dynamics)
• Chris Atkeson’s “Kinematics, Dynamic Systems, and Control” http://www.cs.cmu.edu/˜cga/
kdc/
(uses Schaal’s slides and LaValle’s book, useful: slides on 3d kinematics http://www.cs.cmu.
edu/˜cga/kdc/ewhitman1.pptx)
• CMU lecture “introduction to robotics” http://www.cs.cmu.edu/afs/cs.cmu.edu/academic/
class/16311/www/current/syllabus.html
(useful: PID control, simple BUGs algorithms for motion planning, non-holonomic constraints)
Organization
• Course Webpage:
http://ipvs.informatik.uni-stuttgart.de/mlr/marc/teaching/
– Slides, Exercises & Software (C++)
– Links to books and other resources
• Admin things, please first ask:
Carola Stahl, [email protected], Raum 2.217
• Tutorials: Wednesdays, 15:45-17:15 & 17:30-19:00 (0.108)
• Rules for the tutorials:
12 Introduction to Robotics, Marc Toussaint
2 Kinematics
Outline
• Kinematics: φ : q 7→ y
• Inverse Kinematics: y ∗ 7→ q ∗ = argminq ||φ(q) − y ∗ ||2C + ||q − q0 ||2W
• Basic motion heuristics: Motion profiles
– R=
R21 R22 R23
R31 R32 R33
2:5
• Let (o, e1:3 ) be the world frame, (o0 , e01:3 ) be the body’s frame.
The new basis vectors are the columns in R, that is, e01 = R11 e1 + R21 e2 + R31 e3 ,
etc,
• x = coordinates in world frame (o, e1:3 )
x0 = coordinates in body frame (o0 , e01:3 )
Introduction to Robotics, Marc Toussaint 15
x = p + Rx0
2:6
Euler angles
• Describe rotation by consecutive rotation about different axis:
– 3-1-3 or 3-1-2 conventions, yaw-pitch-roll (3-2-1) in air flight
– first rotate φ about e3 , then θ about the new e01 , then ψ about the new e003
• Gimbal Lock
Rotation vector
• vector w ∈ R3
– length |w| = θ is rotation angle (in radians)
– direction of w = rotation axis (w = w/θ)
• Application on a vector v (Rodrigues’ formula):
• Conversion to matrix:
R(w) = exp(ŵ)
= cos θ I + sin θ ŵ/θ + (1 − cos θ) ww>/θ2
16 Introduction to Robotics, Marc Toussaint
0 −w3 w2
ŵ := w3
0 −w1
−w2
w1 0
(ŵ is called skew matrix, with property ŵv = w × v; exp(·) is called exponential
map)
• Composition: convert to matrix first
• Drawback: singularity for small rotations
2:9
Quaternion
• A quaternion is r ∈ R4 with unit length |r| = r02 + r12 + r22 + r32 = 1
r= r0
, r0 = cos(θ/2) , r̄ = sin(θ/2) w
r̄
r ◦ r0 =
r̄0+ r00 r̄ + r̄0 × r̄
r0
• Application to vector v: convert to matrix first
• Benefits: fast composition. No sin/cos computations. Use this!
2:10
Homogeneous transformations
• xA = coordinates of a point in frame A
xB = coordinates of a point in frame B
TA→B = R t
0 1
B B
xA = TA→B xB = R t x Rx
=
+ t
0 1 1 1
Composition of transforms
TW →C = TW →A TA→B TB→C
xW = TW →A TA→B TB→C xC
2:13
2.2 Kinematics
2:14
Kinematics
18 Introduction to Robotics, Marc Toussaint
B eff
joint B'
transf.
A
relative
C'
A'
C eff.
offset
link
transf.
W
TW →eff (q) = TW →A TA→A0 (q) TA0 →B TB→B 0 (q) TB 0 →C TC→C 0 (q) TC 0 →eff
2:15
Joint types
• Joint transformations: TA→A0 (q) depends on q ∈ Rn
1 0 0 0
1 0 0 q
TA→A0 (q) = 0 1 0 0
0 0 1 0
0 0 0 1
2:17
Kinematic Map
TW →eff (q) gives us the pose of the endeffector in the world frame
Kinematic Map
• The three most important examples for a kinematic map φ are
– A position v on the endeffector transformed to world coordinates:
φpos
eff,v (q) = TW →eff (q) v ∈ R3
φvec
eff,v (q) = RW →eff (q) v ∈ R3
φquat
eff (q) = RW →eff (q) ∈ R4
• See the technical reference later for more kinematic maps, especially relative
position, direction and quaternion maps.
2:19
Jacobian
• When we change the joint angles, δq, how does the effector position change,
δy?
∂
• Given the kinematic map y = φ(q) and its Jacobian J(q) = ∂q φ(q), we have:
δy = J(q) δq
∂φ1 (q) ∂φ1 (q) ∂φ1 (q)
∂q1 ∂q2 ... ∂qn
∂φ2 (q) ∂φ2 (q) ∂φ2 (q)
∂ ...
∂q1 ∂q2 ∂qn
∈ Rd×n
J(q) = φ(q) =
.. ..
∂q
. .
∂φd (q) ∂φd (q) ∂φd (q)
...
∂q1 ∂q2 ∂qn
2:20
eff
axis
point
i-th joint
• Assume the i-th joint is located at pi = tW →i (q) and has rotation axis ai =
1
RW →i (q)
0
0
• We consider an infinitesimal variation δqi ∈ R of the ith joint and see how an
endeffector position peff = φpos vec
eff,v (q) and attached vector aeff = φeff,v (q) change.
2:21
Introduction to Robotics, Marc Toussaint 21
eff
Consider a variation δqi
axis
→ the whole sub-tree rotates
point
i-th joint
δpeff = [ai × (peff − pi )] δqi
δaeff = [ai × aeff ] δqi
⇒ Position Jacobian:
[an × (peff − pn )]
[a1 × (peff − p1 )]
[a2 × (peff − p2 )]
⇒ Vector Jacobian:
[an × aeff ]
[a1 × aeff ]
[a2 × aeff ]
pos vec
3×n
Jeff,v (q) = ∈R Jeff,v (q) = ∈ R3×n
..
.
..
.
2:22
• We can compute all Jacobians from knowing ai , pi and bi for all DOFs (in the
current configuration q ∈ Rn )
2:23
• Iff φ is invertible
q ∗ = φ-1 (y ∗ )
2) The pre-image φ-1 (y ∗ ) may be large: many configurations can generate the
desired y ∗
2:25
• The 1st term ensures that we find a configuration even if y ∗ is not exactly reach-
able
The 2nd term disambiguates the configurations if there are many φ-1 (y ∗ )
2:26
φ(q) ≈ y0 + J (q − q0 ) , y0 = φ(q0 )
q ∗ = q0 + J ] (y ∗ − y0 )
• Why does this not follow the interpolated trajectory ŷ0:T exactly?
– What happens if T = 1 and y ∗ is far?
2:31
= q 0 + J ] (y ∗ − y 0 ) + (I − J ] J) h , h = q0 − q 0 (1)
• What if we want to find the exact (local) optimum? E.g. what if we want to
compute a big step (where q ∗ will be remote from q) and we cannot not rely
only on the local linearization approximation?
Introduction to Robotics, Marc Toussaint 25
– Iterate equation (1) (optionally with a step size < 1 to ensure convergence) by set-
ting the point y 0 of linearization to the current q ∗
– This is equivalent to the Gauss-Newton algorithm
2:32
• In the remainder:
A. Discussion of classical concepts
B. Heuristic motion profiles for simple trajectory generation
C. Extension to multiple task variables
2:33
Singularity
• In general: A matrix J singular ⇐⇒ rank(J) < d
– rows of J are linearly dependent
– dimension of image is < d
– δy = Jδq ⇒ dimensions of δy limited
– Intuition: arm fully stretched
• Implications:
det(JJ>) = 0
→ pseudo-inverse J>(JJ>)-1 is ill-defined!
→ inverse kinematics δq = J>(JJ>)-1 δy computes “infinite” steps!
Null/task/operational/joint/configuration spaces
• The space of all q ∈ Rn is called joint/configuration space
The space of all y ∈ Rd is called task/operational space
Usually d < n, which is called redundancy
nullspace(y ∗ ) = {q | φ(q) = y ∗ }
• We have
= J δy + (I − J # J)a ,
#
J # = W -1 J>(JW -1 J> + C -1 )-1
In the limit C → ∞ it is guaranteed that Jδq = δy (we are exacty on the mani-
fold). The term a introduces additional “nullspace motion”.
2:36
Motion profiles
• Generally, let’s define a motion profile as a mapping
MP : [0, 1] 7→ [0, 1]
xt = x0 + MP(t/T ) (xT − x0 )
• For example
MPramp (s) = s
1
MPsin (s) = [1 − cos(πs)]
2
2:40
qt = q0 + MP(t/T ) (qT − q0 )
2:41
yt = y0 + MP(t/T ) (yT − y0 )
peg-in-a-hole demo
2:43
Multiple tasks
2:45
Multiple tasks
RightHand
position
LeftHand
position
2:46
Multiple tasks
• Assume we have m simultaneous tasks; for each task i we have:
– a kinematic map φi : Rn → Rdi
– a current value φi (qt )
30 Introduction to Robotics, Marc Toussaint
q ∗ = argmin ||q − q0 ||2W + %1 ||φ1 (q) − y1∗ ||2 + %2 ||φ2 (q) − y2∗ ||2 + · · ·
q
2:47
Multiple tasks
• We can “pack” together all tasks in one “big task” Φ.
Example: We want to control the 3D position of the left hand and of the right hand.
Both are “packed” to one 6-dimensional task vector which becomes zero if both tasks
are fulfilled.
∂Φ(q0 )
• Using the local linearization of Φ at q0 , J = ∂q , the optimum is
2:48
Multiple tasks
Introduction to Robotics, Marc Toussaint 31
• In the remainder:
RightHand A. Classical limit of “hierarchical IK” and
position
LeftHand nullspace motion
position B. What are interesting task variables?
2:49
= J1# y1 + (I − J1# J1 )a
J1# = (W/%1 + J> -1 > -1 > -1 >
1 J1 ) J1 = W J1 (J1 W J1 + I/%1 )
-1
• etc...
2:50
• The benefit of this hierarchical way to write the solution is that one can take
the hierarchical limit %i → ∞ and retrieve classical hierarchical IK
32 Introduction to Robotics, Marc Toussaint
Position
Position of some point attached to link i
dimension d=3
parameters link index i, point offset v
kin. map φpos
iv (q) = TW →i v
pos
Jacobian Jiv (q)·k = [k ≺ i] ak × (φpos
iv (q) − pk )
Notation:
– ak , pk are axis and position of joint k
– [k ≺ i] indicates whether joint k is between root and link i
– J·k is the kth column of J
2:53
Vector
Vector attached to link i
dimension d=3
parameters link index i, attached vector v
kin. map φvec
iv (q) = RW →i v
vec
Jacobian Jiv (q) = Ai × φvec
iv (q)
Notation:
– Ai is a matrix with columns (Ai )·k = [k ≺ i] ak containing the joint axes or zeros
– the short notation “A × p” means that each column in A takes the cross-product with
p.
2:54
Introduction to Robotics, Marc Toussaint 33
Relative position
Position of a point on link i relative to point on link j
dimension d=3
parameters link indices i, j, point offset v in i and w in j
kin. map φpos -1 pos pos
iv|jw (q) = Rj (φiv − φjw )
pos pos pos
Jacobian Jiv|jw (q) = Rj-1 [Jiv − Jjw − Aj × (φpos pos
iv − φjw )]
Derivation:
For y = Rp the derivative w.r.t. a rotation around axis a is y 0 = Rp0 + R0 p = Rp0 + a × Rp.
For y = R-1 p the derivative is y 0 = R-1 p0 − R-1 (R0 )R-1 p = R-1 (p0 − a × p). (For details see
http://ipvs.informatik.uni-stuttgart.de/mlr/marc/notes/3d-geometry.pdf)
2:55
Relative vector
Vector attached to link i relative to link j
dimension d=3
parameters link indices i, j, attached vector v in i
-1 vec
kin. map φvec
iv|j (q) = Rj φiv
Jacobian vec
Jiv|j (q) = Rj-1 [Jiv
vec
− Aj × φvec
iv ]
2:56
Alignment
Alignment of a vector attached to link i with a reference v ∗
dimension d=1
parameters link index i, attached vector v, world reference v ∗
align
kin. map φiv (q) = v ∗> φvec
iv
align
Jacobian Jiv (q) = v ∗> Jivvec
Relative Alignment
Alignment a vector attached to link i with vector attached to j
dimension d=1
parameters link indices i, j, attached vectors v, w
align > vec
kin. map φiv|jw (q) = (φvec
jw ) φiv
align > vec vec> vec
Jacobian Jiv|jw (q) = (φvec
jw ) Jiv + φiv Jjw
2:58
34 Introduction to Robotics, Marc Toussaint
Joint limits
Penetration of joint limits
dimension d=1
parameters joint limits qlowP, qhi , margin m
1 n + +
kin. map φlimits (q) = m i=1 [m − qi + qlow ] + [m + qi − qhi ]
1 1
Jacobian Jlimits (q)1,i = − m [m − qi + qlow > 0] + m [m + qi − qhi > 0]
2:59
Collision limits
Penetration of collision limits
dimension d=1
parameters margin m
1
PK a b +
kin. map φcol (q) = m k=1 [m − |pk − pk |]
1
P K a b
Jacobian Jcol (q) = m k=1 [m − |pk − pk | > 0]
pa −pb
(−Jppos pos > k
a + J b )
p
k
|pa −pb |
k k k k
Center of gravity
Center of gravity of the whole kinematic structure
dimension d=3
parameters (none)
φcog (q) = i massi φpos
P
kin. map ici
pos
J cog (q) = i massi Jic
P
Jacobian i
Homing
Introduction to Robotics, Marc Toussaint 35
Example: Set the target y ∗ = 0 and the precision % very low → this task describes
posture comfortness in terms of deviation from the joints’ zero position. In the classical
view, it induces “nullspace motion”.
2:62
A quaternion joint has four DOFs. If it is currently in configuration q ∈ R4 , the ith DOFs generates
(infinitesimally) a rotation around the axis
−2
ai = p [ei ◦ q -1 ]1:3
q2
where ei ∈ R4 is the ith unit vector, ◦ is the concatenation of quaternions, q -1 the inverse quater-
nion, q 2 the quaternion 2-norm (in case it is not normalized), and [·]1:3 pics the vector elements
of the quaternion (derivation: see geometry notes). As for the hinge joint, these four axes are fur-
ther transformed to world coordinates, ai ← RW →j ai , if the quaternion joint is located in the
coordinate frame j.
2:64
36 Introduction to Robotics, Marc Toussaint
3 Dynamics
Kinematic Dynamic
3:1
xt+1 = f (xt , ut )
3:2
Introduction to Robotics, Marc Toussaint 37
Dynamics
• The dynamics of a system describes how the controls ut influence the change-
of-state of the system
xt+1 = f (xt , ut )
– The notation xt refers to the dynamic state of the system: e.g., joint positions and
velocities xt = (qt , q̇t ).
– f is an arbitrary function, often smooth
3:3
Outline
• We start by discussing a 1D point mass for 3 reasons:
– The most basic force-controlled system with inertia
– We can introduce and understand PID control
– The behavior of a point mass under PID control is a reference that we can also follow
with arbitrary dynamic robots (if the dynamics are known)
q̈(t) = u(t)/m
3:6
• Idea 1:
“Always pull the mass towards the goal q ∗ :”
u = Kp (q ∗ − q)
3:7
m q̈ = u = Kp (q ∗ − q)
m b ω 2 eωt = Kp q ∗ − Kp a − Kp b eωt
(m b ω 2 + Kp b) eωt = Kp (q ∗ − a)
⇒ (m b ω 2 + Kp b) = 0 ∧ (q ∗ − a) = 0
q
⇒ ω = i Kp /m
√
q(t) = q ∗ + b ei Kp /m t
m q̈ = u = Kp (q ∗ − q)
√
q(t) = q ∗ + b ei Kp /m t
1 cos(x)
0.5
-0.5
-1
0 2 4 6 8 10 12 14
3:9
u = Kp (q ∗ − q) + Kd (q̇ ∗ − q̇)
3:10
mq̈ = u = Kp (q ∗ − q) + Kd (0 − q̇)
⇒ (m ω 2 + Kd ω + Kp ) = 0 ∧ (q ∗ − a) = 0
p
−Kd ± Kd2 − 4mKp
⇒ω=
2m
q(t) = q ∗ + b eω t
Kd
The term − 2m in ω is real ↔ exponential decay (damping)
3:11
1 1
• wave length λ = ω0 =√ , Kp = m/λ2
Kp /m
ξ > 1: over-damped
ξ = 1: critically dampled
ξ < 1: oscillatory-damped
√ 2
q(t) = q ∗ + be−ξ t/λ ei 1−ξ t/λ
3:14
Z t
∗ ∗
u = Kp (q − q) + Kd (q̇ − q̇) + Ki (q ∗ (s) − q(s)) ds
s=0
Z t
u = Kp (q ∗ − q) + Kd (q̇ ∗ − q̇) + Ki (q ∗ − q(s)) ds
s=0
• PID control
– Proportional Control (“Position Control”)
42 Introduction to Robotics, Marc Toussaint
u ∝ Kp (q ∗ − q)
• With such linear control laws we can design approach trajectories (by tuning
the gains)
– but no optimality principle behind such motions
3:17
fi = mv̇i , ui = Ii ẇ + w × Iw
Algorithms that “propagate” forces through a kinematic tree and numerically compute
the inverse dynamics u = NE(q, q̇, q̈) or forward dynamics q̈ = f (q, q̇, u).
3:19
Introduction to Robotics, Marc Toussaint 43
d ∂L ∂L
− =u
dt ∂ q̇ ∂q
• L(q, q̇) is called Lagrangian and defined as
L=T −U
where T =kinetic energy and U =potential energy.
• q is called generalized coordinate – any coordinates such that (q, q̇) describes
the state of the system. Joint angles in our case.
• u are external forces
3:20
Example: A pendulum
X1 >
1 X1 >
Mi = Jwi mi I3 0 J i
T = mi vi2 + w>
i Ii wi = q̇ Mi q̇ ,
Ii Jiw
2 2 2 Ji 0
i i
U = gmi height(i)
d ∂L ∂L ∂T ∂U
u = − M q̈ + Ṁ q̇ −
= |{z} +
|{z} dt ∂ q̇ ∂q ∂q ∂q
control inertia | {z } |{z}
Coriolis gravity
Newton-Euler recursion
• An algorithm that computes the inverse dynamics
u = NE(q, q̇, q̈ ∗ )
– Euler’s equation expresses the torque (=control) acting on a rigid body given an
angular velocity and angular acceleration:
ui = Ii ẇ + w × Iw
• Backward recursion:
For the leaf links, we now know the desired accelerations q ∗ and can compute
the necessary joint torques. Recurse backward.
3:23
Introduction to Robotics, Marc Toussaint 45
1
q̈tref := [(qt+1 − qt )/τ − (qt − qt-1 )/τ ] = (qt-1 + qt+1 − 2qt )/τ 2
τ
Introduction to Robotics, Marc Toussaint 47
However, tiny errors in acceleration will accumulate greatly over time! This is
Instable!!
• Choose a desired acceleration q̈t∗ that implies a PD-like behavior around the refer-
ence trajectory!
q̈t∗ = q̈tref + Kp (qtref − qt ) + Kd (q̇tref − q̇t )
This is a standard and very convenient heuristic to track a reference trajectory when the
robot dynamics are known: All joints will exactly behave like a 1D point particle around the
reference trajectory!
3:30
• Inverse kinematics:
• Operational space control (one might call it “Inverse task space dynamics”):
3:32
d d ˙ ≈ J q̈ + 2J˙q̇ = JM -1 (u − F ) + 2J˙q̇
φ̈(q) = φ̇(q) ≈ (J q̇ + Jq)
dt dt
48 Introduction to Robotics, Marc Toussaint
We get
u∗ = T ] (ÿ ∗ − 2J˙q̇ + T F )
with T = JM -1 , T ] = (T>CT + H)-1 T>C
(C → ∞ ⇒ T ] = H -1 T>(T H -1 T>)-1 )
3:33
• Operational space control: Let the system behave as if we could directly “apply a 1D
point mass behavior” to the endeffector
3:34
Multiple tasks
• Recall trick last time: we defined a “big kinematic map” Φ(q) such that
3:35
Introduction to Robotics, Marc Toussaint 49
• Hardware limits
– I think: the current success stories on highly dynamic robots are all anchored in
novel hardware approaches
3:36
50 Introduction to Robotics, Marc Toussaint
4 Path Planning
path finding
trajectory optimization
feedback control
start goal
Outline
• Really briefly: Heuristics & Discretization (slides from Howie CHoset’s CMU
lectures)
– Bugs algorithm
– Potentials to guide feedback control
– Dijkstra
• Non-holonomic systems
4:6
4.1 Background
4:7
Introduction to Robotics, Marc Toussaint 53
A better bug?
“Bug 2” Algorithm
m-line
OK ?
4:8
54 Introduction to Robotics, Marc Toussaint
A better bug?
“Bug 2” Algorithm
4:9
Introduction to Robotics, Marc Toussaint 55
A better bug?
“Bug 2” Algorithm
Goal
NO! How do we fix this?
4:10
56 Introduction to Robotics, Marc Toussaint
A better bug?
“Bug 2” Algorithm
4:11
Start-Goal Algorithm:
Potential Functions
4:13
58 Introduction to Robotics, Marc Toussaint
Repulsive Potential
4:14
Introduction to Robotics, Marc Toussaint 59
F (q ) = −∇U (q )
+ =
4:15
60 Introduction to Robotics, Marc Toussaint
4:16
4:17
Introduction to Robotics, Marc Toussaint 61
4:18
62 Introduction to Robotics, Marc Toussaint
4:19
Introduction to Robotics, Marc Toussaint 63
4:20
64 Introduction to Robotics, Marc Toussaint
4:21
Introduction to Robotics, Marc Toussaint 65
4:22
66 Introduction to Robotics, Marc Toussaint
4:23
Introduction to Robotics, Marc Toussaint 67
4:24
68 Introduction to Robotics, Marc Toussaint
Two
possible
shortest
paths
shown
4:25
Dijkstra Algorithm
• Is efficient in discrete domains
– Given start and goal node in an arbitrary graph
– Incrementally label nodes with their distance-from-start
q ∈ Rn describes configuration
Qfree is the set of configurations without collision
4:28
Given the graph, use (e.g.) Dijkstra to find path from qstart to qgoal .
4:30
Local Planner
The smaller the gap (clearance %) the more unlikely to sample such points.
4:33
PRM theory
(for uniform sampling in d-dim space)
• Let a, b ∈ Qfree and γ a path in Qfree connecting a and b.
72 Introduction to Robotics, Marc Toussaint
2|γ| −σ%d n
P (PRM-success | n) ≥ 1 − e
%
|B1 |
σ= 2d |Qfree |
% = clearance of γ (distance to obstacles)
(roughly: the exponential term are “volume ratios”)
• This result is called probabilistic complete (one can achieve any probability with
high enough n)
• For a given success probability, n needs to be exponential in d
4:34
Gaussian: q1 ∼ U; q2 ∼ N(q1 , σ); if q1 ∈ Qfree and q2 6∈ Qfree , add q1 (or vice versa).
Bridge: q1 ∼ U; q2 ∼ N(q1 , σ); q3 = (q1 + q2 )/2; if q1 , q2 6∈ Qfree and q3 ∈ Qfree , add q3 .
• Sampling strategy can be made more intelligence: “utility-based sampling”
• Connection sampling
(once earlier sampling has produced connected components)
4:35
• Cons:
– Precomputation of exhaustive roadmap takes a long time
(but not necessary for “Lazy PRMs”)
4:36
2 motivations:
• Single Query path finding: Focus computational efforts on paths for specific
(qstart , qgoal )
• Use actually controllable DoFs to incrementally explore the search space: control-
based path finding.
Simplest RRT with straight line local planner and step size α
4:39
4:40
78 Introduction to Robotics, Marc Toussaint
Introduction to Robotics, Marc Toussaint 79
Bi-directional search
Summary: RRTs
• Pros (shared with PRMs):
– Algorithmically very simple
– Highly explorative
– Allows probabilistic performance guarantees
References
Steven M. LaValle: Planning Algorithms, http://planning.cs.uiuc.edu/.
4:44
RRT*
Sertac Karaman & Emilio Frazzoli: Incremental sampling-based algorithms for
optimal motion planning, arXiv 1005.0416 (2010).
4:45
Non-holonomic systems
• We define a nonholonomic system as one with differential constraints:
dim(ut ) < dim(xt )
⇒ Not all degrees of freedom are directly controllable
Car example
ẋ = v cos θ
ẏ = v sin θ
θ̇ = (v/L) tan ϕ
|ϕ| < Φ
x
State q = y
Controls u = v
ϕ
θ
ẋ v cos θ
Sytem equation =
ẏ v sin θ
θ̇ (v/L) tan ϕ
4:48
Car example
• The car is a non-holonomic system: not all DoFs are controlled, dim(u) < dim(q)
We have the differential constraint q̇:
ẋ sin θ − ẏ cos θ = 0
Introduction to Robotics, Marc Toussaint 83
• Analogy to dynamic systems: Just like a car cannot instantly move sidewards, a dy-
namic system cannot instantly change its position q: the current change in position is
constrained by the current velocity q̇.
4:49
This is generated with a PRM in the state space q = (x, y, θ) ignoring the differn-
tial constraint.
4:50
1) Select a q ∈ T
2) Pick v, φ, and τ
3) Integrate motion from q
4) Add result if collision-free
4:53
car parking
4:54
86 Introduction to Robotics, Marc Toussaint
car parking
4:55
with a trailer
4:57
• Crucial questions:
• How meassure near in nonholonomic systems?
• How find controls u to steer towards target?
4:58
Standard/Naive metrics:
• Comparing two 2D rotations/orientations θ1 , θ2 ∈ SO(2):
a) Euclidean metric between eiθ1 and eiθ2
b) d(θ1 , θ2 ) = min{|θ1 − θ2 |, 2π − |θ1 − θ2 |}
• Comparing two configurations (x, y, θ)1,2 in R2 :
Eucledian metric on (x, y, eiθ )
• Comparing two 3D rotations/orientations r1 , r2 ∈ SO(3):
Represent both orientations as unit-length quaternions r1 , r2 ∈ R4 :
d(r1 , d2 ) = min{|r1 − r2 |, |r1 + r2 |}
where | · | is the Euclidean metric.
(Recall that r1 and −r1 represent exactly the same rotation.)
• Ideal metric:
Optimal cost-to-go between two states x1 and x2 :
• Use optimal trajectory cost as metric
• This is as hard to compute as the original problem, of course!!
→ Approximate, e.g., by neglecting obstacles.
4:59
• Neglecting obstacles, there are only six types of trajectories that connect any
configuration ∈ R2 × S1 :
{LRL, RLR, LSL, LSR, RSL, RSR}
→ By testing all six types of trajectories for (q1 , q2 ) we can define a Dubins
metric for the RRT – and use the Dubins curves as controls!
• Reeds-Shepp curves are an extension for cars which can drive back.
(includes 46 types of trajectories, good metric for use in RRTs for cars)
4:61
90 Introduction to Robotics, Marc Toussaint
Outline
• These are only some very brief notes on path optimization
T
X
min ft (xt−k:t )>ft (xt−k:t )
x0:T
t=0
s.t. ∀t : gt (xt−k:t ) ≤ 0 , ht (xt−k:t ) = 0 .
5:3
Cost terms
• The ft (xt−k:t ) terms can penalize various things:
Choice of optimizer
T
X
min ft (xt−k:t )>ft (xt−k:t )
x0:T
t=0
s.t. ∀t : gt (xt−k:t ) ≤ 0 , ht (xt−k:t ) = 0 .
• Note: also the Lagrangian is the form of the so-called Gauss-Newton form.
The pseudo Hessian is a banded, symmetric, positive-definite matrix.
5:5
92 Introduction to Robotics, Marc Toussaint
6 Probabilities
Probability Theory
• Why do we need probabilities?
Outline
• Basic definitions
– Random variables
– joint, conditional, marginal distribution
– Bayes’ theorem
• Probability distributions:
– Gauss
– Dirak & Particles
These are generic slides on probabilities I use throughout my lecture. Only parts are
relevant for this course.
6:2
• Probability P : A ⊂ Ω 7→ [0, 1]
e.g., P ({1}) = 61 , P ({4}) = 16 , P ({2, 5}) = 13 ,
• Axioms: ∀A, B ⊆ Ω
– Nonnegativity P (A) ≥ 0
– Additivity P (A ∪ B) = P (A) + P (B) if A ∩ B = ∅
– Normalization P (Ω) = 1
• Implications
0 ≤ P (A) ≤ 1
P (∅) = 0
A ⊆ B ⇒ P (A) ≤ P (B)
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
P (Ω \ A) = 1 − P (A)
6:5
• A bit more formally: a random variable is a map from a measureable space to a domain
(sample space) and thereby introduces a probability measure on the domain (“assigns a
probability to each possible value”)
6:6
Probabilty Distributions
• P (X = 1) ∈ R denotes a specific probability
94 Introduction to Robotics, Marc Toussaint
Joint distributions
Assume we have two random variables X and Y
• Definitions:
Joint: P (X, Y )
P
Marginal: P (X) = Y P (X, Y )
P (X,Y )
Conditional: P (X|Y ) = P (Y )
P
The conditional is normalized: ∀Y : X P (X|Y ) = 1
Joint distributions
joint: P (X, Y )
P
marginal: P (X) = Y P (X, Y )
P (X,Y )
conditional: P (X|Y ) = P (Y )
P (Y |X) P (X)
Bayes’ Theorem: P (X|Y ) = P (Y )
6:9
Bayes’ Theorem
P (Y |X) P (X)
P (X|Y ) =
P (Y )
likelihood · prior
posterior = normalization
6:10
Multiple RVs:
• Analogously for n random variables X1:n (stored as a rank n tensor)
Joint: P (X1:n )
P
Marginal: P (X1 ) = X2:n P (X1:n ),
P (X1:n )
Conditional: P (X1 |X2:n ) = P (X2:n )
P (x = 1 | µ) = µ , P (x = 0 | µ) = 1 − µ
Bern(x | µ) = µ (1 − µ)1−x
x
• We have a data set of random variables D = {x1 , .., xn }, each xi ∈ {0, 1}. If
each xi ∼ Bern(xi | µ) we have
Qn Qn
P (D | µ) = i=1 Bern(xi | µ) = i=1 µxi (1 − µ)1−xi
n n
X 1X
argmax log P (D | µ) = argmax xi log µ + (1 − xi ) log(1 − µ) = xi
µ µ
i=1
n i=1
Pn
• The Binomial distribution is the distribution over the count m = i=1 xi
n
m
n n!
Bin(m | n, µ) = µ (1 − µ)n−m ,
=
m m (n − m)! m!
6:14
Beta
How to express uncertainty over a Bernoulli parameter µ
• The Beta distribution is over the interval [0, 1], typically the parameter µ of a
Bernoulli:
1
Beta(µ | a, b) = µa−1 (1 − µ)b−1
B(a, b)
P (D | µ)
P (µ | D) = P (µ) ∝ Bin(D | µ) Beta(µ | a, b)
P (D)
∝ µaD (1 − µ)bD µa−1 (1 − µ)b−1 = µa−1+aD (1 − µ)b−1+bD
= Beta(µ | a + aD , b + bD )
6:15
Introduction to Robotics, Marc Toussaint 97
Beta
• Conclusions:
– The semantics of a and b are counts of [xi = 1] and [xi = 0], respectively
– The Beta distribution is conjugate to the Bernoulli (explained later)
– With the Beta distribution we can represent beliefs (state of knowledge) about un-
certain µ ∈ [0, 1] and know how to update this belief given data
6:16
Beta
from Bishop
6:17
Multinomial
• We have an integer random variable x ∈ {1, .., K}
The probability of a single x can be parameterized by µ = (µ1 , .., µK ):
P (x = k | µ) = µk
PK
with the constraint k=1 µk = 1 (probabilities need to be normalized)
• We have a data set of random variables D = {x1 , .., xn }, each xi ∈ {1, .., K}. If
each xi ∼ P (xi | µ) we have
Qn Qn QK [x =k] QK
P (D | µ) = i=1 µxi = i=1 k=1 µk i = k=1 µm
k
k
98 Introduction to Robotics, Marc Toussaint
Pn
where mk = i=1 [xi = k] is the count of [xi = k]. The ML estimator is
1
argmax log P (D | µ) = (m1 , .., mK )
µ n
6:18
Dirichlet
How to express uncertainty over a Multinomial parameter µ
• The Dirichlet distribution is over the K-simplex, that is, over µ1 , .., µK ∈ [0, 1]
PK
subject to the constraint k=1 µk = 1:
QK k −1
Dir(µ | α) ∝ k=1 µα
k
It is parameterized by α = (α1 , .., αK ), has mean hµi i = Pαi and mode µ∗i =
j αj
Pαi −1 for ai > 1.
j αj −K
P (D | µ)
P (µ | D) = P (µ) ∝ Mult(D | µ) Dir(µ | a, b)
P (D)
mk QK αk −1 αk −1+mk
∝ K = K
Q Q
k=1 µk k=1 µk k=1 µk
= Dir(µ | α + m)
6:19
Dirichlet
• Conclusions:
– The semantics of α is the counts of [xi = k]
– The Dirichlet distribution is conjugate to the Multinomial
Introduction to Robotics, Marc Toussaint 99
– With the Dirichlet distribution we can represent beliefs (state of knowledge) about
uncertain µ of an integer random variable and know how to update this belief given
data
6:20
Dirichlet
Illustrations for α = (0.1, 0.1, 0.1), α = (1, 1, 1) and α = (10, 10, 10):
from Bishop
6:21
Conjugate priors
P (D | θ)
P (θ | D) ∝ P (D | θ) P (θ)
• Having a conjugate prior is very convenient, because then you know how to
update the belief given data
6:23
Conjugate priors
likelihood conjugate
Binomial Bin(D | µ) Beta Beta(µ | a, b)
Multinomial Mult(D | µ) Dirichlet Dir(µ | α)
Gauss N(x | µ, Σ) Gauss N(µ | µ0 , A)
1D Gauss N(x | µ, λ-1 ) Gamma Gam(λ | a, b)
nD Gauss N(x | µ, Λ-1 ) Wishart Wish(Λ | W, ν)
nD Gauss N(x | µ, Λ-1 ) Gauss-Wishart
N(µ | µ0 , (βΛ)-1 ) Wish(Λ | W, ν)
6:24
Ry
The (cumulative) probability distribution F (y) = P (x ≤ y) = −∞
dx p(x) ∈
[0, 1] is the cumulative integral with limy→∞ F (y) = 1
(In discrete domain: probability distribution and probability mass function P (x) ∈ [0, 1] are
used synonymously.)
R
Dirac or δ (“point particle”) δ(x) = 0 except at x = 0, δ(x) dx = 1
∂
δ(x) = ∂x H(x) where H(x) = [x ≥ 0] = Heavyside step function
6:26
Gaussian distribution
N (x|µ, σ 2 )
2σ
2
1
/σ 2
• 1-dim: N(x | µ, σ 2 ) = 1
| 2πσ 2 | 1/2
e− 2 (x−µ) µ
exp{− 21 a>A-1 a} 1
N[x | a, A] = -1 1/2
exp{− x> A x + x>a} (2)
| 2πA | 2
with precision matrix A = Σ-1 and coefficient a = Σ-1 µ (and mean µ = A-1 a).
Gaussian identities
Symmetry: N(x | a, A) = N(a | x, A) = N(x − a | 0, A)
Product:
N(x | a, A) N(x | b, B) = N[x | A-1 a+B -1 b, A-1 +B -1 ] N(a | b, A+B) N[x | a, A] N[x | b, B] =
N[x | a + b, A + B] N(A-1 a | B -1 b, A-1 + B -1 )
“Propagation”:
N(x | a + F y, A) N(y | b, B) dy = N(x | a + F b, A + F BF>)
R
y
Transformation:
N(F x + f | a, A) = 1
|F |
N(x | F -1 (a − f ), F -1 AF -> )
Marginal
& conditional:
x a A C
N , > = N(x | a, A) · N(y | b + C>A-1 (x - a), B − C>A-1 C)
y b C B
102 Introduction to Robotics, Marc Toussaint
N
X
q(x) := wi δ(x − xi )
i=1
6:31
7 Mobile Robotics
http://www.darpa.mil/grandchallenge05/
DARPA Grand Challenge 2005
7:1
Introduction to Robotics, Marc Toussaint 105
http://www.darpa.mil/grandchallenge/index.asp
DARPA Grand Urban Challenge 2007
7:2
http://www.slawomir.de/publications/grzonka09icra/grzonka09icra.pdf
Quadcopter Indoor Localization
7:3
106 Introduction to Robotics, Marc Toussaint
http://stair.stanford.edu/multimedia.php
STAIR: STanford Artificial Intelligence Robot
7:4
7:5
For this we always assumed to know the state xt of the robot (e.g., its pos-
ture/position)!
7:6
Outline
• PART I:
A core challenge in mobile robotics is state estimation
→ Bayesian filtering & smoothing
particles, Kalman
• PART II:
Another challenge is to build a map while exploring
→ SLAM (simultaneous localization and mapping)
7:7
• Given the local sensor readings yt , the current state xt (location, position) is
uncertain.
108 Introduction to Robotics, Marc Toussaint
– which hallway?
– which door exactly?
– which heading direction?
7:9
7:10
P (Y |X) P (X)
P (X|Y ) = P (Y )
likelihood · prior
posterior = (normalization)
7:11
Introduction to Robotics, Marc Toussaint 109
7:12
Bayes Filter
xt = state (location) at time t
yt = sensor readings at time t
ut-1 = control command (action, steering, velocity) at time t-1
• Given the history y0:t and u0:t-1 , we want to compute the probability distribu-
tion over the state at time t
• Generally:
xt
Filtering: P (xt |y0:t ) 111111111
000000000
000000000
111111111
y 0:t
x t
Smoothing: P (xt |y0:T ) 000000000000000
111111111111111
000000000000000
111111111111111
y 0:T
xt
Prediction: P (xt |y0:s ) 11111
00000
00000
11111
y 0:s
7:13
Bayes Filter
Bayes Filter Z
pt (xt ) ∝ P (yt | xt ) P (xt | ut-1 , xt-1 ) pt-1 (xt-1 ) dxt-1
| {z } xt-1 | {z }
new information old estimate
| {z }
predictive estimate p̂t (xt )
Introduction to Robotics, Marc Toussaint 111
(from Robust Monte Carlo localization for mobile robots Sebastian Thrun, Dieter
Fox, Wolfram Burgard, Frank Dellaert)
7:16
(from Robust Monte Carlo localization for mobile robots Sebastian Thrun, Dieter
Fox, Wolfram Burgard, Frank Dellaert)
112 Introduction to Robotics, Marc Toussaint
7:17
7:18
7:19
Bayesian Filters
• How to represent the belief pt (xt ):
Introduction to Robotics, Marc Toussaint 113
• Gaussian
• Particles
7:20
PN
p(x) ≈ q(x) := i=1 wi δ(x, xi )
114 Introduction to Robotics, Marc Toussaint
7:21
• “Particle Filter”
Instead of sampling directly n̂i ∼ Multi({N wi }) set n̂i = bN wi c + n̄i with n̄i ∼
Multi({N wi − bN wi c})
Liu & Chen (1998): Sequential Monte Carlo Methods for Dynamic Systems.
Douc, Cappé & Moulines: Comparison of Resampling Schemes for Particle Filtering.
7:23
http://www.slawomir.de/publications/grzonka09icra/grzonka09icra.pdf
Quadcopter Indoor Localization
7:24
• Classical solution: generate particles also with other than purely forward pro-
posal P (xt | ut-1 , xt-1 ):
– Or mix particles sampled directly from P (yt | xt ) and from P (xt | ut-1 , xt-1 ).
(Robust Monte Carlo localization for mobile robots. Sebastian Thrun, Dieter Fox, Wolfram Burgard,
Frank Dellaert)
116 Introduction to Robotics, Marc Toussaint
7:25
Defition:
N(x | a, A) = 1
exp{− 12 (x - a)> A-1 (x - a)}
|2πA|1/2
Product:
N(x | a, A) N(x | b, B) = N(x | B(A+B)-1 a + A(A+B)-1 b, A(A+B)-1 B) N(a | b, A + B)
“Propagation”:
y N(x | a + F y, A) N(y | b, B) dy = N(x | a + F b, A + F BF )
>
R
Transformation:
N(F x + f | a, A) = 1
|F |
N(x | F -1 (a − f ), F -1 AF -> )
pt (xt ) = N(xt | st , St )
P (yt | xt ) = N(yt | Cxt + c, W )
P (xt | ut-1 , xt-1 ) = N(xt | Axt-1 + a, Q)
Z
pt (xt ) ∝ P (yt | xt ) P (xt | ut-1 , xt-1 ) pt-1 (xt-1 ) dxt-1
xt-1
Z
= N(yt | Cxt + c, W ) N(xt | Axt-1 + a, Q) N(xt-1 | st-1 , St-1 ) dxt-1
xt-1
7:27
Bayes smoothing
xt
Filtering: P (xt |y0:t ) 111111111
000000000
000000000
111111111
y 0:t
x t
Smoothing: P (xt |y0:T ) 000000000000000
111111111111111
000000000000000
111111111111111
y 0:T
xt
Prediction: P (xt |y0:s ) 11111
00000
00000
11111
y 0:s
7:29
Bayes smoothing
• Let P = y0:t past observations, F = yt+1:T future observations
Bayesian smoothing fuses a forward filter pt (xt ) with a backward “filter” βt (xt )
Z
= βt+1 (xt+1 ) P (yt+1 | xt+1 ) P (xt+1 | xt , ut ) dxt+1
xt+1
7:30
Types of maps
Grid map
Landmark map
7:33
• Given the history y0:t and u0:t-1 , we want to compute the belief over the pose
AND THE MAP m at time t
• If we knew the map we could use a Bayes filter to compute the belief over the
state
P (xt | m, y0:t , u0:t-1 )
Map uncertainty
• In the case the map m = (θ1 , .., θN ) is a set of N landmarks, θj ∈ R2
• Drawbacks:
– Scaling (full covariance matrix is O(N 2 ))
– Sometimes non-robust (uni-modal, “data association problem”)
– Lacks advantages of Particle Filter
(multiple hypothesis, more robust to non-linearities)
7:38
• As for the Localization Problem use particles to represent the pose belief pt (xt )
Note: Each particle actually “has a history xi0:t ” – a whole trajectory!
• For each particle separately, estimate the map belief pit (m) conditioned on the
particle history xi0:t .
The conditional beliefs pit (m) may be factorized over grid points or landmarks
of the map
7:39
7:40
• In SLAM the map is uncertain → each particle is weighted with the expected
likelihood:
wti = P (yt | xit , m) pt−1 (m) dm
R
7:41
• For each particle we have a separate map belief pit (m); in the case of landmarks,
this factorizes in N separate 2D-Gaussians
• Iterate
1. Resample particles to get N weight-1-particles: {x̂it-1 }N i=1
2. Use motion model to get new “predictive” particles {xit }N i=1
3. Update the map belief pim (m) ∝ P (yt | m, xt ) pit-1 (m) for each particle
4. Compute new importance weights wti ∝ P (yt | xit , m) pt−1 (m) dm
R
7:42
7:43
Klein & Murray: Parallel Tracking and Mapping for Small AR Workspaces http:
//www.robots.ox.ac.uk/˜gk/PTAM/
Newcombe, Lovegrove & Davison: DTAM: Dense Tracking and Mapping in Real-
Time ICCV 2011. http://www.doc.ic.ac.uk/˜rnewcomb/
Engel, Schöps & Cremers: LSD-SLAM: Large-Scale Direct Monocular SLAM,
ECCV 2014. http://vision.in.tum.de/_media/spezial/bib/engel14eccv.pdf
7:44
7:46
Graph-based SLAM
Introduction to Robotics, Marc Toussaint 125
Life-long Map Learning for Graph-based SLAM Approaches in Static Environments Kretzschmar, Grisetti,
Stachniss
7:47
SLAM code
• Graph-based and grid map methods:
http://openslam.org/
• Visual SLAM
see previous links
e.g. http://ewokrampage.wordpress.com/
7:48
126 Introduction to Robotics, Marc Toussaint
8 Control Theory
8:1
Control Theory
• Concerns controlled systems of the form
ẋ = f (x, u) + noise(x, u)
Outline
• We’ll first consider optimal control
Goal: understand Algebraic Riccati equation
significance for local neighborhood control
xt+1 = f (xt , ut )
where x0 and the controller π : (x, t) 7→ u are given, which determines x1:T and
u0:T
8:7
1 1
1 3
3
7 1
15 Goal
5 8
Start 20 3
3 3 10
1
1
5 3
• Bellman equation
h i
Vt (x) = minu c(x, u) + Vt+1 (f (x, u))
h i
The argmin gives the optimal control signal: πt∗ (x) = argminu · · ·
Introduction to Robotics, Marc Toussaint 129
Derivation:
T
hX i
Vt (x) = min c(xs , us ) + φ(xT )
π
s=t
h T
X i
= min c(x, ut ) + min[ c(xs , us ) + φ(xT )]
ut π
s=t+1
h i
= min c(x, ut ) + Vt+1 (f (x, ut ))
ut
8:9
where the start state x(0) and the controller π : (x, t) 7→ u are given, which
determine the closed-loop system trajectory x(t), u(t) via ẋ = f (x, π(x, t)) and
u(t) = π(x(t), t)
8:10
• Hamilton-Jacobi-Bellman equation
h i
∂ ∂V
− ∂t V (x, t) = minu c(x, u) + ∂x f (x, u)
h i
The argmin gives the optimal control signal: π ∗ (x) = argminu · · ·
Z ∞
π
J = c(x(t), u(t)) dt
0
• The HBJ and Bellman equations remain “the same” but with the same (station-
ary) value function independent of t:
h ∂V i
0 = min c(x, u) + f (x, u) (cont. time)
u
h ∂x i
V (x) = min c(x, u) + V (f (x, u)) (discrete time)
u
h i
The argmin gives the optimal control signal: π ∗ (x) = argminu · · ·
8:12
• Reference following:
– You always want to stay close to a reference trajectory r(t)
˙
Define x̃(t) = x(t) − r(t) with dynamics x̃(t) = f (x̃(t) + r(t), u) − ṙ(t)
Define a cost Z ∞
Jπ = ||x̃||2 + ρ||u||2
0
Comments
• The Bellman equation is fundamental in optimal control theory, but also Rein-
forcement Learning
Introduction to Robotics, Marc Toussaint 131
• The HJB eq. is a differential equation for V (x, t) which is in general hard to
solve
• The (time-discretized) Bellman equation can be solved by Dynamic Program-
ming starting backward:
h i
VT (x) = φ(x) , VT -1 (x) = min c(x, u) + VT (f (x, u)) etc.
u
But it might still be hard or infeasible to represent the functions Vt (x) over
continuous x!
• Both become significantly simpler under linear dynamics and quadratic costs:
→ Riccati equation
8:14
quadratic costs
• Note: Dynamics neglects constant term; costs neglect linear and constant terms.
This is because
– constant costs are irrelevant
– linear cost terms can be made away by redefining x or u
– constant dynamic term only introduces a constant drift
8:15
Note: If the state is dynamic (e.g., x = (q, q̇)) this control is linear in the posi-
tions and linear in the velocities and is an instance of PD control
The matrix K = R-1 B>P is therefore also called gain matrix
For instance, if x(t) = (q(t) − r(t), q̇(t) − ṙ(t)) for a reference r(t) and K = [Kp Kd ]
then
u∗ = Kp (r(t) − q(t)) + Kd (ṙ(t) − q̇(t))
8:18
Riccati equations
• Finite horizon continuous time
Riccati differential equation
PT
• Finite horizon discrete time (J π = t=0 ||xt ||2Q + ||ut ||2R + ||xT ||2F )
P∞
• Infinite horizon discrete time (J π = t=0 ||xt ||2Q + ||ut ||2R )
8:19
x= q
, ẋ = q̇ q̇
= =
0 1 0
x + u
q̇ q̈ u(t)/m 0 0 1/m
= Ax + Bu , A = 0 1 , B= 0
0 0 1/m
• Costs:
a
c −1
P = , u∗ = −R-1 B>P x = [cq + bq̇]
c b %m
8:20
a
c
−1
P = , u∗ = −R-1 B>P x = [cq + bq̇]
c b %m
c
b
a
0
1 c
2
bc
1 0
0= −
+ +
0 0 c 0 2 2 0 1
%m bc b
√ √
First solve for c, then for b = m % c + . Whether the damping ration ξ =
√b depends on the choices of % and .
4mc
• The Algebraic Riccati equation is usually solved numerically. (E.g. are, care
or dare in Octave)
8:21
• Even if we can solve the HJB eq. and have the optimal control, we still don’t
know how the system really behaves?
– Will it actually reach a “desired state”?
– Will it be stable?
– It is actually “controllable” at all?
• Inverse Kinematics:
T
X T
X
min f (q0:T ) = ||Ψt (qt-k , .., qt )||2 + ||Φt (qt )||2
q0:T
t=0 t=0
• Reinforcement Learning:
– Markov Decision Processes ↔ discrete time stochastic controlled system P (xt+1 | ut , x
– Bellman equation → Basic RL methods (Q-learning, etc)
8:23
8.2 Controllability
8:24
Controllability
• As a starting point, consider the claim:
“Intelligence means to gain maximal controllability over all degrees of freedom over the
environment.”
Note:
– controllability (ability to control) 6= control
– What does controllability mean exactly?
Controllability
• Consider a linear controlled system
ẋ = Ax + Bu
How can we tell from the matrices A and B whether we can control x to eventually
reach any desired state?
Is x “controllable”?
136 Introduction to Robotics, Marc Toussaint
ẋ1 0
=
1 x1 0
+ u
ẋ2 0 0 x2 1
Is x “controllable”?
8:26
Controllability
We consider a linear stationary (=time-invariant) controlled system
ẋ = Ax + Bu
• Complete controllability: All elements of the state can be brought from arbi-
trary initial conditions to zero in finite time
• A system is completely controllable iff the controllability matrix
h i
C := B AB A2 B · · · An-1 B
has full rank dim(x) (that is, all rows are linearly independent)
• Meaning of C:
The ith row describes how the ith element xi can be influenced by u
“B”: ẋi is directly influenced via B
“AB”: ẍi is “indirectly” influenced via AB (u directly influences some ẋj
via B; the dynamics A then influence ẍi depending on ẋj )
...
“A2 B”: x i is “double-indirectly” influenced
etc...
Note: ẍ = Aẋ + B u̇ = AAx + ABu + B u̇
...
x = A3 x + A2 Bu + AB u̇ + B ü
8:27
Controllability
• When all rows of the controllability matrix are linearly independent ⇒ (u, u̇, ü, ...)
can influence all elements of x independently
• If a row is zero → this element of x cannot be controlled at all
• If 2 rows are linearly dependent → these two elements of x will remain coupled
forever
8:28
Controllability examples
Introduction to Robotics, Marc Toussaint 137
ẋ1 0
=
0 x1 1
+ u C= 1 0 rows linearly dependent
ẋ2 0 0 x2 1 1 0
ẋ1 0
=
0 x1 1
+ u C= 1 0 2nd row zero
ẋ2 0 0 x2 0 0 0
ẋ1 0
=
1 x1 0
+ u C= 0 1 good!
ẋ2 0 0 x2 1 1 0
8:29
Controllability
Why is it important/interesting to analyze controllability?
• The Algebraic Riccati Equation will always return an “optimal” controller – but
controllability tells us whether such a controller even has a chance to control x
• “Intelligence means to gain maximal controllability over all degrees of freedom over the
environment.”
– real environments are non-linear
– “to have the ability to gain controllability over the environment’s DoFs”
8:30
8.3 Stability
8:31
Stability
• One of the most central topics in control theory
Stability
• Stability is an analysis of the closed loop system. That is: for this analysis we
don’t need to distinguish between system and controller explicitly. Both to-
gether define the dynamics
• What follows:
– 3 basic definitions of stability
– 2 basic methods for analysis by Lyapunov
8:33
Stability – 3 definitions
ẋ = F (x) with equilibrium point x = 0
• x0 is an equilibrium point ⇐⇒ f (x0 ) = 0
• asymtotically stable ⇐⇒
Lyapunov stable and limt→∞ x(t) = 0
• exponentially stable ⇐⇒
asymtotically stable and ∃α, a s.t. ||x(t)|| ≤ ae−αt ||x(0)||
R∞
(→ the “error” time integral 0 ||x(t)||dt ≤ αa ||x(0)|| is bounded!)
8:35
• Meaning: An eigenvalue describes how the system behaves along one state dimension
(along the eigenvector):
ẋi = λi xi
As for the 1D point mass the solution is xi (t) = aeλi t and
– imaginary λi → oscillation
– negative real(λi ) → exponential decay ∝ e−|λi |t
– positive real(λi ) → exponential explosion ∝ e|λi |t
8:36
• Dynamics:
ẋ = q̇ 0 1
= x + 1/m
0 0 x
q̈ 0 0 −Kp −Kd
A= 0 1
−Kp /m −Kd /m
• Eigenvalues:
The equation λ q
=
0 1 q 2
leads to the equation λq̇ = λ q =
q̇ −Kp /m −Kd /m q̇
−Kp /mq − Kd /mλq or mλ2 + Kd λ + Kp = 0 with solution (compare slide 05:10)
p
−Kd ± Kd2 − 4mKp
λ=
2m
xt+1 = Axt
• In standard cases, a good guess for the Lyapunov function is either the energy
or the value function
8:41
(We could have derived this easier! x>Qx are the immediate state costs, and x>K>RKx =
u>Ru are the immediate control costs—and V̇ (x) = −c(x, u∗ )! See slide 11 bottom.)
• That is: V is a Lyapunov function if Q + K>RK is positive definite!
8:43
RL = Learning to Act
Applications of RL
• Robotics
– Navigation, walking, juggling, helicopters, grasping, etc...
• Games
– Backgammon, Chess, Othello, Tetris, ...
• Control
– factory processes, resource control in multimedia networks, elevators, ....
• Operations Research
– Warehousing, transportation, scheduling, ...
9:3
a0 a1 a2
s0 s1 s2
r0 r1 r2
QT
P (s0:T +1 , a0:T , r0:T ; π) = P (s0 ) t=0 P (at |st ; π) P (rt |st , at ) P (st+1 |st , at )
• Stationary MDP:
– We assume P (s0 | s, a) and P (r|s, a) independent of time
R
– We also define R(s, a) := E{r|s, a} = r P (r|s, a) dr
9:4
ẋ = f (x, u) + noise(x, u)
Introduction to Robotics, Marc Toussaint 145
π : (x, t) 7→ u
9:5
D = {(xt , ut , ct )}Tt=0
Five approaches to RL
learn model learn value fct. optimize policy learn policy learn latent costs
P (x0 |u, x) V (x) π(x) π(x) c(x)
c(x, u)
Imitation Learning
policy
Model−free
Inverse OC
policy
πt (x) π(x)
9:7
146 Introduction to Robotics, Marc Toussaint
Five approaches to RL
learn model learn value fct. optimize policy learn policy learn latent costs
P (s0 |s, a) V (s) π(s) π(s) R(s, a)
R(s, a)
Imitation Learning
dynamic prog. policy dynamic prog.
V (s) π(s) V (s)
Policy Search
Model−based
policy
Model−free
policy
Inverse RL
π(s) π(s)
9:8
Imitation Learning
learn/copy
D = {(s0:T , a0:T )d }nd=1 → π(s)
Literature:
Imitation Learning
• There a many ways to imitate/copy the oberved policy:
Learn a density model P (at | st )P (st ) (e.g., with mixture of Gaussians) from the
observed data and use it as policy (Billard et al.)
Introduction to Robotics, Marc Toussaint 147
Imitation Learning
Inverse RL
learn DP
D = {(s0:T , a0:T )d }nd=1 → R(s, a) → V (s) → π(s)
Literature:
Pieter Abbeel & Andrew Ng: Apprenticeship learning via inverse reinforcement learning (ICML
2004)
Andrew Ng & Stuart Russell: Algorithms for Inverse Reinforcement Learning (ICML 2000)
Nikolay Jetchev & Marc Toussaint: Task Space Retrieval Using Inverse Feedback Control (ICML
2011).
9:12
148 Introduction to Robotics, Marc Toussaint
max ξ
w,ξ
9:14
Introduction to Robotics, Marc Toussaint 149
Policy gradients
• In continuous state/action case, represent the policy as linear in arbitrary state
features:
k
X
π(s) = φj (s)βj = φ(s)>β (deterministic)
j=1
with k features φj .
∂V (β)
∂β
9:16
Policy Gradients
• One approach is called REINFORCE:
Z Z
∂V (β) ∂ ∂
= P (ξ|β) R(ξ) dξ = P (ξ|β) log P (ξ|β)R(ξ)dξ
∂β ∂β ∂β
H H
∂ X ∂ log π(at |st ) X t0 −t
= Eξ|β { log P (ξ|β)R(ξ)} = Eξ|β { γt γ rt 0 }
∂β ∂β
t=0 t0 =t
| {z }
Qπ (st ,at ,t)
– Then the natural gradient ( ∂V∂β(β) multiplied with inv. Fisher metric) updates
β new = β + αw
• Another is PoWER: Use the stochastic policy π(a | s), let at = φ(st )>(β + t ) where
t ∼ N(0, Σ) is the sampled noise, then ∂V∂β(β) = 0 implies
Kober & Peters: Policy Search for Motor Primitives in Robotics, NIPS 2008.
Learning To Walk
Tedrake, Zhang & Seung: Stochastic policy gradient reinforcement learning on a simple 3D biped. IROS,
2849-2854, 2004. http://groups.csail.mit.edu/robotics-center/public_papers/Tedrake0
pdf
9:19
Introduction to Robotics, Marc Toussaint 151
Kober & Peters: Policy Search for Motor Primitives in Robotics, NIPS 2008.
Vlassis, Toussaint (2009): Learning Model-free Robot Control by a Monte Carlo EM Algorithm. Au-
tonomous Robots 27, 123-130.
“Black-Box Optimization”
• The term is not really well defined
– I use it to express that only f (x) can be evaluated
– ∇f (x) or ∇2 f (x) are not (directly) accessible
More common terms:
• Global optimization
– This usually emphasizes that methods should not get stuck in local optima
– Very very interesting domain – close analogies to (active) Machine Learning, ban-
dits, POMDPs, optimal decision making/planning, optimal experimental design
– Usually mathematically well founded methods
Black-Box Optimization
• Problem: Let x ∈ Rn , f : Rn → R, find
min f (x)
x
152 Introduction to Robotics, Marc Toussaint
A zoo of approaches
• People with many different backgrounds drawn into this
Ranging from heuristics and Evolutionary Algorithms to heavy mathematics
Stochastic Search
9:26
Stochastic Search
• The general recipe:
– The algorithm maintains a probability distribution pθ (x)
– In each iteration it takes n samples {xi }ni=1 ∼ pθ (x)
– Each xi is evaluated → data {(xi , f (xi ))}ni=1
– That data is used to update θ
Introduction to Robotics, Marc Toussaint 153
• Stochastic Search:
Input: initial parameter θ, function f (x), distribution model pθ (x), update heuristic
h(θ, D)
Output: final θ and best point x
1: repeat
2: Sample {xi }ni=1 ∼ pθ (x)
3: Evaluate samples, D = {(xi , f (xi ))}n i=1
4: Update θ ← h(θ, D)
5: until θ converges
9:27
Stochastic Search
• The parameter θ is the only “knowledge/information” that is being propagated
between iterations
θ encodes what has been learned from the history
θ defines where to search in the future
• Update heuristic:
– Given D = {(xi , f (xi ))}λi=1 , select µ best: D0 = bestOfµ (D)
– Compute the new mean x̂ from D0
9:30
CMA references
Hansen, N. (2006), ”The CMA evolution strategy: a comparing review”
Hansen et al.: Evaluating the CMA Evolution Strategy on Multimodal Test
Functions, PPSN 2004.
• An interesting variant:
Igel et al.: A Computational Efficient Covariance Matrix Update and a (1 + 1)-
CMA for Evolution Strategies, GECCO 2006.
9:32
CMA conclusions
• It is a good starting point for an off-the-shelf black-box algorithm
• It includes components like estimating the local gradient (pσ , pC ), the local
“Hessian” (Var(D0 )), smoothing out local minima (large populations)
9:33
Input: initial parameter θ, function f (x), distribution model pθ (x), update heuristic
h(θ, D)
Output: final θ and best point x
1: repeat
2: Sample {xi }ni=1 ∼ pθ (x)
3: Evaluate samples, D = {(xi , f (xi ))}n i=1
4: Update θ ← h(θ, D)
5: until θ converges
π : (yt−h,..,t-1 , ut−h,..,t−1 ) 7→ ut
9:35
Introduction to Robotics, Marc Toussaint 157
Grasping
• The most elementary type of interaction with (manipulation of) the environ-
ment.
– Basis for intelligent behavior.
• In industrial settings with high precision sensors and actuators: very fast and
precise.
• In general real world with uncertain actuators and perception, still a great re-
search challenge, despite all the theory that has been developed.
10:1
Pick-and-place in industry
(This type of kinematics is called “Delta Robot”, which is a “parallel robot” just
as the Stewart platform.)
10:2
Research
• Using ultra high speed and precise cameras and localization: High speed robot
hand from the Ishikawa Komuro’s “Sensor Fusion” Lab
158 Introduction to Robotics, Marc Toussaint
http://www.k2.t.u-tokyo.ac.jp/fusion/index-e.html
• Asimo’s grasping:
10:3
Outline
• Introduce to the basic classical concepts for grasping (force closure)
• References:
Force Closure
• Which of these objects is in “tight grip”?
Introduction to Robotics, Marc Toussaint 159
Defining “tight grip”: Assume fingers (vectors) have no friction – but can exert
arbitrary normal forces. Can we generate (or counter-act) arbitrary forces on the
object?
10:5
• Equilibrium: The contact forces can balance the object’s weight and other ex-
ternal forces.
10:7
• Guaranteed synthesis:
1) put fingers “everywhere”
2) while redundant finger exists delete any redundant finger
(A finger is redundant if it can be deleted without reducing the positive linear
span.)
DLR hand
10:13
Introduction to Robotics, Marc Toussaint 163
Legged Locomotion
• Why legs?
11:1
Legged Locomotion
• Why legs?
Rolling vs walking
164 Introduction to Robotics, Marc Toussaint
11:3
One-legged locomotion
Tedrake: Applied Optimal Control for Dynamically Stable Legged Locomotion. PhD thesis (2004). http:
//groups.csail.mit.edu/robotics-center/public_papers/Tedrake04b.pdf
11:4
Biped locomotion
• Walking vs Running
Introduction to Robotics, Marc Toussaint 165
• 2 phases of Walking
– double-support phase (in Robotics often statically stable)
– single-support phase (statically instable)
11:5
Asimo
11:6
• Try yourself: Move as slow as you can but make normal length steps...
11:7
(i) the possibility of rotation of the overall system about one of the foot edges
caused by strong disturbances, which is equivalent to the appearance of an
unpowered (passive) DOF,
Vukobratovic & Borovac: Zero-moment point—Thirty five years of its life. International Journal of Hu-
manoid Robotics 1, 157-173, 2004. http://www.cs.cmu.edu/˜cga/legs/vukobratovic.
pdf
11:8
11:9
• “ZMP is defined as that point on the ground at which the net moment of the
inertial forces and the gravity forces has no component along the horizontal
axes.” (Vukobratovic & Borovac, 2004)
See also: Popovic, Goswami & Herr: Ground Reference Points in Legged Locomotion: Definitions, Bio-
logical Trajectories and Control Implications. International Journal of Robotics Research 24(12), 2005.
http://www.cs.cmu.edu/˜cga/legs/Popovic_Goswami_Herr_IJRR_Dec_2005.pdf
11:10
• Locomotion is dynamically stable if the ZMP remains within the foot-print poly-
gons.
(↔ the support can always apply some momentum, if necessary)
Kajita et al.: Biped Walking Pattern Generation by using Preview Control of Zero-Moment Point. ICRA
2003. http://eref.uqu.edu.sa/files/eref2/folder1/biped_walking_pattern_generatio
by_usin_53925.pdf
11:12
ZMP Summary
• ZMP is the “rescue” for conventional methods:
– ZMP-stability → the robot can be controlled as if foot is “glued” (virtually) to
the ground!
– The whole body behaves like a “conventional arm”
– Can accellerate q̈ any DoF → conventional dynamic control
fully actuated system
• Limitations:
– Humans don’t use ZMP stability, we allow our feet to roll
(toe-off, heel-strike: ZMP at edge of support polygon)
– Can’t describe robots with point feet (walking on stilts)
11:13
11:15
2. Pelvic Rotation:
170 Introduction to Robotics, Marc Toussaint
3. Pelvic Tilt:
11:16
Anderson & Pandy: Dynamic Optimization of Human Walking. Journal of Biomechanical Engineer-
ing 123:381-390, 2001. http://e.guigon.free.fr/rsc/article/AndersonPandy01.pdf
Anderson & Pandy: Static and dynamic optimization solutions for gait are practically equivalent. Journal
of Biomechanics 34 (2001) 153-161. http://www.bme.utexas.edu/faculty/pandy/StaticOptWalk
pdf
11:17
Goswami, Thuilot & Espiau: A study of the passive gait of a compass-like biped robot: symmetry and
chaos. International Journal of Robotics Research 17, 1998.
http://www.ambarish.com/paper/COMPASS_IJRR_Goswami.pdf
11:19
11:20
Westervelt, Grizzle & Koditschek: Hybrid Zero Dynamics of Planar Biped Walkers. IEEE Trans. on
Automatic Control 48(1), 2003.
http://repository.upenn.edu/cgi/viewcontent.cgi?article=1124&context=ese_
papers
11:22
Introduction to Robotics, Marc Toussaint 175
Geyer, Seyfarth & Blickhan: Compliant leg behaviour explains basic dynamics of walking and running.
Proc. Roy. Soc. Lond. B, 273(1603): 2861-2867, 2006. http://www.cs.cmu.edu/˜cga/legs/
GeyerEA06RoySocBiolSci.pdf
11:23
Learning To Walk
Tedrake, Zhang & Seung: Stochastic policy gradient reinforcement learning on a simple 3D biped. IROS,
2849-2854, 2004. http://groups.csail.mit.edu/robotics-center/public_papers/Tedrake0
pdf
11:24
Summary
176 Introduction to Robotics, Marc Toussaint
• ZMP type walking was successful (ASIMO, HRP-2, etc), but limited
11:25
11:26
http://www.bostondynamics.com/robot_rise.html
11:27
178 Introduction to Robotics, Marc Toussaint
12 Exercises
12.1 Exercise 1
No need to prepare for this first tutorial. We’ll do the exercises together on the fly.
XA + A> = I
12.1.3 Optimization
b) What 2nd-order optimization methods (querying f (x), ∇f (x), ∇2 f (x) in each it-
eration) do you know?
c) What is backtracking line search?
12.2 Exercise 2
12.2.1 Geometry
Future exercises will require to code some examples in C/C++. Test if you can
compile and run the lib that accompanies this lecture. Report on problems with
installation.
On Ubuntu:
12.3 Exercise 3
12.3.1 Task spaces and Jacobians
a) The example solution generates a motion in one step using inverse kinematics
δq = J ] δy with J ] = (J>J + σ 2 W )-1 J>. Record the task error, that is, the deviation
of hand position from y ∗ after each step. Why is it initially large?
b) Try to do 100 smaller steps δq = αJ ] δy with α = .1 (each step starting with the
outcome of the previous step). How does the task error evolve over time?
c) Generate a nice trajectory composed of T = 100 time steps. Interpolate the target
linearly ŷ ← y0 + (t/T )(y ∗ − y0 ) in each time step.
d) Generate a trajectory that moves the right hand in a circle centered at (−0.2, −0.4, 1.1),
aligned with the xz-plane, with radius 0.2.
12.4 Exercise 4
12.4.1 Verify some things from the lecture
where the approximation ≈ means that we use the local linearization Φ(q) = Φ(q0 )+
J(q − q0 ). Verify this.
b) On slide 02:34 there is a term (I − J # J) called nullspace projection. Verify that
for % → ∞ (and C = %I) and any choice of a ∈ Rn
δq = (I − J # J)a ⇒ δy = 0
(assuming linearity of φ, i.e., Jδq = δy). Note: this means that any choice of a, the
motion (I − J # J)a will not change the “endeffector position” y.
c) On slide 02:27 we derived the basic inverse kinematics law. Verify that (assuming
linearity of φ) for C → ∞ the desired task is fulfilled exactly, i.e., φ(q ∗ ) = y ∗ . By
writing C → ∞ we mean that C is a matrix of the form C = %C0 , % ∈ R, and we
take the limit % → ∞.
Consider again the last week’s exercise where the robot moved his hand in a circle
(Exercise 3.2d).
a) We’ve seen that the solution does track the circle nicely, but the trajectory “gets
lost” in joint space, leading to very strange postures. We can fix this by adding
more tasks, esp. a task that permanently tries to (moderatly) minimize the distance
182 Introduction to Robotics, Marc Toussaint
void multiTask(){
Simulator S("man.ors");
arr q,q_home,y_target,y,J,W,Phi,PhiJ;
uint n = S.getJointDimension();
S.getJointAngles(q);
W.setDiag(1.,n); //we define W as identity matrix
q_home = q; //we store the initial posture as q_home
for(uint i=0;i<10000;i++){
Phi.clear();
PhiJ.clear();
//add another task for the left hand here, if you like
12.5 Exercise 5
In the lecture we discussed PD force control on a 1D point mass, which leads to
oscillatory behavior for high Kp and damped behavior for high Kd . Slide 03:15
replaces the parameters Kp , Kd by two other, more intuitive parameters, λ and ξ:
λ roughly denotes the time (or time steps) until the goal is reached, and ξ whether
it is reached agressively (ξ > 1, which overshoots a bit) or by exponential decay
(ξ ≤ 1). Use this to solve the following exercise.
a) Implement the system equation for a 1D point mass with mass m = 3.456. That
is, implement the Euler intergration (see below) of the system dynamics that com-
184 Introduction to Robotics, Marc Toussaint
xt+1 = xt + τ f (x, u)
We have the same point mass as above. But now the goal position q ∗ changes over
time: we have a reference trajectory
q ∗ (t) = cos(t)
How can one use a PD controller to let the point mass robustly and precisely follow
this reference trajectory?
Simulate the problem as above, but at every full second apply a randomized impuls
(=instantaneous change in velocity) on the point mass. That is, simply set q̇ ←
q̇ + ξ for ξ ∼ N(0, .1). (In code: double xi = 0.1 * rnd.gauss();) Generate
oscillatory, overdamped and critical recovery behaviors.
Introduction to Robotics, Marc Toussaint 185
12.6 Exercise 6
12.6.1 Fun with Euler-Lagrange
You will find an arm with three joints that is swinging freely under gravity (ignor-
ing collisions).
a) Apply direct PD control (without using M and F ) to each joint separately and try
to find parameters Kp and Kd (potentially different for each joint) to hold the arm
steady, i.e., q ∗ = 0 and q̇ ∗ = 0. Bonus: If you are successful, try the same for the arm
in pegArm2.ors.
Play with setDynamicsSimulationNoise and check stability.
b) As above, try to hold the arm steady at q ∗ = 0 and q̇ ∗ = 0. But now use the
knowledge of M and F in each time step. For this, decide on a desired wavelength λ
and damping behavior ξ and compute the respective Kp and Kd (assuming m = 1),
the same for each joint. Use the PD equation to determine desired accelerations q̈ ∗
(slide 03:31) and use inverse dynamics to determine the necessary u.
12.7 Exercise 7
Consider the 1D point mass with mass m = 1 state x = (q, q̇). The 2D space of (q, q̇)
combining position and velocity is also called phase space. In RRT’s (in line 4 on
slide 04:39) we need to find the nearest configuration qnear to a qtarget . But what does
“near” mean in phase space? This exercise explores this question.
Introduction to Robotics, Marc Toussaint 187
Consider a current state x0 = (0, 1) (at position 0 with velocity 1). Pick any random
phase state xtarget ∈ R2 . How would you connect x0 with xtarget in a way that
fulfils the differential constraints of the point mass dynamics? Given this trajectory
connecting x0 with xtarget , how would you quantify/meassure the distance? (If
you defined the connecting trajectory appropriately, you should be able to give an
analytic expression for this distance.) Given a set (tree) of states x1:n and you pick
the closest to xtarget , how would you “grow” the tree from this closest point towards
xtarget ?
12.8 Exercise 8
12.8.1 Bayes Basics
a) Box 1 contains 8 apples and 4 oranges. Box 2 contains 10 apples and 2 oranges.
Boxes are chosen with equal probability. What is the probability of choosing an
apple? If an apple is chosen, what is the probability that it came from box 1?
b) The Monty Hall Problem: I have three boxes. In one I put a prize, and two are
empty. I then mix up the boxes. You want to pick the box with the prize in it. You
choose one box. I then open another one of the two remaining boxes and show that
it is empty. I then give you the chance to change your choice of boxes—should you
do so?
c) Given a joint probability P (X, Y ) over 2 binary random variables as the table
Y=0 Y=1
X=0 .06 .24
X=1 .14 .56
What are P (X) and P (Y )? Are X and Y conditionally independent?
12.8.2 Gaussians
N(x | a, A) N(x | b, B)
188 Introduction to Robotics, Marc Toussaint
12.9 Exercise 9
where y(xit ) is the (ideal) observation the car would have if it were in the particle
possition xit . Since i wi = 1, normalize the weights after this computation.
P
c) Test the full particle filter including the likelihood weights (step 4) and resam-
pling (step 2). Test using a larger (10σobservation ) and smaller (σobservation /10) variance
in the computation of the likelihood.
We consider the same car example as above, but track the car using a Kalman filter.
Introduction to Robotics, Marc Toussaint 189
a) To apply a Kalman filter (slide 07:27) we need Gaussian models for P (xt | xt-1 , ut-1 )
as well as P (yt |xt ). We assume that the dynamics model is given as a local Gaussian
of the form
u1 cos x3
xt+1 (x, u) = x + τ u1 sin x3
(u1 /L) tan u2
What is the matrix C(xt ) (the Jacobian of the landmark positions w.r.t. the car state)
in our example?
c) Start with the code in RoboticsCourse/06-kalmanSLAM.
Write a Kalman filter to track the car. You can use the routine getObservationJacobian
∂y
to access C(xt ) = ∂x . Note that c(xt ) = ŷt − C(xt )xt , where ŷt is the mean observa-
tion in state xt (there is another routine for this).
12.10 Exercise 10
12.10.1 The value function in Markov Decision Processes
where P (a0 |s0 ; π) described the agent’s policy. We assume a deterministic agent
and write at = π(st ). The value of a state s is defined as the expected discounted
sum of future rewards,
given that the agent starts in state s and from there executes the policy π.
a) Prove
We distinguish 7 states s1 , .., s7 in the maze. The first 3 states are the T-junctions;
the last 4 states receive rewards (4, 0, 2, 3). At each T-junction we have two possible
actions: left, right. Everything is deterministic. Assume a discounting γ = 0.5.
Compute (by hand) the value function V π over all states when π is the random policy
(50/50 left/right at each junction).
c) Bellman’s principle of optimality says that the optimal policy π ∗ has a value
∗
function V π (s) = V ∗ (s),
h i
V ∗ (s) = max E{r0 | s, a} + γ s0 P (s0 | s, a) V π (s0 ) .
P
(10)
a
Compute (by hand) the optimal value function V ∗ over all states for the example
above.
d) Now consider continuous state s and action a. Let the policy be stochastic and
linear in features φ(s) ∈ Rk , that is,
The covariance matrix φ(s)>Σφ(s) describes that each action at = φ(st )>(β + t ) was
generated by adding a noise term t ∼ N(0, Σ) to the parameter β. We always start
in the same state ŝ and the value V π (s0 ) is
X
V π (ŝ) = E{ Hrt | s0 = ŝ} (12)
t=0
h H
X i-1 XH
β ∗ = β old + Eξ|β { W (st )Qπ(β) (st , at , t)} Eξ|β { W (st )et Qπ(β) (st , at , t)} , W (s) =
t=0 t=0
∗)
∂V π(β
from ∂β = 0. (In the scalar case, W (s) = 1.) As a first step, derive
∂
log π(a|s)
∂β
Introduction to Robotics, Marc Toussaint 191
12.11 Exercise 11
12.11.1 Local linearization and Algebraic Riccati equation
Slide 08:02 describes the cart pole, which is similar to the Segway-type system of
Exercise 6, but a little simpler. We’ll solve the cart pole in this exercise.
The state of the cart-pole is given by x = (p, ṗ, θ, θ̇), with p ∈ R the position of the
cart, θ ∈ R the pendulums angular deviation from the upright position and ṗ, θ̇
their respective temporal derivatives. The only control signal u ∈ R is the force
applied on the cart. The analytic model of the cart pole is
h i
g sin(θ) + cos(θ) −c1 u − c2 θ̇2 sin(θ)
θ̈ = 4 2
(13)
3 l − c2 cos (θ)
h i
p̈ = c1 u + c2 θ̇2 sin(θ) − θ̈ cos(θ) (14)
with g = 9.8ms2 the gravitational constant, l = 1m the pendulum length and con-
stants c1 = (Mp + Mc )−1 and c2 = lMp (Mp + Mc )−1 where Mp = Mc = 1kg are the
pendulum and cart masses respectively.
a) Derive the local linearization of these dynamics around x∗ = (0, 0, 0, 0). The
eventual dynamics should be in the form
ẋ = Ax + Bu
Note that
0 1 0 0
0
∂ p̈ ∂ p̈ ∂ p̈ ∂ p̈
∂ p̈
∂p ∂ ṗ ∂θ
A= ∂ θ̇ , B= ∂u
0 0 0 1 0
∂ θ̈ ∂ θ̈ ∂ θ̈ ∂ θ̈
∂ θ̈
∂p ∂ ṗ ∂θ ∂ θ̇ ∂u
g c1 c2
− 4 l−c c1 +
0 0 0
4
3 l−c2
A= 2
, B=
3
0 0 0 1 0
g
−c1
0 0 0
4 4
3 l−c2 3 l−c2
192 Introduction to Robotics, Marc Toussaint
Choose = 1/1000 and iterate until convergence. Output the gains K = −R-1 B>P .
(Why should this iteration converge to the solution of the ARE?)
c) Solve the same Algebraic Riccati equation by calling the are routine of the
octave control package (or a similar method in Matlab). For Octave, install the
Ubuntu packages octave3.2, octave-control, and qtoctave, perhaps use
pkg load control and help are in octave to ensure everything is installed,
use P=are(A,B*inverse(R)*B’,Q) to solve the ARE. Output K = −R-1 B>P
and compare to b).
(I found the solution K = (1.0000, 3.8271, 48.3070, 17.615).)
d) Implement the optimal Linear Quadratic Regulator u∗ = −R-1 B>P x on the cart
pole simulator. In the Robotics code base, path 07-cartPole there is some code
– but the main.problem.cpp was outdated. Please use the one from the website or
email.
Simulate the optimal LQR and test it for various noise levels.
12.12 Exercise 13
On Wed. 29th we meet as usual for the exercise.
On Tue. 28th, 14:00 my office, interested students are invited to try whatever they
like on the racer hardware, play around, etc.
We consider again the simulation of the racer as given in 09-racer in your code
repo.
Introduction to Robotics, Marc Toussaint 193
π : φ(y) 7→ u = φ(u)>w
that maps some features of the (history of the) direct sensor signals y to the control
policy.
Use black-box optimization to find parameters w that robustly balance the racer.
Some notes:
However, I noticed that a balancing policy can also be found for the direct
sensor signals only, that is:
φ(y) = (1, y) ∈ R5
• Episodes and duration costs: To compute the cost for a given w you need
to simulate the racer for a couple of time steps. For the optimization it is
really bad if an episode is so long that it includes a complete failure and
wrapping around of the inverted pendulum. Therefore, abort an episode if
fabs(R.q(1)) too large and penalize an abortion with an extra cost, e.g.,
proportional to T − t. Try different episode horizons T up to 500; maybe
increase this horizon stage-wise.
• Optimizer: You are free to use any optimizer you like. On the webpage
you find a reference implementation of CMA by Niko Hansen (with wrap-
per using our arr), which you may use. In that case, add cmaes.o and
search_CMA.o in the Makefile. The typical loop for optimization is
194 Introduction to Robotics, Marc Toussaint
SearchCMA cma;
cma.init(5, -1, -1, -0.1, 0.1);
arr samples, values;
uint T=500;
for(uint t=0;t<1000;t++){
cma.step(samples, values);
for(uint i=0;i<samples.d0;i++) values(i) = evaluateControlParameters(samp
uint i=values.minIndex();
cout <<t <<’ ’ <<values(i) <<’ ’ <<samples[i] <<endl;
}
13.1 Kinematics
• 3D geometry
– Definition of an object pose, frame, transformations
– Homogeneous transformations (4×4 matrix)
– Composition of transformations, notation xW = TW →A TA→B TB→C xC
• Fwd kinematics & Jacobian
– Fwd kinematics as composition of transformations
– Transformations of a rotational joint
– Kinematic maps φpos : q 7→ y and φvec
– Definition of a Jacobian
– Derivation of the position and vector Jacobians J pos , J vec
• Inverse kinematics (IK)
– Optimality criterion for IK
– Using the local linearization to find the optimum
– Pseudo code of following a task trajectory in small steps using Inverse Kine-
matics
– Definition & example for a singularity
• Motion profiles & Interpolation
– Motion profiles (esp. sine profile)
– Joint space vs. task space interpolation of a trajectory [e.g. using a motion
profile in one or the other space]
• Multiple Tasks
– How to incorporate multiple tasks
– What are interesting task maps; know at least about pos, vec, align, and
limits
• Further
– Be able to explain the consequences of the local linearization in IK (big steps
→ errors)
196 Introduction to Robotics, Marc Toussaint
13.2 Dynamics
• Euler-Lagrange equation
– Definition
– Roughly: structure of the Euler-Lagrange equation for systems; understand
at least T = 12 q̇>M q̇
– Be able to apply on minimalistic system (e.g. pendulum, point mass)
• Basics
– Path finding vs. trajectroy optimization vs. feedback control
– Roughly: BUG algorithms
– Potential functions, and that they’re nothing but IK with special task vari-
ables
– Dijkstra Algorithm
• Non-holonomic Systems
– Definition of non-holonomicity. Be able to give example
– Path finding: control-based sampling
– RRT extension for control-based exploration
– Roughly: Intricacies with metrics for non-holonomic systems
• Probability Basics
– Definitions of random variable, probability distribution, joint, marginal,
conditional distribution, independence
– Bayes’ Theorem
– Continuous distributions, Gaussian, particles
• State Estimation
– Formalization of the state estimation problem
– The Bayes Filter as the general analytic solution
– Gaussians and particles to approximate the Bayes filter and make it compu-
tationally feasible:
– Details of a Particle Filter
– Kalman filter (esp. assumptions made, not eq. or derivation)
– Extended KF (assumptions made)
– Odometry (dead reckoning) as “Bayes filter without observations”
• SLAM
– Definition of the SLAM problem
– In what sense SLAM is a “chicken or egg problem”
– Joint filtering over x and m: Extended Kalman SLAM
– Particle-based SLAM (map belief for each particle)
– Roughly: graph-based SLAM & loop closing
198 Introduction to Robotics, Marc Toussaint
• Controllability
– Definition and understanding/interpretation of the controllability matrix C
– Definition of controllability
– Be able to apply on simple examples
• Stability
– Definitions of stability
– Eigenvalue analysis for linear systems
– Optimize controllers for negativity of eigenvalues
– Definition of a Lyapunov function
– Lyanunov’s theorem: ∃ Lyapunov fct. → stability
– Energy and value function as candidates for a Lyapunov fct.
Index
1D point mass (3:6), Dynamic vs. non-dynamic robots (3:1),