0% found this document useful (0 votes)
2 views

AMathCookbookForRobotManipulation

The document is a comprehensive guide on mathematical techniques for robot manipulation, covering topics such as quadratic cost minimization, kinematics, and optimization methods like iLQR. It includes detailed sections on algorithms, trajectory encoding, and control strategies, providing a structured approach to robotic motion planning and control. The content is aimed at researchers and practitioners in robotics, offering both theoretical foundations and practical applications.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

AMathCookbookForRobotManipulation

The document is a comprehensive guide on mathematical techniques for robot manipulation, covering topics such as quadratic cost minimization, kinematics, and optimization methods like iLQR. It includes detailed sections on algorithms, trajectory encoding, and control strategies, providing a structured approach to robotic motion planning and control. The content is aimed at researchers and practitioners in robotics, offering both theoretical foundations and practical applications.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

A Math Cookbook for Robot Manipulation

Sylvain Calinon, Idiap Research Institute

Contents
1 Introduction 3

2 Quadratic costs minimization as a product of Gaussians (PoG) 3

3 Newton’s method for minimization 4


3.1 Gauss–Newton algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Least squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.3 Least squares with constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 Forward kinematics (FK) for a planar robot manipulator 8

5 Inverse kinematics (IK) for a planar robot manipulator 9


5.1 Numerical estimation of the Jacobian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.2 Inverse kinematics (IK) with task prioritization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

6 Encoding with basis functions 10


6.1 Univariate trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
6.2 Multidimensional outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6.3 Multidimensional inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6.4 Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6.5 Concatenated basis functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6.6 Batch computation of basis functions coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6.7 Recursive computation of basis functions coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6.8 Computation of basis functions coefficients using eikonal cost . . . . . . . . . . . . . . . . . . . . . . . 16

7 Linear quadratic tracking (LQT) 17


7.1 LQT with initial state optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
7.2 LQT with smoothness cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
7.3 LQT with control primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
7.4 LQR with a recursive formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
7.5 LQT with a recursive formulation and an augmented state space . . . . . . . . . . . . . . . . . . . . . 20
7.6 Least squares formulation of recursive LQR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.7 Dynamical movement primitives (DMP) reformulated as LQT with control primitives . . . . . . . . . 22

8 iLQR optimization 24
8.1 Batch formulation of iLQR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
8.2 Recursive formulation of iLQR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
8.3 Least squares formulation of recursive iLQR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
8.4 Updates by considering step sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
8.5 iLQR with quadratic cost on f (xt ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
8.5.1 Robot manipulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
8.5.2 Bounded joint space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
8.5.3 Bounded task space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
8.5.4 Reaching task with robot manipulator and prismatic object boundaries . . . . . . . . . . . . . 30
8.5.5 Initial state optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
8.5.6 Center of mass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
8.5.7 Bimanual robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
8.5.8 Obstacle avoidance with ellipsoid shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
8.5.9 Distance to a target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

1
8.5.10 Manipulability tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
8.5.11 Decoupling of control commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
8.5.12 Curvature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
8.6 iLQR with control primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
8.7 iLQR for spacetime optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
8.8 iLQR with offdiagonal elements in the precision matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
8.9 Car steering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
8.10 Bicopter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

9 Ergodic control 37
9.1 Spectral-based ergodic control (SMC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
9.1.1 Unidimensional SMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
9.1.2 Multidimensional SMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
9.2 Diffusion-based ergodic control (HEDAC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

10 Torque-controlled robots 46
10.1 Impedance control in joint space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
10.2 Impedance control in task space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
10.3 Forward dynamics for a planar robot manipulator and associated control strategy . . . . . . . . . . . . 47

11 Orientation representations and Riemannian manifolds 47


11.1 Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
11.2 Sphere manifolds S d in robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
11.3 Gaussian distributions on Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
11.4 Other homogeneous manifolds in robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
11.5 Ellipsoids and SPD matrices as S++d
manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
11.6 Non-homogeneous manifolds in robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

A System dynamics at trajectory level 54

B Derivation of motion equation for a planar robot 54

C Linear systems used in the bimanual tennis serve example 55

D Equivalence between LQT and LQR with augmented state space 56

E Forward dynamics (FD) for a planar robot manipulator 56

2
1 Introduction
This cookbook presents several learning and optimal control recipes for robotics (essentially for robot manipulators),
complemented by simple toy problems that can be easily coded. It accompanies Robotics Codes From Scratch
(RCFS), a website containing interactive sandbox examples and exercises, together with a set of standalone source
code examples gathered in a git repository, which can be accessed at:
https://rcfs.ch

Each section in this document lists the corresponding source codes in Python and Matlab (ensuring full compat-
ibility with GNU Octave), as well as in C++ and Julia for some of the principal examples, which can be accessed
at:
https://gitlab.idiap.ch/rli/robotics-codes-from-scratch

2 Quadratic costs minimization as a product of Gaussians (PoG)


The solution of a quadratic cost function can be viewed probabilistically as corresponding to a Gaussian distribution.
Indeed, given a precision matrix W , the quadratic cost

c(x) = (x − µ)⊤ W (x − µ), (1)


= ∥x − µ∥2W , (2)

has an optimal solution x∗ = µ. This solution does not contain much information about the cost function itself.
Alternatively, we can view x as a random variable with a Gaussian distribution, i.e., p(x) = N (µ, Σ), where µ and
Σ = W −1 are the mean vector and covariance matrix of the Gaussian, respectively. The negative log-likelihood of this
Gaussian distribution is equivalent to (2) up to a constant factor. According to p(x), x has the highest probability
at x̄, and Σ gives the directional information on how this probability changes as we move away from µ. The point
having the lowest cost in (2) is therefore associated with the point having the highest probability.
Similarly, the solution of a cost function composed of several quadratic terms

X
K

µ̂ = arg min (x − µk ) Wk (x − µk ) (3)
x
k=1

QK
can be seen as a product of Gaussians k=1 N (µk , Wk−1 ), with centers µk and covariance matrices Wk−1 . The
Gaussian N (µ̂, Ŵ −1 ) resulting from this product has parameters
!−1 !
X
K X
K X
K
µ̂ = Wk Wk µ k , Ŵ = Wk .
k=1 k=1 k=1

µ̂ and Ŵ are the same as the solution of (3) and its Hessian, respectively. Viewing the quadratic cost probabilis-
tically can capture more information about the cost function in the form of the covariance matrix Ŵ −1 .
In case of Gaussians close to singularity, to be numerically more robust when computing the product of two
Gaussians N (µ1 , Σ1 ) and N (µ2 , Σ2 ), the Gaussian N (µ̂, Σ̂) resulting from the product can be alternatively computed
with
−1 −1 −1
µ̂ = Σ2 (Σ1 + Σ2 ) µ1 + Σ1 (Σ1 + Σ2 ) µ2 , Σ̂ = Σ1 (Σ1 + Σ2 ) Σ2 .
Figure 1 shows an illustration for 2 Gaussians in a 2-dimensional space. It also shows that when one of the Gaussians
is singular, the product corresponds to a projection operation, that we for example find in nullspace projections to
solve prioritized tasks in robotics.

3
Figure 1: Quadratic costs minimization as a product of Gaussians (PoG).

3 Newton’s method for minimization


We would like the find the value of a decision variable x that would give us
a cost c(x) that is a small as possible, see Figure 2. Imagine that we start
from an initial guess x1 and that can observe the behavior of this cost function
within a very small region around our initial guess. Now let’s assume that we
can make several consecutive guesses that will each time provide us with simi-
lar local information about the behavior of the cost around the points that we
guessed. From this information, what point would you select as second guess
(see question marks in the figure), based on the information that you obtained
from the first guess? There were two relevant information in the small portion
of curve that we can observe in the figure to make a smart choice. First, the
trend of the curve indicates that the cost seems to decrease if we move on the
left side, and increase if we move on the right side. Namely, the slope c′ (x1 )
of the function c(x) at point x1 is positive. Second, we can observe that the
portion of the curve has some curvature that can be also be informative about
the way the trend of the curve c(x) will change by moving to the left or to the Figure 2: Problem formulation.
right. Namely, how much the slope c′ (x1 ) will change, corresponding to an ac-
celeration c′′ (x1 ) around our first guess x1 . This is informative to estimate how
much we should move to the left of the first guess to wisely select a second guess.

Now that we have this intuition, we can move to a


more formal problem formulation. Newton’s method at-
tempts to solve minx c(x) or maxx c(x) from an initial
guess x1 by using a sequence of second-order Taylor
approximations of c(x) around the iterates, see Fig. 3.
The second-order Taylor expansion around the point
xk can be expressed as
1
c(xk +∆xk ) ≈ c(xk ) + c′ (xk ) ∆xk + c′′ (xk ) ∆x2k , (4)
2
where c′ (xk ) and c′′ (xk ) are the first and second deriva-
tives at point xk .
We are interested in solving minimization problems
with this approximation. If the second derivative c′′ (xk )
is positive, the quadratic approximation is a convex func-
tion of ∆xk , and its minimum can be found by setting
Figure 3: Newton’s method for minimization, starting from an
the derivative to zero. initial estimate x1 and converging to a local minimum (red point)
after 5 iterations.

4
To find the local optima of a function, we can localize
the points whose derivatives are zero, see Figure 4 for an
illustration.
Thus, by differentiating (4) w.r.t. ∆xk and equating to zero, we obtain
c′ (xk ) + c′′ (xk ) ∆xk = 0,
meaning that the minimum is given by
c′ (xk )
∆x̂k = −
c′′ (xk )
which corresponds to the offset to apply to xk to minimize the second-order
Figure 4: Finding local optima by
polynomial approximation of the cost at this point. localizing the points whose derivatives
It is important that c′′ (xk ) is positive if we want to find local minima, see are zero (horizontal slopes).
Figure 5 for an illustration.
By starting from an initial estimate x1 and by recursively refining the current
estimate by computing the offset that would minimize the polynomial approx-
imation of the cost at the current estimate, we obtain at each iteration k the
recursion
c′ (xk )
xk+1 = xk − ′′ . (5)
c (xk )
The geometric interpretation of Newton’s method is that at each iteration,
it amounts to the fitting of a paraboloid to the surface of c(x) at xk , having the
same slopes and curvature as the surface at that point, and then proceeding to
the maximum or minimum of that paraboloid. Note that if c(x) is a quadratic
function, then the exact extremum is found in one step, which corresponds to
the resolution of a least-squares problem.

Multidimensional case
For functions that depend on multiple variables stored as multidimensional vec- Figure 5: Newton update that would
tors x, the cost function c(x) can similarly be approximated by a second-order be achieved when the second derivative
is negative.
Taylor expansion around the point xk with
∂c 1 ∂2c
c(xk +∆xk ) ≈ c(xk ) + ∆x⊤
k + ∆x⊤
k ∆xk , (6)
∂x xk 2 ∂x2 xk

which can also be rewritten in augmented vector form as


 ⊤   
1 1 0 gx⊤ 1
c(xk +∆xk ) ≈ c(xk ) + ,
2 ∆x k g x Hxx ∆xk
with gradient vector
∂c
g(xk ) = , (7)
∂x xk
and Hessian matrix
∂2c
H(xk ) = . (8)
∂x2 xk
We are interested in solving minimization problems
with this approximation. If the Hessian matrix H(xk ) is
positive definite, the quadratic approximation is a con-
vex function of ∆xk , and its minimum can be found by
setting the derivatives to zero, see Figure 6.
By differentiating (6) w.r.t. ∆xk and equation to
zero, we obtain that
−1
∆x̂k = −H(xk ) g(xk ),
is the offset to apply to xk to minimize the second-order
polynomial approximation of the cost at this point.
By starting from an initial estimate x1 and by re-
cursively refining the current estimate by computing the
offset that would minimize the polynomial approxima-
tion of the cost at the current estimate, we obtain at
each iteration k the recursion
Figure 6: Newton’s method for minimization with 2D decision
−1
xk+1 = xk − H(xk ) g(xk ). (9) variables.

5
3.1 Gauss–Newton algorithm
The Gauss–Newton algorithm is a special case of Newton’s method in which the cost is quadratic (sum of squared
PR
function values), with c(x) = i=1 ri2 (x) = r ⊤ r = ∥r∥2 , where r are residual vectors. We neglect in this case the
second-order derivative terms of the Hessian. The gradient and Hessian can in this case be computed with

g = 2Jr⊤ r, H ≈ 2Jr⊤ Jr ,

where Jr ∈ RR×D is the Jacobian matrix of r ∈ RR . This definition of the Hessian matrix makes it positive definite,
which is useful to solve minimization problems as for well conditioned Jacobian matrices, we do not need to verify the
positive definiteness of the Hessian matrix at each iteration.
The update rule in (9) then becomes
−1
xk+1 = xk − Jr⊤ (xk )Jr (xk ) Jr⊤ (xk ) r(xk )
= xk − Jr† (xk ) r(xk ), (10)

where Jr† denotes the pseudoinverse of Jr .


The Gauss–Newton algorithm is the workhorse of many robotics problems, including inverse kinematics and optimal
control, as we will see in the next sections.

3.2 Least squares


When the cost c(x) is a quadratic function w.r.t. x, the optimization problem can be solved directly, without requiring
iterative steps. Indeed, for any matrix A and vector b, we can see that if

c(x) = ∥Ax − b∥2 = (Ax − b)⊤ (Ax − b), (11)

deriving c(x) w.r.t. x and equating to zero yields

Ax − b = 0 ⇐⇒ x = (A⊤ A)−1 A⊤ b = A† b, (12)

which corresponds to the standard analytic least squares estimate. We will see later in the inverse kinematics section
that for the complete solution can also include a nullspace structure.
In contrast to the Gauss–Newton algorithm presented in Section 3.1, the optimization problem in (11), described
by a quadratic cost on the decision variable x, admits the solution (12) that can be computed directly without
relying on an iterative procedure. We can observe that if we follow the Gauss–Newton procedure, the residual vector
corresponding to the cost (11) is r = Ax − b and its Jacobian matrix is Jr = A. By starting from an initial estimate
x0 , the Gauss–Newton update in (10) then takes the form

xk+1 = xk − A† (Axk − b)
= xk − (A⊤ A)−1 A⊤ (Axk − b)
⊤  ⊤ x + A † b
(A
= xk −  A)−1 
(AA) k

= A† b,

which converges in a single iteration, independently of the initial guess x0 . Indeed, a cost that takes a quadratic form
with respect to the decision variable can be soled in batch form (least squares solution), which will be a useful property
that we will exploit later in the context of linear quadratic controllers.
A ridge regression problem can similarly be defined as

x̂ = arg min ∥Ax − b∥2 + α∥x∥2


x
= (A⊤ A + αI)−1 A⊤ b, (13)

which is also called penalized least squares, robust regression, damped least squares or Tikhonov regularization. In
(13), I denotes an identity matrix (diagonal matrix with 1 as elements in the diagonal). α is typically a small scalar
value acting as a regularization term when inverting the matrix A⊤ A.
The cost can also be weighted by a matrix W , providing the weighted least squares solution

x̂ = arg min ∥Ax − b∥W


2
x
= arg min (Ax − b)⊤ W (Ax − b)
x
= (A W A)−1 A⊤ W b.

(14)

6
By combining (13) and (14), a weighted ridge regression problem can be defined as

x̂ = arg min ∥Ax − b∥2W + α∥x∥2


x
= (A⊤ W A + αI)−1 A⊤ W b. (15)

If the matrices and vectors in (15) have the structure


     
W1 0 A1 b
W = , A= , b= 1 ,
0 W2 A2 b2

we can consider an optimization problem using only the first part of the matrices, yielding the estimate

x̂ = arg min ∥A1 x − b1 ∥2W1 + α∥x∥2


x
−1 ⊤
= (A⊤
1 W1 A1 + αI) A 1 W1 b 1 . (16)

which is the same as the result of computing (15) with W2 = 0.

3.3 Least squares with constraints


A constrained minimization problem of the form

min (Ax − b)⊤ (Ax − b), s.t. Cx = h, (17)


x

can also be solved analytically by considering a Lagrange multiplier variable λ allowing us to rewrite the objective as

min (Ax − b)⊤ (Ax − b) + λ⊤ (Cx − h).


x,λ

Differentiating with respect to x and λ and equating to zero yields

A⊤ (Ax − b) + C ⊤ λ = 0, Cx − h = 0,

which can rewritten in matrix form as     ⊤ 


A⊤A C ⊤ x A b
= .
C 0 λ h
With this augmented state representation, we can then see that
   ⊤ −1  ⊤ 
x A A C⊤ A b
=
λ C 0 h

minimizes the constrained cost. The first part of this augmented state then gives us the solution of (17).

7
4 Forward kinematics (FK) for a planar robot manipulator
IK_manipulator.*

The forward kinematics (FK) function of a planar robot manipu-


lator is defined as
 ⊤ 
ℓ cos(Lx)
f =  ℓ⊤ sin(Lx) 
1⊤ x
 
ℓ1 cos(x1 )+ℓ2 cos(x1 +x2 )+ℓ3 cos(x1 +x2 +x3 )+. . .
=  ℓ1 sin(x1 )+ℓ2 sin(x1 +x2 )+ℓ3 sin(x1 +x2 +x3 )+. . . ,
x1 + x2 + x3 + . . .

with x the state of the robot (joint angles), f the position of the robot
endeffector, ℓ a vector of robot links lengths, L a lower triangular
matrix with unit elements, and 1 a vector of unit elements, see Fig. 7.
The position and orientation of all articulations can similarly be
computed with the forward kinematics function
Figure 7: Forward kinematics for a planar robot
h i⊤ with two links.
f˜ = L diag(ℓ) cos(Lx), L diag(ℓ) sin(Lx), Lx
˜ 
f1,1 f˜1,2 f˜1,3 . . .
= f˜2,1 f˜2,2 f˜2,3 . . ., (18)
f˜3,1 f˜3,2 f˜3,3 . . .

f˜1,1 = ℓ1 cos(x1 ), f˜1,2 = ℓ1 cos(x1 )+ℓ2 cos(x1 +x2 ), f˜1,3 = ℓ1 cos(x1 )+ℓ2 cos(x1 +x2 )+ℓ3 cos(x1 +x2 +x3 ),
with f˜2,1 = ℓ1 sin(x1 ), f˜2,2 = ℓ1 sin(x1 )+ℓ2 sin(x1 +x2 ), f˜2,3 = ℓ1 sin(x1 )+ℓ2 sin(x1 +x2 )+ℓ3 sin(x1 +x2 +x3 ), . . .
f˜3,1 = x1 , f˜3,2 = x1 + x2 , f˜3,3 = x1 + x2 + x3 .

In Python, this can be coded for the endeffector position part as


1 D = 3 #State space dimension (joint angles)
2 x = np.ones(D) * np.pi / D #Robot pose
3 l = np.array ([2, 2, 1]) #Links lengths
4 L = np.tril(np.ones ([D,D])) #Transformation matrix
5 f = np.array ([L @ np.diag(l) @ np.cos(L @ x), L @ np.diag(l) @ np.sin(L @ x)]) #Forward kinematics

8
5 Inverse kinematics (IK) for a planar robot manipulator
IK_manipulator.*

By differentiating the forward kinematics function, a least norm inverse kinematics solution can be computed with

ẋ = J †(x) f˙, (19)

where the Jacobian J (x) corresponding to the endeffector forward kinematics function f (x) can be computed as (with
a simplification for the orientation part by ignoring the manifold aspect)
 
− sin(Lx)⊤ diag(ℓ)L
J =  cos(Lx)⊤ diag(ℓ)L 
1⊤
 
J1,1 J1,2 J1,3 . . .
= J2,1 J2,2 J2,3 . . .,
J3,1 J3,2 J3,3 . . .

with
J1,1 = −ℓ1 sin(x1 )−ℓ2 sin(x1 +x2 )−ℓ3 sin(x1 +x2 +x3 )−. . . , J1,2 = −ℓ2 sin(x1 +x2 )−ℓ3 sin(x1 +x2 +x3 )−. . . , J1,3 = −ℓ3 sin(x1 +x2 +x3 )−. . . ,
J2,1 = ℓ1 cos(x1 )+ℓ2 cos(x1 +x2 )+ℓ3 cos(x1 +x2 +x3 )+. . . , J2,2 = ℓ2 cos(x1 +x2 )+ℓ3 cos(x1 +x2 +x3 )+. . . , J2,3 = ℓ3 cos(x1 +x2 +x3 )+. . . , ...
J3,1 = 1, J3,2 = 1, J3,3 = 1,

In Python, this can be coded for the endeffector position part as


1 J = np.array([-np.sin(L @ x).T @ np.diag(l) @ L, np.cos(L @ x).T @ np.diag(l) @ L]) #Jacobian (for endeffector)

This Jacobian can be used to solve the inverse kinematics (IK) problem that consists of finding a robot pose to
reach a target with the robot endeffector. The underlying optimization problem consists of minimizing a quadratic
cost c = ∥f − µ∥2 , where f is the position of the endeffector and µ is a target to reach with this endeffector. An
optimization problem with a quadratic cost can be solved iteratively with a Gauss–Newton method, requiring to
compute Jacobian pseudoinverses J † , see Section 3.1 for details.

5.1 Numerical estimation of the Jacobian


IK_num.*

Section 5 above presented an analytical solution for the Jacobian. A numerical solution can alternatively be
estimated by computing
∂fi (x) fi (x(j) ) − fi (x)
Ji,j = ≈ ∀i, ∀j,
∂xj ∆
with x(j) a vector of the same size as x with elements
(
(j) xk + ∆, if k = j,
xk =
xk , otherwise,

where ∆ is a small value for the approximation of the derivatives.

5.2 Inverse kinematics (IK) with task prioritization


IK_nullspace.*

In (19), the pseudoinverse provides a single least norm solution. This result can be generalized to obtain all
solutions of the linear system with
ẋ = J †(x) f˙ + N(x) g(x), (20)
where g(x) is any desired gradient function and N(x) the nullspace projection operator

N(x) = I − J (x)† J (x).

In the above, the solution is unique if and only if J (x) has full column rank, in which case N (x) is a zero matrix.
An alternative way of computing the nullspace projection matrix, numerically more robust, is to exploit the singular
value decomposition
J † = U ΣV ⊤ ,

9
(a) Moving the first joint as (b) Orientation tracking a as (c) Position tracking as (d) Tracking of a moving object with the right
secondary task secondary task secondary task hand as secondary task

Figure 8: Examples of nullspace controllers. The red paths represent the trajectories of moving objects that need to be tracked by the
robot endeffectors.

to compute
N = Ũ Ũ ⊤ ,
where Ũ is a matrix formed by the columns of U that span for the corresponding zero rows in Σ.
Figure 8 presents examples of nullspace controllers. Figure 8a shows a robot controller keeping its endeffector still
as primary objective, while trying to move the first joint as secondary objective, which is achieved with the controller
 
  1
0
ẋ = J †(x) + N(x)0 .
0
0

6 Encoding with basis functions


MP.*

Basis functions can be used to encode signals in a compact manner through a weighted superposition of basis
functions, acting as a dictionary of simpler signals that are superposed to form more complex signals.
Basis functions can for example be used to encode trajectories, whose input is a 1D time variable and whose
output can be multidimensional. For basis functions ϕ(t) described by a time or phase variable t, the corresponding
continuous signal x(t) is encoded with x(t) = ϕ(t) w, with t a continuous time variable and w a vector containing the
superposition weights. Basis functions can also be employed in a discretized form by providing a specific list of time
steps, which is the form we will employ next.
The term movement primitive is often used in robotics to refer to the use of basis functions to encode trajectories.
It corresponds to an organization of continuous motion signals in the form of a superposition in parallel and in series
of simpler signals, which can be viewed as “building blocks” to create more complex movements, see Fig. 9. This
principle, coined in the context of motor control [12], remains valid for a wide range of continuous time signals (for
both analysis and synthesis).
The simpler form of movement primitives consists of encoding a movement as a weighted superposition of simpler
movements. The compression aims at working in a subspace of reduced dimensionality, while denoising the signal and
capturing the essential aspects of a movement.

6.1 Univariate trajectories


A univariate trajectory x1D ∈ RT of T datapoints can be represented as a weighted sum of K basis functions with

X
K
x1D = ϕk wk1D = ϕ w1D , (21)
k=1

where ϕ can be any set of basis functions, including some common forms that are presented below (see also [2] for
more details).

Piecewise constant basis functions


Piecewise constant basis functions can be computed in matrix form as

ϕ = IK ⊗ 1 T , (22)
K

T
where IK is an identity matrix of size K and 1 T is a vector of length K compose of unit elements, see Fig. 9.
K

10
Figure 9: Examples of basis functions. Left: Representation in timeline form for K = 5 and T = 300. Right: Representation in matrix
form for K = 5 and T = 20, with a grayscale colormap where white pixels are 0 and black pixels are 1.

Radial basis functions (RBFs)


Gaussian radial basis functions (RBFs) can be computed in matrix form as

ϕ = exp(−λ E ⊙ E), with K − 1T µ ,


E = t 1⊤ s⊤
(23)

where λ is bandwidth parameter, ⊙ is the elementwise (Hadamard) product operator, t ∈ RT is a vector with entries
linearly spaced between 0 to 1, µs ∈ RK is a vector containing the RBF centers linearly spaced on the [0, 1] range,
and the exp(·) function is applied to each element of the matrix, see Fig. 9.
RBFs can also be employed in a rescaled form by replacing each row ϕk with ∑Kϕk ϕ .
i=1 i

Bernstein basis functions


Bernstein basis functions (used for Bézier curves) can be computed as

(K − 1)! K−k
ϕk = (1T − t) ⊙ tk−1 , (24)
(k − 1)!(K − k)!

∀k ∈ {1, . . . , K}, where t ∈ RT is a vector with entries linearly spaced between 0 to 1, and (·)d is applied elementwise,
see Fig. 9.

Fourier basis functions


Fourier basis functions can be computed in matrix form as

ϕ = exp(t k̃⊤ 2πi), (25)

where the exp(·) function is applied to each element of the matrix, t ∈ RT is a vector with entries linearly spaced

between 0 to 1, k̃ = [−K +1, −K +2, . . . , K −2, K −1] , and i is the imaginary unit (i2 = −1).
If x is a real and even signal, the above formulation can be simplified to

ϕ = cos(t k⊤ 2πi), (26)



with k = [0, 1, . . . , K −2, K −1] , see Fig. 9.
Indeed, we first note that exp(ai) is composed of a real part and an imaginary part with exp(ai) = cos(a) + i sin(a).

11
We can see that for a given time step t, a real state xt can be constructed with the Fourier series

X
K−1
xt = wk exp(t k 2πi)
k=−K+1

X
K−1
= wk cos(t k 2π)
k=−K+1

X
K−1
= w0 + 2wk cos(t k 2π),
k=1

where we used the properties cos(0) = 1 and cos(−a) = cos(a) of the cosine function. Since we do not need direct
correspondences between the Fourier transform and discrete cosine transform as in the above, we can omit the scaling
factors for wk and directly write the decomposition as in (26).

6.2 Multidimensional outputs


A multivariate trajectory x ∈ RDT of T datapoints of dimension D can similarly be computed as
 
Iϕ1,1 Iϕ2,1 · · · IϕK,1
XK  Iϕ1,2 Iϕ2,2 · · · IϕK,2 
 
x= Ψk wk = Ψ w, with Ψ = ϕ ⊗ I =  . .. .. ..  , (27)
 .. . . . 
k=1
Iϕ1,T Iϕ2,T · · · IϕK,T

where ⊗ the Kronecker product operator and I is an identity matrix of size D × D.


In the above, I can alternatively be replaced by a rectangular matrix S acting as a coordination matrix.

6.3 Multidimensional inputs


spline2d.*

Basis functions can also be


used to encode signals generated
by multivariate inputs ti . For ex-
ample, a Bézier surface uses two
input variables t1 and t2 to cover
a spatial range and generates an
output variable describing the
height of the surface within this
rectangular region. This sur-
face (represented as a colormap Figure 10: Concatenated Bernstein basis functions used with inputs of various dimensions to
in Fig. 10-center) can be con- encode signed distance functions (SDFs). Left: Concatenation of 6 cubic Bézier curves with a
1D input (time variable t). Center: Concatenation of 5×5 cubic Bézier curves with a 2D input.
structed from a 1D input sig-
Right: Concatenation of 5×5×5 cubic Bézier curves with a 3D input. Each cubic Bézier curve
nal by leveraging the Kronecker corresponds to the superposition of 4 Bernstein basis functions, with superposition weights
product operation, namely acting as control points (displayed as black points in the 1D example), with constraints on the
control points ensuring continuity.
Ψ=ϕ ⊗ ϕ (28)

in matrix form, or
Ψ(t) = ϕ(t1 ) ⊗ ϕ(t2 ) (29)
in analytic form.
When using splines of the form ϕ(t) = T (t)BC, (29) can equivalently be computed as
 
Ψ(t) = T (t1 ) ⊗ T (t2 ) BC ⊗ BC , (30)

which means that BC ⊗ BC can be precomputed.
As for the unidimensional version, we still maintain a linear reconstruction x = Ψ w, where x and w are vectorized
versions of surface heights and superposition weights, which can be reorganized as 2D arrays if required.
Successive Kronecker products can be used so that any number of input and output dimensions din and dout can
be considered, with
Ψ = ϕ ⊗ ϕ ⊗ · · · ⊗ ϕ ⊗ Idout . (31)
| {z }
din

12
For example, a vector field in 3D can be encoded with basis functions

Ψ = ϕ ⊗ ϕ ⊗ ϕ ⊗ I3 , (32)

where I3 is a 3×3 identity matrix to encode the 3 elements of the vector.


Figure 10 shows examples with various input dimensions to encode a signed distance field (SDF). In Fig. 10-center,
several equidistant contours are displayed as closed paths, with the one corresponding to the object contour (distance
zero) represented in blue. In Fig. 10-right, several isosurfaces are displayed as 3D shapes, with the one corresponding
to the object surface (distance zero) represented in blue. A marching algorithm has been used here for visualization
purpose to compute closed contours (in 2D) and isosurfaces (in 3D).
The analytic expression provided by the proposed encoding can be used to express the derivatives as analytic
expressions, which is useful for control and planning problem, such as moving closer or away from objects, or orienting
the robot gripper to be locally aligned with the surface of an object.

6.4 Derivatives
By using basis functions as analytic expressions, the derivatives are easy to compute. For example, for 2D inputs as
in (29), we have
∂Ψ(t) ∂ϕ(t1 ) ∂Ψ(t) ∂ϕ(t2 )
= ⊗ ϕ(t2 ), = ϕ(t1 ) ⊗ , (33)
∂t1 ∂t1 ∂t2 ∂t2
providing the derivatives of Ψ(t) with respect to t expressed as
∂Ψ(t) ∂Ψ(t)
∇Ψ(t) = ⊗ , (34)
∂t1 ∂t2
which can be used to compute the 2D gradient of the SDF at location t with

∇x = ∇Ψ(t) w. (35)
∂ϕ(t) ∂T (t)
When using splines of the form ϕ(t) = T (t)BC, the derivatives in (33) are simply computed as ∂t = ∂t BC.
For a cubic splines, it corresponds to T (t) = [1, t, t2 , t3 ] and ∂T∂t(t) = [0, 1, 2t, 3t2 ].

Example: finding the closest point on a contour

By modeling a signed distance function as x = f (t), the derivatives can for example be used to find a point
on the contour (with distance zero). Such problem can be solved with Gauss–Newton optimization with the cost
c(t) = 21 f (t)2 , by starting from an initial guess t0 . In the above example, the Jacobian is then given by
 ∂Ψ(t) 
∂t1 w
 ∂Ψ(t) 
J (t) =  
 ∂t2 w , (36)
..
.
and the update can be computed recursively with

tk+1 ← tk − αJ (tk )† f (tk ), (37)

as in Equation (10), where J (tk )⊤ f (tk ) is a gradient and J (tk )⊤ J (tk ) a Hessian matrix, and α is a line search param-
eter.

Example: finding two contour points with closest distance

For two shapes encoded as SDFs f1 (t1 ) and f2 (t2 ) using basis functions, finding the shortest segment between the
two shapes boils down to the Gauss–Newton optimization
  of a pair of points t1 and t2 are expressed in their respective
t1
shape coordinate frame and are organized as t = , with the cost
t2
1 1 1
c(t) = f1 (t1 )2 + f2 (t2 )2 + ∥A t2 + b − t1 ∥2 , (38)
2 2 2
which is composed of quadratic residual terms, where A and b are the rotation matrix and translation vector offset
between the two objects, respectively. The Jacobian of the residual terms in the above cost is given by
 
J1 (t1 ) 0
J (t) =  0 J2 (t2 ) , (39)
−I A

13
where J1 (t1 ) and J2 (t2 ) are the derivatives of the two SDFs as given by (36).
Similarly to (37), the Gauss–Newton update step is then given by
 
f1 (t1,k )
tk+1 ← tk − αJ (tk )†  f2 (t2,k ) . (40)
A t2,k + b − t1,k

A prioritized optimization scheme can also be used to have the two points on the shape boundaries as primary
objective and to move these points closer as secondary objective, with a corresponding Gauss–Newton update step
given by
 
† f1 (t1,k )
tk+1 ← tk − α1 J12 + α2 N12 J3† (A t2,k + b − t1,k ), (41)
f2 (t2,k )
 
J (t ) 0 †  
where J12 = 1 1 , N12 = I − J12 J12 , J3 = −I A , (42)
0 J2 (t2 )

with α1 and α2 are two line search parameters.

6.5 Concatenated basis functions


spline1d.* spline2d.*

When encoding entire signals, some dictionaries such as Bernstein basis functions require to set polynomials of
high order to encode long or complex signals. Instead of considering a global encoding, it can be useful to split
the problem as a set of local fitting problems which can consider low order polynomials. A typical example is the
encoding of complex curves as a concatenation of simple Bézier curves. When concatenating curves, constraints on
the superposition weights are typically considered. These weights can be represented as control points in the case of
Bézier curves and splines. We typically constrain the last point of a curve and the first point of the next curve to
maintain the continuity of the curve. We also typically constrain the control points before and after this joint to be
symmetric, effectively imposing smoothness.
In practice, this can be achieved efficiently by simply replacing ϕ with ϕC in the above equations, where C is a
tall rectangular matrix. This further reduces the number of superposition weights required in the encoding, as the
constraints also reduce the number of free variables.
For example, for the concatenation of Bézier curves, we can define C as
 
1 0 ··· 0 0 ···
0 1 · · · 0 0 · · ·
 
 .. .. . . .. .. . . 
. . . . . . 
 
0 0 · · · 1 0 · · · 
C=  , (43)
0 0 · · · 0 1 · · · 
 
0 0 · · · 0 1 · · ·
 
0 0 · · · −1 2 · · ·
 
.. .. . . .. .. . .
. . . . . .
 
1 0
 0 1
where the pattern  
 0 1 is repeated for each junction of two consecutive Bézier curves. For two concatenated cubic
−1 2
Bézier curves, each composed of 4 Bernstein basis functions, we can see locally that this operator yields a constraint
of the form    
w3 1 0  
w 4   0 1  a
 =  (44)
w 5   0 1  b ,
w6 −1 2
which ensures that w4 = w5 and w6 = −w3 + 2w5 . These constraints guarantee that the last control point and the
first control point of the next segment are the same, and that the control point before and after are symmetric with
respect to this junction point, see Fig. 10-left.

6.6 Batch computation of basis functions coefficients


Based on observed data x, the superposition weights ŵ can be estimated as a simple least squares estimate
−1
ŵ = Ψ† x = (Ψ⊤ Ψ) Ψ⊤ x, (45)

14
Figure 11: Iterative estimation of a 2D SDF, where the weights are initialized to form a circular object. Points are then sampled one-by-
one (in red) to probe the value of the function as the sampled location. Each point used to refine the estimate and is then discarded.

or as the regularized version (ridge regression)


−1
ŵ = (Ψ⊤ Ψ + λI) Ψ⊤ x. (46)
For example, if we want to fit a reference path, which can also be sparse or composed of a set of viapoints, while
minimizing velocities (or similarly, any other derivatives, such as computing a minimum jerk trajectories), we can
solve
1 λ
ŵ = arg min ∥Ψw − x∥2 + ∥∇Ψw∥2 (47)
w 2 2
−1
= (Ψ⊤ Ψ + λ∇Ψ⊤ ∇Ψ) Ψ⊤ x. (48)

6.7 Recursive computation of basis functions coefficients

Algorithm 1: Recursive computation of weights


B̃ = λ1 I // Initialize B̃, corresponding to (Ψ⊤ Ψ)−1
w = w0 // Initialize superposition weights (e.g., with prior shape w0 )
for n ← 1 to N do
Ψn = Ψ(tn ) // Evaluate basis functions at location tn , where the corresponding datapoint is xn
 −1
K = B̃Ψ⊤ n I + Ψn B̃Ψn

// Compute Kalman gain K

B̃ ← B̃ − KΨ
 n B̃ // Update
 B̃
w ← w + K xn − Ψn w // Update superposition weights w
end

The same result can be obtained by recursive computation, by providing the datapoints one-by-one or by groups of
points. The algorithm starts from an initial estimate of w that is iteratively refined with the arrival of new datapoints,
see Fig. 11.
This recursive least squares algorithm exploits the Sherman-Morrison-Woodbury formulas that relate the
inverse of a matrix after a small-rank perturbation to the inverse of the original matrix, namely
E
z }| {
−1 −1
(B + U V ) = B −1 − B −1 U I + V B −1 U V B −1 (49)
with U ∈ Rn×m and V ∈ Rm×n . When m ≪ n, the correction term E can be computed more efficiently than inverting
B + UV .
By defining B = Ψ⊤ Ψ, the above relation can be exploited to update
 the least squares solution (46) when new
  x
datapoints become available. Indeed, if Ψnew = Ψ, V and xnew = , we can see that
v

Bnew = Ψnew Ψnew (50)
⊤ ⊤
=Ψ Ψ+V V (51)
= B + V ⊤V , (52)
whose inverse can be computed using (49), yielding
−1
−1
Bnew = B −1 − B −1 V ⊤ I + V B −1 V ⊤ V B −1 . (53)
| {z }
K

15
This is exploited to estimate the update as
 
wnew = w + K v − V w , (54)

with Kalman gain


−1
K = B −1 V ⊤ I + V B −1 V ⊤ . (55)
−1
The above equations can be used recursively, with Bnew and wnew the updates that will become B −1 and w for
the next iteration. Note that the above iterative computation only uses B −1 , which is the variable stored in memory.
Namely, we never use B, only the inverse. For recursive ridge regression, the algorithm starts with B −1 = λ1 I. After
all datapoints are used, the estimate of w is exactly the same as the ridge regression result (46) computed in batch
form. Algorithm 1 summarizes the computation steps.

6.8 Computation of basis functions coefficients using eikonal cost


spline2d_eikonal.*

We can also estimate an SDF by privileging a unit


norm on the derivatives. From a dataset of M distances
xm observed at locations tm , the basis functions coeffi-
cients w of an SDF can be computed by evaluating

X
M
2 X
N
2
min Ψ(tm ) w − xm +λ ∇x⊤
n ∇xn − 1 ,
w
m=1 n=1
(56)
where λ is a weighting factor balancing the distance
matching objective and the eikonal objective (regulariza-
tion). The gradient of xn with respect to t is expressed Figure 12: Estimation of an SDF from only contour points (in
as ∇xn = ∇Ψ(tn ) w, computed using (34) for a set of red), without (left) and with (right) an eikonal cost to privilege a
N locations tn , that can typically be defined to cover unit norm on the derivatives. The M = 93 points in red are used for
homogeneously the area/volume encoded in the SDF. the distance matching objective and the N = 196 points in green are
used for the eikonal objective. The estimated SDFs are rendered
The first component of the objective in (56) is to as level sets, with the 0-level set (i.e., the contour) in thick blue
find a SDF that can fit the data. If only this first part line. The colormap in the background is proportional to the norm
would be required, the minimization boils down to the of the gradient, where a homogeneous gray value would mean that
least squares solution of Section 6.6. The second term the eikonal cost can be satisfied everywhere. Regions with darker
or lighter color then correspond to higher eikonal cost.
adds the objective of keeping the norm of the derivatives
close to one, which is related to the eikonal equation
∥∇x∥ = 1, see [4] for the use of the eikonal equation in
the context of SDF modeling. Intuitively, as the basis functions encodes a SDF, it means that we expect that if we
move away from a shape by a small distance ∆d in the direction given by the gradient, the expected distance at this
new point should also increase by ∆d, which corresponds to a slope of 1 (namely, a unit norm for the gradient), as
expressed in the second part of the objective function.
By following the notation in Section 3.1, the above objective can be written as a sum of squared functions that
can be solved by Gauss–Newton optimization, with residuals and Jacobians given by

r1,m = Ψ(tm ) w − xm , J1,m = Ψ(tm ), ∀m ∈ {1, . . . , M },


r2,n = ∇x⊤
n ∇xn − 1, J2,n = 2 ∇xn⊤ ∇Ψ(tn ), ∀n ∈ {1, . . . , N },

which provide at each iteration step k the Gauss–Newton update rule


!−1 
X
M X
N X
M X
N
wk+1 ← wk − α ⊤
J1,m J1,m + λ ⊤
J2,n J2,n ⊤
J1,m r1,m + λ ⊤
J2,n r2,n ,
m=1 n=1 m=1 n=1

as detailed in Section 3.1, where the learning rate α can be determined by line search.
Note that the above computation can be rewritten with concatenated vectors and matrices for more efficient
computation exploiting linear algebra. By concatenating vertically the Jacobian matrices and residuals, we have the
equivalent Gauss–Newton update rule
!−1 
wk+1 ← wk − α J1⊤ J1 + λJ2⊤ J2 J1⊤ r1 + λJ2⊤ r2 .

Figure 12 presents an example in 2D.

16
Figure 13: Tennis serve problem solved by linear quadratic tracking (LQT).

7 Linear quadratic tracking (LQT)


LQT.* LQT_nullspace.*

Linear quadratic tracking (LQT) is a simple form of optimal control that trades off tracking and control costs
expressed as quadratic terms over a time horizon, with the evolution of the state described in a linear form. The LQT
problem is formulated as the minimization of the cost
−1  
⊤  TX ⊤ 
c = µT −xT QT µT −xT + µt −xt Qt µt −xt + u⊤
t R t ut
t=1
⊤ 
= µ − x Q µ − x + u⊤Ru, (57)
 ⊤ ⊤  ⊤   ⊤
with x = x1 , x2 , . . . , x⊤ the evolution of the state variables, u = u⊤ ⊤ ⊤
1 , u2 , . . . , uT −1 the evolution of the control
 ⊤T ⊤ ⊤
 ⊤
commands, and µ = µ1 , µ2 , . . . , µT the evolution of the tracking targets. Q = blockdiag(Q1 , Q2 , . . . , QT ) represents
the evolution of the precision matrices Qt , and R = blockdiag(R1 , R2 , . . . , RT −1 ) represents the evolution of the control
weight matrices Rt .
The evolution of the system is linear, described by
xt+1 = At xt + Bt ut , (58)
yielding at trajectory level
x = Sx x1 + Su u, (59)
see Appendix A for details.
With open loop control commands organized as a vector u, the solution of (57) subject to x = Sx x1 + Su u is
analytic, given by
−1 ⊤ 
û = Su⊤ QSu + R Su Q µ − Sx x1 . (60)
The residuals of this least squares solution provides information about the uncertainty of this estimate, in the form
of a full covariance matrix (at control trajectory level)
−1
Σ̂u = Su⊤ QSu + R , (61)
which provides a probabilistic interpretation of LQT, see Section 2.
The batch computation approach facilitates the creation of bridges between learning and control. For example, in
learning from demonstration, the observed (co)variations in a task can be formulated as an LQT objective function,
which then provides a trajectory distribution in control space that can be converted to a trajectory distribution in
state space. All the operations are analytic and only exploit basic linear algebra.
The approach can also be extended to model predictive control (MPC), iterative LQR (iLQR) and differential
dynamic programming (DDP), whose solution needs this time to be interpreted locally at each iteration step of the
algorithm, as we will see later in the technical report.

Example: Bimanual tennis serve


LQT_tennisServe.*
LQT can be used to solve a ballistic task mimicking a bimanual tennis serve problem. In this problem, a ball is
thrown by one hand and then hit by the other, with the goal of bringing the ball to a desired target, see Fig. 13. The
problem is formulated as in (57), namely

min (µ − x) Q(µ − x) + u⊤Ru, s.t. x = Sx x1 + Su u, (62)
u

17
where x represents the state trajectory of the 3 agents (left hand, right hand and ball), where only the hands can be
controlled by a series of acceleration commands u that can be estimated by LQT.
In the above problem, Q is a precision matrix and µ is a reference vector describing at specific time steps the
targets that the three agents must reach. The linear systems are described according to the different phases of the
task, see Appendix C for details. As shown above, the constrained objective in (62) can be solved by least squares,
providing an analytical solution given by (60), see also Fig. 13 for the visualization of the result for a given target and
for given initial poses of the hands.

7.1 LQT with initial state optimization


LQT can easily be extended to determine the optimal initial state x1 together with the optimal control commands u,
⊤ ⊤
by defining ũ = [x⊤
1 , u ] as a vector concatenating x1 and u which is used as the variable to optimize. The linear
system evolution x = Sx x1 + Su u, as defined in (59), then takes the form x = S ũ with S = [Sx , Su ]. Similarly, the
optimization problem becomes
 
x̂ ⊤ 
ũ = 1 = arg min µ − S ũ Q µ − S ũ + ũ⊤R̃ũ
ˆ
û ũ
−1 ⊤
= S ⊤ QS + R̃ S Qµ, (63)
 
0 0
by using R̃ = to remove any constraint on the initial state. Thus, the above optimization problem estimates
0 R
both the initial state and the sequence of control commands.

7.2 LQT with smoothness cost


LQT.*

Linear quadratic tracking (LQT) can be used with dynam-


ical systems described as simple integrators. With single inte-
grators, the states correspond to positions, with velocity control
commands. With double integrators, the states correspond to
positions and velocities, with acceleration control commands.
With triple integrators, the states correspond to positions, ve-
locities and accelerations, with jerk control commands, etc.
Figure 14 shows an example of LQT for a task that consist
of passing through a set of viapoints. The left and right graphs
show the result for single and double integrators, respectively.
The graph in the center also considers a single integrator, by
redefining the weight matrix R of the control cost to ensure Figure 14: Examples of linear quadratic tracking (LQT)
smoothness. This can be done by replacing the standard diag- applied to the task of reaching a set of viapoints (red dots)
onal R matrix in (57) with by starting from an initial position (black dot). Top-left:
With velocity commands, with a system described as a
  single integrator and a standard control cost. Top-right:
2 −1 0 0 ··· 0 0 With velocity commands, with a system described as a
−1 2 −1 0 · · · 0 0 single integrator and a control cost ensuring smoothness.
 
 0 −1 2 −1 · · · 0 0 Bottom: With acceleration commands, with a system de-
 
R= . .. .. .. .. .. ..  ⊗ ID . scribed as a double integrator and a standard control cost.
 .. . . . . . . 
 
0 0 0 0 ··· 2 −1
0 0 0 0 · · · −1 2

18
7.3 LQT with control primitives
LQT_CP.*

If we assume that the control commands profile u is composed of control primitives (CP) with u = Ψw, the
objective becomes

min (x − µ) Q(x − µ) + w⊤ Ψ⊤ RΨw, s.t. x = Sx x1 + Su Ψw,
w

with a solution given by


−1 
ŵ = Ψ⊤ Su⊤ QSu Ψ + Ψ⊤ RΨ Ψ⊤ Su⊤ Q µ − Sx x1 ,
which is used to compute the trajectory û = Ψŵ in control space, corresponding to the list of control commands
organized in vector form. Similarly to (61), the residuals of the least squares solution provides a probabilistic approach
to LQT with control primitives. Note also that the trajectory in control space can be converted to a trajectory in state
space with x̂ = Sx x1 + Su û. Thus, since all operations are linear, a covariance matrix on w can be converted to a
covariance on u and on x. A similar linear transformation is used in the context of probabilistic movement primitives
(ProMP) [15] to map a covariance matrix on w to a covariance on x. LQT with control primitives provides a more
general approach by considering both control and state spaces. It also has the advantage of reframing the method to
the more general context of optimal control, thus providing various extension opportunities.
Several forms of basis functions can be considered, including stepwise control commands, radial basis functions
(RBFs), Bernstein polynomials, Fourier series, or learned basis functions, which are organized as a dictionary matrix
Ψ. Dictionaries for multivariate control commands can be generated with a Kronecker product operator ⊗ as

Ψ = ϕ ⊗ C, (64)

where the matrix ϕ is a horizontal concatenation of univariate basis functions, and C is a coordination matrix, which
can be set to an identity matrix ID for the generic case of control variables with independent basis functions for each
dimension d ∈ {1, . . . , D}.
Note also that in some situations, R can be set to zero because the regularization role of R in the original problem
formulation can sometimes be redundant with the use of sparse basis functions.

7.4 LQR with a recursive formulation


LQT_recursive.*

Algorithm 2: Recursive formulation of LQR


Define quadratic cost function with Qt , Rt
Define dynamics with At , Bt , and initial state x1
// Backward pass
Set VT = QT
for t ← T − 1 to 1 do
Compute Vt with (73) and (71)
end
// Forward pass
for t ← 1 to T − 1 do
Compute Kt and ût with (72)
Compute xt+1 with (58)
end

The LQR problem is formulated as the minimization of the cost

X
T −1 X
T −1  
c(x1 , u) = cT (xT ) + ct (xt , ut ) = x⊤
T QT xT + x⊤ ⊤
t Qt xt + ut Rt ut , s.t. xt+1 = At xt + Bt ut , (65)
t=1 t=1

where c without index refers to the total cost (cumulative cost). We also define the partial cumulative cost

X
T −1
vt (xt , ut:T −1 ) = cT (xT ) + cs (xt , ut ), (66)
s=t

which depends on the states and control commands, except for vT that only depends on the state.

19
We define the value function as the value taken by vt when applying optimal control commands, namely

v̂t (xt ) = min vt (xt , ut:T −1 ). (67)


ut:T −1

We will use the dynamic programming principle to reduce the minimization in (67) over the entire sequence of
control commands ut:T −1 to a sequence of minimization problems over control commands at a single time step, by
proceeding backwards in time.
By inserting (66) in (67), we observe that

 X
T −1 
v̂t (xt ) = min cT (xT ) + cs (xt , ut )
ut:T −1
s=t
 X
T −1 
= min ct (xt , ut )+ min cT (xT )+ cs (xt , ut )
ut ut+1:T −1
s=t+1

= min ct (xt , ut ) + v̂t+1 (xt+1 ), (68)


ut | {z }
qt (xt ,ut )

where qt is called the q-function.


By starting from the last time step T , and by relabeling QT as VT , we can see that v̂T (xT ) = cT (xT ) is independent
of the control commands, taking the quadratic error form

v̂T = x⊤
T VT xT , (69)

which only involves the final state xT . We will show that v̂T −1 has the same quadratic form as v̂T , enabling the
backward recursive computation of v̂t from t = T − 1 to t = 1.
With (68), the dynamic programming recursion takes the form
 ⊤   
xt Qt 0 xt
v̂t = min + x⊤
t+1 Vt+1 xt+1 . (70)
ut ut 0 R t ut

By substituting xt+1 = At xt + Bt ut into (70), v̂t can be rewritten as



 ⊤      ⊤
 Qxx,t = At Vt+1 At + Qt ,
xt Qxx,t Q⊤ xt
v̂t = min ux,t
, where Quu,t = Bt⊤ Vt+1 Bt + Rt , (71)
ut ut Qux,t Quu,t ut 

Qux,t = Bt⊤ Vt+1 At .

An optimal control command ût can be computed by differentiating (71) with respect to ut and equating to zero,
providing a feedback law
ût = −Kt xt , with Kt = Q−1 uu,t Qux,t . (72)
By introducing (72) back into (71), the resulting value function v̂t has the quadratic form
−1
v̂t = x⊤
t V t xt , with Vt = Qxx,t − Q⊤
ux,t Quu,t Qux,t . (73)

We observe that (73) has the same quadratic form as (69), so that (68) can be solved recursively. We thus obtain
a backward recursion procedure in which Vt is evaluated recursively from t = T − 1 to t = 1, starting from VT = QT ,
which corresponds to a Riccati equation.
After all feedback gain matrices Kt have been computed by backward recursion, a forward recursion can be used
to compute the evolution of the state, starting from x1 , by using the linear system xt+1 = At xt + Bt ut and the
control policy ut = −Kt xt , see Algorithm 2 for the summary of the overall procedure.

7.5 LQT with a recursive formulation and an augmented state space


LQT_recursive.*

In order to extend the above development to linear quadratic tracking (LQT), the problem of tracking a reference
signal {µt }Tt=1 can be recast as a regulation problem by considering a dynamical system with an augmented state
      
xt+1 A 0 xt B
= + ut , (74)
1 0 1 1 0
| {z } | {z } | {z } |{z}
x̃t+1 Ã x̃t B̃

20
and an augmented tracking weight
 −1 −1    
Qt +µt µ⊤
t µt I 0 Qt 0 I −µt
Q̃t = = ,
µ⊤
t 1 −µ⊤t 1 0 1 0 1
which is used to define the cost
−1  
⊤  TX ⊤ 
c = µ T − xT QT µT − xT + µ t − x t Q t µ t − x t + u⊤
t R u
t t
t=1
X
T −1  
= x̃⊤
T Q̃T x̃T + x̃⊤
t Q̃ t x̃ t + u ⊤
t R t ut , (75)
t=1

where the augmented form in (75) has the same form as the standard LQR cost in (65), allowing the tracking problem
to be solved in the same way by using this augmented state representation. Additional verification details can be
found in Appendix D.
For a tracking problem, we can see that the resulting optimal control policy takes the form

ût = −K̃t x̃t = Kt (µt − xt ) + ufft , (76)


 
characterized by a feedback gain matrix Kt extracted from K̃t = Kt , kt , and a feedforward term ufft = −kt − Kt µt
depending on µt .

7.6 Least squares formulation of recursive LQR


LQT_recursive_LS.*

We have seen in Section 7.4 that a standard LQR problem is formulated as

min x⊤ Qx + u⊤Ru, s.t. x = Sx x1 + Su u,


u

whose solution is −1


û = − Su⊤ QSu + R Su⊤ QSx x1 , (77)
corresponding to open loop control commands.
We can see in (77) that the open loop control commands û (in a vector form) is linear with respect to x1 . Thus,
by introducing a matrix F to describe u = −F x1 , we can alternatively define the optimization problem as

min x⊤ Qx + (−F x1 )⊤ R (−F x1 ), s.t. x = (Sx − Su F )x1 ,


F

whose least squares solution is


−1
F = Su⊤ QSu + R Su⊤ QSx . (78)
By decomposing F as block matrices Ft with t ∈ {1, . . . , T − 1}, F can be used to iteratively reconstruct regulation
gains Kt , by starting from K1 = F1 , P1 = I, and by computing recursively
−1
Pt = Pt−1 (At−1 − Bt−1 Kt−1 ) , Kt = F t Pt , (79)

which can then be used in a feedback controller as in (72).


It is straightforward to extend this least squares formulation of recursive LQR to linear quadratic tracking and use
(76) as feedback controller on an augmented state space, since the recursive LQT problem can be transformed to a
recursive LQR problem with an augmented state space representation (see Section 7.5).
This least squares formulation of LQT (LQT-LS) yields the same controller as with the standard recursive com-
putation presented in Section 7.5. However, the linear form in (78) used by LQT-LS has several advantages. First,
it allows the use of full precision matrices Q, which will be demonstrated in the example below. It also allows to
extend LQT-LS to the use of control primitives, which will be further discussed in Section 7.7. Moreover, it provides
a nullspace structure that can be exploited in recursive LQR/LQT problems.

Example with the control of multiple agents


LQT_recursive_LS_multiAgents.*

Figure 15 presents an example with the control of multiple agents, with a precision matrix involving nonzero
offdiagonal elements. The corresponding example code presents two options: it either requests the two agents to meet
in the middle of the motion (e.g., for a handover task), or it requests the two agents to find a location to reach at
different time steps (e.g., to drop and pick-up an object), involving nonzero offdiagonal elements at different time
steps.

21
Figure 15: Least squares formulation of recursive
LQR to control multiple agents (as point mass sys-
tems), where the task of each agent is to reach a
desired target at the end of the motion, and to meet
the other agent in the middle of the motion (e.g., for
a handover task, see main text for the alternative op-
tion of nonzero offdiagonal elements at different time
steps). We then test the adaptation capability of the
agents by simulating a perturbation at 1/4 of the
movement. The original and adapted movements are
depicted in dashed and solid lines, respectively. The
initial positions are represented with black points,
the targets are in red, the optimized meeting points
are in blue, and the perturbation is in green.

Figure 16: Linear quadratic tracking (LQT) with control primitives applied to a trajectory tracking task, with a formulation similar to
dynamical movement primitives (DMP). Left: The observed “S” shape (in blue) is reproduced by starting from a different initial position
(black point), with a perturbation simulated at 1/4 of the movement (green points) to show the capability of the approach to recover
from perturbations. The trajectory in dashed line shows the result without perturbation and trajectory in solid line shows the result with
perturbation. Right: The corresponding LQT problem formulation consists of requesting an end-point to be reached (red points) and an
acceleration profile to be tracked (red lines), where the control commands u are represented as a superposition of radial basis functions
ϕk .

7.7 Dynamical movement primitives (DMP) reformulated as LQT with control primitives
LQT_CP_DMP.*

The dynamical movement primitives (DMP) [5] approach proposes to reproduce an observed movement by crafting
a controller composed of two parts: a closed-loop spring-damper system reaching the final point of the observed
movement, and an open-loop system reproducing the acceleration profile of the observed movement. These two
controllers are weighted so that the spring-damper system part is progressively increased until it becomes the only
active controller. In DMP, the acceleration profile (also called forcing terms) is encoded with radial basis functions
[1], and the spring-damper system parameters are defined heuristically, usually as a critically damped system.
Linear quadratic tracking (LQT) with control primitives can be used in a similar fashion as in DMP, by requesting
a target to be reached at the end of the movement and by requesting the observed acceleration profile to be tracked,
while encoding the control commands as radial basis functions. The controller can be estimated either as the open-loop
control commands (60), or as the closed-loop controller (76).
In the latter case, the matrix F in (78) is estimated by using control primitives and an augmented state space
formulation, namely
−1 ⊤ ⊤
Ŵ = Ψ⊤ Su⊤ Q̃Su Ψ + Ψ⊤ RΨ Ψ Su Q̃Sx , F = ΨŴ ,
which is used to compute feedback gains K̃t on the augmented state with (79).
The resulting controller ût = −K̃t x̃t tracks the acceleration profile while smoothly reaching the desired goal at
the end of the movement, with a smooth transition between the two. The main difference with DMP is that the
smooth transition between the two behaviors is directly optimized by the system, and the parameters of the feedback
controller are automatically optimized (in DMP, stiffness and damping ratio).

22
Figure 16 presents an example of reproducing an “S” trajectory with simulation perturbations.
In addition, the LQT formulation allows the resulting controller to be formalized in the form of a cost function,
which allows the approach to be combined more fluently with other optimal control strategies. Notably, the approach
can be extended to multiple viapoints without any modification. It allows multiple demonstrations of a movement
to be used to estimate a feedback controller that will exploit the (co)variations in the demonstrations to provide a
minimal intervention control strategy that will selectively reject perturbations based on the impact they can have on
the task to achieve. This is effectively attained by automatically regulating the gains in accordance to the variations
in the demonstrations, with low gains in parts of the movement allowing variations, and higher gains for parts of the
movement that are invariant in the demonstrations. It is also important to highlight that the solution of the LQT
problem formulated as in the above is analytical, corresponding to a simple least squares problem.
Moreover, the above problem formulation can be extended to iterative LQR, providing an opportunity to consider
obstacle avoidance and constraints within the DMP formulation, as well as to describe costs in task space with a DMP
acting in joint angle space.

23
8 iLQR optimization
PT
Optimal control problems are defined by a cost function t=1 c(xt , ut ) to minimize and a dynamical system xt+1 =
d(xt , ut ) describing the evolution of a state xt driven by control commands ut during a time window of length T .
Iterative LQR (iLQR) [8] solves such constrained nonlinear models by carrying out Taylor expansions on the cost
and on the dynamical system so that a solution can be found iteratively by solving a LQR problem at each iteration,
similarly to differential dynamic programming approaches [11, 17].
iLQR employs a first order Taylor expansion of the dynamical system xt+1 = d(xt , ut ) around the point (x̂t , ût ),
namely
∂d ∂d
xt+1 ≈ d(x̂t , ût ) + (xt − x̂t ) + (ut − ût )
∂xt ∂ut
⇐⇒ ∆xt+1 ≈ At ∆xt + Bt ∆ut , (80)
with residual vectors
∆xt = xt − x̂t , ∆ut = ut − ût ,
and Jacobian matrices
∂d ∂d
At = , Bt = .
∂xt x̂t ,ût ∂ut x̂t ,ût
The cost function c(xt , ut ) for time step t can similarly be approximated by a second order Taylor expansion
around the point (x̂t , ût ), namely
∂c ∂c 1 ∂2c ∂2c 1 ∂2c
c(xt , ut ) ≈ c(x̂t , ût ) + ∆x⊤t + ∆u⊤ t + ∆x⊤ t 2 ∆xt + ∆x⊤t ∆ut + ∆u⊤
t ∆ut ,
∂xt ∂ut 2 ∂xt ∂xt ut 2 ∂u2t
 ⊤   
1 0 ⊤
gx,t ⊤
gu,t 1 (81)
1
⇐⇒ c(xt , ut ) ≈ c(x̂t , ût ) + ∆xt  gx,t Hxx,t Hux,t ⊤  ∆xt  ,
2
∆ut gu,t Hux,t Huu,t ∆ut
with gradients
∂c ∂c
gx,t = , gu,t = ,
∂xt x̂t ,ût ∂ut x̂t ,ût
and Hessian matrices
∂2c ∂2c ∂2c
Hxx,t = , Huu,t = , Hux,t = .
∂x2t x̂t ,ût ∂u2t x̂t ,ût ∂ut ∂xt x̂t ,ût

8.1 Batch formulation of iLQR


iLQR_manipulator.*
 ⊤
A solution in batch form can be computed by minimizing over u = u⊤ ⊤ ⊤
1 , u2 , . . . , uT −1 , yielding a series of open
loop control commands ut , corresponding
 to a Gauss–Newton  iteration scheme, see Section 3.1.
 ⊤ ⊤ ⊤
⊤ ⊤
At a trajectory level, we denote x = x⊤
1 , x ⊤
2 , . . . , x T the evolution of the state and u = u1 , u2 , . . . , u⊤
T −1 the
evolution of the control commands. The evolution of the state in (80) becomes ∆x = Su ∆u, see Appendix A for
details.1
The minimization problem can then be rewritten in batch form as
 ⊤   
1 0 gx⊤ gu⊤
1
1
min ∆c(∆x, ∆u), s.t. ∆x = Su ∆u, where ∆c(∆x, ∆u) = ∆x gx Hxx Hux ⊤ 
∆x . (82)
∆u 2
∆u gu Hux Huu ∆u
By inserting the constraint into the cost, we obtain the optimization problem
1 1
∆u⊤ Su⊤ gx + ∆u⊤ gu + ∆u⊤ Su⊤ Hxx Su ∆u + ∆u⊤ Su⊤ Hux
min ⊤
∆u + ∆u⊤ Huu ∆u, (83)
∆u 2 2
which can be solved analytically by differentiating with respect to ∆u and equating to zero, namely,
Su⊤ gx + gu + Su⊤ Hxx Su ∆u + 2Su⊤ Hux

∆u + Huu ∆u = 0, (84)
providing the least squares solution
−1
∆û = (Su⊤ Hxx Su + 2Su⊤ Hux

+ Huu ) (−Su⊤ gx − gu ) , (85)
which can be used to update the control commands estimate at each iteration step of the iLQR algorithm.
The complete iLQR procedure is described in Algorithm 4 (including backtracking line search). Figure 17 also
presents an illustrative summary of the iLQR optimization procedure in batch form.
1 Note that Sx ∆x1 = 0 because ∆x1 = 0 (as we want our motion to start from x1 ).

24
Figure 17: Summary of the iLQR optimization procedure in batch form.

8.2 Recursive formulation of iLQR


iLQR_manipulator_recursive.*

A solution can alternatively be computed in a recursive form to provide a controller with feedback gains. Section 7.4
presented the dynamic programming principle in the context of linear quadratic regulation problems, which allowed us
to reduce the minimization over an entire sequence of control commands to a sequence of minimization problems over
control commands at a single time step, by proceeding backwards in time. In this section, the approach is extended
to iLQR.
Similarly to (68), we have here

v̂t (∆xt ) = min ct (∆xt , ∆ut ) + v̂t+1 (∆xt+1 ), (86)


∆ut | {z }
qt (∆xt ,∆ut )

Similarly to LQR, by starting from the last time step T , v̂T (∆xT ) is independent of the control commands. By
relabeling gx,T and Hxx,T as vx,T and Vxx,T , we can show that v̂T −1 has the same quadratic form as v̂T , enabling
the backward recursive computation of v̂t from t = T − 1 to t = 1.
This dynamic programming recursion takes the form
 ⊤  ⊤
 
1 0 vx,t+1 1
v̂t+1 = , (87)
∆xt+1 vx,t+1 Vxx,t+1 ∆xt+1

and we then have with (86) that


 ⊤   
1 0 ⊤
gx,t ⊤
gu,t 1  ⊤  ⊤
 
1 0 vx,t+1 1
v̂t = min ∆xt  gx,t Hxx,t ⊤
Hux,t   
∆xt + . (88)
∆ut ∆xt+1 vx,t+1 Vxx,t+1 ∆xt+1
∆ut gu,t Hux,t Huu,t ∆ut

By substituting ∆xt+1 = At ∆xt + Bt ∆ut into (88), v̂t can be rewritten as




 qx,t = gx,t + A⊤
t vx,t+1 ,
 ⊤   


 ⊤
1 0 ⊤
qx,t ⊤
qu,t 1  qu,t = gu,t + Bt vx,t+1 ,
v̂t = min ∆xt  qx,t Qxx,t Q⊤ ∆xt  , where Qxx,t ≈ Hxx,t + A⊤ t Vxx,t+1 At , (89)
∆ut
ux,t


∆ut qu,t Qux,t Quu,t ∆ut Quu,t ≈ Huu,t + Bt⊤ Vxx,t+1 Bt ,




Qux,t ≈ Hux,t + Bt⊤ Vxx,t+1 At ,

with gradients
∂c ∂c ∂v̂ ∂q ∂q
gx,t = , gu,t = , vx,t = , qx,t = , qu,t = ,
∂xt x̂t ,ût ∂ut x̂t ,ût ∂xt x̂t ,ût ∂xt x̂t ,ût ∂ut x̂t ,ût

Jacobian matrices
∂d ∂d
At = , Bt = ,
∂xt x̂t ,ût ∂ut x̂t ,ût

and Hessian matrices


∂2c ∂2c ∂2c ∂ 2 v̂
Hxx,t = , Huu,t = , Hux,t = , Vxx,t = ,
∂x2t x̂t ,ût ∂u2t x̂t ,ût ∂ut ∂xt x̂t ,ût ∂x2t x̂t ,ût

25
∂2q ∂2q ∂2q
Qxx,t = , Quu,t = , Qux,t = .
∂x2t x̂t ,ût ∂u2t x̂t ,ût ∂ut ∂xt x̂t ,ût

Minimizing (89) w.r.t. ∆ut can be achieved by differentiating the equation and equating to zero, yielding the
controller (
kt = −Q−1uu,t qu,t ,
∆ût = kt + Kt ∆xt , with (90)
Kt = −Q−1 uu,t Qux,t ,

where kt is a feedforward command and Kt is a feedback gain matrix.


By inserting (90) into (89), we get the recursive updates
−1
vx,t = qx,t − Q⊤
ux,t Quu,t qu,t ,
−1
(91)
Vxx,t = Qxx,t − Q⊤
ux,t Quu,t Qux,t .

In this recursive iLQR formulation, at each iteration step of the optimization algorithm, the nominal trajectories
(x̂t , ût ) are refined, together with feedback matrices Kt . Thus, at each iteration step of the optimization algorithm,
a backward and forward recursion is performed to evaluate these vectors and matrices. There is thus two types of
iterations: one for the optimization algorithm, and one for the dynamic programming recursion performed at each
given iteration.
After convergence, by using (90) and the nominal trajectories in the state and control spaces (x̂t , ût ), the resulting
controller at each time step t is
ut = ût + Kt (x̂t − xt ), (92)
where the evolution of the state is described by xt+1 = d(xt , ut ).
The complete iLQR procedure is described in Algorithm 5 (including backtracking line search).

8.3 Least squares formulation of recursive iLQR


First, note that (90) can be rewritten as
  h i
∆xt
∆ût = K̃t , with K̃t = −Q−1
uu,t Qux,t , qu,t . (93)
1

Also, (83) can be rewritten as


 ⊤    (
1 ∆u C c ∆u c = Su⊤ gx + gu ,
min ⊤ , where
∆u 2 1 c 0 1 C = Su⊤ Hxx Su + 2Su⊤ Hux

+ Huu .
| {z }
1 ⊤ ⊤
2 ∆u C∆u+∆u c

Similarly to Section 7.6, we set


   
∆x1 0
∆u = −F = −F = −F ∆x̃1 , (94)
1 1

and redefine the optimization problem as


1
1 F CF ∆x̃1 − ∆x̃1 F c.
∆x̃⊤ ⊤ ⊤ ⊤
min (95)
F 2
By differentiating w.r.t. F and equating to zero, we get

F ∆x̃1 = C −1 c. (96)

Similarly to Section 7.6, we decompose F as block matrices Ft with t ∈ {1, . . . , T − 1}. F can then be used
to iteratively reconstruct regulation gains Kt , by starting from K1 = F1 , P1 = I, and by computing with forward
recursion
−1
Pt = Pt−1 (At−1 − Bt−1 Kt−1 ) , Kt = Ft Pt , (97)
from t = 2 to t = T − 1.

26
Algorithm 3: Backtracking line search method with parameter αmin
α←1
while c(û + α ∆û) > c(û) and α > αmin do
α ← α2
end

Algorithm 4: Batch formulation of iLQR


Define cost c(·), dynamics d(·), x1 , αmin , ∆min
Initialize û
repeat
Compute x̂ using û, d(·), and x1
Compute At , Bt in (80)
Compute Su with (187)
Compute g· and H· in (82), using (81)
Compute ∆û with (85)
Compute α with Algorithm 3
Update û with (98)
until ∥∆û∥ < ∆min ;

Algorithm 5: Recursive formulation of iLQR


Define cost c(·), dynamics d(·), x1 , αmin , ∆min
Initialize all ût
repeat
Compute x̂ using û, d(·), and x1
// Evaluating derivatives
Compute all At , Bt in (80)
Compute all g·,t and H·,t in (81)
// Backward pass
Set vx,T = gx,T and Vxx,T = Hxx,T
for t ← T − 1 to 1 do
Compute q·,t and Q·,t with (89)
Compute kt and Kt with (90)
Compute vx,t and Vxx,t with (91)
end
// Forward pass
α←2
while c(û + α ∆û) > c(û) and α > αmin do
α ← α2
for t ← 1 to T − 1 do
Update ût with (99)
Compute x̂t+1 with d(·)
end
end
until ∥∆û∥ < ∆min ;
Use (92) and d(·) for reproduction

27
8.4 Updates by considering step sizes
iLQR_manipulator.*

To be more efficient, iLQR most often requires at each iteration to estimate a step size α to scale the control
command updates. For the batch formulation in Section 8.1, this can be achieved by setting the update (85) as

û ← û + α ∆û, (98)

where the resulting procedure consists of estimating a descent direction with (85) along which the objective function
will be reduced, and then estimating with (98) a step size that determines how far one can move along this direction.
For the recursive formulation in Section 8.2, this can be achieved by setting the update (90) as

ût ← ût + α kt + Kt ∆xt . (99)

In practice, a simple backtracking line search procedure can be considered with Algorithm 3, by considering a small
value for αmin . For more elaborated methods, see Ch. 3 of [13].
The complete iLQR procedures are described in Algorithms 4 and 5 for the batch and recursive formulations,
respectively.

8.5 iLQR with quadratic cost on f (xt )


iLQR_manipulator.*

We consider a cost defined by


c(xt , ut ) = f (xt )⊤ Qt f (xt ) + u⊤
t R t ut , (100)
where Qt and Rt are weight matrices trading off task and control costs. Such cost is quadratic on f (xt ) but non-
quadratic on xt .
For the batch formulation of iLQR, the cost in (81) then becomes

∆c(∆xt , ∆ut ) ≈ 2∆x⊤ ⊤ ⊤ ⊤ ⊤ ⊤


t J (xt ) Qt f (xt ) + 2∆ut Rt ut + ∆xt J (xt ) Qt J (xt )∆xt + ∆ut Rt ∆ut , (101)

which used gradients


gx,t = 2J (xt )⊤ Qt f (xt ), gu,t = 2Rt ut , (102)
and Hessian matrices
Hxx,t ≈ 2J (xt )⊤ Qt J (xt ), Hux,t = 0, Huu,t = 2Rt . (103)
∂f (xt )
with J (xt ) = ∂xt a Jacobian matrix. The same results can be used in the recursive formulation in (89).
At a trajectory level, the cost can be written as

c(x, u) = f (x)⊤ Qf (x) + u⊤


t R u, (104)

where the tracking and control weights are represented by the diagonally concatenated matrices Q = blockdiag(Q1 , Q2 , . . . , QT )
and R = blockdiag(R1 , R2 , . . . , RT −1 ), respectively. In the above, with a slight abuse of notation, we defined f (x) as a
vector concatenating the vectors f (xt ). Similarly, J (x) will represent a block-diagonal concatenation of the Jacobian
matrices J (xt ).
With this notation, the minimization problem (83) then becomes

min 2∆x⊤ J (x)⊤ Qf (x) + 2∆u⊤ R u + ∆x⊤ J (x)⊤ QJ (x)∆x + ∆u⊤ R∆u, s.t. ∆x = Su ∆u, (105)
∆u

whose least squares solution is given by


 −1  
∆û = Su⊤ J (x)⊤ QJ (x)Su +R − Su⊤ J (x)⊤ Qf (x) − R u , (106)

which can be used to update the control commands estimates at each iteration step of the iLQR algorithm.
In the next sections, we show examples of functions f (x) that can rely on this formulation.

28
Figure 18: Typical transformations involved in a manipulation task involving a robot, a vision system, a visual marker on the object,
and a desired grasping location on the object.

8.5.1 Robot manipulator


iLQR_manipulator.*

We define a manipulation task involving a set of transformations as in Fig. 18. By relying on these transformation
operators, we will next describe all variables in the robot frame of reference (defined by 0, e1 and e2 in the figure).
For a manipulator controlled by joint angle velocity commands u = ẋ, the evolution of the system is described by
∂g ∂g
xt+1 = xt + ut ∆t, with the Taylor expansion (80) simplifying to At = ∂x t
= I and Bt = ∂u t
= I∆t. Similarly, a
double integrator can alternatively be considered, with acceleration commands u = ẍ and states composed of both
positions and velocities.
For a robot manipulator, f (xt ) in (106) typically represents the error between a reference µt and the endeffector
position computed by the forward kinematics function f ee (xt ). We then have

f (xt ) = f ee (xt ) − µt ,
J (xt ) = J ee (xt ).

 vector f (xt ) − µt is replaced by a


For the orientation part of the data (if considered), the Euclidean distance ee

geodesic distance measure computed with the logarithmic map logµt f (xt ) , see [3] for details.
ee

The approach can similarly be extended to target objects/landmarks with positions µt and rotation matrices Ut ,
whose columns are basis vectors forming a coordinate system, see Fig. 20. We can then define an error between the
robot endeffector and an object/landmark expressed in the object/landmark coordinate system as

f (xt ) = Ut⊤ f ee (xt ) − µt ,
(107)
J (xt ) = Ut⊤ J ee (xt ).

8.5.2 Bounded joint space


The iLQR solution in (106) can be used to keep the state within a
boundary (e.g., joint angle limits). We denote f (x) = f cut (x) as the
vertical concatenation of f cut (xt ) and J (x) = J cut (x) as a diagonal
concatenation of diagonal Jacobian matrices J cut (xt ). Each element i
of f cut (xt ) and J cut (xt ) is defined as


xt,i − xi , if xt,i < xi
min min

fi (xt,i ) = xt,i − xi , if xt,i > xmax


cut max
, Figure 19: ReLU-like functions used in optimiza-


i
0, otherwise tion costs. The derivatives of these functions are
 simple, providing Jacobians whose entries are ei-

1, if xt,i < xmin
i
ther 0 or ±1.
cut
Ji,i (xt,i ) = 1, if xt,i > xmax ,


i
0, otherwise

where ficut (xt,i ) describes continuous ReLU-like functions for each dimension. ficut (xt,i ) is 0 inside the bounded domain
and takes the signed distance value outside the boundary, see Fig. 19-left.
We can see with (102) that for Q = 12 I, if x is outside the domain during some time steps t, gx = ∂x ∂c
= 2J ⊤ Qf
generates a vector bringing it back to the boundary of the domain.

29
Figure 20: Example of a viapoints task in which
a planar robot with 3 joints needs to sequentially
reach 2 objects, with object boundaries defining the
allowed reaching points on the objects surfaces. Left:
Reaching task with two viapoints at t = 25 and t =
50. Right: Corresponding values of the cost function
for the endeffector space at t = 25 and t = 50.

8.5.3 Bounded task space


The iLQR solution in (106) can be used to keep the endeffector within a boundary (e.g., endeffector position limits).
Based on the above definitions, f (x) and J (x) are in this case defined as
 
f (xt ) = f cut e(xt ) ,
 
J (xt ) = J cut e(xt ) J ee (xt ),
with e(xt ) = f ee (xt ).

8.5.4 Reaching task with robot manipulator and prismatic object boundaries
iLQR_manipulator.*

The definition of f (xt ) and J (xt ) in (107) can also be extended to objects/landmarks with boundaries by defining
 
f (xt ) = f cut e(xt ) ,
 
J (xt ) = J cut e(xt ) Ut⊤ J ee (xt ),

with e(xt ) = Ut⊤ f ee (xt ) − µt ,
see also Fig. 20.

8.5.5 Initial state optimization


iLQR_manipulator_initStateOptim.*

Similarly as in Section 7.1, iLQR can easily be ex-


tended to determine the optimal initial state x1 to-
gether with the optimal control commands u, by defining
⊤ ⊤
ũ = [x⊤1 , u ] as a vector concatenating x1 and u.
If the system evolution is linear, x = S ũ with S =
 be used, as in Section 7.1.
[Sx , Su ] can additionally
0 0
By defining R̃ = to remove any constraint
0 R
on the initial state, the optimization problem to estimate
both the initial state x1 and the sequence of control com-
mands u then becomes
min f (x)⊤ Qf (x) + ũ⊤R̃ũ, (108)

where f (x) denotes the concatenation of f (xt ) for all


Figure 21: Optimal robot base location for a viapoint task, by
time steps. defining a kinematic chain with both revolute and prismatic joints,
If only some parts of x1 need to be estimated, the which is employed to encode the location of the robot base in the
corresponding diagonal entries in R̃ can be set with ar- plane as two prismatic joint variables. In this example, the pris-
matic joints in green represent the initial states that we estimate,
bitrary high values, by reformulating the minimization and the angular joints in blue represent the state variables that can
problem as be moved during the task.

min f (x)⊤ Qf (x) + (ũ − µ) R̃(ũ − µ), (109)

30
Figure 23: Reaching tasks
with a bimanual robot
(frontal view). Left: with a
target for each hand. Right:
with a target for the hand
on the left, while keeping the
center of mass at the same
location during the whole
movement.

where µ is a vector containing the initial elements in x1 that do not need to be estimated, where the other elements
are zero.
The above approach can for example be used to estimate the optimal placement of a robot manipulator. For a
planar robot, this can for example be implemented by defining the kinematic chain starting with two prismatic joints
along the two axes directions, providing a way to represent the location of a free floating platform as two additional
prismatic joints in the kinematic chain. If we would like the base of the robot to remain static, we can request that
these first two prismatic joints will stay still during the motion. By using the above formulation, this can be done by
setting arbitrary large control weights for these corresponding joints, see Figure 21 and the corresponding source code
example.

8.5.6 Center of mass


iLQR_manipulator_CoM.*

If we assume an equal mass for each link concentrated at the joint (i.e.,
assuming that the motors and gripper are heavier than the link structures), the
forward kinematics function to determine the center of a mass of an articulated
chain can simply be computed with
 
CoM
1 ℓ⊤ L cos(Lx)
f = ,
D ℓ⊤ L sin(Lx)

which corresponds to the row average of f˜ in (18) for the first two columns
(position). The corresponding Jacobian matrix is
 
1 − sin(Lx)⊤ L diag(ℓ⊤ L)
J CoM = .
D cos(Lx)⊤ L diag(ℓ⊤ L)
The forward kinematics function f CoM can be used in tracking tasks similar
to the ones considering the endeffector, as in Section 8.5.4. Fig. 22 shows
an example in which the starting and ending poses are depicted in different
grayscale levels. The corresponding center of mass is depicted by a darker
semi-filled disc. The area allowed for the center of mass is depicted as a
transparent red rectangle. The task consists of reaching the green object with Figure 22: Reaching task with a hu-
the endeffector at the end of the movement, while always keeping the center manoid (side view) by keeping the center
of mass in an area defined by the feet lo-
of mass within the desired bounding box during the movement. cation.

8.5.7 Bimanual robot


iLQR_bimanual.*
We consider a 5-link planar bimanual robot with a torso link shared by the two arms, see Fig. 23. We define the
forward kinematics function as  ⊤ 
ℓ[1,2,3] cos(Lx[1,2,3] )
 ℓ⊤
[1,2,3] sin(Lx[1,2,3] ) 

f =ℓ⊤ ,
[1,4,5] cos(Lx [1,4,5] )

ℓ[1,4,5] sin(Lx[1,4,5] )
where the first two and last two dimensions of f correspond to the position of the left and right endeffectors, respec-
tively. The corresponding Jacobian matrix J ∈ R4×5 has entries
   
− sin(Lx[1,2,3] )⊤ diag(ℓ[1,2,3] )L − sin(Lx[1,4,5] )⊤ diag(ℓ[1,4,5] )L
J[1,2],[1,2,3] = , J[3,4],[1,4,5] = ,
cos(Lx[1,2,3] )⊤ diag(ℓ[1,2,3] )L cos(Lx[1,4,5] )⊤ diag(ℓ[1,4,5] )L

31
where the remaining entries are zeros.
If we assume a unit mass for each arm link concentrated at the joint and a mass of two units at the tip of the first
link (i.e., assuming that the motors and gripper are heavier than the link structures, and that two motors are located
at the tip of the first link to control the left and right arms), the forward kinematics function to determine the center
of a mass of the bimanual articulated chain in Fig. 23 can be computed with
 
1 ℓ⊤ ⊤
[1,2,3] L cos(Lx[1,2,3] ) + ℓ[1,4,5] L cos(Lx[1,4,5] )
f CoM = ,
6 ℓ⊤ ⊤
[1,2,3] L sin(Lx[1,2,3] ) + ℓ[1,4,5] L sin(Lx[1,4,5] )

with the corresponding Jacobian matrix J CoM ∈ R2×5 computed as


h i
CoM-L CoM-R CoM-L CoM-R
J CoM = J[1,2],1 + J[1,2],1 , J[1,2],[2,3] , J[1,2],[2,3] ,
   
1 − sin(Lx[1,2,3] )⊤ L diag(ℓ⊤[1,2,3] L)
1 − sin(Lx[1,4,5] )⊤ L diag(ℓ⊤[1,4,5] L)
with J CoM-L = , J CoM-R = .
6 cos(Lx[1,2,3] )⊤ L diag(ℓ⊤[1,2,3] L) 6 cos(Lx[1,4,5] )⊤ L diag(ℓ⊤[1,4,5] L)

8.5.8 Obstacle avoidance with ellipsoid shapes


iLQR_obstacle.*
By taking as example a point-mass system, iLQR can be used to solve
an ellipsoidal obstacle avoidance problem with the cost (typically used with
other costs, see Fig. 24 for an example). We first define a ReLU-like function
and its gradient as (see also Fig. 19-right)
( (
0, if d > 1 0, if d > 1
f avoid (d) = , g avoid (d) = ,
1 − d, otherwise −1, otherwise

that we exploit to define f (xt ) and J (xt ) in (106) as


   
J (xt ) = 2 g avoid e(xt ) (xt − µ) Σ−1 ,

f (xt ) = f avoid e(xt ) ,
e(xt ) = (xt − µ) Σ−1 (xt − µ).

with
In the above, f (x) defines a continuous function that is 0 outside the
obstacle boundaries and 1 at the center of the obstacle. The ellipsoid is Figure 24: Reaching task with obstacle
defined by a center µ and a shape Σ = V V ⊤ , where V is a horizontal avoidance, by starting from x1 and reach-
concatenation of vectors Vi describing the principal axes of the ellipsoid, ing µT while avoiding the two obstacles in
see Fig. 24. red.
A similar principle can be applied to robot manipulators by composing
forward kinematics and obstacle avoidance functions.

8.5.9 Distance to a target


iLQR_distMaintenance.*
The obstacle example above can easily be extended to the prob-
lem of reaching/maintaining a desired distance to a target, which can
also typically used with other objectives. We first define a function
and a gradient
f dist (d) = 1 − d, g dist (d) = −1,
that we exploit to define f (xt ) and J (xt ) in (106) as
  2
f (xt ) = f dist e(xt ) ,

J (xt ) = − 2 (xt − µ) , Figure 25: Left: Optimal control problem to gen-
r erate a motion by considering a point mass starting
1 ⊤
from an initial point (in black), with a cost asking
with e(xt ) = 2 (xt − µ) (xt − µ). to reach or maintain a desired distance to a target
r point (in red). Right: Similar problem for a contour
In the above, f (x) defines a continuous function that is 0 on a to reach/maintain defined as a covariance matrix.
sphere of radius r centered on the target (defined by a center µ),
and increasing/decreasing when moving away from this surface in
one direction or the other.
Similarly, the error can be weighted by a covariance matrix Σ, by using e(xt ) = (xt − µ) Σ−1 (xt − µ) and

⊤ −1
J (xt ) = −2(xt − µ) Σ , which corresponds to the problem of reaching/maintaining the contour of an ellipse.
Figure 25 presents examples with a point mass.

32
Figure 26: Example of iLQR optimization with a cost on manipulability, where
the goal is to reach, at the end of the motion, a desired manipulability defined
at the center of mass (CoM) of a bimanual planar robot. The initial pose of
the bimanual robot is displayed in light gray and the final pose of the robot
is displayed in dark gray. The desired manipulability is represented by the red
ellipse. The achieved manipulability at the end of the motion is represented by
the gray ellipse. Both manipulability ellipses are displayed at the CoM reached
at the end of the motion, displayed as a semi-filled dark gray disc (the CoM at
the beginning of the motion is displayed in lighter gray color). We can see that
the posture adopted by the robot to approach the desired manipulability is to
extend both arms, which is the pose allowing the robot to swiftly move its center
of mass in case of unexpected perturbations.

8.5.10 Manipulability tracking


iLQR_bimanual_manipulability.*

Skills transfer can exploit manipulability ellipsoids in the form of geometric descriptors representing the skills to be
transferred to the robot. As these ellipsoids lie on symmetric positive definite (SPD) manifolds, Riemannian geometry
can be used to learn and reproduce these descriptors in a probabilistic manner [3].
Manipulability ellipsoids come in different flavors. They can be defined for either positions or forces (including the
orientation/rotational part). Manipulability can be described at either kinematic or dynamic levels, for either open
or closed linkages (e.g., for bimanual manipulation or in-hand manipulation), and for either actuated or non-actuated
joints [16]. This large set of manipulability ellipsoids provide rich descriptors to characterize robot skills for robots
with legs and arms.
Manipulability ellipsoids are helpful descriptors to handle skill transfer problems involving dissimilar kinematic
chains, such as transferring manipulation skills between humans and robots, or between two robots with different
kinematic chains or capabilities. In such transfer problems, imitation at joint angles level is not possible due to
the different structures, and imitation at endeffector(s) level is limited, because it does not encapsulate postural
information, which is often an essential aspect of the skill that we would like to transfer. Manipulability ellipsoids
provide intermediate descriptors that allows postural information to be transferred indirectly, with the advantage that
it allows different embodiments and capabilities to be considered in the skill transfer process.

The manipulability ellipsoid M (x) = J (x)J (x) is a symmetric positive definite matrix representing the manip-
ulability at a given point on the kinematic chain defined by a forward kinematics function f (x), given the joint angle
posture x, where J (x) is the Jacobian of f (x). The determinant of M (x) is often used as a scalar manipulability
index indicating the volume of the ellipsoid [19], with the drawback of ignoring the specific shape of this ellipsoid.
In [7], we showed that a geometric cost on manipulability can alternatively be defined with the geodesic distance
1
A = log S − 2 M (x)S − 2 ,
1
c = ∥A∥2F , with
X X ⊤
⇐⇒ c = trace(AA⊤ ) = trace(Ai A⊤ i )= A⊤ i Ai = vec(A) vec(A), (110)
i i

where S is the desired manipulability matrix to reach, log(·) is the logarithm matrix function, and ∥ · ∥F is a Frobenius
norm. By exploiting the trace properties, we can see that (110) can be expressed in a quadratic form, where the
vectorization operation vec(A) can be computed efficiently by exploiting the symmetry of the matrix A. This is
implemented by keeping only the lower half of the matrix, with the elements below the diagonal multiplied by a factor

2.
The derivatives of (110) with respect to the state x and control commands u can be computed numerically (as in
the provided example), analytically (by following an approach similar to the one presented in [7]), or by automatic
differentiation. The quadratic form of (110) can be exploited to solve the problem with Gauss–Newton optimization,
by using Jacobians (see description in Section 5.1 within the context of inverse kinematics). Marić et al. demonstrated
the advantages of a geometric cost as in (110) for planning problems, by comparing it to alternative widely used
metrics such as the manipulability index and the dexterity index [9].
Note here that manipulability can be defined at several points of interest, including endeffectors and centers of
mass. Figure 26 presents a simple example with a bimanual robot. Manipulability ellipsoids can also be exploited as
descriptors for object affordances, by defining manipulability at the level of specific points on objects or tools held by
the robot. For example, we can consider a manipulability ellipsoid at the level of the head of a hammer. When the
robot grasps the hammer, its endeffector is extended so that the head of the hammer becomes the extremity of the
kinematic chain. In this situation, the different options that the robot has to grasp the hammer will have an impact on
the resulting manipulability at the head of the hammer. Thus, grabbing the hammer at the extremity of the handle will
improve the resulting manipulability. Such descriptors offer promises for learning and optimization in manufacturing
environments, in order to let robots automatically determine the correct ways to employ tools, including grasping
points and body postures.

33
The above iLQR approach, with a geodesic cost on SPD manifolds, has been presented for manipulability ellipsoids,
but it can be extended to other descriptors represented as ellipsoids. In particular, stiffness, feedback gains, inertia,
and centroidal momentum have similar structures. For the latter, the centroidal momentum ellipsoid quantifies the
momentum generation ability of the robot [14], which is another useful descriptor for robotics skills. Similarly to
manipulability ellipsoids, it is constructed from Jacobians, which are in this case the centroidal momentum matrices
mapping the joint angle velocities to the centroidal momentum, which sums over the individual link momenta after
projecting each to the robot’s center of mass.
The approach can also be extended to other forms of symmetric positive definite matrices, such as kernel matrices
used in machine learning algorithms to compute similarities, or graph graph Laplacians (a matrix representation of a
graph that can for example be used to construct a low dimensional embedding of a graph).

8.5.11 Decoupling of control commands


In some situations, we would like to control a robot by avoiding that all degrees of
freedom are controlled at the same time. For example, we might in some cases prefer
that a mobile manipulator stops moving the arm while the platform is moving and vice
versa.
For a 2 DOFs robot controlled at each time step t with two control command u1,t
and u2,t , the corresponding cost to find the full sequence of control commands for a Figure 27: 2D reaching task
duration T (gathered in a concatenated vector u) is problem (initial point in black
and target point in red), with
X
T
2
an additional cost to favor the
min (u1,t u2,t ) , decoupling of the control com-
u mands.
t=1

with corresponding residuals and Jacobian given by


 
rt = u1,t u2,t , Jt = u2,t , u1,t , ∀t ∈ {1, . . . , T }.
The Gauss–Newton update rule in concatenated vector form is given by
−1 ⊤
uk+1 ← uk − α J ⊤ J J r,
where α is a line search parameter, r is a vector concatenating vertically all residuals rt , and J is a Jacobian matrix
with Jt as block diagonal elements. With some linear algebra machinery, this can also be computed in batch form
 
using r = u1 ⊙ u2 , and J = (IT −1 ⊗ 11×2 ) diag IT −1 ⊗ (12×2 − I2 ) u , with ⊙ and ⊗ the elementwise product and
Kronecker product operators, respectively.
Figure 27 presents a simple example within a 2D reaching problem with a point mass agent and velocity commands.

8.5.12 Curvature
The curvature of a 2-dimensional curve is defined as
ẋ1 ẍ2 − ẋ2 ẍ1
κ= 3 . (111)
(ẋ21 + ẋ22 ) 2
We describe the system state in vector form as
 
x
x = ẋ . (112)

By defining a set of selection vectors si,j so that ẋ1 = s⊤
1,1 x, ẋ2 =
s1,2 x, ẍ1 = s⊤

2,1 x, ẍ2 = s ⊤
2,2 x, we can observe that
ẋ1 ẍ2 = (s⊤ ⊤ ⊤ ⊤
1,1 x)(s2,2 x) = x (s1,1 s2,2 )x, (113)
where (s1,1 s⊤
2,2 ) is a selection matrix.
By leveraging this matrix formulation, the curvature in (111) can
be expressed as Figure 28: Optimal control problem to generate a
⊤ − 32 ⊤ motion by considering a point mass starting from an
κ(x) = (x SB x) x SA x, (114)
initial point (in black), with a cost asking to pass
2,2 − s1,2 s2,1 and SB = s1,1 s1,1 +
with selection matrices SA = s1,1 s⊤ ⊤ ⊤ through a set of viapoints (in red), together with a
⊤ cost on curvature (black path).
s1,2 s1,2 .
With this formulation, by using the derivative property (f g)′ = f ′ g + f g ′ , and by observing that SB is a symmetric
matrix and that SA is an asymmetric matrix, the derivatives of (111) w.r.t x form the Jacobian
∂κ(x)
= (x⊤ SB x)− 2 (SA + SA )x − 3 (x⊤ SA x) (x⊤ SB x)− 2 SB x.
3 5

J (x) = (115)
∂x
Figure 28 presents an example with a point mass.

34
Figure 29: Reaching task with control primitives.

8.6 iLQR with control primitives


iLQR_CP.*

The use of control primitives can be extended to iLQR, by considering ∆u = Ψ∆w. For problems with quadratic
cost on f (xt ) (see previous Section), the update of the weights vector is then given by
 −1  
∆ŵ = Ψ⊤ Su⊤ J (x)⊤ QJ (x)Su Ψ + Ψ⊤ R Ψ − Ψ⊤ Su⊤ J (x)⊤ Qf (x) − Ψ⊤ R u . (116)

For a 5-link kinematic chain, by setting the coordination matrix in (64) as


 
−1 0 0
2 0 0
 
C= −1 0 0,
0 1 0
0 0 1

the first three joints are coordinated such that the third link is always oriented in the same direction, while the last
two joints can move independently. Figure 29 shows an example for a reaching task, with the 5-link kinematic chain
depicting a humanoid robot that needs to keep its torso upright.
Note also that the iLQR updates in (116) can usually be computed faster than in the original formulation, since a
matrix of much smaller dimension needs to be inverted.

8.7 iLQR for spacetime optimization


We define a phase variable st starting from s1 = 0
at the beginning of the movement. With an aug-

mented state xt = [x⊤ t , st ] and control command
x⊤ s ⊤
ut = [ut , ut ] , we define the evolution of the sys-
tem xt+1 = d(xt , ut ) as

xt+1 = At xt + Bt uxt ust ,


st+1 = st + ust .

Its first order Taylor expansion around x̂, û pro-


vides the linear system
       
∆xt+1 At 0 ∆xt Bt ust Bt uxt ∆uxt
= + ,
∆st+1 0 1 ∆st 0 1 ∆ust Figure 30: Spacetime optimization with iLQR for a viapoints task,
| {z } | {z } | {z } | {z } | {z }
∆xt+1 At ∆xt Bt ∆ut where the estimated durations between consecutive viapoints are shown
in blue.
∂d ∂d
with Jacobian matrices At = ∂x t
, Bt = ∂u t
. Note that in the above equations, we used here the notation xt and xt to
describe the standard state and augmented state, respectively. We similarly used {At , Bt } and {At , Bt } (with slanted
font) for standard linear system and augmented linear system.
Figure 30 shows a viapoints task example in which both path and duration are optimized (convergence after 10
iterations when starting from zero commands). The example uses a double integrator as linear system defined by
constant At and Bt matrices.

8.8 iLQR with offdiagonal elements in the precision matrix


iLQR_manipulator_object_affordance.*

35
Figure 31: Manipulation with object affordances cost. The
depicted pick-and-place task requires the offset at the pick-
ing location to be the same as the offset at the placing loca-
tion, which can be achieved by setting a precision matrix with
nonzero offdiagonal entries, see main text for details. In this
example, an additional cost is set for choosing the picking and
placing points within the object boundaries, which was also
used for the task depicted in Fig. 20. We can see with the
resulting motions that when picking the object, the robot an-
ticipates what will be done later with this object, which can
be viewed as a basic form of object affordance. In the two sit-
uations depicted in the figure, the robot efficiently chooses a
grasping point close to one of the corners of the object, differ-
ent in the two situations, so that the robot can bring the object
at the desired location with less effort and without reaching
joint angle limits.

Figure 32: iLQR for car and bicopter control/planning problems.

 I −I 
Spatial and/or temporal constraints can be considered by setting −I I in the corresponding entries of the preci-
sion matrix Q. With this formulation, we can constrain two positions to be the same without having
 1 −1  xi to predetermine

the position at which the two should meet. Indeed, we can see that a cost c = [ xxji ] −1 1 [ xj ] = (xi − xj )2 is
minimized when xi = xj .
Such cost can for example be used to define costs on object affordances, see Fig. 31 for an example.

8.9 Car steering


iLQR_car.*
Velocity Control Input
A car motion with position (x1 , x2 ), car orientation θ, and front wheels orientation ϕ (see Fig. 32-left) is characterized
by equations
u1
ẋ1 = u1 cos(θ), ẋ2 = u1 sin(θ), θ̇ = tan(ϕ), ϕ̇ = u2 ,

where ℓ is the distance between the front and back axles, u1 is the back wheels velocity command and u2 is the front
wheels steering velocity command. Its first order Taylor expansion around (θ̂, ϕ̂, û1 ) is
      
∆ẋ1 0 0 −û1 sin(θ̂) 0 ∆x1 cos(θ̂) 0  
∆ẋ2  0     0
 = 0 û1 cos(θ̂) 0  ∆x2  +  sin(θ̂)  ∆u1 , (117)
 ∆θ̇  0 0 0 û1
(tan(ϕ̂)2 +1)  ∆θ   1ℓ tan(ϕ̂) 0 ∆u2

∆ϕ̇ 0 0 0 0 ∆ϕ 0 1
which can then be converted to a discrete form with (191).

Acceleration Control Input


A car motion with position (x1 , x2 ), car orientation θ, and front wheels orientation ϕ (see Fig. 32-left) is characterized
by equations
v
ẋ1 = v cos(θ), ẋ2 = v sin(θ), θ̇ = tan(ϕ), v̇ = u1 , ϕ̇ = u2 ,

where ℓ is the distance between the front and back axles, u1 is the back wheels acceleration command and u2 is the
front wheels steering velocity command. Its first order Taylor expansion around (θ̂, ϕ̂, v̂) is
      
∆ẋ1 0 0 −v̂ sin(θ̂) cos(θ̂) 0 ∆x1 0 0
∆ẋ2  0 0 v̂ cos(θ̂) sin(θ̂) 0   ∆x   0 0  
    2
   ∆u1
 ∆θ̇  = 0 (tan(ϕ̂)2 +1)
 ∆θ  + 0 0 (118)
1 v̂ ,
   0 0 tan(ϕ̂)
 ∆v̇  0 0 0

0

0   ∆v  1 0
∆u2
∆ϕ̇ 0 0 0 0 0 ∆ϕ 0 1

36
which can then be converted to a discrete form with (191).

8.10 Bicopter
iLQR_bicopter.*

A planar bicopter of mass m and inertia I actuated by two propellers at distance ℓ (see Fig. 32-right) is characterized
by equations
1 1 ℓ
ẍ1 = − (u1 + u2 ) sin(θ), ẍ2 = (u1 + u2 ) cos(θ) − g, θ̈ = (u1 − u2 ),
m m I
with acceleration g = 9.81 due to gravity. Its first order Taylor expansion around (θ̂, û2 , û2 ) is
      
∆ẋ1 0 0 0 1 0 0 ∆x1 0 0
∆ẋ2  0 0 0 0 1 0 ∆x2   0 0 
       
 ∆θ̇  0 0 0 0 0 1  ∆θ   0 0  ∆u1
    + 1  (119)
∆ẍ1  = 0 0     ,
 ∆ẋ1  −1m sin(θ̂) −1m sin(θ̂) ∆u2
1
   0 −m
1
(û1 + û2 ) cos(θ̂) 0 0
∆ẍ2  0 0 −m
1
(û1 + û2 ) sin(θ̂) 0 0 0   ∆ ẋ 2   cos(θ̂) m cos(θ̂) 
m
∆θ̈ 0 0 0 0 0 0 ∆θ̇ ℓ
I
− Iℓ

which can then be converted to a discrete form with (191).

9 Ergodic control
Ergodic theory studies the connection between the time-averaged and space-averaged behaviors of a dynamical sys-
tem. By using it in a search and exploration context, it enables optimal exploration of an information distribution.
Exploration problems can be formulated in various ways, see Figures 33 and 34.
Research in ergodic control addresses fundamental challenges linking machine learning, optimal control, signal
processing and information theory. A conventional tracking problem in robotics is characterized by a target to reach,
requiring a controller to be computed to reach this target. In ergodic control, instead of providing a single target point,
a probability distribution is given to the robot, which must cover the distribution in an efficient way. Ergodic control
thus consists of moving within a spatial distribution by spending time in each part of the distribution in proportion
to its density. The term ergodicity corresponds here to the difference between the time-averaged spatial statistics of
the agent’s trajectory and the target distribution to search in. By minimizing this difference, the resulting controller
generates natural exploration behaviors.
In robotics, ergodic control can be exploited in a wide range of problems requiring the automatic exploration of
regions of interest, which can be used in a wide range of tasks, including active sensing, localization, surveillance,
or insertion (see Figure 35 for an example). This is particularly helpful when the available sensing information or
robot actuators are not accurate enough to fulfill the task with a standard controller (limited vision, soft robotic
manipulator, etc.), but where this information can still guide the robot towards promising areas. In a collaborative
task, it can also be used when the operator’s input is not accurate enough to fully reproduce the task, which then
requires the robot to explore around the requested input (e.g., a point of interest selected by the operator). For picking
and insertion problems, ergodic control can be applied to move around the picking/insertion point, thereby facilitating
the prehension/insertion. It can also be employed for active sensing and localization (either detected autonomously,
or with help by the operator). Here, the robot can plan movements based on the current information density, and
can recompute the commands when new measurements are available (i.e., updating the spatial distribution used as
target).
Ergodic control is originally formulated as a spectral multiscale coverage (SMC) objective [10], which we will see
next. Other ergodic control techniques have later been proposed, including the heat equation driven area coverage
(HEDAC) [6], which we will also be introduced next.

9.1 Spectral-based ergodic control (SMC)


The underlying objective of spectral multiscale coverage (SMC) takes a simple form, corresponding to a tracking
problem in the spectral domain (matching of frequency components) [10]. The advantage of such a simple control
formulation is that it can be easily combined with other control objectives and constraints.
It requires the spatial distribution to be decomposed as Fourier series, with a cost function comparing the spectral
decomposition of the robot path with the spectral decomposition of the distribution, in the form of a weighted distance
function with a decreasing weight from low frequency to high frequency components. The resulting controller allows
the robot to explore the given spatial distribution in a natural manner, by starting from a crude exploration
and by refining the search progressively (i.e., matching the Fourier coefficients with an increasing importance
from low to high frequency components).

37
Figure 33: Search and exploration problems can be formulated in various ways. Ergodic control can be used in a search problem in
which we do not know the duration of the exploration (e.g., we would like to find a hidden target as soon as possible). In contrast to
stochastic sampling of a distribution, ergodic control takes into account the cost to move the agent from one point to another. If the search
optimization problem is formulated as a coverage problem with fixed duration, the optimal path typically reveals some regular coverage
patterns (here, illustrated as spirals, but also called lawnmower strategy). Ergodic control instead generates a more natural search behavior
to localize the object to search by starting with a crude coverage of the workspace and by then generating progressively more detailed
paths to cover the distribution. This for example resembles the way we would search for keys in a house by starting with a distribution
about the locations in the rooms in which we believe the keys could be, and by covering these distribution coarsely as a first pass, followed
by progressive refinement about the search with more detailed coverage until the keys are found. Both stochastic samples or patterned
search would be very inefficient here!

Ergodic search Patterned search


Figure 34: Example illustrating the search of a car key in
a living room, where regions of interest are specified in the
form of a distribution over the spatial domain. These regions
of interest correspond here to the different tables in the room
where it is more likely to find the key. The left and right
columns respectively depict an ergodic search and patterned
search processes. For a coverage problem in which the dura-
tion is given in advance, a patterned search can typically be
better suited than an ergodic control search, because the agent
will optimize the path to reduce the number of transitions from
one area to the other. In contrast, an ergodic control search
will typically result in multiple transitions between the differ-
ent regions of interest. When searching for a key, since we are
interested in finding the key as early as possible, we have a
problem in which the total search duration cannot be specified
in advance. In this problem setting, an ergodic search will be
better suited than a patterned search, as it generates a natural
way of exploring the environment by first proceeding with a
crude search in the whole space and by then progressively fine-
tuning the search with additional details until the key is finally
found. The bottom row shows an example in which the key lo-
cation is unknown, with the search process stopping when the
key is found. Here, a patterned search would provide an inef-
ficient and unnatural way of scanning the whole environment
regularly by looking at all details regions by regions, instead of
treating the problem at different scales (i.e., at different spatial
frequencies), which is what ergodic exploration provides.

Figure 35: Insertion tasks can be achieved robustly with ergodic control, by not only relying on accurate sensors, but instead using a
control strategy to cope with limited or inaccurate sensing information, see [18] for details.

38
Table 1: Fourier series properties (1D case).

Symmetry property: 
If g(x) is real and even, ϕk (x) in (121) is also real and even, simplifying to ϕk (x) = L1 cos 2πkx
L , which then, in
practice, only needs an evaluation on the range k ∈ [0, . . . , K − 1], as the basis functions are even. We then have
PK−1 
g(x) = w0 + k=1 wk 2 cos 2πkx L , by exploiting cos(0) = 1.
Shift property:
L )wk are the Fourier coefficients of g(x − µ).
If wk are the Fourier series coefficients of a function g(x), exp(−i 2πkµ
Combination property:
If wk,1 (resp. wk,2 ) are the Fourier series coefficients of a function g1 (x) (resp. g2 (x)), then α1 wk,1 + α2 wk,2 are the
Fourier coefficients of α1 g1 (x) + α2 g2 (x).
Gaussian property:
x2
If g0 (x) = N (x | 0, σ 2 ) = (2πσ 2 )− 2 exp(− 2σ
1
2 ) is mirrored to create a real and even periodic function g(x) of

period L ≫ σ (implementation details will follow), the corresponding Fourier series coefficients are of the form
2 2 2
wk = exp(− 2π Lk2 σ ).

9.1.1 Unidimensional SMC


We will adopt a notation to make links with the superposition of basis functions seen in Section 6. By starting with
the unidimensional case (system moving along a line segment), we will consider a signal g(x) varying along a variable
x, where x will be used as a generic variable that can for example be a time variable or the coordinates of a pixel in
an image. The signal g(x) can be approximated as a weighted superposition of basis functions with

X
K−1
g(x) = wk ϕk (x). (120)
k=−K+1

The use of Fourier basis functions provides useful connections between the spatial domain and the spectral domain,
where wk and ϕk (x) denote the coefficients and basis functions of the Fourier series.
For a spatial signal x on the interval [− L2 , L2 ] of period L, the basis functions of the Fourier series with complex
exponential functions will be defined as
 
1 2πkx
ϕk (x) = exp −i
2L L
   !
1 2πkx 2πkx
= cos − i sin , ∀k ∈ [−K +1, . . . , K −1], (121)
2L L L

with i the imaginary unit of a complex number (i2 = −1).


As detailed in Table 1, when g(x) is real and even (our case of interest), ϕk (x) in (121) is also real and even,
simplifying to ϕk (x) = L1 cos 2πkx
L , which then, in practice, only needs an evaluation on the range k ∈ [0, . . . , K −1],
using  
1 2πkx
ϕk (x) = cos , ∀k ∈ [0, . . . , K − 1]. (122)
L L
The derivatives of (122) with respect to x are easy to compute, namely
 
2πk 2πkx
∇x ϕk (x) = − 2 sin . (123)
L L

Ergodic control aims to build a controller so that the Fourier series coefficients wk along a trajectory x(t) of
duration t, defined as Z
1 t 
wk = ϕk x(τ ) dτ, (124)
t τ =0
will match the Fourier series coefficients ŵk on the spatial domain X , defined as
Z
ŵk = ĝ(x) ϕk (x) dx. (125)
x∈X

39
The discretized version of (124) for a discrete number of time steps T is

1X
T
wk = ϕk (xs ), (126)
T s=1

which can be computed recursively by updating wk at each iteration step.


Similarly, the discretized version of (125) for a given spatial resolution can for example be computed on a domain
X discretized in X bins as
X
X
ŵk = ĝ(xs ) ϕk (xs ), (127)
s=1

where xs splits the domain X into X regular intervals.


To compute these Fourier series coefficients efficiently, several Fourier series properties can be exploited, including
general property such as symmetry, shift, and linear combination, as well as more specific property in the special case
when the spatial distribution takes the form of a mixture of Gaussians. These properties are reported in Table 1 for
the 1D case.
Well-known applications of Fourier basis functions in the context of time series include speech processing and the
analysis of periodic motions such as gaits. In ergodic control, the aim is instead to find a series of control commands
u(t) so that the retrieved trajectory x(t) covers a bounded space X in proportion of a desired spatial distribution ĝ(x).
This can be achieved by defining a metric in the spectral domain, by decomposing in Fourier series coefficients both
the desired spatial distribution ĝ(x) and the (partially) retrieved trajectory x(t), which can exploit the Fourier series
properties presented in Table 1, which can also be extended to more dimensions.
The goal of ergodic control is to minimize

1 X
K−1  2
ϵ= Λk wk − ŵk (128)
2
k=0

where Λk are weights, ŵk are the Fourier series coefficients of ĝ(x), and wk are the Fourier series coefficients along the
trajectory x(t). k ∈ [0, 1, . . . , K −1] is a set of K index vectors. In (128), the weights
−1
Λk = 1 + k 2 (129)

assign more importance on matching low frequency components (related to a metric for Sobolev spaces of negative
order).
Practically, the controller will mainly focus on the lowest frequency components at the beginning as these compo-
nents have a large effect on the cost due to the large weights. When the errors become zero for these components, the
controller can then try to progressively reduce higher frequency components.
Ergodic control is then set as the constrained problem of computing a control command û(t) at each time step t
with  
û(t) = arg min ϵ x(t+∆t) , s.t. ẋ(t) = f x(t), u(t) , ∥u(t)∥ ⩽ umax , (130)
u(t)

where the simple system u(t) = ẋ(t) is considered (control with velocity commands), and where the error term is
approximated with the Taylor series
   1 
ϵ x(t+∆t) ≈ ϵ x(t) + ϵ̇ x(t) ∆t + ϵ̈ x(t) ∆t2 . (131)
2

By using (128),
 (124), (122) and the chain rule ∂f ∂f ∂x
∂t = ∂x ∂t , the Taylor series is composed of the control term u(t)
and ∇x ϕk x(t) ∈ R, the gradient of ϕk x(t) with respect to x(t). Solving the constrained objective in (156) then
results in the analytical solution (see [10] for the complete derivation)

umax  X
K−1
 
u(t) = ũ(t) = sign u(t) umax , with ũ(t) = − Λk wk − ŵk ∇x ϕk x(t)
∥ũ(t)∥
k=0

It is important to emphasize that the implementation of this controller does not rely on any random number
generation: this search behavior is instead deterministic.

Computation in vector form

ergodic_control_SMC_1D.*

40
Figure 36: Unidimensional SMC problem computed with iLQR. A series of control commands is used as initial guess (see timeline plot in
gray on the bottom-left), which is then iteratively refined by iLQR (see timeline plot in black on the bottom-left). The top plots shows the
reference distribution (in red) and the reconstructed distributions (in gray for initial guess and black after iLQR), in both spatial domain
(top-left) and spectral domain (top-right).

The superposition of basis functions in (120) can be rewritten as

g(x) = w⊤ ϕ(x), (132)

where w and ϕ(x) are vectors formed with the K elements of wk and ϕk (x), respectively.
The goal of ergodic control in (128) is to minimize
1 ⊤  
ϵ= w − ŵ Λ w − ŵ , (133)
2
where ŵ is a vector formed with the K elements ŵk , and Λ is a weighting matrix formed with the K diagonal elements
Λk .
By defining ϕ(xs ) as the vector concatenating the K elements ϕk (xs ), (126) can be rewritten in a vector form as

1X
T
w= ϕ(xs ). (134)
T s=1

With some abuse of notation, by defining a vector ṽ = [0, . . . , K −1] , we can conveniently compute
1
ϕ(x) = cos(vx), (135)
L
v 2πṽ
∇x ϕ(x) = − sin(vx), with v= , (136)
L L
where ∇x ϕ(x) is a vector formed by the K elements ∇x ϕk (x), yielding the controller


u(t) = −∇x ϕ(x) Λ w − ŵ . (137)

Planning formulation of unidimensional SMC with iLQR


ergodic_control_SMC_DDP_1D.*

By exploiting the results of Section 8.5, the batch formulation of iLQR consists of minimizing the cost

c(x, u) = f (x)⊤ Qf (x) + u⊤ R u, (138)

with Q = Λ and f (x) = w(x) − ŵ, with


1X
T
w(x) = ϕ(xs ), (139)
T s=1
By starting from an initial guess, the iterative updates of (138) are given by
 −1  
∆û = Su⊤ J (x)⊤ QJ (x)Su +R − Su⊤ J (x)⊤ Qf (x) − R u , (140)

41
Figure 37: Decomposition of the spatial distribution using Fourier basis functions.

∂f (x)
with J (x) = ∂x a Jacobian matrix given by
1 
∇x ϕ(x1 ), ∇x ϕ(x2 ), . . . ∇x ϕ(xT ) .
J (x) = (141)
T
Figure 36 presents the output of the accompanying source codes.

9.1.2 Multidimensional SMC


ergodic_control_SMC_2D.*
ergodic_control_SMC_DDP_2D.*
For the multidimensional case, we will define K as a set of index vectors covering a D-dimensional array k =
r × r × · · · × r, with r = [0, 1, . . . , K −1] and K the resolution of the array. For example, with D = 2 and K = 2, we
will have K = [ 00 ] , [ 01 ] , [ 10 ] , [ 11 ] .
Similarly as the unidimensional case, we will build a controller so that the Fourier series coefficients wk along a
trajectory x(t) of duration t, defined as
Z
1 t 
wk = ϕk x(τ ) dτ, (142)
t τ =0
will match the Fourier series coefficients ŵk on the spatial domain X , defined as
Z
ŵk = ĝ(x) ϕk (x) dx. (143)
x∈X

The discretized version of (142) for a discrete number of time steps T is

1X
T
wk = ϕk (xs ), (144)
T s=1
PT
or equivalently in vector form w = T1 s=1 ϕ(xs ), which can be computed recursively by updating w at each iteration
step.
Similarly, the discretized version of (143) for a given spatial resolution can for example be computed on a domain
[0, 1] discretized in X bins for each dimension as
X
X X
X X
X
ŵk = ··· ĝ(xs ) ϕk (xs ), (145)
s1 =1 s2 =1 sd =0

where each dimension i of x is given by index si splitting each domain dimension into X equal bins.
The goal of ergodic control is to minimize
1X  2
ϵ= Λk wk − ŵk (146)
2
k∈K
1 ⊤  
= w − ŵ Λ w − ŵ , (147)
2

42
Figure 38: 2D ergodic control based on SMC used to generate search behaviors. (a) shows the spatial distribution ĝ(x) that the agent
has to explore, encoded here as a mixture of two Gaussians (gray colormap in left graph). The right graphs show the corresponding Fourier
series coefficients ŵk in the frequency domain (K = 9 coefficients per dimension). (b) shows the evolution of the reconstructed spatial
distribution g(x) (left graph) and the computation of the next control command u (red arrow) after T /10 iterations. The corresponding
Fourier series coefficients wk are shown in the right graph. (c) shows that after T iterations, the agent covers the space in proportion to
the desired spatial distribution, with a good match of coefficients in the frequency domain (we can see that ŵk and wk are nearly the
same). (d) shows how a periodic signal ĝ(x) (with range [−L/2, L/2] for each dimension) can be constructed from an initial mixture of two
Gaussians ĝ0 (x) (red area). The constructed signal ĝ(x) is composed of eight Gaussians in this 2D example, by mirroring the Gaussians
along horizontal and vertical axes to construct an even signal of period L. (e) depicts the first few basis functions of the Fourier series for
the first four coefficients in each dimension, represented as a 2D colormap corresponding to periodic signals of different frequencies along
two axes.

Figure 39: Bidimensional SMC problem computed with iLQR.

43
where Λk are weights, ŵk are the Fourier series coefficients of ĝ(x), and wk are the Fourier series coefficients along the
trajectory x(t). w ∈ RK and ŵ ∈ RK are vectors composed of elements wk and ŵk , respectively. Λ ∈ RK ×K is
D D D D

a diagonal weighting matrix with elements Λk . In (146), the weights


− D+1
Λk = 1 + ∥k∥2 2
(148)

assign more importance on matching low frequency components (related to a metric for Sobolev spaces of negative
order).
Practically, the controller will mainly focus on the lowest frequency components at the beginning as these compo-
nents have a large effect on the cost due to the large weights. When the errors become zero for these components, the
controller can then try to progressively reduce higher frequency components.
For a spatial signal x ∈ RD , where xd is on the interval [− L2 , L2 ] of period L, ∀d ∈ {1, . . . , D}, the basis functions
of the Fourier series with complex exponential functions are defined as (see Fig. 38-(e))
 
1 Y
D
2πkd xd
ϕk (x) = cos , ∀k ∈ K. (149)
LD L
d=1
D
By concatenating the K D index vectors k ∈ K into a matrix Ṽ ∈ RD×K , we can conveniently compute ϕ(x) as
the vector concatenating all K D elements ϕk (x), ∀k ∈ K, namely

1 K
D
ϕ(x) = cos(Vd xd ) (150)
LD
d=1

1 2π Ṽ
= cos(V1 x1 ) ⊙ cos(V2 x2 ) ⊙ · · · ⊙ cos(VD xD ), with V = , (151)
LD L
where ⊙ the elementwise/Hadamard product operator.
The derivatives of the vector ϕ(x) with respect to x are given by the matrix ∇x ϕ(x) of size K D ×D by concatenating
horizontally the D vectors
1 K
∇x ϕd (x) = − D
Vd ⊙ sin(Vd xd ) cos(Vi xi ) (152)
L
i̸=d
1
=− Vd ⊙ cos(V1 x1 ) ⊙ · · · ⊙ cos(Vd−1 xd−1 ) ⊙ sin(Vd xd ) ⊙ cos(Vd+1 xd+1 ) ⊙ · · · ⊙ cos(VD xD ). (153)
LD
For the specific example of a 2D space, (151) and (153) yield
1
ϕ(x) = 2
cos(V1 x1 ) ⊙ cos(V2 x2 ), (154)
L 1 
∇x ϕ(x) = − L V1 ⊙ sin(V1 x1 ) ⊙ cos(V2 x2 ) − L1 V2 ⊙ cos(V1 x1 ) ⊙ sin(V2 x2 ) . (155)

Ergodic control is set as the constrained problem of computing a control command û(t) at each time step t with
 
û(t) = arg min ϵ x(t+∆t) , s.t. ẋ(t) = f x(t), u(t) , ∥u(t)∥ ⩽ umax , (156)
u(t)

where the simple system u(t) = ẋ(t) is considered (control with velocity commands), and where the error term is
approximated with the Taylor series
   1 
ϵ x(t+∆t) ≈ ϵ x(t) + ϵ̇ x(t) ∆t + ϵ̈ x(t) ∆t2 . (157)
2
By using (146),
 (142), (149) and the chain rule∂f ∂f ∂x
∂t = ∂x ∂t , the Taylor series is composed of the control term u(t)
and ∇x ϕk x(t) ∈ R 1×D
, the gradient of ϕk x(t) with respect to x(t). Solving the constrained objective in (156)
then results in the analytical solution (see [10] for the complete derivation)
umax ⊤

u(t) = ũ(t) , with ũ(t) = −∇x ϕ(x) Λ w − ŵ . (158)
∥ũ(t)∥
Figures 37 and 38 show a 2D example of ergodic control to create a motion approximating the distribution given
by a mixture of two Gaussians. A remarkable characteristic of such approach is that the controller produces natural
exploration behaviors without relying on stochastic noise in the formulation, see Fig. 38-(c). In the limit case, if the
distribution g(x) is a single Gaussian with a very small isotropic covariance, the controller results in a conventional
target reaching behavior.
Figure 39 presents the output of the accompanying source codes, where the initial guess is computed by standard
SMC (myopic), whose resulting control command trajectory is then refined with iLQR (SMC as a planning problem).

44
9.2 Diffusion-based ergodic control (HEDAC)
ergodic_control_HEDAC_1D.*
ergodic_control_HEDAC_2D.*

Heat equation driven area coverage (HEDAC) is another area coverage method that relies on diffusion processes
instead of spectral analysis [6].
The coverage density currently covered by the agent(s) is defined as the convolution of an instantaneous action
and the trajectory, which results in a field occupying the space around the path of the agent. A radial basis function
can for example be used as the instantaneous action.
The algorithm designs a potential field based on the heat equation by using a source term constructed as the
difference between the given goal density and the current coverage density. The movements of the agent(s) are
directed by the gradient of this potential field, leading them to the area of interest and producing paths to achieve the
desired coverage density. The diffusion of the heat allows the agent(s) to get access to this gradient, bringing them to
the regions of the target coverage density that have not been explored yet.
HEDAC considers a target distribution p(x) to be covered, and the time-dependent coverage c(x, t) that one or
multiple moving agents have already covered. The coverage residual e(x, t) is defined as the scalar field

e(x, t) = p(x) − c(x, t), (159)

which is used to build a virtual heat source with nonnegative values


2
s̃(x, t) = max (e(x, t), 0) . (160)

For proper scaling, the heat source is further normalized over the domain using
s̃(x, t)
s(x, t) = R . (161)

s̃(x, t)dx

The heat source is then diffused over the domain to propagate the information about the unexplored regions to the
agent(s). For that purpose, we use the heat equation: a second-order partial differential equation (PDE) that models
heat conduction by relating spatial and time derivatives of a scalar field with
∂u(x, t)
= α∆u(x, t) + s(x, t), (162)
∂t
2 2
where u(x, t) corresponds to the temperature field, α is a thermal diffusivity parameter, and ∆u = ∂∂xu2 + ∂∂xu2 + . . . is
1 2
the Laplacian of the function u.
In order to control the agents, the potential field exerts a fictitious force on each of the i-th agent, based on the
gradient of the temperature field
fi = ∇u(xi , t). (163)
Neumann boundary conditions can be imposed (corresponding to thermal insulation), which means that informa-
tion propagation stays within the domain boundary with
∂u
= 0, on ∂Ω, (164)
∂n
where Ω is a n-dimensional domain with Lipschitz continuous boundary.
At each iteration step of the algorithm, we compute the coverage of the agent(s), which cool the temperature field
and mark the regions they cover by updating the coverage distribution ci (x, t). For the i-th agent, the coverage is
computed by the relation Z t
ci (x, t) = ϕ(x − xi (τ ))dτ, (165)
0
where ϕ(xi ) is the footprint or shape of the virtual agent (arbitrary shape). For example, a Gaussian radial basis
functions (RBF) with adjustable shape parameter ϵ can be used, given by

ϕ(x) = e−(ϵx) .
2
(166)

The total coverage containing the exploration effort of all the agents is then computed with

1 X
N
c̃(x, t) = ci (x, t), (167)
N t i=1

that is further normalized over the domain using (161) to provide c(x, t).

45
10 Torque-controlled robots
Robot manipulators can sometimes be controlled by using torque commands, allowing us to consider varying stiffness
and compliance behaviors when the robot is in contact with objects, users or its environment. To design efficient
controllers that can make use of this capability, we will typically need to take into account the various kinds of forces
that will act on the robot, which will involve various notions of physics and mechanics. We will start this section
with the description of a computed torque control approach that we can use either in joint space or in task space,
which will be described generically as impedance controllers in the next two sections. In a nutshell, impedance control
is an approach to dynamic control relating force and position. It is often used for applications in which the robot
manipulator physically interacts with its environment.

10.1 Impedance control in joint space


impedance_control.*

A torque-controlled robot can be expressed by the dynamical model

τ = M (x)ẍ + c(x, ẋ) + g(x) + h(x, ẋ) + τext , (168)

where τ is the actuation torque (the input variable to the robot), M (x) is the inertia matrix (symmetric positive-
definite matrix), c(x, ẋ) = C(x, ẋ) ẋ is the Coriolis and centrifugal torque, g(x) is the gravitational torque, h(x, ẋ)
is the friction torque, τext is the torque resulting from external forces from the environment.
In order to best exploit the compliant control capability of such robots, the joint torque commands τ that we
provide most often need to compensate for the various physical effects that are applied to the robot. Removing these
effects mean that we try to estimate what these effects produce at the level of the torques by simulating what effects
will be produce in the physical world, which include (but is not limited to) gravity, inertial forces and friction. In
many applications in robot manipulation, the gravity will have the strongest effect that we will try to compensate.
The estimation of these effects involve physical parameters such as the center of mass of each robot link, the
mass of each robot link, the distribution of mass as an inertia matrix, etc. Robot manufacturers sometimes provide
these values (which can for example be estimated from CAD models of the robot). There are otherwise approaches
to estimate these parameters during a calibration phase. We will assume that we have access to these values. The
estimated model parameters/functions of our robot will be denoted as M̂ (x), ĉ(x, ẋ), ĝ(x) and ĥ(x, ẋ), which might
be close to the real physical values M (x)ẍ, c(x, ẋ), g(x) and h(x, ẋ), but not necessarily the same.
We will consider a desired reference {xd , ẋd , ẍd } that we generically described as position, velocity and acceleration
components.
By using the estimated model parameters of our robot, we can design a control law to reach the desired reference
as
τ = K JP (xd − x) + K JV (ẋd − ẋ) + M̂ (x) ẍd + ĉ(x, ẋ) + ĝ(x) + ĥ(x, ẋ), (169)
where K JP and K JV are stiffness and damping matrices in joint space.
If we apply this control law to our robot, which will affected by the different physical effects of the physical world
as described in (168), we will obtain a closed-loop system (controlled robot) of the form

K JP e + K JV ė + M ë = τext , (170)

with error term e = xd − x. This closed-loop system is simply obtained by inserting our designed control commands
(169) into (168). Thus, with the proposed control law, we can see with (170) that the controlled robot acts as a
mechanical impedance to the environment, corresponding to a mass-spring-damper system in joint space.

10.2 Impedance control in task space


The same principle can also be applied to task space by expressing all parts composing the dynamical model of (168)
in the endeffector coordinate system.
Since we used the term joint torques in the previous section to refer to force commands at the joint angle level, we
will use the term wrench to refer to the forces at the level of the endeffector in task space. In the most generic case,
the wrench w will be a 6D force vector by considering both translation and rotational parts (3D for each). The wrench
w applied to the endeffector will then produce reaction torques at joint level, with τ = J (x)⊤ w, corresponding to the
principle of virtual work.
By using the relations f˙ = J (x)ẋ and f¨ = J˙(x)ẋ + J (x)ẍ ≈ J (x)ẍ, we can see that τ = M (x) ẍ in the
 −1
joint coordinate system becomes w = Λ(x) f¨, with Λ(x) = J (x) M (x)−1 J (x)⊤ in the endeffector coordinate
system.2
−1
2 We can see this with J ⊤ w = M ẍ ⇐⇒ M −1 J ⊤ w = ẍ ⇐⇒ JM −1 J ⊤ w = J ẍ ⇐⇒ w = (JM −1 J ⊤ ) f¨.

46
Similarly to (169), we can then define a control law as
 
τ = J (x)⊤ K P (f d − f ) + K V (f˙d − f˙) + Λ̂(x) f¨d + ĉ(x, ẋ) + ĝ(x) + ĥ(x, ẋ), (171)

with  −1
Λ̂(x) = J (x) M̂ (x)−1 J (x)⊤ . (172)

The controlled robot then acts as a mechanical impedance corresponding to a mass-spring-damper system in task
space.

10.3 Forward dynamics for a planar robot manipulator and associated control strategy
FD.* iLQR_manipulator_dynamics.*

The dynamic equation of a planar robot with an arbitrary number of links can be derived using the Lagrangian
formulation, by representing the kinetic and potential energies of the robot as functions of the joint angles. The
dynamic equation can be used in the context of iLQR to control a robot with torque commands, see the provided
examples and Appendix E for computation details.

11 Orientation representations and Riemannian manifolds


Representing orientation is crucial in robotics. For example, controlling the 3D position of a gripper with respect
to the 3D position of an object is not enough for a successful grasp with a robot manipulator. Instead, we need to
consider the full 6D poses of the object and the gripper, which require to consider both position and orientation to
describe the location in space of the robot’s end-effector.
There are many ways to represent orientations, including rotation matrices, Euler angles, axis-angle, unit quater-
nions. Each representation has pros and cons that often depend on what we want to do with these data. It sometimes
requires us to consider multiple representations in the same application, by converting one representation into another.

Rotation matrices
Rotations in the three-dimensional space can be represented by 3 rotation matrices, i.e., by means of 9 parameters.
These parameters, however, are not independent, but constrained by orthonormal conditions (its columns are orthog-
onal unit vectors). The main advantage of rotation matrices is that geometric operation can be directly computed
with standard linear algebra, by multiplying matrices with vectors. The main disadvantage is that it involves the use
of 9 parameters that are not 9 independent parameters but that are instead constrained to 3 independent parameters.
For human analysis, the columns of a rotation matrix correspond to a set of 3 unit vectors forming the base of
a coordinate system, where each vector is orthogonal to the other. By plotting these 3 vectors, we can visualize the
corresponding orientation that the rotation matrix encodes.

Euler angles
A minimal representation of rotations in space requires three independent parameters, which is fulfilled by Euler
angles. With Euler angles, a rotation in space is viewed as a sequence of three elementary rotations. The sequence
ZYX of Euler angles correspond for example to yaw-pitch-roll angles. There are 16 different sequences, some being
more used than others. When using Euler angles, in addition to the three values, it is thus important to specify which
sequence has been used, because the use of these different conventions can easily introduce mistakes.
The main advantage of Euler angles is that they are easy to interpret for human analysis in a compact form, where
the rotation can be mentally interpreted by applying the three orientations in sequence. This might be an advantage
over the rotation matrix, that often requires the corresponding coordinate system to be visualized on a screen to be
interpreted.
One disadvantage of Euler angles is that they are not easy to compose, but there is another more important
disadvantage that drastically limits the use of Euler angles for computation in learning and optimization problems.
With Euler angles, for a given orientation, the 3 angles are uniquely determined except for singular cases when some of
the considered axes have the same or opposite directions. These ambiguities are known as gimbal locks. A gimbal is a
ring that is suspended so it can rotate about an axis. Gimbals are nested one within another to accommodate rotation
about multiple axes. Gimbal lock refers to the loss of one degree of freedom in a three-dimensional, three-gimbal
mechanism that occurs when the axes of 2 of the 3 gimbals are driven into a parallel configuration, ”locking” the
system into rotation in a degenerate two-dimensional space. For example, in a pitch/yaw/roll rotational system, if the
pitch is rotated 90 degrees up or down, the yaw and roll will correspond to the same motion and a degree of freedom
is lost.

47
Axis-angle
Axis-angle is a representation of orientation that defines a unit axis u around which a rotation is made with angle
θ. By rescaling the unit axis by the rotation angle, we can obtain a representation with 3 parameters. The rotation
vector is not well defined for θ = 0.

Unit quaternions
The unit quaternion is a nonminimal representation of rotations which shares some similarities with the axis-angle
representation but that overcomes its limitations. A unit quaternion can be stored as a vector of 4 elements, but it
can also be interpreted as a single number, in the same way as a complex number can be stored as a vector of two
elements with the real and imaginary part. Indeed, the quaternion is an extension of complex numbers (with i2 = −1)
to additional abstract symbols i, j, k satisfying the rules

i2 = j 2 = k 2 = ijk = −1. (173)

Unit quaternions can be used to represent orientations. Similarly to the rotation matrix, this representation avoids
the gimbal lock issue, but with a more compact representation (4 instead of 9 parameters).
While the quaternion describes a number, it is convenient in practice to store the parameters of the quaternion
number q1 + q2 i + q3 j + q4 k as a vector  
q1
 q2 
q= 
 q3  . (174)
q4
The conversion from an axis-angle representation is simple and shows the connections between the two represen-
tations. For a unit axis u and a rotation angle θ, we have
 
cos θ2
q= . (175)
u sin θ2

The main motivation of working with quaternions (which can be viewed as an advantage or as a disadvantage) is
that they come with a specific algebra that allows to fully leverage the quaternion parameterization. For example, in
the quaternion algebra, a quaternion rotation can be defined as p′ = qpq −1 , which can be computed efficiently. This
has led unit quaternions to be the most widely used representation in computer graphics, for example.
Conversions between quaternion algebra and linear algebra are possible. For example, the above quaternion rotation
corresponds to the matrix rotation p′ = R p in standard linear algebra, where R is a rotation matrix given by
 
1 − 2q32 − 2q42 2q2 q3 − 2q4 q1 2q2 q4 + 2q3 q1
R = 2q2 q3 + 2q4 q1 1 − 2q22 − 2q42 2q3 q4 − 2q2 q1  . (176)
2q2 q4 − 2q3 q1 2q3 q4 + 2q2 q1 1 − 2q22 − 2q32

A rotation matrix R can reversely be converted to a quaternion q by constructing a symmetric 4 × 4 matrix


 
R11 −R22 −R33 R21 +R12 R31 +R13 R23 −R32
1  R21 +R12 R22 −R11 −R33 R32 +R23 R31 −R13 
K=  
, (177)
3 R31 +R13 R32 +R23 R33 −R11 −R22 R12 −R21 
R23 −R32 R31 −R13 R12 −R21 R11 +R22 +R33

whose largest eigenvalue corresponds to q.


By storing the four elements of a unit quaternion as a vector q, we can interpret q as a point on a (hyper)sphere
of 4 dimensions. Indeed, the norm of 1 induces this geometry. One way to interpret and visualize this easily is to
consider the corresponding problem of a 3D vector constrained to have a unit norm, which means that the points lie
on a 3D sphere of radius 1. Such unit 3D vectors for example correspond to the representation of direction vectors
in robotics (e.g., to represent surface normals for environment modeling). These representations form a manifold that
can leverage the exploitation of Riemannian geometry, which is discussed next.

11.1 Riemannian manifolds


Data in robotics are essentially geometric. Riemannian manifolds can facilitate the processing of these geometric data,
which can be applied to a wide range of learning and optimization problems in robotics. In particular, it provides a
principled and simple way to extend algorithms initially developed for Euclidean data to other manifolds, by efficiently
taking into account prior geometric knowledge about these manifolds.
A smooth d-dimensional manifold M is a topological space that locally behaves like the Euclidean space Rd . A
Riemannian manifold is a smooth and differentiable manifold equipped with a positive definite metric tensor. For each

48
Figure 40: Applications in robotics using Riemannian manifolds rely on two well-known principles of Riemannian geometry: exponen-
tial/logarithmic mapping (left) and parallel transport (right), which are depicted here on a S 2 manifold embedded in R3 . Left: Bidirectional
mappings between tangent space and manifold. Right: Parallel transport of a vector along a geodesic (see main text for details).

point p ∈ M, there exists a tangent space Tp M that locally linearizes the manifold. On a Riemannian manifold, the
metric tensor induces a positive definite inner product on each tangent space Tp M, which allows vector lengths and
angles between vectors to be measured. The affine connection, computed from the metric, is a differential operator
that provides, among other functionalities, a way to compute geodesics and to transport vectors on tangent spaces
along any smooth curves on the manifold. It also fully characterizes the intrinsic curvature and torsion of the manifold.
The Cartesian product of two Riemannian manifolds is also a Riemannian manifold (often called manifold bundles or
manifold composites), which allows joint distributions to be constructed on any combination of Riemannian manifolds.
Riemannian geometry is a rich field of research. For applications in robotics, two main notions of Riemannian
geometry are crucial:
Geodesics: The minimum length curves between two points on a Riemannian manifold are called geodesics. Similarly
to straight lines in Euclidean space, the second derivative is zero everywhere along a geodesic. The exponential map
Expx0 : Tx0 M → M maps a point u in the tangent space of x0 to a point x on the manifold, so that x lies on the
geodesic starting at x0 in the direction u. The norm of u is equal to the geodesic distance between x0 and x. The
inverse map is called the logarithmic map Logx0 : M → Tx0 M. Figure 40-left depicts these mapping functions.
Parallel transport: Parallel transport Γg→h : Tg M → Th M moves vectors between tangent spaces such that the
inner product between two vectors in a tangent space is conserved. It employs the notion of connection, defining
how to associate vectors between infinitesimally close tangent spaces. This connection allows the smooth transport
of a vector from one tangent space to another by sliding it (with infinitesimal moves) along a curve. Figure 40-right
depicts this operation. The flat surfaces show the coordinate systems of several tangent spaces along the geodesic.
The black vectors represent the directions of the geodesic in the tangent spaces. The blue vectors are transported
from g to h. Parallel transport allows a vector u in the tangent space of g to be transported to the tangent space
of h, by ensuring that the angle (i.e., inner product) between u and the direction of the geodesic (represented as
black vectors) are conserved. At point g, this direction is expressed as Logg (h). This operation is crucial to combine
information available at g with information available at h, by taking into account the rotation of the coordinate
systems along the geodesic. In Euclidean space (top-left inset), the parallel transport is simply the identity operator
(a vector operation can be applied to any point without additional transformation). By extension, a covariance matrix
Pd Pd
Σ can be transported with Σ∥ = i=1 Γg→h (vi ) Γ⊤ g →h (vi ), using the eigendecomposition Σ =

i=1 vi vi . For many
manifolds in robotics, this transport operation can equivalently be expressed as a linear mapping Σ∥ = Ag→h Σ A⊤ g →h
The most common manifolds in robotics are homogeneous, with the same curvature at any point on the manifold,
which provides simple analytic expressions for exponential/logarithmic mapping and parallel transport.

11.2 Sphere manifolds S d in robotics


The sphere manifold S d is the most widely used non-Euclidean manifold in robotics, as it can encode direction and
orientation information. Unit quaternions S 3 can be used to represent endeffector (tool tip) orientations. S 2 can be
used to represent unit directional vector perpendicular to surfaces (e.g., for contact planning). Revolute joints can be
represented on the torus S 1 × S 1 × . . . × S 1 .
The exponential and logarithmic maps corresponding to the distance

d(x, y) = arccos(x⊤ y), (178)

49
with x, y ∈ S d computed as
u
y = Expx (u) = x cos(∥u∥) + sin(∥u∥), (179)
∥u∥
y − x⊤ y x
u = Logx (y) = d(x, y) . (180)
∥y − x⊤ y x∥

The parallel transport of v ∈ Tx S d to Ty S d is given by

Logx (y) v  

Γx→y (v) = v − 2
Logx (y) + Logy (x) . (181)
d(x, y)

In some applications, it can be convenient to define the parallel transport with the alternative equivalent form

Γx→y (v) = Ax→y v, with (182)


Ax→y = −x sin(∥u∥)u⊤ + u cos(∥u∥)u⊤ + (I − u u⊤ ),
u
u = Logx (y), and u = ,
∥u∥

highlighting the linear structure of the operation.


Note that in the above representation, u and v are described as vectors with d + 1 elements contained in Tx S d . An
alternative representation consists of expressing u and v as vectors of d elements in the coordinate system attached
to Tx S d .

11.3 Gaussian distributions on Riemannian manifolds


Several approaches can be used to extend Gaussian distributions
to Riemannian manifolds. The most simple approach consists of
estimating the mean of the Gaussian as a centroid on the manifold
(also called Karcher/Fréchet mean), and representing the disper-
sion of the data as a covariance expressed in the tangent space
of the mean [3]. This is an approximation, because distortions
arise when points are too far apart from the mean. However, this
distortion is negligible in most robotics applications. In particu-
lar, this effect is strongly attenuated when the Gaussian covers
a small part of the manifold. This Gaussian representation on a
manifold M is defined as
Figure 41: Gaussian distribution, where the center of
 − 12 ⊤ −1 the data is described by a point on the manifold and the
e− 2 Logµ(x) Σ Logµ(x) ,
1
NM (x|µ, Σ) = (2π)d |Σ| spread of the data is described by a covariance matrix
in the tangent space at this point.
where x ∈ M is a point of the manifold, µ ∈ M is the mean of the
distribution (origin of the tangent space), and Σ ∈ Tµ M is the covariance defined in this tangent space.
For a set of N datapoints, this geometric mean corresponds to the minimization

X
N

min Logµ(xn ) Logµ(xn ),
µ
n=1

which can be solved by a simple and fast Gauss-Newton iterative algorithm. The algorithm starts from an initial
estimate on the manifold and an associated tangent space. The datapoints {xn }N n=1 are projected in this tangent
space to compute a direction vector, which provides an updated estimate of the mean. This process is repeated by
iterating
1 X
N
u= Logµ(xn ), µ ← Expµ(u),
N n=1
until convergence. In practice, for homogeneous manifolds with constant curvatures, this iterative algorithm will
converge very fast, with only a couple of iterations. PN ⊤
After convergence, a covariance is computed in the tangent space as Σ = N 1−1 n=1 Logµ(xn ) Logµ(xn ) , see Fig.
41.
This distribution can for example be used in a control problem to represent a reference to track with an associated
required precision (e.g., learned from a set of demonstrations). Importantly, this geometric mean can be directly
extended to weighted distances, which will be exploited in the next sections for mixture modeling, fusion (product of
Gaussians), regression (Gaussian conditioning) and planning problems.

50
Figure 42: Examples of manifolds relevant for robotics.

11.4 Other homogeneous manifolds in robotics


Figure 42 shows four examples of homogeneous Riemannian manifolds that can be employed in robotics. For these
four manifolds, the bottom graphs depict S 2 , S++ 2
, H2 and G 3,2 , with a clustering problem in which the datapoints
(black dots/planes) are segmented in two classes, each represented by a center (red and blue dots/planes).
The geodesics depicted in Fig. 42 show the specificities of each manifold, see [3] for details. The sphere manifold
S d is characterized by constant positive curvature. The elements of S++ d
can be represented as the interior of a convex
d
cone embedded
 in its tangent space Sym . Here, the three axes correspond to A11 , A12 and A22 in the SPD matrix
A11 A12
A12 A22 . The hyperbolic manifold H d
is characterized by constant negative curvature. Several representations exist:
H can for example be represented as the interior of a unit disk in Euclidean space, with the boundary of the disk
2

representing infinitely remote point (Poincaré disk model, as depicted here). In this model, geodesic paths are arcs of
circles intersecting the boundary perpendicularly. G d,p is the Grassmann manifold of all p-dimensional subspaces of
Rd .

11.5 Ellipsoids and SPD matrices as S++


d
manifolds
Symmetric positive definite (SPD) matrices are widely used in robotics. They can represent stiffness, manipulability
and inertia ellipsoids. They can be used as weight matrices to compute distances. They can also be used to process
sensory data in the form of spatial covariances. For example, to represent the robot’s manipulability capability in
translation and rotation can be encoded as a manipulability ellipsoid S++
6
.
For an affine-invariant distance between X, Y ∈ S++d

log(X − 2 Y X − 2 )
1 1
d(X, Y ) = F
, (183)

the exponential and logarithmic maps on the SPD manifold can be computed as
1
Y = ExpX (U ) = X 2 exp X − 2 U X − 2 X 2 ,
1 1 1
(184)
1
U = LogX (Y ) = X 2 log X − 2 Y X − 2 X 2 .
1 1 1
(185)

The parallel transport of V ∈ TX S++


d
to TY S++
d
is given by

−2 1 1
ΓX →Y (V ) = AX →Y V A⊤
X →Y , with AX →Y = Y X
2 . (186)

11.6 Non-homogeneous manifolds in robotics


Manifolds with nonconstant curvature can also be employed, such as spaces endowed with a metric, character-
ized by a weighting matrix used to compute distances. Many problems in robotics can be formulated with such
a smoothly varying matrix M that measures the distance between two points x1 and x2 as a quadratic error
term c = (x1 − x2 )⊤ M (x1 − x2 ), forming a Riemannian metric that describes the underlying manifold (with non-
homogeneous curvature). This weighting matrix can for example represent levels of kinetic energy or stiffness gains
to model varying impedance behaviors. Computation is often more costly for these manifolds with nonconstant cur-
vature, because it typically requires iterative algorithms instead of the direct analytic expressions typically provided
by homogeneous manifolds.
Figure 43 presents examples exploiting non-homogeneous Riemannian manifolds.

51
Figure 43: Left: A typical example of smoothly varying metric in robotics is when the weight matrix is a mass inertia matrix so that
inertia is taken into account to generate a movement from a starting joint configuration to a target joint configuration while minimizing
kinetic energy (see geodesics in solid lines). In the figure, inverse weight matrices are depicted to visualize locally the equidistant points
(w.r.t the metric) that the system can reach as ellipsoids. The movement in dashed lines show the baseline movements that would be
produced by ignoring inertia (corresponding to linear interpolation between the two poses). The metric can similarly be weighted by other
precision matrices, including stiffness and manipulability ellipsoids. Center: Transformation of a scalar signed distance field (SDF), as in
Figure 10, and its associated derivatives to a smoothly varying coordinate system to facilitate teleoperation. This coordinate system is
shape-centric (both user-centric and object-centric), in the sense that it is oriented toward the surface normal while ensuring smoothness
(by constructing the SDF as piecewise polynomials). In this example, the commands to approach the surface and move along the surface
at a desired distance then corresponds to piecewise constant commands (see control command profiles in the graph inset). Right: This
coordinate system can be used to define a cost function with a quadratic error defined by a covariance matrix. This matrix can be shaped by
the distance, to provide a Riemannian metric that smoothly distorts distances when being close to objects, users and obstacles (represented
as covariance ellipses in the figure). This can be used in control and planning to define geodesic paths based on this metric. This will
generate paths that naturally curve around obstacles when the obstacles are close (see the two point-to-point trajectories in green, either
far from the obstacle or close to the obstacle). The figure inset shows that the metric can also be interpreted as a surface lying in a higher
dimensional space (here, in 3D), where the geodesics then correspond to the shortest Euclidean path on this surface. It is relevant to note
here that we don’t need to know about this corresponding embedding, which is for some problems hard to construct), and that we instead
only need the metric to compute geodesics on the Riemannian manifold.

References
[1] D. S. Broomhead and D. Lowe. Multivariable functional interpolation and adaptive networks. Complex Systems,
2(3):321–355, 1988.

[2] S. Calinon. Mixture models for the analysis, edition, and synthesis of continuous time series. In N. Bouguila and
W. Fan, editors, Mixture Models and Applications, pages 39–57. Springer, 2019.
[3] S. Calinon. Gaussians on Riemannian manifolds: Applications for robot learning and adaptive control. IEEE
Robotics and Automation Magazine (RAM), 27(2):33–45, June 2020.
[4] A. Gropp, L. Yariv, N. Haim, M. Atzmon, and Y. Lipman. Implicit geometric regularization for learning shapes.
In Proc. Intl Conf. on Machine Learning (ICML), pages 3789–3799, 2020.
[5] A. Ijspeert, J. Nakanishi, P. Pastor, H. Hoffmann, and S. Schaal. Dynamical movement primitives: Learning
attractor models for motor behaviors. Neural Computation, 25(2):328–373, 2013.
[6] S. Ivić, B. Crnković, and I. Mezić. Ergodicity-based cooperative multiagent area coverage via a potential field.
IEEE Trans. on Cybernetics, 47(8):1983–1993, 2017.
[7] N. Jaquier, L. Rozo, D. G. Caldwell, and S. Calinon. Geometry-aware manipulability learning, tracking and
transfer. International Journal of Robotics Research (IJRR), 40(2–3):624–650, 2021.
[8] W. Li and E. Todorov. Iterative linear quadratic regulator design for nonlinear biological movement systems. In
Proc. Intl Conf. on Informatics in Control, Automation and Robotics (ICINCO), pages 222–229, 2004.

[9] F. Marić, L. Petrović, M. Guberina, J. Kelly, and I. Petrović. A Riemannian metric for geometry-aware singularity
avoidance by articulated robots. Robotics and Autonomous Systems, 145:103865, 2021.
[10] G. Mathew and I. Mezic. Spectral multiscale coverage: A uniform coverage algorithm for mobile sensor networks.
In Proc. IEEE Conf. on Decision and Control, pages 7872–7877, Dec 2009.

[11] D. Mayne. A second-order gradient method for determining optimal trajectories of non-linear discrete-time
systems. International Journal of Control, 3(1):85–95, 1966.

52
[12] F. A. Mussa-Ivaldi, S. F. Giszter, and E. Bizzi. Linear combinations of primitives in vertebrate motor control.
Proc. National Academy of Sciences, 91:7534–7538, 1994.
[13] J. Nocedal and S. Wright. Numerical optimization. Springer, New York, NY, 2006.
[14] D. E. Orin and A. Goswami. Centroidal momentum matrix of a humanoid robot: Structure and properties. In
Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS), pages 653–659, 2008.
[15] A. Paraschos, C. Daniel, J. Peters, and G. Neumann. Probabilistic movement primitives. In C. J. C. Burges,
L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing
Systems (NeurIPS), pages 2616–2624. Curran Associates, Inc., USA, 2013.
[16] F. C. Park and J. W. Kim. Manipulability of closed kinematic chains. Journal of Mechanical Design, 120(4):542–
548, 1998.
[17] H. H. Rosenbrock. Differential dynamic programming. The Mathematical Gazette, 56(395):78–78, 1972.
[18] S. Shetty, J. Silvério, and S. Calinon. Ergodic exploration using tensor train: Applications in insertion tasks.
IEEE Trans. on Robotics, 38(2):906–921, 2022.

[19] T. Yoshikawa. Dynamic manipulability of robot manipulators. Robotic Systems, 2:113–124, 1985.

53
Appendices
A System dynamics at trajectory level
A linear system is described by
xt+1 = At xt + Bt ut , ∀t ∈ {1, . . . , T }
with states xt ∈ RD and control commands ut ∈ Rd .
With the above linearization, we can express all states xt as an explicit function of the initial state x1 . By writing

x2 = A1 x1 + B1 u1 ,
x3 = A2 x2 + B2 u2 = A2 (A1 x1 + B1 u1 ) + B2 u2 ,
..
. ! ! !
TY−1 −2
TY TY −3
xT = AT −t x1 + AT −t B1 u1 + AT −t B2 u2 + · · · + BT −1 uT −1 ,
t=1 t=1 t=1

in a matrix form, we get an expression of the form x = Sx x1 + Su u, with


     
x1 I 0 0 ··· 0  
 x2   A1   B1 0 ··· 0 u1
       u2 
 x 3   A2 A 1   A2 B 1 B2 ··· 0
 =  x1 +   . 
 .   ..   .. .. ..  . 
.. 
, (187)
 ..     . . . 
. .
.   Q  
QT −1 QT −2 T −3 uT−1
xT t=1 A T −t t=1 A T −t B1 t=1 AT −t B2 ··· BT −1 | {z }
| {z } | {z } | {z } u
x Sx Su

dT ×d dT ×d(T −1) d(T −1)


where Sx ∈ R , x1 ∈ R , S u ∈ R
d
and u ∈ R .

Transfer matrices for single integrator


A single integrator is simply defined as xt+1 = xt+ ut ∆t, corresponding
 to At = I and Bt = I∆t, ∀t ∈ {1, . . . , T },
0D,D(T−1)
and transfer matrices Sx = 1T ⊗ ID , and Su = , where L is a lower triangular matrix and ⊗ is
LT−1,T−1 ⊗ ID ∆t
the Kronecker product operator.

B Derivation of motion equation for a planar robot


We derive each element in (193) individually for j ≥ z, then combine them altogether to derive the dynamic equation
of the system. In this regard, for the first element in (193), we can write

∂Tj  ∂ f˙ ∂ f˙j,2 ˙      
j,1 ˙
= mj fj,1 + fj,2 = mj − lz sz f˙j,1 + lz cz f˙j,2 ,
∂ q̇z ∂ q̇z ∂ q̇z
whose time derivative can be expressed as
d ∂Tj         
= mj − lz q̇z cz f˙j,1 − lz sz f¨j,1 − lz q̇z sz f˙j,2 + lz cz f¨j,2 ,
dt ∂ q̇z
where
X
j X
j X
j X
j
f¨j,1 = − lk q̈k sk − lk q̇k2 ck , f¨j,2 = lk q̈k ck − lk q̇k2 sk .
k=1 k=1 k=1 k=1

For the second term in (193), we can write


!
∂Tj  ∂ f˙ ∂ f˙j,2 ˙     
j,1 ˙
= mj fj,1 + fj,2 = mj − lz q̇z cz f˙j,1 − lz q̇z sz f˙j,2 ,
∂qz ∂qz ∂qz

and finally, the potential energy term can be calculated as

∂Uj
= mj glz cz .
∂qz

54
The z-th generalized force can be found by substituting the derived terms to (193) as
N 
X               
uz = mj − lz q̇z cz ṙj,1 − lz sk r̈j,1 − lz q̇z sz ṙj,2 + lz cz r̈j,2 − mj − lz q̇z cz ṙj,1 − lz q̇z sz ṙj,2 + mj glz cz
j=z
N 
X      
= mj − lz sz r̈j,1 + lz cz r̈j,2 + mj glz cz
j=z
N 
X   X
j X
j    X
j X
j  
= mj − lz s z − lk q̈k sk − lk q̇k2 ck + lz cz lk q̈k ck − lk q̇k2 sk + mj glz cz
j=z k=1 k=1 k=1 k=1

X
N X
j X
j  X
N
= mj lz lk cz−k q̈k + lz lk sz−k q̇k2 + mj glz cz ,
j=z k=1 k=1 j=z

where
ch−k = ch ck − sh sk , sh−k = sh ck − ch sk .
By rearranging the order of elements, we can write

X
N X
j  X
N X
j  XN
mj lz lk cz−k q̈k = uz − mj lz lk sz−k q̇k2 − mj glz cz .
j=z k=1 j=z k=1 j=z

C Linear systems used in the bimanual tennis serve example


x represents the state trajectory of 3 agents: the left hand, the right hand and the ball. The nonzero elements
correspond to the targets that the three agents must reach. For Agent 3 (the ball), µ corresponds to a 2D position
target at time T (ball target), with a corresponding 2D precision matrix (identity matrix) in Q. For Agent 1 and
2 (the hands), µ corresponds to 2D position targets active from time 3T 4 to T (coming back to the initial pose and
maintaining this pose), with corresponding 2D precision
 I −I  matrices (identity matrices) in Q. The constraint that Agents
2 and 3 collide at time T2 is ensured by setting −I I in the entries of the precision matrix Q corresponding to the
2D positions of Agents 2 and 3 at T2 . With this formulation, the two positions are constrained to be the same, without
having to predetermine the position at which the two agents should meet.3
The evolution of the system is described by the linear relation ẋt = Act xt + Btc ut , gathering the behavior of the
3 agents (left hand, right hand and ball) as double integrators with motions affected by gravity. For t ≤ T4 (left hand
holding the ball), we have
      
ẋ1,t 0 I 0 0 0 0 0 0 0 x1,t 0 0
 ẍ1,t   0 0 I 0 0 0 0 0 0   ẋ1,t   I 0 
 ˙      
 f1,t   0 0 0 0 0 0 0 0 0   f1,t   0 0 
      
 ẋ2,t   0 0 0 0 I 0 0 0 0   x2,t   0 0   
      
 ẍ2,t  =  0 0 0 0 0 I 0 0 0   ẋ2,t  +  0 I  u1,t . (188)
      
 f˙2,t   0 0 0 0 0 0 0 0 0   f2,t   0 0  | u{z 2,t
}
      
 ẋ3,t   0 I 0 0 0 0 0 0 0   x3,t   0 0  ut
      
 ẍ3,t   0 0 0 0 0 0 0 0 I   ẋ3,t   0 0 
f˙3,t 0 0 0 0 0 0 0 0 0 f3,t 0 0
| {z } | {z } | {z } | {z }
ẋt Act xt Btc

At t = T2 (right hand hitting the ball), we have


      
ẋ1,t 0 I 0 0 0 0 0 0 0 x1,t 0 0
 ẍ1,t   0 0 I 0 0 0 0 0 0    
 ˙     ẋ1,t   I 0 
 f1,t   0 0 0 0 0 0 0 0 0  f1,t   0 0 
      
 ẋ2,t   0 0 0 0 I 0 0 0 0     
     x2,t   0 0  u1,t
 ẍ2,t  =  0 0 0 0 0 I 0 0 0  ẋ2,t  +  0 I  . (189)
      
 f˙2,t   0 0 0 0 0 0 0 0 0  f2,t   0 0  | u{z
2,t
}
      
 ẋ3,t   0 0 0 0 I 0 0 0 0    
     x3,t   0 0  ut
 ẍ3,t   0 0 0 0 0 0 0 0 I   ẋ3,t   0 0 
f˙3,t 0 0 0 0 0 0 0 0 0 f3,t 0 0
| {z } | {z } | {z } | {z }
ẋt Act xt Btc

3 Indeed,
 xi ⊤ h 1 −1
i
xi 
we can see that a cost c = xj −1 1 xj = (xi − xj )2 is minimized when xi = xj .

55
T
For 4 < t < T2 and t > T2 (free motion of the ball), we have
      
ẋ1,t 0 I 0 0 0 0 0 0 0 x1,t 0 0
 ẍ1,t   0 0 I 0 0 0 0 0 0    
 ˙     ẋ1,t   I 0 
 f1,t   0 0 0 0 0 0 0 0 0   f1,t   0 0 
   
   
 ẋ2,t   0 0 0 0 I 0 0 0 0     
     x2,t   0 0  u1,t
 ẍ2,t  =  0 0 0 0 0 I 0 0     
0   ẋ2,t  +  0 I  . (190)
  
 f˙2,t   0 0 0 0 0 0 0 0 0  f2,t   0 0  | u{z
2,t
}
      
 ẋ3,t   0 0 0 0 0 0 0 I 0    
     x3,t   0 0  ut
 ẍ3,t   0 0 0 0 0 0 0 0 I   ẋ3,t   0 0
f˙3,t 0 0 0 0 0 0 0 0 0 f3,t 0 0
| {z } | {z } | {z } | {z }
ẋt Act xt Btc

In
 the above,
 fi = mi g represent the effect of gravity on the three agents, with mass mi and acceleration vector
0
g = −9.81 due to gravity. The parameters {Act , Btc }Tt=1 , describing a continuous system, are first converted to their
discrete forms with
At = I + Act ∆t, Bt = Btc ∆t, ∀t ∈ {1, . . . , T }, (191)
which can then be used to define sparse transfer matrices Sx and Su describing the evolution of the system in a vector
form, starting from an initial state x1 , namely x = Sx x1 + Su u.

D Equivalence between LQT and LQR with augmented state space


The equivalence between the original LQT problem and the corresponding LQR problem with an augmented state
space can be shown by using the block matrices composition rule
h i h ih ih −1 i
M M I 0 S −1 0 I −M12 M22
M = M11 M12 ⇐⇒ M −1 = −M −1 M I 0 M −1
0 I
,
21 22 22 21 22
−1
where M11 ∈ R , M12 ∈ R
d×d d×D
, M21 ∈ R D×d
, M22 ∈ R D×D
, and S = M11 − M12 M22 M21 the Schur complement
of the matrix M .
In our case, we have
 −1    
Q−1 +µµ⊤ µ I 0 Q 0 I −µ
= ,
µ⊤ 1 −µ⊤ 1 0 1 0 1
and it is easy to see that
 ⊤  −1 −1  
⊤ x Q +µµ⊤ µ x
(x − µ) Q(x − µ) = − 1.
1 µ⊤ 1 1

E Forward dynamics (FD) for a planar robot manipulator


The dynamic equation of a planar robot with an arbitrary number of links can be derived using the Lagrangian
formulation, by representing the kinetic and potential energies of the robot as functions of generalized coordinates,
which are joint angles in this case. For simplicity, we will assume that all masses are located at the end of each link,
and we will neglect the terms that contain rotational energies (i.e, zero moments of inertia). To do this, we exploit
the formulation derived in Section 4 without the orientation part, so the position of the j-th mass can be written as
  Pj  X
k X
k
fj,1 l c
fj = = Pjk=1 k k , where ck = cos( xi ), sk = sin( xi ), (192)
fj,2 k=1 lk sk i=1 i=1

whose corresponding velocity can be calculated as


" P Pk #
j
− k=1 lk ( i=1 ẋi )sk
fj = P j
˙ Pk .
k=1 lk ( i=1 ẋi )ck

By knowing the velocity, the j-th point kinetic energy can be expressed as
!
1  1  Xj Xk  2  X
j Xk  2
˙ ˙
Tj = mj fj,1 + fj,2 = mj −
2 2
lk ẋi sk + lk ẋi ck ,
2 2 i=1 i=1
k=1 k=1

and its potential energy as


X
j 
Uj = mj g fj,2 = mj g lk s k .
k=1

56
Pk
The kinetic and potential energies are easier to express with absolute angles i=1 xi instead of relative angles
xi . This is because the energies of the system are coordinate invariant and are dependent on absolute values. For
simplicity, we define the absolute angles as
Xk
qk = xi ,
i=1

so the kinetic energy can be redefined as

1  X 2  X 2 
j j
Tj = mj − lk q̇k Sk + lk q̇k Ck .
2
k=1 k=1

The robot’s total kinetic and potential energies are the sum of their corresponding values in each point mass. So,
the Lagrangian of the whole system can be written as

X
N
Elag = Tj − U j ,
j=1

where N is the number of links. Using the Euler-Lagrange equation, we can write
PN PN N 
d ∂Elag ∂Elag d ∂( j=z Tj ) ∂ j=z (Tj − Uj ) X d ∂Tj ∂Tj ∂Uj 
uz = − = − = − + , (193)
dt ∂ q̇z ∂qz dt ∂ q̇z ∂qz j=z
dt ∂ q̇z ∂qz ∂qz

where uz is the generalized force related to qz . This force can be found from the corresponding generalized work wz ,
which can be described by (
(τz − τz+1 )q̇z z < N
wz = , (194)
τN q̇N z=N
where τz is the torque applied at z-th joint. The fact that we need subtraction in the first line of (194) is that when
τz+1 is applied on link lz+1 , its reaction is applied on link lz , except for the last joint where there is no reaction from
the proceeding links. Please note that if we had used relative angles in our formulation, we did not need to consider
this reaction as it will be cancelled by itself at link z + 1 (as q̇z+1 = q̇z + ẋz+1 ). That said, generalized forces can be
defined as (
∂wz (τz − τz+1 ) z < N
uz = = .
∂ q̇z τN z=N
By substituting the derived equations into (193), the dynamic equation of a general planar robot with an arbitrary
number of links can be expressed as (for more details, please refer to Appendix B)
(
XN X j  XN Xj  XN
ch−k = ch ck − sh sk ,
mj lz lk cz−k q̈k = uz − mj lz lk sz−k q̇k −
2
mj glz cz , where (195)
j=z k=1 j=z k=1 j=z
sh−k = sh ck − ch sk .

In (195), the coefficient of q̈k can be written as

X
N
lz lk cz−k mj 1(j − k),
j=z

where 1(·) is a function that returns 0 if it receives a negative input, and 1 otherwise. The coefficient of q̇k2 can be
found in a similar way as
X
N
−lz lk sz−k mj 1(j − k).
j=z

By concatenation of the results for all z’s, the dynamic equation of the whole system can be expressed as

M̃ q̈ + g̃ + C̃diag(q̇)q̇ = u, (196)

where
   
q1 u1
 
q2  
u2
    X
N X
N
 ..  
..
q= ,. u= ,
. M̃ (z, k) = lz lk Cz−k mj 1(j − k), C̃(z, k) = lz lk Sz−k mj 1(j − k),
   
qN −1  uN −1  j=z j=z

qN uN

57
 PN 
j=1 mj l1 g cos(q1 )
 PN 
 j=2 mj l2 g cos(q2 ) 
 
g̃ = 

..
.
.

PN 
 j=N −1 mj lN −1 g cos(qN −1 )
mN lN g cos(qN )
To use this equation with relative angles and torque commands, one can replace the absolute angles and general
forces by    
x1 τ1
 x2   τ2 
   
   
q = Lx, u = L−⊤ τ , where x =  ..  , τ =  ...  .
.
   
xN −1  τN −1 
xN τN
By substituting these variables into (196), we would have

M ẍ + g + Cdiag(Lẋ)Lẋ = τ , (197)

where
M = L⊤ M̃ L, C = L⊤ C̃, g = L⊤ g̃
Please note that elements in M̃ , C̃ and g̃ are defined with absolute angles, which can be replaced with the relative
angles by
qi = Li x,
where Li is the i-th row of L.

iLQR for a robot manipulator with dynamics equation


The nonlinear dynamic equation of a planar robot presented in the above can be expressed as
  " #
qt+1  qt + q̇t dt 
= . (198)
q̇t+1 q̇t + M̃ −1 u − g̃ − C̃diag(q̇t )q̇t dt

For simplicity we start with (196), and at the end, we will describe how the result can be converted to the other
choice of variables. The iLQR method requires (198) to be linearized around the current states and inputs.The
corresponding At and Bt can be found according to (80) as
   
I Idt 0
At = , B = .
A21 dt I − 2M̃ −1 C̃diag(q̇t )dt t
M̃ −1 dt
∂ q̈t
A21 = ∂q t
can be found by taking the derivation of (196) w.r.t. qt . The p-th column of A21 is the partial derivation
of q̈t w.r.t. p-th joint angle at time t qtp which can be formulated as

∂ q̈t ∂ M̃ −1    ∂ g̃ ∂ C̃ 
p = p u − g̃ − C̃diag(q̇t )q̇t − M̃ −1 p + p diag(q̇t )q̇t .
∂qt ∂qt ∂qt ∂qt
Having this done for all p at time t, we can write
∂ q̈t  ′    
= M̃ −1 u − g̃ − C̃diag(q̇t )q̇t − M̃ −1 g̃ ′ + C̃ ′ diag(q̇t )q̇t ,
∂qt
where ( P
N
′ − j=i mj li gsi if i = j
g̃ (i, j) = ,
0 otherwise
 PN
h i 
lh lk ch−k P j=h mj 1(j − k) if p = h ̸= k
′ ′
C̃ = ∂qt ∂qt · · · ∂qt , C̃ (h, k, p) = −lh lk ch−k j=h mj 1(j − k) if p = k ̸= h ,
∂ C̃ ∂ C̃ ∂ C̃ N
1 2 N 

0 otherwise
h −1
i   ′
∂ M̃ −1 M̃ −1
(M̃ −1 )′ = ∂ M̃ ∂qt1 ∂qt2
· · · ∂∂q N , M̃ −1 = −M̃ −1 M̃ ′ M̃ −1 ,
t
 PN

−lh lk sh−k mj 1(j − k) if p = h ̸= k
P j=k
M̃ ′ (h, k, p) = lh lk sh−k N m j 1(j − k) if p = k ̸= h .


j=h
0 otherwise

58
According to Section 8.1, we can concatenate the results for all time steps as
 
∆q1  
 ∆q̇1  ∆u1
   ∆u2 
 ..   
 .  = Suq  ..  , (199)
   . 
∆qT 
∆uT −1
∆q̇T

where T is the total time horizon. These variables can be converted to relative angles and joint torque commands by
   
∆q1   ∆x1    −⊤  
 ∆q̇1  L 0 . . . 0  ∆ẋ1  ∆u1 L 0 ... 0 ∆τ1
      ∆u2   0 L−⊤ . . . 0   
 ..   0 L . . . 0   ..       ∆τ2 
 .  =  .. .. . . ..   .  ,  ..  =  .. .. . ..   .. 
.   . ,
  . . . .    .   . . .. 
∆qT  ∆xT 
0 0 ... L ∆uT −1 0 0 . . . L−⊤ ∆τT −1
∆q̇T | {z } ∆ẋT | {z }
Lx Lu

so (199) can be rewritten as  


∆x1    
 ∆ẋ1  ∆τ1 ∆τ1
   ∆τ2   ∆τ2 
 ..     
 .  = Sτx  ..  = L−1 S q
L u  ..  .
   .  x u
 . 
∆xT 
∆τT −1 ∆τT −1
∆ẋT

59

You might also like