Optimal Control
Optimal Control
Thierry Miquel
Thierry Miquel
1
https://lewisgroup.uta.edu/history.htm
4
Bibliography
[1] Alazard D., Apkarian P. and Cumer C., Robustesse et commande
optimale, Cépaduès (2000)
[2] Anderson B. and Moore J., Optimal Control: Linear Quadratic
Methods, Prentice Hall (1990)
[3] Friedland B., Control System Design: An Introduction to State-
Space Methods, Dover Books on Electrical Engineering (2012)
[4] Hespanha J. P., Linear Systems Theory, Princeton University Press
(2009)
[9] Sinha A., Linear Systems: Optimal and Robust Control, CRC Press
(2007)
4 Design methods 91
4.1 Symmetric Root Locus . . . . . . . . . . . . . . . . . . . . . . . . 91
4.1.1 Characteristics polynomials . . . . . . . . . . . . . . . . . 91
4.1.2 Root Locus reminder . . . . . . . . . . . . . . . . . . . . . 92
Table of contents 9
Overview of Pontryagin's
Minimum Principle
1.1 Introduction
Pontryagin's Minimum (or Maximum) Principle was formulated in 1956 by the
Russian mathematician Lev Pontryagin (1908 - 1988) and his students1 . Its
initial application was dedicated to the maximization of the terminal speed
of a rocket. The result was derived using ideas from the classical calculus of
variations.
This chapter is devoted to the main results of optimal control theory which
leads to conditions for optimality.
1.2 Variation
Optimization can be accomplished by using a generalization of the dierential
called variation.
Let's consider the real scalar cost function J(x) of a vector x ∈ Rn . Cost
function J(x) has a local minimum at x∗ if and only if for all δx suciently
small;
J(x∗ + δx) ≥ J(x∗ ) (1.1)
An equivalent statement statement is that:
c The term ∆J(x∗ , δx) is called the increment of J(x). The optimality condition
can be found by expanding J(x∗ + δx) in a Taylor series around the extremun
point x∗ . When J(x) is a scalar function of multiple variables, the expansion
of J(x) in the Taylor series involves the gradient and the Hessian of the cost
function J(x):
∗
− Assuming that J(x) is a dierentiable function, the term dJ(x
dx
)
is the
gradient of J(x) at x ∈ R which is the vector of R dened by:
∗ n n
1
https://en.wikipedia.org/wiki/Pontryagin's_maximum_principle
12 Chapter 1. Overview of Pontryagin's Minimum Principle
dJ(x)
dx1
dJ(x∗ ) ..
= ∇J(x∗ ) = (1.3)
.
dx
dJ(x)
dxn x=x∗
2 ∗
− Assuming that J(x) is a twice dierentiable function, the term d dx
J(x )
2
dened by:
∂ 2 J(x) ∂ 2 J(x)
···
d2 J(x∗ ) ∂x1.∂x1 ∂x1 ∂xn
d2 J(x∗ )
∗
2
(1.4)
= ∇ J(x ) = . =
.
dx2 2
dxi dxj 1≤i,j≤n
∂ J(x) ∂ 2 J(x)
∂xn ∂x1 ··· ∂xn ∂xn x=x∗
Expanding J(x∗ + δx) in a Taylor series around the point x∗ leads to the
following expression, where HOT stands for Higher-Order Terms :
1
J(x∗ + δx) = J(x∗ ) + δxT ∇J(x∗ ) + δxT ∇2 J(x∗ )δx + HOT (1.5)
2
Thus:
∀ 1 ≤ k ≤ n Hk > 0
H 1 = h11 > 0
h h12
H2 = 11
>0
h21 h22
(1.9)
⇔ h11 h12 h13
H3 = h21 h22 h23 > 0
h31 h32 h33
and so on...
∀ 1 ≤ k ≤ n (−1)k Hk > 0
H1 = h11 < 0
h h12
H2 = 11
>0
h21 h22
(1.10)
⇔ h11 h12 h13
H = h21 h22 h23 < 0
3
h31 h32 h33
and so on...
− If the Hessian has both positive and negative eigenvalues then the critical
point x∗ is a saddle point for the cost function J(x).
1.3 Example
Find the local maxima/minima for the following cost function:
Now, we compute the Hessian to conclude on the nature of this critical point:
∗ −2 0
2
∇ J(x ) = (1.15)
0 −4
As far as the Hessian is negative denite we conclude that the critical point
x∗ is a local maximum.
As an illustration, consider the cost function J(x) = (x1 − 1)2 + (x2 − 2)2 :
this is the equation of a circle of center (1, 2) with radius J(x). It is clear that
J(x) is minimal when (x1 , x2 ) is situated on the center of the circle. In this
case J(x)∗ = 0. Nevertheless if we impose on (x1 , x2 ) to belong to the straight
line dened by x2 − 2x1 − 6 = 0 then J(x) will be minimized as soon as the
1.5. Example 15
circle of radius J(x) tangent the straight line, that is if the gradient of J(x)
is normal to the surface at x∗ . Parameter λ is called the Lagrange multiplier
and has the dimension of the number of constraints expressed through g(x).
The necessary condition for optimality can be obtained as the solution of the
following unconstrained optimization problem where L(x, λ) is the Lagrange
function:
L(x, λ) = J(x) + λT g(x) (1.18)
Setting to zero the gradient of the Lagrange function with respect to x
leads to (1.16) whereas setting to zero the derivative of the Lagrange function
with respect to the Lagrange multiplier λ leads to the constraint g(x) = 0.
As a consequence, a necessary condition for x∗ to be a local extremum of the
cost function J subject to the constraint g(x) = 0 is that the rst variation of
Lagrange function (its gradient) at x∗ is zero :
∂L(x, λ) ∂J(x) T ∂g(x)
=0⇔ +λ =0 (1.19)
∂x x=x∗ ∂x ∂x x=x∗
− However if some values of p are zero or of a dierent sign, then the critical
point x∗ is a saddle point.
1.5 Example
Find the local maxima/minima for the following cost function:
∂L(x∗ )
1 + 2λx1
= = 0 s.t. x21 + x22 − 10 = 0 (1.24)
∂x 3 + 2λx2
− For λ = 1
2 the bordered Hessian is:
2λ − p 0 2x1 1−p 0 −2
Hb (p) = 0 2λ − p 2x2 = 0 1 − p −6 (1.27)
2x1 2x2 0 x=x∗ −2 −6 0
Thus:
det (Hb (p)) = −40 + 40p (1.28)
We conclude that the critical point (−1; −3) is a local minima because
det (Hb (p)) = 0 for p = +1 which is strictly positive.
Thus:
det (Hb (p)) = 40 + 40p (1.30)
We conclude that the critical point (+1; +3) is a local maxima because
det (Hb (p)) = 0 for p = −1 which is strictly negative.
1.6. Euler-Lagrange equation 17
∂F T ∂F T
Z tf Z tf
δJ = δF (x(t), ẋ(t)) dt = δx + δ ẋ dt (1.34)
0 0 ∂x ∂ ẋ
∂F T
Integrating ∂ ẋ δ ẋ by parts leads to the following expression:
d ∂F T d ∂F T ∂F T
dt ∂ ẋ δx = dt ∂ ẋ δx + ∂ ẋ δ ẋ
tf (1.35)
∂F T d ∂F T ∂F T
R tf
⇒ δJ = 0 ∂x δx − dt ∂ ẋ δx dt + ∂ ẋ δx 0
δJ = 0 ∀ δx (1.36)
2
https://en.wikipedia.org/wiki/Euler-Lagrange_equation
18 Chapter 1. Overview of Pontryagin's Minimum Principle
As far as the initial and nal values of x(t) are imposed no variation are
permitted on δx:
x(0) = x0 δx(0) = 0
⇒ (1.38)
x(tf ) = xf δx(tf ) = 0
On the other hand it is worth noticing that if the nal value was not imposed
we shall have ∂F
∂ ẋ = 0.
t=tf
Thus the rst variation δJ of the functional cost reads:
∂F T d ∂F T
Z tf
δJ = δx − δx dt (1.39)
0 ∂x dt ∂ ẋ
In order to set to zero the rst variation δJ whatever the value of the
variation δx the following second-order partial dierential equation has to be
solved:
∂F T d ∂F T
− =0 (1.40)
∂x dt ∂ ẋ
Or by taking the transpose:
d ∂F ∂F
− =0 (1.41)
dt ∂ ẋ ∂x
We retrieve the well known Euler-Lagrange equation of classical mechanics.
Euler-Lagrange equation is a second orderR t Ordinary Dierential Equations
(ODE) that x shall satisfy to minimize 0 f F (x(t), ẋ(t)) dt. Euler-Lagrange
equation is usually quite dicult to solve.
Nevertheless, because F (x(t), ẋ(t)) does not depends explicitly on time t,
Beltrami identity3 provides a rst integral of the Euler-Lagrange equation.
Denoting by C a constant, the rst integral of the Euler-Lagrange equation
reads as follows:
d ∂F ∂F ∂F T
− =0⇔F − ẋ = C (1.42)
dt ∂ ẋ ∂x ∂ ẋ
Indeed, multiplying both sides of the Euler-Lagrange equation by ẋT we get:
d ∂F ∂F d ∂F ∂F
− = 0 ⇒ ẋ T
− ẋT =0 (1.43)
dt ∂ ẋ ∂x dt ∂ ẋ ∂x
Since F (x(t), ẋ(t)) does not depend explicitly on time t, we have:
dF (x(t),ẋ(t)) ∂F T ∂x ∂F T ∂ ẋ
dt = ∂x ∂t + ∂ ẋ ∂t
∂F T ∂ ẋ (1.44)
= ẋT ∂F
∂x + ∂ ẋ ∂t
∂F T ∂ ẋ
⇒ ẋT ∂F
∂x = dF
dt − ∂ ẋ ∂t
∂F T
F− ẋ = C (1.46)
∂ ẋ
Alternatively, Euler-Lagrange equation could be transformed into a set of
rst order Ordinary Dierential Equations, which may be more convenient to
manipulate, by introducing a control u(t) dened by ẋ(t) = u(t) and by using
the Hamiltonian function H as it will be seen in the next sections.
Example 1.1. Let's nd the shortest distance between two points P1 = (x1 , y1 )
and P2 = (x2 , y2 ) in the euclidean plane.
The length of the path between the two points is dened by:
Z P2 p Z x2 q
J(y(x)) = 2 2
dx + dy = 1 + (y ′ (x))2 dx (1.47)
P1 x1
Thus, the shortest distance between two xed points in the euclidean plane
is a curve with constant slope, that is a straight-line:
y(x) = a x + b (1.53)
20 Chapter 1. Overview of Pontryagin's Minimum Principle
With initial and nal values imposed on y(x) we nally get for y(x) the
Lagrange polynomial of degree 1:
y(x1 ) = y1 x − x2 x − x1
⇒ y(x) = y1 + y2 (1.54)
y(x2 ) = y2 x1 − x2 x2 − x1
■
Where:
Note that the state equation serves as constraints for the optimization of the
performance index J(u(t)). In addition, notice that the use of function G (x(tf ))
is optional; indeed, if the nal state x(tf ) is imposed then there is no need to
insert the expression G (x(tf )) in the cost to be minimized.
Let u∗ (t) be a candidate for the optimal input vector and let the
corresponding state vector be x∗ (t):
Assuming that the nal time tf is known, the change δJa in the value of the
augmented performance index is obtained thanks to the calculus of variation 4 :
T
∂G(x(tf ))
δJa = ∂x(tf) δx(tf )+
R tf ∂F T
∂F T T ∂f ∂f dδx
0 ∂x δx + ∂u δu + λ (t) ∂x δx + ∂u δu − dt dt
T
∂G(x(tf ))
= ∂x(tf
) δx(tf )+
T
∂F T
R tf T ∂f ∂F T ∂f T dδx
0 ∂x + λ (t) ∂x δx + ∂u + λ (t) ∂u δu − λ (t) dt dt
(1.62)
In the preceding equation:
∂G(x(tf )) ∂F T ∂F T
− ∂x(tf ) , ∂u and ∂x are row vectors;
4
Ferguson J., Brief Survey of the History of the Calculus of Variations and its Applications
(2004) arXiv:math/0402357
22 Chapter 1. Overview of Pontryagin's Minimum Principle
∂f ∂f
− ∂x and ∂u are matrices;
∂f ∂f dδx
− ∂x δx, ∂u δu and dt are column vectors.
Then:
∂H T ∂F T
(
= + λT (t) ∂f
∂x
∂H T
∂x
∂F T
∂x
(1.64)
∂u = ∂u + λT (t) ∂f
∂u
∂G (x(tf )) T tf
∂H T ∂H T
Z
dδx
δJa = δx(tf ) + δx + δu − λT (t) dt (1.65)
∂x(tf ) 0 ∂x ∂u dt
Let's concentrate on the last term within the integral that we integrate by
parts:
R tf tf R tf T
λT (t) dδx T
dt dt = λ (t)δx 0 − 0 λ̇ (t)δx dt
Rt 0
R tf T (1.66)
⇔ 0 f λT (t) dδx T T
dt dt = λ (tf )δx(tf ) − λ (0)δx(0) − 0 λ̇ (t)δx dt
As far as the initial state is imposed, the variation of the initial condition is
null; consequently we have δx(0) = 0 and:
Z tf Z tf
dδx T
T
λ (t) dt = λT (tf )δx(tf ) − λ̇ (t)δx dt (1.67)
0 dt 0
Using (1.67) within (1.65) leads to the following expression for the rst
variation of the augmented functional cost:
!
∂G (x(tf )) T
δJa = − λT (tf ) δx(tf )+
∂x(tf )
(1.68)
∂H T ∂H T
Z tf
T
δu + + λ̇ (t) δx dt
0 ∂u ∂x
In order to set the rst variation of the augmented functional cost δJa to
zero the time dependent Lagrange multipliers λ(t), which are also called costate
functions, are chosen as follows:
T ∂H T ∂H
λ̇ (t) + = 0 ⇔ λ̇(t) = − (1.69)
∂x ∂x
− Assuming that nal value x(tf ) is not specied then the variation δx(tf )
in (1.68) is not equal to zero and the value of λ(tf ) is set by imposing that
the following dierence vanishes at nal time tf :
Hence in both situations the rst variation of the augmented functional cost
(1.68) can be written as:
tf
∂H T
Z
δJa = δu dt (1.71)
0 ∂u
Moreover if there is no constraint on input u(t), then δu is free and the rst
variation of the augmented functional cost δJa in (1.71) is set to zero through
the following necessary condition for optimality:
∂H T ∂H
δJa = 0 ⇒ =0⇔ =0 (1.72)
∂u ∂u
The kinetic energy T (q, q̇) and potential energy V (q) read as follows (remember
that the vertical position is oriented downward):
When taking the time derivative of the square of the velocity v(t) and using
relations (1.76) we get the following expression of the time derivative of velocity
v(t):
v(t)2 = ẏ(t)2 + ż(t)2
⇒ v(t)v̇(t) = ẏ(t)ÿ(t) + ż(t)z̈(t) (1.78)
v(t) sin(γ(t))g
⇒ v̇(t) = v(t)
In order to reduce the size of the system, it is worth noticing that ż(t) and
v̇(t) depend on sin(γ(t)). So we can write:
ż(t)
v̇(t) = g ⇔ v(t)v̇(t) = g ż(t) (1.80)
v(t)
The solution of this dierential equation is the cycloid curve. The parametric
expression of the cycloid curve is the following where parameter θ varies from 0
to θf :
y = R(C) (θ − sin(θ)) 1
where R(C) := (1.96)
zr = R(C) (1 − cos(θ)) 4g C 2
The cycloid curve corresponds to the trajectory of a point on a circle of
radius R(C) rolling along a straight line.
The values of C and θf shall then be chosen such that the nal conditions
on y and zr are fullled:
y(tf ) = R(C) (θf − sin(θf ))
(1.97)
zr (tf ) = R(C) (1 − cos(θf ))
nd u(y)
Rt Ry q 2
which minimizes J (u) = 0 f dt = 0 f 2g 1+u
z(y)+l0 dy (1.98)
under the following constraint :
z′ = u
z ′ = u(z, λz )
(1.101)
λ′z = g 1 + u(z, λz )2 (2g z + l0 )−3/2
p
This could be a tricky task be let's try it ! First from the rst equation of
(1.100) we get the expression of 1 + u2 as a function of z and λz :
u p 1
√ = −λz 2g z + l0 ⇒ 1 + u2 = 2 (2g z + l )
(1.102)
1+u2 1 − λz 0
Using this expression in the second equation of (1.100), we get the following
expression of λ′z :
√
λ′z = g q1 + u2 (2g z + l0 )−3/2
(1.103)
1
= g 1−λ2 (2g z+l0 )
(2g z + l0 )−3/2
z
2 1 λ2z (2g z + l0 )
z′ = u ⇒ z′ = u2 = − 1 = (1.104)
1 − λ2z (2g z + l0 ) 1 − λ2z (2g z + l0 )
s s
(2g z + l0 ) 1
z ′ = −λz
p
= −λ z 2g z + l0 (1.105)
1 − λ2z (2g z + l0 ) 1 − λ2z (2g z + l0 )
q
Then we get the expression of 1−λ2 (2g 1
z+l0 )
that we insert in (1.103). We
z
get: q
1 z′
1−λ2 (2g z+l0 )
= − λ √2g z+l
z z 0
′
⇒ λ′z = −g λ √z (2g z + l0 )−3/2 (1.106)
z 2g z+l0
z′
= − λgz (2g z+l 2
0)
We nally get:
g z′
λ′z λz = − (1.107)
(2g z + l0 )2
Thus the rst integral of this dierential equation is the following where C1
denotes a constant:
1
λ2z = + C1 (1.108)
2g z + l0
Or, equivalently:
Having in mind that z ′ = u, we retrieve the rst integral (1.93) which has
been obtained through Baltrami identity. Then the resolution process is similar
to what has been done in the previous section.
or, equivalently:
∂J ∗ (x, t) ∂J ∗ (x, t)
− = H∗ , x(t) (1.113)
∂t ∂x
where
T !
∂J ∗ (x, t)
∗
H (λ(t), x(t)) = minu(t) ∈ U F (x, u) + f (x, u) (1.114)
∂x
For the time-dependent case, the terminal condition on the optimal cost-to-
go function solution of (1.112) reads:
It is worth noticing that the Lagrange multiplier λ(t) represents the partial
derivative with respect to the state of the optimal cost-to-go function7 :
∂J ∗ (x, t)
λ(t) = (1.116)
∂x
6
da Silva J., de Sousa J., Dynamic Programming Techniques for Feedback Control,
Proceedings of the 18th World Congress, Milano (Italy) August 28 - September 2, 2011
7
Alazard D., Optimal Control & Guidance: From Dynamic Programming to Pontryagin's
Minimum Principle, lecture notes
1.9. Hamilton-Jacobi-Bellman (HJB) equation 29
∂J(x) T
Z ∞ Z ∞
J(x(0)) − J(0) = − f (x, u)dt ≤ F (x, u)dt (1.122)
0 ∂x 0
8
Bellman R., Dynamic programming, Princeton University Press, 1957
30 Chapter 1. Overview of Pontryagin's Minimum Principle
Moreover, the optimal cost J ∗ (x) has a decay rate given by −F (x, u∗ ), which
is negative. Thus J ∗ (x) may serve as a Lyapunov function to prove that the
optimal control law is stabilizing9 .
where:
Q = QT ≥ 0
(1.124)
R = RT > 0
Assuming that the nal state at t = tf is set to zero, a candidate solution
J ∗ (x, t) of the Hamilton-Jacobi-Bellman (HJB) partial dierential equation is
the following quadratic function:
Thus:
∂J ∗ (x,t)
∂t = xT Ṗ(t)x
(1.126)
∂J ∗ (x,t)
∂x = 2P(t)x(t)
Finally, assuming unconstrained control, that is u(t) ∈ Rm , the Hamilton-
Jacobi-Bellman (HJB) equation (1.112) reads as follows:
∗ T
T ∂J (x)
−x Ṗ(t)x = minu(t) ∈ Rm F (x, u) + ∂x f (x, u)
(1.127)
= minu(t) ∈ Rm xT (t)Qx(t) + uT (t)Ru(t)
+ 2xT (t)P(t) (Ax(t) + Bu(t))
Because this equation must be true ∀ x, we conclude that P(t) = PT (t) shall
solve the following dierential Riccati dierential equation:
−Ṗ(t) = AT P(t) + P(t)A − P(t)BR−1 BT P(t) + Q (1.133)
For time invariant systems with innite horizon (tf → ∞), the optimal cost-
to-go function J ∗ (x, t) is independent of time t: J ∗ (x, t) = J ∗ (x). Thus matrix
P(t) becomes a constant matrix:
tf → ∞ ⇒ J ∗ (x, t) = J ∗ (x) ⇒ P(t) = P (1.134)
Then the Pontryagin's principle states that the optimal control u∗ must
satisfy the following conditions:
10
Hull D. G., Optimal Control Theory for Applications, Springer (2003)
32 Chapter 1. Overview of Pontryagin's Minimum Principle
∂Ha
∂Ha ∂H ∂c(x, u) ∂u
0= = +µ ⇒µ=− ∂c(x,u)
(1.143)
∂u ∂u ∂u
∂u u=ub (x)
or, equivalently:
u∗ = minu(t) ∈ U H(x∗ , u, λ∗ ) (1.146)
where U denotes the set of admissible values for the control u (here u(t) ∈ U as
soon as c(x∗ , u) ≤ 0). The last relation is the so-called Pontryagin's principle.
1.11. Hamiltonian over time 33
Moreover, if the terminal time tf is free, then along the optimal trajectory
we have:
H(x∗ , u∗ , λ∗ ) = 0 when tf is free (1.148)
dH ∂H T dx ∂H T du ∂H T dλ
= + + (1.149)
dt ∂x dt ∂u dt ∂λ dt
T dx dλ dH ∂H T du
λ̇ (t) = ẋT (t) ⇒ = (1.151)
dt dt dt ∂u dt
Let λ(x) be the Lagrange multiplier, which is here a scalar. The Hamiltonian
H reads: p
H= 1 + u2 (x) + λ(x)u(x) (1.155)
The necessary conditions for optimality are the following:
∂H
= −λ′ (x) ⇔ λ′ (x) = 0
(
∂y
(1.156)
∂H
∂u = 0 ⇔ √ u(x)2 + λ(x) = 0
1+u (x)
Using this relation in the second equation of (1.156) leads to the following
expression of u(x) where constant a is introduced:
q
√ u(x)2 + c = 0 ⇒ u2 (x) = c2
1−c2
⇒ u(x) = c2
1−c2
:= a = constant
1+u (x)
(1.158)
Thus, the shortest distance between two xed points in the euclidean plane
is a curve with constant slope, that is a straight-line:
y(x) = a x + b (1.159)
In this case the rst variation of the augmented performance index with
respect to δtf is zero as soon as:
∂G (x(tf ))T
F (tf ) + f (tf ) = 0 (1.162)
∂x
1.12. Bang-bang control 35
∂G (x(tf ))
λ(tf ) = ⇒ F (tf ) + λ(tf )f (tf ) = 0 (1.163)
∂x(tf )
dt = (tf − t0 ) ds (1.165)
Then the optimal control problem with respect to time t where the nal
time tf is free is changed into an optimal control problem with respect to new
variable s and an additional state tf (s) which is constant with respect to s. The
optimal control problem reads:
Minimize:
Z 1
J(u(s)) = G (x(1)) + (tf (s) − t0 ) F (x(s), u(s)) ds (1.166)
0
∂H
umax if σ(t) = <0
∂u
∂H
umin ≤ u(t) ≤ umax ⇒ u(t) = umin if σ(t) = >0
∂u
∈ [umin , umax ] if σ(t) = ∂H = 0
∂u
(1.170)
b
u = −α where α ≥ 0 (1.173)
∥b∥
umax if σ < 0
12
Bertrand R., Epenoy R., New smoothing techniques for solving bang-bang optimal control
problems - Numerical results and statistical interpretation, Optimal Control Applications and
Methods 23(4):171 - 197, July 2002, DOI:10.1002/oca.709
1.12. Bang-bang control 37
Last but not least, assume that the performance index to be minimized reads
as follows where λ0 > 0:
λ0 tf
Z
J(u(t)) = ∥u(t)∥ dt (1.175)
2 0
λ0 tf
Z
Jϵ (u(t)) = (∥u(t)∥ − ϵ h (∥u(t)∥)) dt (1.176)
2 0
Parameter ϵ is assumed to be in the interval ]0, 1] and function h is a
continuous function satisfying h (w) ≥ 0 ∀ w ∈ [0, 1]. For example, one could
choose h(w) = w − w2 ; with this choice ∥u(t)∥ − ϵ h (∥u(t)∥) = ∥u(t)∥2 for
ϵ = 1 and ∥u(t)∥ − ϵ h (∥u(t)∥) = ∥u(t)∥ for ϵ = 0.
If h(w) → ∞ as w approaches 1 or 0, then h is called a barrier function,
otherwise it is a penalty function.
The homotopic (or continuation) approach12 consists in solving the
perturbed problem with ϵ = 1. Then, after dening a decreasing sequence of ϵ
values (ϵ1 = 1 > ϵ2 > · · · > ϵn > 0), the current optimal control problem
associated with ϵ = ϵk where k = 2, · · · , n is solved with the solution of the
previous one as a starting point.
1.12.2 Example 1
Consider a simple mass m which moves on the x-axis and is subject to a force
f (t)13 . Equation of motion reads:
We will assume that the initial position of the mass is zero and that the
movement starts from rest:
y(0) = 0
(1.181)
ẏ(0) = 0
13
Linear Systems: Optimal and Robust Control 1st Edition, by Alok Sinha, CRC Press
38 Chapter 1. Overview of Pontryagin's Minimum Principle
First we are looking for the optimal control u(t) which enables the mass to
cover the maximum distance in a xed time tf :
The objective of the problem is to maximize y(tf ). This corresponds to
minimize the opposite of y(tf ); consequently the cost J(u(t)) reads as follows
where F (x, u) = 0 when compared to (1.56):
Solutions of adjoint equations are the following where c and d are constants:
λ1 (t) = c
(1.186)
λ2 (t) = −ct + d
Consequently:
c = −1 λ1 (t) = −1
⇒ (1.188)
d = −tf λ2 (t) = t − tf
Thus the Hamiltonian H reads as follows:
nal time. The optimal state trajectory can be easily obtained by solving the
state equations with given initial conditions:
x1 (t) = 21 umax t2
ẋ1 = x2
⇒ (1.191)
ẋ2 = umax x2 (t) = umax t
The Hamiltonian along the optimal trajectory has the following value:
1
J(u(t)) = −x1 (tf ) = − umax t2f (1.193)
2
Solutions of adjoint equations are the following where c and d are constants:
λ1 (t) = c
(1.197)
λ2 (t) = (1 − c) t + d
Consequently:
c=0 λ1 (t) = 0
⇒ (1.199)
d = −tf λ2 (t) = t − tf
Obviously, we retrieve the same expressions for λ1 (t) and λ2 (t) than those
obtained previously, and we nally get the same bang-bang optimal control.
40 Chapter 1. Overview of Pontryagin's Minimum Principle
1.12.3 Example 2
We re-use the preceding example but now we are looking for the optimal control
u(t) which enables the mass to cover the maximum distance in a xed time tf
with the additional constraint that the nal velocity is equal to zero:
x2 (tf ) = 0 (1.200)
The solution of this problem starts as in the previous case and leads to the
solution of adjoint equations where c and d are constants:
λ1 (t) = c
(1.201)
λ2 (t) = −ct + d
The dierence when compared with the previous case is that now the nal
velocity is equal to zero, that is x2 (tf ) = 0. Consequently transversality
condition (1.70) involves only state x1 and reads as follows:
Thus ∂H
∂u = t + d = λ2 (t) ∀ 0 ≤ t ≤ tf where the value of constant d is not
known: it can be either d < −tf , d ∈ [−tf , 0] or d > 0. Figure 1.1 plots the
three possibilities.
Hence d shall be chosen between −tf and 0. According to (1.170) and Figure
1.1 we have:
umax ∀ 0 ≤ t ≤ ts
u(t) = (1.205)
umin ∀ ts < t ≤ tf
From Figure 1.1 it is clear that at t = ts we have λ2 (ts ) = 0. Using the fact
that λ2 (t) = t + d we nally get the value of constant d:
λ2 (t) = t + d
⇒ d = −ts (1.208)
λ2 (ts ) = 0
Furthermore the Hamiltonian along the optimal trajectory has the following
value:
∀ 0 ≤ t ≤ ts H(x, u, λ)) = λ1 (t)x2 (t) + λ2 (t)u(t)
= −umax t + (t + d)umax = −ts umax
∀ ts < t ≤ tf H(x, u, λ)) = −umax ts − umin (t − ts ) + (t − ts )umax
= −ts umax
(1.209)
As expected the Hamiltonian along the optimal trajectory is constant.
42 Chapter 1. Overview of Pontryagin's Minimum Principle
dk
σ(t) = 0 ∀ t1 ≤ t ≤ t2 , ∀k∈N (1.211)
dtk
At some derivative order the control u(t) does appear explicitly and its
value is thereby determined. Furthermore it can be shown that the control u(t)
appears at an even derivative order. So the derivative order at which the control
u(t) does appear explicitly will be denoted 2q . Thus:
d2q σ(t)
k := 2q ⇒ := A(t, x, λ) + B(t, x, λ)u = 0 (1.212)
dt2q
The previous equation gives an explicit equation for the singular control,
once the Lagrange multiplier λ have been obtained through the relation λ̇(t) =
∂x .
− ∂H
The singular arc will be optimal if it satises the following generalized
Legendre-Clebsch condition, which is also known as the Kelley condition14 ,
where 2q is the (always even) value of k at which the control u(t) explicitly
dk
appears in dt k σ(t) for the rst time:
∂ d2q σ(t)
(−1) q
≥0 (1.213)
∂u dt2q
Note that for the regular arc the second order necessary condition for
optimality to achieve a minimum cost is the positive semi-deniteness of the
Hessian matrix of the Hamiltonian along an optimal trajectory. This condition
is obtained by setting q = 0 in the generalized Legendre-Clebsch condition
(1.213):
∂ ∂2H
q=0⇒ σ(t) = = Huu ≥ 0 (1.214)
∂u ∂u2
This inequality is also termed regular Legendre-Clebsch condition.
14
Douglas M. Pargett & Douglas Mark D. Ardema, Flight Path Optimization at Constant
Altitude, Journal of Guidance Control and Dynamics, July 2007, 30(4):1197-1201, DOI:
10.2514/1.28954
Chapter 2
Where:
Then we have to nd the control u(t) which minimizes the following quadratic
performance index:
1 T
J(u(t)) = x(tf ) − xf S x(tf ) − xf
2
1 tf T
Z
+ x (t)Qx(t) + uT (t)Ru(t) dt (2.2)
2 0
where the nal time tf is set and xf is the nal state to be reached. The
performance index relates to the fact that a trade-o has been done between the
rate of variation of x(t) and the magnitude of the control input u(t). Matrices
44 Chapter 2. Finite Horizon Linear Quadratic Regulator
Q = QT ≥ 0 (2.3)
R = RT > 0
Notice that the use of matrix S is optional; indeed, if the nal state xf is
imposed then there is no need the insert the expression
T
1
S x(tf ) − xf in the cost to be minimized.
2 x(tf ) − xf
Matrix M0.5 is called the root square of matrix M. By getting the modal
decomposition of matrix M, that is M = VDV−1 where V is the matrix
whose columns are the eigenvectors of M and D is the diagonal matrix
whose diagonal elements are the corresponding positive eigenvalues, the
square root M0.5 of M is given by M0.5 = VD0.5 V−1 , where D0.5 is
any diagonal matrix whose elements are the square root of the diagonal
elements of D1 .
Similarly a semi-denite positive matrix M is denoted M ≥ 0. We remind
that a n × n real symmetric matrix M = MT is called positive semi-denite if
and only if we have either:
− xT Mx ≥ 0 for all x ̸= 0;
− All eigenvalues of M are non-negative;
− All of the principal (not only leading) minors are non-negative (the
principal minor of order k is the minor of order k obtained by deleting
n − k rows and the n − k columns with the same position than the rows.
For instance, in a principal minor where you have deleted rows 1 and 3,
you should also delete columns 1 and 3);
1
https://en.wikipedia.org/wiki/Square_root_of_a_matrix
2.3. Hamiltonian matrix 45
T
− Matrix M can be written as M0.5 M0.5 where matrix M0.5 is full row
rank.
Furthermore a real symmetric matrix M is called negative (semi-)denite if
−M is positive (semi-)denite.
1 2
Example 2.1. Check that M1 = M1 = T is not positive denite and that
2 3
1 −2
M2 = MT2 = is positive denite. ■
−2 5
A −BR−1 BT x(t)
ẋ(t) x(t)
= =H (2.12)
λ̇(t) −Q −AT λ(t) λ(t)
46 Chapter 2. Finite Horizon Linear Quadratic Regulator
A −BR−1 BT
H= (2.13)
−Q −AT
(JH)T = JH (2.14)
In the previous equation, the value of λ(0) is not known. On the other
hand, x(tf ) or λ(tf ) is known, depending on whether the nal state is imposed
or weighted. Thus by replacing t by t − tf in the previous equation we obtain:
x(t) x(0) x(tf )
= eHt = eH(t−tf ) (2.17)
λ(t) λ(0) λ(tf )
Y1 (t) X1 (t)
H(t−tf )
e := (2.18)
Y2 (t) X2 (t)
x(tf ) := xf (2.21)
Then (2.19) can be manipulated to get rid of the unknown vector λ(tf ):
λ(tf ) = X−1
1 (t) x(t) − Y1 (t) xf (2.22)
λ(tf ) = X−1
2 (t) λ(t) − Y2 (t) xf
X−1 −1
2 (t) λ(t) − Y2 (t) xf = X1 (t) x(t) − Y 1 (t) x f
⇔ λ(t) = X2 (t) X−1 (2.23)
1 (t) x(t) − Y1 (t) xf + Y2 (t) xf
⇔ λ(t) = X2 (t) X−1 −1
1 (t) x(t) − X2 (t) X1 (t) Y1 (t) − Y2 (t) xf
In order to factor x(t) and xf , let P(t) and F(t) be the following matrices:
We nally get:
λ(t) = P(t) x(t) − F(t) xf (2.25)
(2.26)
λ(tf ) = S x(tf ) − xf
Then (2.19) can be manipulated to get rid of the unknown vector x(tf ):
Thus:
λ(t) = (Y2 (t) + X2 (t) S) (Y1 (t) + X1 (t) S)−1 x(t) + X1 (t) S xf
− X2 (t) S xf (2.29)
In order to factor x(t) and xf , let PS (t) and FS (t) be the following matrices:
We nally get:
λ(t) = PS (t) x(t) − FS (t) xf (2.31)
:= −K(t)x(t) + F(t) xf
where:
K(t) = R−1 BT P(t) (2.38)
The preceding expression leads to the closed-loop block diagram shown in
Figure 2.1.
It is worth noticing that P(tf ) = X2 (tf )X−1
1 (tf ) → ∞ because X1 (tf ) = 0
when the nal value xf of x(tf ) is imposed, as indicated by (2.20). This is in
line with the nal value of P(t) as indicated by (2.10) when the nal state is
close to zero:
xf = 0 ⇒ λ(tf ) = P(tf )x(tf ) = Sx(tf ) ⇒ P(tf ) = S (2.39)
Consequently, when it is desired that the nal value x(tf ) tends towards xf ,
then S → ∞. Thus S = P(tf ) is singular when the nal value x(tf ) is set to
xf . In that case, and to avoid the numerical diculty when t = tf , we shall set
u(tf ) = 0. Thus the optimal control reads:
−K(t) x(t) + F(t) xf ∀ 0 ≤ t < tf
u(t) = (2.40)
0 for t = tf
50 Chapter 2. Finite Horizon Linear Quadratic Regulator
Using (2.9) and (2.10), we can compute the time derivative of the Lagrange
multipliers λ(t) = P(t)x(t) as follows:
2.6 Examples
2.6.1 Example 1
Given the following scalar plant:
ẋ(t) = ax(t) + bu(t)
(2.47)
x(0) = x0
Find control u(t) which minimizes the following performance index where
xf = 0, S ≥ 0 and ρ > 0:
1 tf 2
Z
1 T
J(u(t)) = x (tf )Sx(tf ) + ρu (t) dt (2.48)
2 2 0
Hamiltonian matrix H dened in (2.13) reads:
" −b2
#
A −BR−1 BT
a
H= = ρ (2.49)
−Q −AT 0 −a
Se ( )
−a t−tf
=
(2.52)
a(t−tf ) −a(t−tf )
Sb2 e −e
e (
a t−tf )−
2ρa
S
=
Sb2 1−e
(
2a t−tf
)
e (
2a t−tf )+
2ρa
52 Chapter 2. Finite Horizon Linear Quadratic Regulator
u(t) = −K(t)x(t)
= −R−1 BT PS (t)x(t)
= − ρb PS (t)x(t)
(2.53)
−bS
= x(t)
Sb2 1−e
(
2a t−tf
)
ρe (
2a t−tf )+
2a
2ρa
→ P(t) = X2 (t) X−1
PS (t) |{z} 1 (t) = (2.54)
S→∞ b2 1 − e2a(t−tf )
and:
−2a
u(t) = −R−1 BT P(t)x(t) = x(t) (2.55)
b 1 − e2a(t−tf )
2.6.2 Example 2
Given the following plant, which actually represents a double integrator:
ẋ1 (t) 0 1 x1 (t) 0
= + u(t) (2.56)
ẋ2 (t) 0 0 x2 (t) 1
Find control u(t) which minimizes the following performance index where
xf = 0 and S = ST ≥ 0:
Z tf
1 1
J(u(t)) = xT (tf )Sx(tf ) + u2 (t)dt (2.57)
2 2 0
In order to compute eHt we use the following relation where L−1 stands for
the inverse Laplace transform:
h i
eHt = L−1 (sI − H)−1 (2.60)
2.7. Second order necessary condition for optimality 53
We get:
1 1 1
− s13
s −1 0 0 s s2 s4
0 s 0 1 −1
0 1 1
− s12
sI − H = s s3
0 0 s 0 ⇒ (sI − H) = 0 0
1
s 0
0 0 1 s 0 0 − s12 1
3 2
s
(2.61)
1 t t6 − t2
2
0 1 t2 −t
h i
⇒ eHt = L−1 (sI − H)−1 =
0 0 1
0
0 0 −t 1
Taking into account the fact that P = PT > 0, R = RT > 0 as well as (2.1),
(2.37) with xf = 0 and (2.38) it can be shown that:
d T
x (t)P(t)x(t) + xT (t)Qx(t) + u∗T (t)Ru∗ (t) = 0 (2.69)
dt
And the performance index (2.2) to be minimized can be re-written as:
R tf
J(u∗ (t)) = 12 xT (tf
)Sx(tf ) + 1
2 xT (t)Qx(t) + u∗T (t)Ru∗ (t) dt
0
Rt d T
⇔ J(u∗ (t)) = 1
xT (tf )Sx(tf ) − 0 f dt (2.70)
2 x (t)P(t)x(t) dt
⇔ J(u∗ (t)) = 1 T − xT (tf )P(tf )x(tf ) + xT (0)P(0)x(0)
2 x (tf )Sx(tf )
Then taking into account the boundary conditions P(tf ) = S we nally get
(2.66).
We are looking for the control u(t) which moves the system from the initial
state x(0) = x0 to a nal state which should be close to a given value x(tf ) = xf
at nal time t = tf . We will assume that the performance index to be minimized
2.9. Application to minimum energy control problem 55
∂H
= Ru(t) + BT λ(t) = 0 (2.74)
∂u
We get:
u(t) = −R−1 BT λ(t) (2.75)
Eliminating u(t) in equation (2.72) reads:
∂H
λ̇(t) = − = −AT λ(t) (2.77)
∂x
We get from the preceding equation:
T
λ(t) = e−A t λ(0) (2.78)
The value of λ(0) will inuence the nal value of the state vector x(t). Indeed
let's integrate the linear dierential equation:
T
ẋ(t) = Ax(t) − BR−1 BT λ(t) = Ax(t) + BR−1 BT e−A t λ(0) (2.79)
Or:
x(t) = eAt x0 + eAt Wc (t) λ(0) (2.81)
where matrix Wc (t) is dened as follows:
Z t
T
Wc (t) = e−Aτ BR−1 BT e−A τ dτ (2.82)
0
(2.83)
λ(tf ) = S x(tf ) − xf
56 Chapter 2. Finite Horizon Linear Quadratic Regulator
Solving the preceding linear equation in λ(0) gives the following expression:
T
e−A tf − SeAtf Wc (tf ) λ(0) = S eAtf x0 − xf
T
−1 (2.86)
⇔ λ(0) = e−A tf − SeAtf Wc (tf ) S eAtf x0 − xf
Using the expression of λ(0) in (2.78) leads to the expression of the Lagrange
multiplier λ(t):
T
T
−1
λ(t) = e−A t e−A tf − SeAtf Wc (tf ) S eAtf x0 − xf (2.87)
It is clear from the expression of λ(t) that the control u(t) explicitly depends
on the initial state x0 .
Using (2.95) in (2.78) leads to the expression of the Lagrange multiplier λ(t):
T
λ(t) = e−A t λ(0)
T (2.96)
= e−A t Wc−1 (tf ) e−Atf xf − x0
Finally the control u(t) which moves with the minimum energy the system
from the initial state x(0) = x0 to a given nal state x(tf ) = xf at nal time
t = tf has the following expression:
u(t) = −R−1 BT λ(t)
T
= −R−1 BT e−A t λ(0) (2.97)
T
= −R−1 BT e−A t Wc−1 (tf ) e−Atf xf − x0
It is clear from the preceding expression that the control u(t) explicitly
depends on the initial state x0 . When comparing the initial value λ(0) of the
Lagrange multiplier obtained in (2.95) in the case where the nal state is
imposed to be x(tf ) = xf with the expression of the initial value of the
Lagrange multiplier obtained in (2.86) in the case where the nal state x(tf ) is
close to a given nal state xf we can see that the expression in (2.95)
corresponds to the limit of the initial value (2.86) when matrix S moves
−1
towards innity (note that eAtf = e−Atf ):
T
−1
e−A tf − SeAtf Wc (tf ) S eAtf x0 − xf
limS→∞
−1
= limS→∞ −SeAtf Wc (tf ) S eAtf x0 − xf
(2.98)
= limS→∞ Wc−1 (tf )e−Atf S−1 S eAtf x0 − xf
= Wc−1 (tf )e−Atf eAtf x0 − xf
2.9.3 Example
Given the following scalar plant:
ẋ(t) = ax(t) + bu(t)
(2.99)
x(0) = x0
Find the optimal control for the following cost functional and nal states
constraints:
We wish to compute a nite horizon Linear Quadratic Regulator with either
a xed or a weighted nal state xf .
58 Chapter 2. Finite Horizon Linear Quadratic Regulator
− When the nal state x(tf ) is set to a xed value xf and the cost functional
is set to:
Z tf
1
J= ρu2 (t) dt (2.100)
2 0
− When the nal state x(tf ) shall be close of a xed value xf so that the
cost functional is modied as follows where is a positive scalar (S > 0):
Z tf
1 1
J = (x(tf ) − xf )T S (x(tf ) − xf ) + ρu2 (t) dt (2.101)
2 2 0
In both cases the two-point boundary value problem which shall be solved
depends on the solution of the following dierential equation where Hamiltonian
matrix H appears:
− If the nal state x(tf ) is set to the value xf then the value λ(0) is obtained
by solving the rst equation of (2.105):
at −atf
b2 (e f −e )
x(tf ) = xf = eatf x(0) − λ(0)
−2ρa
2ρa
at
(2.106)
⇒ λ(0) = 2 atf −atf xf − e x(0) f
b (e −e )
2.10. Finite horizon LQ regulator with cross-term in the performance index 59
And:
x(t) = eat x(0) + eat −e−at
xf − eat f x(0)
at −at f
e f −e
−2ρae−at (2.107)
λ(t) = e−at λ(0) = xf − eatf x(0)
at −atf
b2 (e f −e )
−b 2ae−at
u(t) = −R−1 BT λ(t) = xf − eatf x(0)
λ(t) = at −at
ρ b (e − e
f f )
(2.108)
Interestingly enough, the open-loop control is independent of the control
weighting ρ.
− If the nal state x(tf ) is expected to be close to the nal value xf then
we have to mix the two equations of (2.105) and the constraint λ(tf ) =
S (x(tf ) − xf ) to compute the value of λ(0) :
(x(tf ) − xf )
λ(tf ) = S
atf −atf
b2 (e −e )
⇒ e−atf λ(0) = S eatf x(0) − 2ρa λ(0) − xf
atf
(2.109)
S(e x(0)−xf )
⇔ λ(0) =
at −atf
Sb2 e f −e
−atf
e + 2ρa
Where:
We will assume that the pair (A, B) is controllable. The purpose of this
section is to explicit the control u(t) which minimizes the following quadratic
performance index with cross-terms:
1 tf T
Z
J(u(t)) = x (t)Qx(t) + uT (t)Ru(t) + 2xT (t)Su(t) dt (2.111)
2 0
With the constraint on terminal state:
x(tf ) = 0 (2.112)
S = ST ≥ 0
Q = QT ≥ 0 (2.113)
R = RT > 0
Where:
Qm = Q − SR−1 ST
(2.115)
v(t) = u(t) + R−1 ST x(t)
Hence cost (2.111) to be minimized can be rewritten as:
1 ∞ T
Z
J(u(t)) = x (t)Qm x(t) + v T (t)Rv(t) dt (2.116)
2 0
Qm = Q − SR−1 ST ≥ 0 (2.118)
The problem can be solved through the following Hamiltonian system whose
state is obtained by extending the state x(t) of system (2.110) with costate λ(t):
A − BR−1 ST −BR−1 BT
ẋ(t) x(t) x(t)
= := H (2.120)
λ̇(t) −Q + SR−1 ST −AT + SR−1 BT λ(t) λ(t)
2.11. Extension to nonlinear system ane in control 61
Notice that pair (A, B) has been replaced by (−A, −B) in the second
equation. We will denote by K1 and K2 the following innite horizon gain
matrices:
K1 = R−1 ST + B T P 1
(2.122)
K2 = R−1 ST − B T P 2
Then the optimal control reads:
−K(t)x(t) ∀ 0 ≤ t < tf
u(t) = (2.123)
0 for t = tf
Where:
K(t) = R−1 ST + BT P(t)
(2.124)
P(t) = X2 (t)X−1
1 (t)
And:
(
X1 (t) = e(A−BK1 )t − e(A−BK2 )(t−tf ) e(A−BK1 )tf
(2.125)
X2 (t) = P1 e(A−BK1 )t + P2 e(A−BK2 )(t−tf ) e(A−BK1 )tf
Furthermore the optimal state x(t) and costate λ(t) have the following
expressions:
x(t) = X1 (t) X−1
1 (0) x0 (2.127)
λ(t) = X2 (t) X−1
1 (0) x0
1 tf
Z
q(x) + uT (t)Ru(t) dt (2.128)
J(u(t)) = G (x(tf )) +
2 0
under the constraint that the system is nonlinear but ane in control:
ẋ(t) = f (x) + g(x) u(t)
(2.129)
x(0) = x0
2
Lorenzo Ntogramatzidis, A simple solution to the nite-horizon LQ problem with zero
terminal state, Kybernetika - 39(4):483-492, January 2003
62 Chapter 2. Finite Horizon Linear Quadratic Regulator
3
Todorov E. and Tassa Y., Iterative Local Dynamic Programming, IEEE ADPRL, 2009
Chapter 3
1 tf T
Z
J(u(t)) = x (t)Qx(t) + uT (t)Ru(t) dt (3.2)
2 0
where Q = NT N ≥ 0 (thus Q is symmetric and positive semi-denite) and
R = RT > 0 is a symmetric and positive denite matrix.
In this chapter we will focus on the case where the nal time tf tends toward
innity (tf → ∞). The performance index to be minimized turns to be:
1 ∞ T
Z
x (t)Qx(t) + uT (t)Ru(t) dt (3.3)
J(u(t)) =
2 0
The results presented in this chapter can be envisioned as the results of the
previous chapter as ∥S∥ → ∞ (xf := 0 here) and tf → ∞. When the nal
time tf is set to innity, the Kalman gain K(t) which has been computed in the
previous chapter becomes constant. As a consequence, the control is easier to
implement as far as it is no more necessary to integrate the dierential Riccati
equation and to store the gain K(t) before applying the control. In practice
innity means that nal time tf becomes large when compared to the time
constants of the plant.
64 Chapter 3. Innite Horizon Linear Quadratic Regulator (LQR)
∀ λi for controllability
(3.5)
rank A − λi I B = n
∀ λi s.t. Re(λi ) ≥ 0 for stabilizability
Similarly we may use the Kalman test to check the observability of the
system:
N
NA
rank = n where n = size of state vector x (3.6)
..
.
NAn−1
Or equivalently the Popov-Belevitch-Hautus (PBH ) test which shall be
applied to all eigenvalues of A, denoted λi , to check the observability of the
system, or only on the eigenvalues which are not contained in the left half
plane to check the detectability of the system:
∀ λi for observability
A − λi I
rank =n (3.7)
N ∀ λi s.t. Re(λi ) ≥ 0 for detectability
It is worth noticing that the algebraic Riccati equation (3.8) may have several
solutions. The solution of the optimal control problem only retains the positive
semi-denite solution of the algebraic Riccati equation.
3.3. Algebraic Riccati equation 65
The need for the detectability assumption is to ensure that the optimal
control computed using the limtf →∞ P(t) generates a feedback gain
K = R−1 BT P that stabilizes the plant, i.e. all the eigenvalues of A − BK lie
on the open left half plane. In addition, it can be shown that the minimum
cost achieved is given by:
1
J ∗ = xT (0)Px(0) (3.10)
2
To get this result rst we notice that the Hamiltonian (1.63) reads:
1 T
x (t)Qx(t) + uT (t)Ru(t) + λT (t) (Ax(t) + Bu(t)) (3.11)
H(x, u, λ) =
2
The necessary condition for optimality (1.72) yields:
∂H
= Ru(t) + BT λ(t) = 0 (3.12)
∂u
∂H
λ̇(t) = − = −Qx(t) − AT λ(t) (3.15)
∂x
The key point in the LQR design is that Lagrange multipliers λ(t) are now
assume to linearly depends on state vector x(t) through a constant symmetric
positive denite matrix denoted P:
By taking the time derivative of the Lagrange multipliers λ(t) and using
again equation (3.1) we get:
Then using the expression of control u(t) provided in (3.13) as well as (3.16)
we get:
λ̇(t) = PAx(t) − PBR−1 BT λ(t)
(3.18)
= PAx(t) − PBR−1 BT Px(t)
Finally using (3.18) within (3.15) and using λ(t) = Px(t) (see (3.16)) we
get:
−PAx(t) + PBR−1 BT Px(t) = Qx(t) + AT Px(t)
(3.19)
⇔ AT P + PA − PBR−1 BT P + Q x(t) = 0
As far as this equality stands for every value of the state vector x(t) we
retrieve the algebraic Riccati equation (3.8):
AT P + PA − PBR−1 BT P + Q = 0 (3.20)
T !
∂J ∗ (x)
1
q(x) + uT u + (3.23)
0 = minu(t) ∈ U f (x) + g(x) u
2 ∂x
∗ T ∗ (3.25)
∂J (x) T ∂J (x)
+ ∂x f (x) − g(x)g (x) ∂x
3.4. Extension to nonlinear system ane in control 67
We nally get:
∂J (x) T 1 ∂J ∗ (x) T
∗ ∗
1 ∂J (x)
q(x) + f (x) − T
g(x)g (x) = 0 (3.26)
2 ∂x 2 ∂x ∂x
In the linearized case the solution of the optimal control problem is a linear
static state feedback of the form u = −BT P̄, where P̄ is the symmetric positive
denite solution of the algebraic Riccati equation:
where:
∂f (x)
A = ∂x
x=0
B = g(0) (3.28)
∂ 2 q(x)
Q= 1
2 2
∂x x=0
neighbourhood of the origin Ω ⊆ R2n and k̄ ≥ 0 such that for all k ≥ k̄ the
function V (x, ξ) is positive denite and satises the following partial
dierential inequality:
1 1
q(x) + Vx (x, ξ)f (x) + Vξ (x, ξ) ξ̇ − Vx (x, ξ) g(x)g T (x)VxT (x, ξ) ≤ 0 (3.29)
2 2
where: (
V (x, ξ) = P (ξ)x + 21 (x − ξ)T R(x − ξ)
(3.30)
ξ̇ = −k VξT (x, ξ) ∀ (x, ξ) ∈ Ω
∂P (x)T
= P̄ (3.32)
∂x x=0
where:
ξ̇ = −k VξT (x, ξ) = −k Ψ(ξ)T x − R x − ξ (3.35)
Such control has been applied to internal combustion engine test benches2 .
A −BR−1 BT x(t)
ẋ(t) x(t)
= := H (3.36)
λ̇(t) −Q −AT λ(t) λ(t)
⇔ HT JT x = −λ JT x
A −BR−1 BT x1
x1 x
H = =λ 1 (3.44)
x2 −Q −AT x2 x2
AX + XB + C + XDX = 0 (3.52)
Matrices A, B, C and D are known whereas matrix X has to be determined.
The general algebraic Lyapunov equation is obtained as a special case of the
algebraic Riccati by setting D = 0.
The general algebraic Riccati equation can be solved3 by considering the
following 2n × 2n matrix H:
B D
H= (3.53)
−C −A
and imaginary parts of such eigenvectors. Note that there are many ways to
form matrix M.
Then we can write the following relation:
Λ1 0
(3.54)
HM = MΛ = M1 M2
0 Λ2
We will focus our attention on the rst equation and split matrix M1 as
follows:
M11
M1 = (3.56)
M12
Assuming that matrix M11 is not singular, we can check that a solution X
of the general algebraic Riccati equation (3.52) reads:
X = M12 M−1
11 (3.58)
Indeed:
BM11 + DM12 = M11 Λ1
CM11 + AM12 = −M12 Λ1
X = M12 M−1
11
⇒ AX + XB + C + XDX = AM12 M−1 −1
11 + M12 M11 B + C
+M12 M−111 DM12 M11
−1
−1
= (AM12 + CM11 ) M11
+M12 M−1 11 (BM11 + DM12 ) M11
−1
From the rst equation we can choose for example the following components
for eigenvector v 1 :
1 s
(c1 b)2
v11
v1 = = a−λρ
1 where λ1 = + a2 + (3.67)
v12 b2 ρ
We can check that this choice for v11 and v12 is compatible with the second
equation. Indeed:
−c21 2
−c21 v11 = v12 (a + λ1 ) ⇒ a−λ1 = ρ
b2
(a + λ1 ) ⇒ a2 − λ21 = − (c1ρb) (3.68)
Then the solution of the algebraic Riccati equation which leads to the
computation of the optimal control reads:
s
−1 ρ (c1 b)2
P = X2 X1 = 2 (a − λ2 ) where λ2 = − a2 + (3.71)
b ρ
Thus: s !
ρ (c1 b)2
P= 2 a+ a2 + (3.72)
b ρ
We will check those results by using the algebraic Ricatti equation, which
reads:
AT P + PA − PBR−1 BT P + Q = 0
2
⇔ 2aP − bρ P2 + c21 = 0 (3.73)
b2 2 2
⇔ ρ P − 2aP − c1 = 0
The roots of this quadratic equation are:
r
(c b)2
2a+ 4a2 +4 1ρ
q
ρ (c1 b)2
P1 =
b2
= b2
a + a2 + ρ >0
2
r ρ
(3.74)
(c1 b)2
4a2 +4
2a− q
ρ ρ (c1 b)2
P2 = = a − a2 + <0
2 b2 ρ
2 bρ
74 Chapter 3. Innite Horizon Linear Quadratic Regulator (LQR)
where T ∈ Rn×n is an upper triangular matrix (we said that T has a real
Schur form):
t11 t12 · · · t1n
0 t21 . . .
(3.83)
T= .
.. . . . . . .
0 0 · · · tnn
Moreover, given Hamiltonian matrix H dened in (3.78), there is always a
corresponding algebraic Riccati equation (ARE)4 :
A −G
H= where G = GT , Q = QT
−Q −AT (3.84)
Corresponding ARE : A P + PA − PGP + Q = 0
T
−BR−1 BT
A In 0
−NT N −AT P In
−BR−1 BT
In 0 A − BK
= (3.87)
P In 0 − (A − BK)T
In 0
From the preceding relation, and using the fact that det = 1,
P In
we get:
det (sI − H) = (−1)n β(s) β(−s) where β(s) := det (sI − A + BK) (3.88)
A BR−0.5
F(s) = := N (sI − A)−1 BR−0.5 (3.89)
N 0
76 Chapter 3. Innite Horizon Linear Quadratic Regulator (LQR)
To get this result, consider Figure 3.1. The relation between e(s) and r(s)
is obtained by reading Figure 3.1 against the arrows:
e(s) = r(s) − F(s)FT (−s)e(s)
−1 (3.91)
⇒ e(s) = I + F(s)FT (−s) r(s)
On the other hand, the realization of FT (−s) is obtained from the realization
of F(s) as follows:
A BR−0.5
F(s) = := N (sI − A)−1 BR−0.5
N 0
T −1 T
⇒ FT (−s) = N (−sI − A)−1 BR−0.5 = −R−0.5 BT sI − −AT N
T T
−A N
=
−R−0.5 BT 0
(3.92)
Thus, in the time domain we have:
A BR−0.5 ẋ1 = Ax1 + BR−0.5 u
F(s) = ⇒
N 0 y = Nx1
T T (3.93)
ẋ2 = −AT x2 + NT e
−A N
FT (−s) = ⇒
−R−0.5 BT 0 u = −R−0.5 BT x2
From Figure 3.1 we see that e = r − y . Thus the realization of Figure 3.1
reads as follows:
−BR−1 BT
ẋ1 A x1 0
e = r − y
= + r
−NT N −AT NT
ẋ2 x2
u = −R −0.5 T
B x2 ⇒ x1
y = Nx1 e = −N 0 + Ir
x2
(3.94)
In the frequency domain we get:
−1 0
(3.95)
e(s) = −N 0 (sI − H) + I r(s)
NT
3.8. Discrete time LQ regulator 77
BR−0.5
−1 −1
T −0.5 T + I (3.96)
I + F (−s)F(s) = 0 −R B (sI − H)
0
Where: −1 T
K(k) = R + BT P(k + 1)B B P(k + 1)A (3.103)
And P(k) is given by the solution of the discrete time Riccati equation:
−1 T
P(k) = AT P(k + 1)A + Q − AT P(k + 1)B R + BT P(k + 1)B B P(k + 1)A
P(N ) = S
(3.104)
x(N ) = 0 (3.106)
Where:
Ab = A−1
Bb = −A−1 B
Qb = A−T QA−1 (3.108)
R = R − ST A−1 B − BT A−T S + BT A−T QA−1 B
b
Sb = A−T S − A−T QA−1 B
We will denote by K1 and K2 the following innite horizon gain matrices:
( −1 T
K1 = R + BT P1 B B P 1 A + ST
−1 T (3.109)
K2 = Rb + BTb P2 Bb Bb P2 Ab + STb
And:
(
X1 (k) = (A − BK1 )k − (Ab − Bb K2 )(k−N ) (A − BK1 )N
(3.112)
X2 (k) = P1 (A − BK1 )k + P2 (Ab − Bb K2 )(k−N ) (A − BK1 )N
Furthermore the optimal state x(k) and costate λ(k) have the following
expressions:
x(k + 1) = (A − BK1 ) e1 (k) − (Ab − Bb K2 ) e2 (k)
(3.114)
λ(k + 1) = P1 (A − BK1 ) e1 (k) + P2 (Ab − Bb K2 ) e2 (k)
Where:
(
e1 (k) = (A − BK1 )k X−11 (0)x0
(k−N ) (3.115)
e2 (k) = (Ab − Bb K2 ) (A − BK1 )N X−1
1 (0)x0
Where: −1 T
K = R + BT PB B PA (3.119)
If (A, B) is stabilizable, then the closed-loop system is stable, meaning that
all the eigenvalues of (A−BK), with K given by (3.119), will lie within the unit
disk (i.e. have magnitudes less than 1). Let's dene the following symplectic
matrix5 :
A−1 A−1 G
H= (3.120)
QA−1 AT + QA−1 G
5
Alan J. Laub, A Schur Method for Solving Algebraic Riccati equations, IEEE
Transactions On Automatic Control, VOL. AC-24, NO. 6, December 1979
80 Chapter 3. Innite Horizon Linear Quadratic Regulator (LQR)
Where:
G = BR−1 BT (3.121)
A symplectic matrix is a matrix which satises:
0 I
H JH = J where J =
T
and J−1 = −J (3.122)
−I 0
This implies:
P = X2 X−1
1 (3.124)
Thus matrix P for the optimal steady state feedback can be computed thanks
to the unstable (eigenvalues outside the unit circle) eigenvectors of H or the
stable (eigenvalues inside the unit circle) eigenvectors of H−1 .
In order to compute the closed-loop transfer matrix between X(s) and R(s)
we take the Laplace transform of (3.125) assuming no initial condition:
On the other hand, let Φ(s) be resolvent of the state (transition) matrix A.
Matrix Φ(s) is dened as follows:
The block diagram of the full-state feedback control is shown in Figure 3.2.
We get:
X(s) = Φ(s)B(R(s) − KX(s))
(3.129)
= (I + Φ(s)BK)−1 Φ(s)BR(s)
Using the fact that (AB)−1 = B−1 A−1 we get:
6
https://en.wikipedia.org/wiki/Determinant
82 Chapter 3. Innite Horizon Linear Quadratic Regulator (LQR)
Im −M1
Thus if M = , we get:
M2 In
Im −M1
det (M) = det = det (Im + M1 M2 )
M2 In (3.135)
= det (In + M2 M1 )
Then we get:
det (sI − A + BK) = det (sI − A) I + (sI − A)−1 BK
= det ((sI − A) (I + Φ(s)BK)) (3.137)
= det (sI − A) det (I + Φ(s)BK)
= det (sI − A) det (I + KΦ(s)B)
The roots of det (sI − A + BK) are the eigenvalues of the closed-loop
system. Consequently they are related to the stability of the closed-loop
system.
Moreover the roots of det (I + KΦ(s)B) are exactly the roots of
det (sI − A + BK). Indeed, as far as Φ(s) = (sI − A)−1 , the inverse of
(sI − A) is computed as the adjugate of matrix (sI − A) divided by
det (sI − A) which nally becomes the denominator of det (I + KΦ(s)B):
det(I + KΦ(s)B) = det I + K (sI − A)−1 B
adj(sI−A)
= det I + K det(sI−A) B
(3.139)
= det det(sI−A)I+K adj(sI−A)B
det(sI−A)
= det(det(sI−A)I+K adj(sI−A)B)
det(sI−A)
⇒ det (sI − A + BK) = det (det (sI − A) I + K adj (sI − A) B)
Thus:
det (sI − A + BK) = 0 ⇔ det (I + KΦ(s)B) = 0 (3.140)
This equation involves the transfer function CΦ(s)B between the output
Y (s) and the control U (s) of the plant without any feedback and is used in the
Nyquist stability criterion for Single-Input Single-Output (SISO) systems.
It is also worth noticing that (I + KCΦ(s)B)−1 is attached to the so called
sensitivity function of the closed-loop whereas CΦ(s)B is attached to the open-
loop transfer function from the process' input U (s) to the plant output Y (s).
where L(s) is the loop gain and K the optimal feedback gain (obtained
through the algebraic Riccati equation):
Substituting s = jω yields:
1
∥1 + L(jω)∥2 = 1 + ∥NΦ(jω)B∥2 (3.159)
R
Therefore:
∥1 + L(jω)∥ ≥ 1 ∀ω ∈ R (3.160)
For SISO plants, the sensitivity function S(s) and the complementary
sensitivity function T(s) are dened as follows:
1
(
S(s) = 1+L(s)
L(s) (3.161)
T(s) = 1 − S(s) = 1+L(s)
Furthermore, let's introduce the real part X(ω) and the imaginary part Y (ω)
of L(jω):
L(jω) = X(ω) + jY (ω) (3.163)
Then ∥1 + L(jω)∥2 reads as follows:
∥1 + L(jω)∥ ≥ 1
⇔ ∥1 + L(jω)∥2 ≥ 1 (3.165)
⇔ (1 + X(ω))2 + Y (ω)2 ≥ 1
As a consequence, the Nyquist plot of L(jω) will be outside the circle of
unit radius centered at (−1, 0). Thus applying the generalized (MIMO) Nyquist
stability criterion and knowing that the LQR design always leads to a stable
closed-loop plant, the implications of Kalman inequality are the following:
− If the open-loop system has no unstable pole, then the Nyquist plot of
L(jω) does not encircle the critical point (−1, 0). This corresponds to a
positive gain margin of +∞ as depicted as depicted in Figure 3.5.
− On the other hand if Φ(s) has unstable poles, the Nyquist plot of L(jω)
encircles the critical point (−1, 0) a number on times which corresponds
to the number of unstable open-loop poles. This corresponds to a negative
gain margin which is always lower or equal to 20 log10 (0.5) = −6 dB as
depicted in Figure 3.6.
Figure 3.5: Nyquist plot of L(s): example where the open-loop system has
no unstable pole
Figure 3.6: Nyquist plot of L(s): example where the open-loop system has
unstable poles
3.9. Robustness property 89
Unfortunately those nice properties are lost as soon as the performance index
J(u(t)) contains state / control cross-terms 8 :
Z tf
1
J(u(t)) = xT (t)Qx(t) + uT (t)Ru(t) + 2xT (t)Su(t) dt (3.166)
2 0
This is especially the case for LQG (Linear Quadratic Gaussian) regulator
where the plant dynamics as well as the output measurement are subject to
stochastic disturbances and where a state estimator has to be used.
8
Doyle J.C., Guaranteed margins for LQG regulators, IEEE Transactions on Automatic
Control, Volume: 23, Issue: 4, Aug 1978
90 Chapter 3. Innite Horizon Linear Quadratic Regulator (LQR)
Chapter 4
Design methods
− kp is a scaling factor
Then the state matrix of the closed-loop system reads A − kp BKo N and the
polynomial det (sI − A + kp BKo N) is the closed-loop characteristics
polynomial.
It is worth noticing that the roots of D(s) + kp N (s) are also the roots of
1 + kp N (s)
D(s) :
N (s)
D(s) + kp N (s) = 0 ⇔ 1 + kp = 0 ⇔ L(s) := kp F (s) = −1 (4.8)
D(s)
Transfer function L(s) = kp F (s) is called the loop transfer function. In the
SISO case the numerator of the loop transfer function L(s) is scalar as well as
its denominator.
Equation L(s) = −1 can be equivalently split into two equations:
|L(s)| = 1
(4.10)
arg (L(s)) = (2k + 1) π, k = 0, ±1, · · ·
− The root locus is symmetrical with respect to the horizontal real axis
(because roots are either real or complex conjugate);
− The number of branches is equal to the number of poles of the loop transfer
function. Thus the root locus has n branches;
− The root locus starts at the n poles of the loop transfer function;
− The root locus ends at the zeros of the loop transfer function. Thus m
branches of the root locus end on the m zeros of F (s) and there are (n−m)
asymptotic branches;
− Assuming that coecient a in F (s) is positive, a point s∗ on the real
axis belongs to the root locus as soon as there is an odd number of poles
and zeros on its right. Conversely assuming that coecient a in F (s) is
negative, a point s∗ on the real axis belongs to the root locus as soon as
there is an even number of poles and zeros on its right. Be careful to take
into account the multiplicity of poles and zeros in the counting process;
− The (n − m) asymptotic branches of the root locus which diverge to ∞
are asymptotes.
The angle δk of each asymptote with the real axis is dened by:
π + arg(a) + 2kπ
δk = ∀ k = 0, . . . , n − m − 1 (4.11)
n−m
Denoting by pi the n poles of the loop transfer function (that are the
roots of D(s)) and by zj the m zeros of the loop transfer function
(that are the roots of N (s)), the asymptotes intersect the real axis
at a point (called pivot or centroid) given by:
Pn Pm≤n
i=1 pi − j=1 zj
σ= (4.12)
n−m
− The breakaway / break-in points are located on the real axis and always
have a vertical tangent. They are located at the roots sb of the following
equation as soon as there is an odd (if coecient a in F (s) is positive) or
even (if coecient a in F (s) is negative) number of poles and zeros on its
right (Be careful to take into account the multiplicity of poles and zeros
in the counting process):
d 1 d D(s)
ds F (s) s=s = ds N (s) s=s =0
(4.13)
b b
⇔ D′ (sb )N (sb ) − D(sb )N ′ (sb ) =0
Indeed from the fact that breakaway / break-in points have always a
vertical tangent we can write:
N (s) dkp D′ (s)N (s) − D(s)N ′ (s)
1 + kp F (s) = 1 + kp =0⇒ =− =0
D(s) dp N 2 (s)
(4.14)
From this relation we get (4.13).
94 Chapter 4. Design methods
The previous equation is then split into its real and imaginary part and
provides a system of 2 equations which lead to the value of the critical
gain and the oscillation frequency at the critical gain. It is worth noticing
that the Routh criterion can be used for the same purpose.
1 ∞ T
Z
x (t)Qx(t) + uT (t)Ru(t) dt (4.17)
J(u(t)) =
2 0
Let z(t) := N x(t) be the controlled output: this is a ctitious output which
represents the output of interest for the design. The controlled output z(t) is
expressed as a linear function of the state vector x(t) as:
In the single control case which is under consideration, it can be shown (see
section 4.1.4) that the characteristic polynomial of the closed-loop system is
96 Chapter 4. Design methods
linked with the numerator and the denominator of the loop transfer function as
follows:
1
β(s) β(−s) = D(s)D(−s) + N (s)N (−s) (4.25)
R
This relation can be associated with the root locus of
G(s)G(−s) = N (s)N (−s)
D(s)D(−s) where ctitious gain kp = R varies from 0 to ∞. This
1
leads to the so-called Chang-Letov design procedure, which enables to nd the
closed-loop poles based on the open-loop poles and zeros of G(s)G(−s). The
dierence with the root locus of G(s) is that both the open-loop poles and
zeros and their reections about the imaginary axis have to be taken into
account (this is due to the multiplication by G(−s)). The actual closed-loop
poles are those located in the left half plane with negative real part; indeed
optimal control leads always to a stabilizing gain. It is worth noticing that
matrix N is actually a design parameter which is used to shape the root locus.
Furthermore it has been seen in (3.138) that thanks to the Hsu-Chen theorem
we have:
det (sI − A + BK)
det (I + KΦ(s)B) = (4.28)
det (sI − A)
Let D(s) be the open-loop characteristics polynomial and β(s) be the closed-
loop characteristic polynomial:
D(s) = det (sI − A)
(4.29)
β(s) = det (sI − A + BK)
1
β(s) β(−s) ≈ N (s)N (−s) as R → 0 (4.36)
R
Equation (4.36) shows that any roots of β(s) β(−s) that remains nite as
R → 0 must tend toward the roots of N (s)N (−s). But from (4.3) we know
that the degree of N (s)N (−s), say 2m, is less than the degree of β(s) β(−s),
which is 2n. Therefore m roots of β(s) are the roots of N (s)N (−s) in the open
left half plane (stable roots). The remaining n − m roots of β(s) asymptotically
approach innity in the left half plane. For very large s we can ignore all but
the highest power of s in (4.34) so that the magnitude (or modulus) of the roots
that tend toward innity shall satisfy the following approximate relation:
b2m
(−1)n s2(n−m) ≈ (−1)m (4.37)
R
where we denote:
The roots of β(−s) are the reection across the imaginary of the roots of
β(s). Now express s in the exponential form:
s = r ejθ (4.39)
b2m b2
(−1)n r2n ej2nθ ≈ (−1)m r2m ej2mθ ⇒ r2(n−m) ≈ m (4.40)
R R
Therefore, the remaining n−m zeros of β(s) lie on a circle of radius r dened
by:
1
b n−m
r≈ √m (4.41)
R
The particular pattern to which the 2(n − m) solutions of (4.41) lie is known
as the Butterworth conguration. The angle of the 2(n − m) branches which
diverge to ∞ are obtained by adapting relation (4.11) to the case where transfer
function reads G(s)G(−s) = N (s)N (−s)
D(s)D(−s) .
(1 + L(−s))T (1 + L(s)) = 1 + 1
(Φ(−s)B)T NT N (Φ(s)B)
R (4.43)
=1+ 1
R (NΦ(−s)B)T (N (Φ(s)B))
4.2. Asymptotic properties of LQR applied to SISO plants 99
If after simulation |zi (t)| is to large then increase qii ; similarly if after
simulation |uj (t)| is to large then increase rjj .
(
Av i = λi v i
−1
K = ··· 0m×1 ··· ··· vi ···
| {z } |{z} (4.53)
ith column ith column
⇒ (A − BK) v i = λi v i
(4.54)
V= v1 · · · vn
Note that if λi and λj := λ̄i are complex conjugate eigenvalues, then the
corresponding eigenvectors v i and v j are also complex conjugate:
λj = λ̄i ⇔ v j = v̄ i (4.55)
In order to get a real valued matrix V, v i and v j shall be changed into the real
part and imaginary part of v i , that is Re(v i ) and Im(v i ), respectively.
Let λ1 , · · · , λr be the r ≤ n eigenvalues that are desired to be changed by
state feedback gain K and v 1 , · · · , v r the corresponding eigenvectors of the state
matrix A. Similarly let λr+1 , · · · , λn be the n − r eigenvalues that are desired to
be kept invariant by state feedback gain K and v r+1 , · · · , v n the corresponding
eigenvectors of the state matrix A. Assuming that matrix V is invertible, matrix
M is dened and split as follows where Mr is an r × n matrix and Mn−r is an
(n − r) × n matrix:
−1
−1 Mr
(4.56)
M=V = v 1 · · · v r v r+1 · · · v n =
Mn−r
Shieh & al.7 have shown that, once weighting matrix R = RT > 0 is
set, the characteristic polynomial β(s) of the closed-loop transfer function is
102 Chapter 4. Design methods
linked with the numerator and the denominator of the loop transfer function
Φ(s)B = (sI − A)−1 B as follows:
where:
Nol (s) = adj (sI − A) B
−1
N rl (s) = q T0 Mr Nol (s) R0.5 (4.58)
q ∈ Rr×1
0
T
Matrix R = R
0.5 0.5 is the root square of matrix R. By getting the modal
decomposition of matrix R, that is R = VDV−1 where V is the matrix whose
columns are the eigenvectors of R and D is the diagonal matrix whose diagonal
elements are the corresponding positive eigenvalues, the square root R0.5 of R is
given by R0.5 = VD0.5 V−1 , where D0.5 is any diagonal matrix whose elements
are the square root of the diagonal elements of D2 .
Relation (4.57) can be associated with root locus of the ctitious transfer
N T (s) N (−s)
function G(s)G(−s) = D(s) rl
D(−s) where ctitious gain kp varies from 0 to ∞.
rl
(4.60)
Q v r+1 · · · v n = 0
A −BR−1 BT
H= (4.61)
−Q −AT
which corresponds to the following algebraic Riccati equation:
AT P + PA − PBR−1 BT P + Q = 0 (4.62)
A + αI −BR−1 BT
H= (4.66)
0 − (A + αI)T
Finally let λKi be the closed-loop eigenvalues, that are the eigenvalues of
matrix A − BK. The eigenvalues of A − BK are obtained from the eigenvalues
of A + αI − BK by subtracting α to λαi . Thus from (4.67), and given a
controllable pair (A, B), a positive denite symmetric matrix R and a positive
real constant α, the algebraic Riccati equation (4.65) where Q = QT = 2αP
has a unique positive denite solution P = PT > 0 such that λKi have the
following property:
Re(λi ) ≤ −α ⇒ λKi = λi
Re(λi ) > −α ⇒ λKi = −λi − 2α ∀ i = 1, · · · , n (4.68)
Im(λKi ) = Im(λi )
104 Chapter 4. Design methods
Once the algebraic Riccati equation (4.65) is solved in P the classical LQR
design is applied:
u(t) = −Kx(t)
(4.70)
K = R−1 BT P
It is worth noticing that the algebraic Riccati equation (4.65) can be changed
into a Lyapunov equation by pre- and post-multiplying (4.65) by P−1 and setting
X := P−1 :
Matrix R remains the degree of freedom for the design and it seems that
it may be used to set the damping ratio of the complex conjugate dominant
poles for example. Unfortunately (4.66) indicates that the eigenvalues of the
Hamiltonian matrix H, which are closely related to eigenvalues of the closed-
loop system, are independent of matrix R. Thus matrix R has no inuence on
the location of the closed-loop poles in that situation.
Furthermore it is worth reminding that the higher the displacement of
closed-loop eigenvalues with respect to the open-loop eigenvalue is, the higher
the control eort is. Thus specifying very fast dominant poles may lead to
unacceptable control eort.
3
Y. Ochi and K. Kanai, Pole placement in optimal regulator by continuous pole-shifting,
Journal of Guidance Control and Dynamics, Vol. 18, No. 6 (1995), pp. 1253-1258
4
Optimal pole shifting for continuous multivariable linear systems, M. H. Amin, Int.
Journal of Control 41 No. 3 (1985), 701707.
4.3. Poles shifting in optimal regulator 105
ż i = v T Ax + v T Bu
= v T λi x + v T Bu
= λi v T x + v T Bu (4.75)
= λi z i + v T Bu
= λi z i + Gu where G := v T B = CT B
Thus, matrix P
e reads:
Pe = λi − λKi (4.78)
GR−1 GT
The state weighting matrix Qi that will shift the open-loop eigenvalue λi to
the closed-loop eigenvalue λKi is obtained through the following identication:
e i z = xT Qi x. We nally get:
z Ti Q i
z i = v T x := CT x ⇒ Qi = CQ
e i CT (4.79)
0 = Pλ e − PGR
e i + λi P −1 GT Pe +Q
ei
(4.80)
e
e e e −1
⇔ Qi = −2λi P + PGR G P T e
In order to manipulate real values, we will use the real part and the imaginary
part of the preceding equation. Denoting λi := a + j b, that is a := Re(λi ) and
b := Im(λi ), the preceding relation is equivalently replaced by the following
one:
T
λi 0
A v v̄ = v v̄
0 λ̄i
a −b
(4.82)
T
⇔A Re(v) Im(v) = Re(v) Im(v)
b a
z i := CT x where C = (4.83)
Re(v) Im(v)
ż i = Ai z i + Gu (4.84)
where:
T
G=C B
(4.85)
a −b
Ai =
b a
4.3. Poles shifting in optimal regulator 107
Thus the closed-loop eigenvalues are the eigenvalues of matrix Λi . Here the
design process becomes a little bit more involved because parameters pe1 , pe2 and
pe3 of matrix P
e shall be chosen to meet the desired complex conjugate closed-
loop eigenvalues λKi and λ̄Ki while minimizing the trace of P e (indeed it can be
shown that min(Ji ) = min(tr(P))). The design process has been described by
e
Arar & Sawan5 .
Alternatively, we can choose the three coecients qe1 , qe2 and qe3 of matrix
Qei = Qe T ≥ 0 such that the eigenvalues with negative real part of the following
i
Hamiltonian matrix H e i correspond to the desired eigenvalues λKi and λ̄Ki , as
proposed by Fujinaka & Omatu6 .
Thus the problem consists to nd matrix Q e i:
qe1 qe2
Qi =
e =QeT ≥ 0
i (4.87)
qe2 qe3
such that:
e i ) = (s − λKi )(s − λ̄Ki )(s + λKi )(s + λ̄Ki ) = s4 + c2 s2 + c0 (4.88)
det(sI − H
where:
Ai −GR−1 GT
H
ei = (4.89)
−Q
ei −Ai
Once matrix Q
e i has been computed, matrix Q is obtained as follows:
e i CT
Q = CQ (4.90)
Moreover the feedback control law u = −Ki x shift the pair of complex
conjugate eigenvalues (λi , λ̄i ) of matrix A to a pair of complex conjugate
eigenvalues (λKi , λ̄Ki ) as follows, assuming α + Re(λi ) ≥ 0:
e T
Pi = CPC
Re(λKi ) = − (2α + Re(λi ))
Qi = 2α Pi ⇒
Im(λKi ) = Im(λi )
Ki = R−1 BT Pi = R−1 GT PCe T
(4.92)
The design process proposed by Amin4 to shift several eigenvalues recursively
is the following:
1. Set i = 1 and A1 = A.
This means that the shifted poles shall have the same imaginary parts
than the original ones. Then compute (right) eigenvectors (v 1 , v 2 ) of
ATi corresponding to λi and λ̄i . In T(v 1 , v 2 ) are the left
other words
T T
v1 T λi 0 v1
eigenvectors of Ai : T Ai = . Then compute
v2 0 λ̄i v T2
C, G, α and Λi dened by:
v 1 = v̄ 2 ⇒ C = Re(v 1 ) Im(v 1 )
T
G=C B
α = − Re(λKi 2
+λi )
≥0 (4.95)
a −b
λi = a + jb ∈ C ⇒ Λi =
b a
Alternatively, P
e can be dened as follows:
e = X−1
P (4.97)
This relation does not lead to the value of gain K as soon as Nol (λKi )ω i
is a vector which is not invertible. Nevertheless assuming that n denotes the
order of state matrix A we can apply this relation for the n desired closed-loop
eigenvalues. We get:
(4.106)
K v K1 · · · v Kn = − p1 · · · pn
Using the fact that det (XY) = det (X) det (Y) leads to the following result:
det (I + KΦ(−s)B)T det (R) det ((I + KΦ(s)B))
= det R + (Φ(−s)B)T Q (Φ(s)B)
⇔ det (I + KΦ(−s)B)T det ((I + KΦ(s)B))
= det I + R−1 (Φ(−s)B)T Q (Φ(s)B)
(4.110)
On the other hand, let D(s) be the open-loop characteristic polynomial and
β(s) be the closed-loop characteristic polynomial:
D(s) = det (sI − A)
(4.111)
β(s) = det (sI − A + BK)
As in the previous section, let Nol (s) be the following polynomial matrix:
Then we get:
Nol (s)
Φ(s) := (sI − A)−1 ⇒ Φ(s) B = (sI − A)−1 B := (4.113)
D(s)
Furthermore the Hsu-Chen equality (3.138) reads as follows with those
notations:
Finally, using the fact that det XT = det (X), relation (4.110) becomes:
β(−s) β(s)
= det I + R−1 (Φ(−s)B)T Q (Φ(s)B) (4.115)
D(−s) D(s)
We
nally get the
following result where
T
det I + R (Φ(−s)B) Q (Φ(s)B) is a rational fraction whose denominator
−1
Thus the closed-loop eigenvalues are the roots λKi with negative real part
such that:
det D(s) D(−s) I + R−1 Nol (−s)T Q Nol (s) s=λ = 0 (4.117)
Ki
where vectors v Ki and pi are given as in the non optimal pole assignment
problem:
v Ki = Nol (λKi ) ω i
(4.120)
pi = D(λKi ) ω i
7
L.S. Shieh, H.M. Dib, R.E. Yates, Sequential design of linear quadratic state regulators
via the optimal root-locus techniques, IEE Proceedings D - Control Theory and Applications,
Volume: 135 , Issue: 4, July 1988, DOI: 10.1049/ip-d.1988.0040
112 Chapter 4. Design methods
T
D(−λKi )D(λKi )I + R−1 (Nol (−λKi )) Q Nol (λKi ) ω i = 0
D(λi ) = 0
(4.121)
λi = λKi
pi = D(λKi ) ω i = D(λi ) ω i = 0
⇒
Q Nol (λi ) ω i = Q v Ki = 0
On the other hand, if λKi and ω i are set, then Q and R shall be chosen such
that (4.118) holds ∀ i. Once matrix R = R > 0 has been set, matrix Q can be
assumed to be a real diagonal matrix whose coecients qi shall be computed to
comply with (4.118):
q1
Q = QT =
..
∈ Rn (4.122)
.
qn
Q
ωi ̸= 0 ∈ R ⇒ D(−λKi )D(λKi ) + (Nol (−λKi ))T Nol (λKi ) = 0 ∀ i (4.123)
R
We wish to design an optimal state feedback controller such that the closed-
loop poles are located at {λK1 = −10, λK2 = −2}.
To solve this problem, we rst observe that the eigenvalues of A are {λ1 =
−10, λ2 = 1}. Thus the problem consists in preserving λ1 = −10 in the state
feedback loop while shifting λ2 = 1 towards λK2 = −2. Because we are looking
for an optimal state feedback controller, we have to select matrix Q = QT ≥ 0
and R > 0 to achieve those specications.
The characteristic polynomial D(s) of state matrix A reads:
D(λK1 ) = D(−10) = 0
D(s) = det (sI − A) = s + 9s − 10 ⇒
2
(4.125)
D(λK2 ) = D(−2) = −24
4.5. Poles assignment in optimal regulator through matrix inequalities 113
We nally get:
−1
qr1 1 −100 0 300
= =
qr2 1 −4 288 3
Q qr1 0 300 0
⇒ := = (4.129)
R 0 qr2 0 3
■
while minimizing the quadratic performance index J(u(t)) for some Q > 0
and R > 0.
114 Chapter 4. Design methods
Z ∞
1
xT (t)Qx(t) + uT (t)Ru(t) dt (4.131)
J(u(t)) =
2 0
We provide in that section the material written by He, Cai and Han8 .
Assume that (A, B) is controllable. Then, the pole assignment problem is
solvable if and only if there exist two matrices X1 ∈ Rn×n and X2 ∈ Rn×n
such that the following matrix inequalities are satised:
where F is any matrix such that λ (F) = Λcl and (X1 , X2 ) satises the
following generalized Sylvester matrix equation9 :
P = X2 X−1
1 (4.135)
The starting point to get this result is the fact that there must exist an
eigenvector matrix X such that the following formula involving Hamiltonian
matrix H holds:
HX = XF (4.136)
A −BR−1 BT
X1 X1 X1
X= ⇒ = F (4.137)
X2 −Q −AT X2 X2
8
Hua-Feng He, Guang-Bin Cai and Xiao-Jun Han, Optimal Pole Assignment of Linear
Systems by the Sylvester Matrix Equations, Hindawi Publishing Corporation, Abstract and
Applied Analysis, Volume 2014, Article ID 301375, http://dx.doi.org/10.1155/2014/301375
9
An explicit solution to right factorization with application in eigenstructure assignment,
Bin Zhou, Guangren Duan, Journal of Control Theory and Applications 08/2005; 3(3):275-
279. DOI: 10.1007/s11768-005-0049-7
4.6. Model matching 115
(4.142)
Then we get a more general form of the quadratic performance index. Indeed
the quadratic performance index can be rewritten as:
T
1 ∞
R x Q S x
J(u(t)) = 2 0 dt
u S T R u (4.143)
∞
= 12 0 xT (t)Qx(t) + uT (t)Ru(t) + 2xT (t)Su(t) dt
R
where:
Q = QT := NT N ≥ 0
R = RT := DT D + R1 > 0 (4.144)
S := NT D
where:
Qm = Q − SR−1 ST
(4.146)
v(t) = u(t) + R−1 ST x(t)
116 Chapter 4. Design methods
ẋ = Ax(t) + Bu(t)
= Ax(t) + B v(t) − R−1 ST x(t)
(4.148)
= Am x(t) + Bv(t)
where Am = A − BR−1 ST
Where matrix P is the positive denite matrix which solves the following
algebraic Riccati equation (see (3.8)):
PAm + ATm P − PBR−1 BT P + Qm = 0 (4.150)
In that section we consider the problem to nd control u(t) which minimizes
the following performance index:
1
R∞ T
J(u(t)) = e (t)e(t) dt
2
1
R0∞ T (4.152)
= 2 0 (ẋ(t) − Ar x(t)) (ẋ(t) − Ar x(t)) dt
1 ∞ T
Z
x (t)Qx(t) + uT (t)Ru(t) + 2xT (t)Nu(t) dt (4.154)
J(u(t)) =
2 0
4.7. Optimal output feedback 117
Where: T
Q = (A − Ar ) (A − Ar )
R = BT B (4.155)
N = (A − Ar )T B
Then we can re-use the results of section 4.6.1. Let P be the positive denite
matrix which solves the following algebraic Riccati equation:
Where:
Qm = Q − NR−1 NT
Am = A − BR−1 NT (4.157)
v(t) = u(t) + R−1 NT x(t)
The stabilizing control u(t) is then dened in a similar fashion than (4.149):
Λr = V−1 Ar V (4.159)
Let Acl be the state matrix of the closed-loop which is written using matrix
V as follows :
ẋ(t) = Acl x(t) = VΛcl V−1 x(t) (4.160)
Assuming that the desired Jordan form Λr is a diagonal matrix and using
the fact V−1 = VT the product eT (t)e(t) in (4.152) reads as follows:
R∞From the preceding equation it is clear that minimizing the cost J(u(t)) =
0 e (t)e(t) dt consists in nding the control u(t) which minimizes the gap
1 T
2
between the desired eigenvalues (which are set in Λr ) and the actual eigenvalues
of the closed-loop.
Thus, after integration, and denoting by x(0) the initial value of x(t), we
get:
x(t) = e(A−BK) t x(0) (4.167)
Consequently, the performance index J reads:
R∞
J = 0 xT (t) Q + KT RK x(t) dt
R∞ T (4.168)
= 0 x(0)T e(A−BK) t Q + KT RK e(A−BK) t x(0) dt
Because the initial value x(0) of the state vector is usually unknown, the
product x(0)x(0)T is removed from the performance index J . We get:
J = tr (P) (4.170)
Alternatively, and following Lewis & al.10 , assume that there exists a positive
denite matrix P = PT > 0 such that the following equality holds:
d T
x (t)Px(t) = −xT (t) Q + KT RK x(t) (4.175)
dt
Then the performance index J dened in (4.165) reads:
R∞
J = 0 R xT (t) Q + KT RK
x(t) dt
∞ d T
= − 0 dt x (t)Px(t) dt
t→∞ (4.176)
= − xT (t)Px(t) t=0
= xT (0) P x(0) − limt→∞ xT (t) P x(t)
Assuming that the closed-loop is stable so that x(t) vanishes with time, we
get the following relation where tr (X) denotes the trace of matrix X:
lim x(t) = 0 ⇒ J = xT (0) P x(0) = tr P x(0) xT (0) (4.177)
t→∞
Since this relation shall hold for all value of x(t), we shall have:
− Q + KT RK = (A − BK)T P + P (A − BK) (4.179)
where Acl := A − BK
10
Lewis F., Vrabie D., Syrmos V., Optimal Control, John Wiley & Sons, 3rd Edition, 2012
120 Chapter 4. Design methods
AT P + PA − PBR−1 BT P + Q = 0
⇔ (A − BK)T P + P (A − BK) + (BK)T P + PBK − PBR−1 BT P + Q = 0
⇔ (A − BK)T P + P (A − BK) + KT BT P + PBK − PBR−1 BT P + Q = 0
K = R−1 BT P ⇒ (A − BK)T P + P (A − BK) + KT BT P + −
PBK PBK
+ Q = 0
BT P = RK ⇒ (A − BK)T P + P (A − BK) + Q + KT RK = 0
(4.181)
Following Lewis & al.10 , assume that there exists a positive denite matrix
P = PT > 0 such that the following equality holds:
d T
x (t)Px(t) = −xT (t) QK x(t) (4.186)
dt
The performance index J now reads:
R∞ d T
J = − 0 dt x (t)Px(t) dt
t→∞
= − xT (t)Px(t) t=0 (4.187)
= xT (0) P x(0) − limt→∞ xT (t) P x(t)
Assuming that the closed-loop is stable so that x(t) vanishes with time, we
get the following relation where tr (X) denotes the trace of matrix X:
Example 4.2. In order to illustrate the rst relation, we consider the following
matrices:
2 3
X=
4 5 (4.193)
y11 y12
Y=
y21 y22
Then:
2 3 y11 y12 2y11 + 3y21 2y12 + 3y22
tr (XY) = =
4 5 y21 y22 4y11 + 5y21 4y12 + 5y22 (4.194)
⇒ tr (XY) = 2y11 + 3y21 + 4y12 + 5y22
Thus:
∂ tr(XY) ∂ tr(XY)
∂ tr (XY) ∂y11 ∂y12 2 4
= = = XT (4.195)
∂Y ∂ tr(XY) ∂ tr(XY) 3 5
∂y21 ∂y22
■
122 Chapter 4. Design methods
Now, we are in position to solve the output feedback optimal control problem
thanks to the Lagrange multiplier approach. We dene the (scalar) Hamiltonian
H as follows where Λ = ΛT is a n × n diagonal matrix of Lagrange multipliers
to be determined:
H = tr (P) + tr Λ (A − BKC)T P + P (A − BKC) + QK (4.196)
From the rst equation, and assuming that CΛCT is nonsingular, the static
output feedback gain K can be computed as a function of Lagrange multipliers
Λ:
∂H −1
= 0 ⇒ K = R−1 BT PΛCT CΛCT (4.198)
∂K
It is worth noticing that for the static state feedback case where C = I, the
static state feedback K no more depends of Lagrange multipliers Λ:
C = I ⇒ K = R−1 BT P (4.199)
Moreover, in the state feedback case the Lyapunov equation specifying the
constraint turns to be the algebraic Riccati equation:
C = I ⇒ K = R−1 BT P ⇒ QK = Q + KT RK = Q + PBR−1 BT P
⇒ 0 = (A − BKC)T P + P (A − BKC) + QK
= (A − BK)T P + P (A − BK) + Q + PBR−1 BT P
= AT P + PA − PBR−1 BT P + Q
(4.200)
We have seen in the previous section that if there exists a positive denite
matrix P = PT > 0 such that the following equality holds:
Then the minimum cost J ∗ reads as follows where tr (X) denotes the trace of
matrix X:
J ∗ = xT (0) P x(0) = tr P x(0) xT (0) (4.203)
where α ≤ 0
Aα =A − αI
(4.204)
sin(θ)Aα cos(θ)Aα sin(θ) cos(θ)
Ae = := ⊗ Aα
− cos(θ)Aα sin(θ)Aα − cos(θ) sin(θ)
Moreover let:
h
i sin(θ) cos(θ)
B e1 Be 2 := ⊗B
h − cos(θ)
i sin(θ) (4.206)
A
e− B
e cl = A e1 B2 KC
e
11
Yuan L., Achenie L., Jiang W., Linear Quadratic Optimal Output Feedback Control For
Systems With Poles In A Specied Region, International Journal of Control, Vol. 64(6), pages
= 1151-1164, 1996
124 Chapter 4. Design methods
Then the optimal control problem for poles placement in a specied region
reads as follows11 :
Find P = PT > 0 and K
which minimize tr (P) under the constraints :
( (4.207)
(A − BKC)T P + P (A − BKC) + QK = 0
e T P + PA
A e cl < 0
cl
where:
sin(θ) cos(θ)
A
e cl = ⊗ (A − αI − BKC) (4.208)
− cos(θ) sin(θ)
Let the state space model of the rst equation of (4.211) be the following,
where z(t) is the output and x(t) the input of the following MIMO system:
χ̇q (t) = Aq χq (t) + Bq x(t)
z(t) = Nq χq (t) + Dq x(t) (4.215)
⇒ z(s) = Nq (sI − Aq )−1 Bq + Dq x(s) = Wq (s)x(s)
Similarly, let the state space model of the second equation of (4.211) be the
following, where v(t) is the output and u(t) the input of the following MIMO
system:
χ̇r (t) = Ar χr (t) + Br u(t)
v(t) = Nr χr (t) + Dr u(t) (4.216)
⇒ v(s) = Nr (sI − Ar )−1 Br + Dr u(s) = Wr (s)u(s)
That is:
x(t)
z T (t)z(t) + v T (t)v(t) = xT (t) χTq (t) χr (t)T Qf χq (t)
χr (t)
+ 2 x (t) χTq (t) χr (t)T Nf u(t) + uT (t)Rf u(t) (4.218)
T
Where: T
Dq Dq DTq Nq
0
Qf = NTq Dq NTq Nq
0
T
0 0 Nr Nr
0 (4.219)
Nf = 0
T
Nr Dr
Rf = DTr Dr
Where:
A 0 0
Aa = Bq Aq 0
0 0 Ar
(4.222)
B
B = 0
a
Br
Using (4.218) the performance index J dened in (4.214) is written as
follows:
Z ∞
J= xTa (t)Qf xa (t) + 2xa (t)Nf u(t) + u(t)T Rf u(t) dt (4.223)
0
Denoting by P the positive denite matrix which solves the algebraic Riccati
equation (4.224), the stabilizing control u(t) is then dened in a similar fashion
than (4.149): (
u(t) = −Kx(t)
(4.226)
K = R−1f (PBa + Nf )
T
This linear equation in the coecients of p(s) and q(s) has solution for
arbitrary β(s) if and only if m ≥ n. The solution is unique if and only if
m = n. Now consider the following performance measure J(ρ, µ) where ρ and
µ are positive number to give relative weights to outputs y1 (t) and y2 (t) and to
inputs w1 (t) and w2 (t) respectively and δ(t) is the Dirac delta function:
Z ∞
1
J(ρ, µ) = y12 (t) + ρy22 (t) dt
2 0
w1 (t) = µδ(t)
w2 (t) = 0
Z ∞
1
+ y12 (t) + ρy22 (t) dt (4.231)
2 0
w1 (t) = 0
w2 (t) = δ(t)
− Find polynomial dµ (s) (also called spectral factor ) which is formed with
the n roots with negative real parts of D(s)D(−s) + µ2 N (s)N (−s):
− Find polynomial dρ (s) (also called spectral factor ) which is formed with
the n roots with negative real parts of D(s)D(−s) + ρN (s)N (−s):
− Then the optimal controller K(s) = q(s)/p(s) is the unique nth order
strictly proper transfer function such that:
5.1 Introduction
The regulator problem that has been tackled in the previous chapters is in fact
a spacial case of a wider class of problems where the outputs of the system are
required to follow a desired trajectory in some optimal sense. As underlined in
the book of Anderson and Moore trajectory following problems can be
conveniently separated into three dierent problems which depend on the
nature of the desired output trajectory:
− If the plant outputs are to follow a class of desired trajectories, for example
all polynomials up to certain order, the problem is referred to as a servo
(servomechanism) problem;
− When the plant outputs are to follow the response of another plant (or
model) the problem is referred to as model following problems;
This chapter is devoted to the presentation of some results common to all three of
these problems, with a particular attention being given on the tracking problem.
The optimal control problem is then split into two separate problems which
are solved individually to form the suboptimal control:
− First the commanded value r(t) is set to zero and the gain K is computed
to solve the Linear Quadratic Regulator (LQR) problem;
− Then the feedforward gain F is computed such that the steady-state value
of output y(t) is equal to the commanded value r(t) := y c .
r(t) := y c (5.3)
Using the expression (5.2) of the control u(t) within the state space
realization (5.1) of the linear system leads to:
ẋ(t) = Ax(t) + Bu(t) = (A − BK) x(t) + BFy c
(5.4)
y(t) = C x(t)
Then matrix F is computed such that the steady-state value of output y(t) is
y c . Assuming that ẋ = 0, which corresponds to the steady-state, the preceding
equations becomes:
That is:
y = −C(A − BK)−1 BFy c (5.6)
Setting y to y c and assuming that the size of the output vector y(t) is the
same than the size of the control vector u (square plant) leads to the following
expression of the feedforward gain F:
−1
y = y c ⇒ F = − C (A − BK)−1 B (5.7)
For a square plant the feedforward gain F is nothing than the inverse of the
closed-loop static gain (the closed-loop static gain is obtained by setting the
Laplace variable s to 0 in the expression of the closed-loop transfer function).
It is now desired to nd an optimal control law in such a way that the
controlled output y(t) tracks or follows a reference output r(t). Hence the
performance index is dened as:
1 tf T
Z
1 T
J(u(t)) = e (tf )Se(tf ) + e (t)Qe(t) + uT (t)Ru(t) dt (5.9)
2 2 0
5.3. Finite horizon Linear Quadratic Tracker 131
λ̇(t) = − ∂H
∂x
T
= − ∂e∂x(t) Q e(t) + AT λ(t)
(5.13)
= − CT Q (C x(t) − r(t)) + AT λ(t)
P(tf ) = CT SC
(5.17)
g(tf ) = CT Sr(tf )
From the preceding equation it is clear that the optimal control is the sum
of two components:
− a state-feedback component: −K(t) x(t) where K(t) = R−1 BT P(t);
132 Chapter 5. Linear Quadratic Tracker (LQT)
Using (5.8), (5.15) and (5.18) to express u(t) as a function of x(t) and g(t)
we nally get:
Ṗ(t) + AT P(t) + P(t)A − P(t)BR−1 BT P(t) + CT QC x(t)
− ġ(t) − AT − P(t)BR−1 BT g(t) − CT Q r(t) = 0 (5.21)
and:
T −1 T T
−ġ(t) = A − P(t)BR B g(t) + C Q r(t)
:= (A − BK(t))T g(t) + CT Q r(t) (5.23)
g(tf ) = CT Sr(tf )
Then let x
e(t) the error between the actual state-vector x(t) and its steady-
state value xss and u
e(t) the error between the actual control u(t) and its steady-
state value uss :
e(t) := x(t) − xss
x
(5.35)
e(t) := u(t) − uss
u
Then using (5.35) the dynamics of x
e(t) reads:
e˙ (t) = ẋ(t)
x
= Ax(t) + Bu(t) (5.36)
= A (ex(t) + xss ) + B (e
u(t) + uss )
e˙ (t) = Ae
Axss + Buss = 0 ⇒ x x(t) + Be
u(t) (5.37)
2
Hamidreza Modares, Frank L. Lewis, Online Solution to the Linear Quadratic Tracking
Problem of Continuous-time Systems using Reinforcement Learning, 52nd IEEE Conference
on Decision and Control, December 10-13, 2013. Florence, Italy
5.5. Plant augmented with integrator 135
1 ∞ T
Z
J(eu(t)) = e (t)Qe(t) + u eT (t)Re
u(t) dt (5.39)
2 0
e(t) + uss = −K x
u(t) = u e(t) + uss = −K (x(t) − xss ) + uss (5.42)
ẋ(t) = Ax(t) + Bu(t)
y(t) = C x(t)
Te(t) = T r(t) − y(t)
ẋi (t) = −
= Tr(t) TC x(t)
(5.44)
d x(t) A 0 x(t) B 0
= + u(t) + r(t)
dt
⇔
xi (t) −TC 0 xi (t) 0 T
x(t)
y(t) = C 0
xi (t)
Qa = NTa Na (5.47)
Using the Laplace transform, and denoting by I the identity matrix, we get:
−1 0
X a (s) = (sI − Aa + Ba Ka ) R(s)
T (5.53)
E(s) = T R(s) − C 0 X a (s)
Thus:
−1
−1 −A + BKp BKi
(−Aa + Ba Ka ) =
TC 0
BKi (TCBKi )−1
" #
0
= −1
(BKi )T BKi (BKi )T W
(5.58)
And:
0 BKi (TCBKi )−1 0
0 −1
(−Aa + Ba K) =
∗
T W T
BKi (TCBKi )−1 T
=
WT (5.59)
BKi (TCBKi )−1 T
−1 0
⇒ C 0 (−Aa + Ba K) = C 0
T WT
= CBKi (TCBKi )−1 T
5.6. Tracking with prelter 139
Figure 5.2: Linear Quadratic Tracker with constant reference signal rss
Consequently, using (5.59) in (5.55), the nal value of the error e(t) becomes:
−1 0
limt→∞ e(t) = T I − C 0 (−Aa + Ba Ka )
T
= T I − CBKi (TCBKi )−1 T
(5.60)
= T − TCBKi (TCBKi )−1 T
=T−T
=0
0 = A1 xss + B1 uss + B2 rss
0 = Ar xrss + Br rss
y ss = y r ⇔ C1 xss = Cr xrss
ss
−B2 A1 0 B1 xss
⇔ −Br rss = 0 Ar 0 xrss (5.64)
0 C1 −Cr 0 uss
Let:
−B2 M1
M−1 −Br := M2 (5.67)
0 M3
Thus:
xss M1
xr = M2 rss
ss
(5.68)
uss M3
e˙ (t) = ẋ(t)
x
= A1 x(t) + B1 u(t) + B2 rss
(5.70)
= A1 (e x(t) + xss ) + B1 (e
u(t) + uss ) + B2 rss
= A1 xe(t) + B1 u
e(t) + A1 xss + B1 uss + B2 rss
Similarly:
e˙ r (t) = ẋr (t)
x
= Ar xr (t) + Br rss
(5.71)
= Ar x er (t) + xrss + Br rss
= Ar x er (t) + Ar xrss + Br rss
It is clear from (5.64) that A1 xss +B1 uss +B2 rss = 0 and Ar xrss +Br rss = 0.
We nally get the following state space equation:
˙
x
e(t) A1 0 x
e(t) B1
= + e(t)
u
e˙ r (t)
x 0 Ar x
e (t) 0
r (5.72)
x
e(t)
:= Aa + Ba u
e(t)
x
er (t)
Using the last equation of (5.64), we nally get the following output
equation:
x(t) M1
u(t) = −Ka +K rss + M3 rss
xr (t) a M 2
x(t) (5.79)
:= − Kx Kr + Dpf rss
xr (t)
:= −Kx x(t) + y pf (t)
where:
y pf (t) = −K
r x (t) + Dpf r ss
r
M1 (5.80)
Dpf := Ka + M3
M2
Consequently the actual optimal control u(t) is the sum of two components:
− a state-feedback component: −Kx x(t);
− and a feedforward component y pf (t) with is obtained as the output of the
a prelter Cpf (s) with the following realization:
(
ẋr (t) = Ar xr (t) + Br rss
Cpf (s) : (5.81)
y pf (t) = −Kr xr (t) + Dpf rss
as follows:
Ca := [ C I −Cr ]
|{z} (5.84)
0 becomes I
6.1 Introduction
The design of the Linear Quadratic Regulator (LQR) assumes that the whole
state is available for control and that there is no noise. Those assumptions may
appear unrealistic in practical applications. We will assume in this chapter that
the process to be controlled is described by the following linear time invariant
model where w(t) and v(t) are random processes which represents the process
noise and the measurement noise, respectively:
ẋ(t) = Ax(t) + Bu(t) + w(t)
(6.1)
y(t) = Cx(t) + v(t)
Figure 6.1: Open-loop linear system with process and measurement noises
146 Chapter 6. Linear Quadratic Gaussian (LQG) regulator
As far as only the output y(t) is now available for control (not the full state
x(t)), the separation principle will be used to design the LQG regulator. Indeed,
the solution of the LQG problem can be split into two steps:
− First an estimator will be used to estimate the full state using the available
output y(t)
We assume that x(t) cannot be measured and the goal of the observer is to
estimate x(t) based on y(t). Luenberger observer (1964) provides an estimation
of the state vector through the following dierential equation where matrices F,
J and L have to be determined:
d
x
b(t) = Fb
x(t) + Ju(t) + Ly(t) (6.4)
dt
The estimation error e(t) is dened as follows:
e(t) = x(t) − x
b(t) (6.5)
Using (6.5) and the output equation y(t) = Cx(t) the preceding relation can
be rewritten as follows:
ė(t) = Ax(t) + Bu(t) − F (x(t) − e(t)) − Ju(t) − LCx(t)
(6.7)
= Fe(t) + (A − F − LC) x(t) + (B − J) u(t)
As soon as the purpose of the observer is to move the estimation error e(t)
towards zero independently of control u(t) and true state vector x(t) we choose
matrices F and J as follows:
J=B
(6.8)
F = A − LC
6.3. White noise through Linear Time Invariant (LTI) system 147
b˙ (t) = (A − LC) x
x b(t) + Bu(t) + Ly(t)
(6.10)
= Abx(t) + Bu(t) + L y(t) − Cb
x(t)
Figure 6.2 shows the structure of the Luenberger observer.
We will assume that w(t) is a white noise (which is a special case of a wide-
sense stationary (WSS) random process) with zero mean Gaussian probability
density function (pdf) p(w). The covariance matrix of the Gaussian probability
density function p(w) will be denoted Pw and the Dirac delta function will be
denoted δ(τ ):
− 1 wT P−1
1 w w
p(w) = (2π)n/2 √det(Pw ) e 2
E [w(t)] = m (6.13)
w (t) =T 0
Rw (τ ) = E w(t)w (t + τ ) = Pw δ(τ ) where Pw = PTw > 0
Let mx (0) = E [x0 ] be the mean of the initial value x0 of the state vector
x(t) and Px (0) the covariance matrix of the initial value x0 of the state vector.
Then it can be shown that x(t) is a Gaussian random process with:
Assuming that mx (0) = 0 we get zero for the mean value of x(t)::
Finally, assuming that mx (0) = 0 and the input random process w(t) is
a zero mean white noise with autocorrelation function Rw (τ ) = Pw δ(τ )
and
is uncorrelated with the initial value x0 of the state vector, that is
E x0 wT (τ ) = 0, then matrix Px (t) reads as follows:
(6.20)
Rt
Using the fact that 0 g(τ1 ) δ(τ1 − τ ) dτ1 = g(τ ), we nally obtain:
Z t
Tt T (t−τ )
Px (t) = eAt Px (0)eA + eA(t−τ ) BPw BT eA dτ (6.21)
0
Assuming that the system is stable (i.e. all the eigenvalues of the state matrix
A have negative real part) the random process x(t) will become stationary after
a certain amount of time: its mean mx (t) will be zero whereas the value of its
covariance matrix Px (t) turns to be a constant matrix Px = PTx ≥ 0 ∀ t which
solves the following matrix algebraic Lyapunov equation:
Thus after a certain amount of time the state vector x(t) as well as the
output vector y(t) are wide-sense stationary (WSS) random processes.
Then we will see in Section 6.3.4 that the following result holds:
where F(s) is the transfer function of the linear system, which is assumed
to be stable :
F(s) = C (sI − A)−1 B (6.29)
Relation (6.28) indicates that the power spectral density (psd) Sy (f ) of y(t)
can be obtained thanks to the transfer function F(s) of the stable linear system
and the spectral density matrix Pw of the exciting white noise w(t).
Let Sy (s) be the (one-sided) Laplace transform of the autocorrelation
function Ry (τ ):
Z +∞
Sy (s) = L [Ry (τ )] = Ry (τ )e−sτ dτ (6.30)
0
It can be seen that the power spectral density (psd) Sy (f ) of y(t) can be
obtained thanks to the (one-sided) Laplace transform Sy (s) of Ry (τ ) as:
Ry (−τ ) = Ry (τ )
R +∞ R +∞ (6.33)
⇒ Sy (f ) = 0 Ry (τ )esτ dτ + 0 Ry (τ )e−sτ dτ
s=j2πf s=j2πf
When identifying the stable transfer function Sy (s) in the preceding relation,
we get the autocorrelation function Ry (τ ) ∀τ ≥ 0 thank to the inverse (one-
sided) Laplace transform of Sy (s):
Example 6.1. Let F(s) be a rst order system with time constant a and let
w(t) be a white noise with covariance Pw :
1
F(s) = 1+as
(6.38)
Rw (τ ) = E w(t)wT (t + τ ) = Pw δ(τ ) where Pw = PTw > 0
As far as a > 0 the system is stable. The covariance matrix Px (t) is dened
as follows: h i
Px (t) = E (x(t) − mx (t)) (x(t) − mx (t))T (6.42)
We get:
Pw Pw
F(−s) Pw FT (s) = (1+as)(1−as) = 1−(as)2
T Pw (6.48)
⇒ Sy (f ) = F(−s) Pw F (s) s=j2πf = 1+(2πf a)2
Pw Pw 1 Pw 1
2
= + = Sy (−s) + Sy (s) (6.49)
1 − (as) 2 1 − as 2 1 + as
Finally we use the initial value theorem on the (one-sided) Laplace transform
Sy (s) to get the following result:
Pw
Py = Ry (τ )|τ =0 = lim s Sy (s) = (6.54)
s→∞ 2a
■
6.3. White noise through Linear Time Invariant (LTI) system 153
= E Cx(t) xT (t + τ )C T
= C E x(t) xT (t + τ ) CT
R R T
t A(t−τ1 ) t+τ A(t+τ −τ2 )
= CE 0 e B w(τ1 ) dτ1 0 e B w(τ2 ) dτ2 CT
R R
t t+τ T
= C 0 0 eA(t−τ1 ) B E w(τ1 ) wT (τ2 ) BT eA (t+τ −τ2 ) dτ1 dτ2 CT
(6.55)
Using the facts that w(t) is Rta white noise, that is
E w(τ1 ) wT (τ2 ) = Pw δ(τ2 − τ1 ), and that 0 g(τ1 ) δ(τ2 − τ1 ) dτ1 = g(τ2 ), we
get:
R R
t t+τ T
Ry (t, t + τ ) = C 0 0 eA(t−τ1 ) B Pw δ(τ2 − τ1 ) BT eA (t+τ −τ2 ) dτ1 dτ2 CT
R
t T
= C 0 eA(t−τ2 ) B Pw BT eA (t+τ −τ2 ) dτ2 CT
(6.56)
Let ξ := t − τ2 ⇒ dξ = −dτ2 . Then:
R
0 T
Ry (t, t + τ ) = −C t eAξ B Pw BT eA (ξ+τ ) dξ CT
R
t T
(6.57)
= C 0 eAξ B Pw BT eA (ξ+τ ) dξ CT
We nally get1 :
Z t T
Ry (t, t + τ ) = C eAξ B Pw C eA(ξ+τ ) B dξ (6.58)
0
Of course (6.59) is valid only if the process has a steady state response, that
is if the process is stable.
Then the power spectral density (psd) Sy (f ) of y(t) is dened as the Fourier
transform of the autocorrelation function Ry (τ ). We get from (6.59):
R +∞ −j2πf τ dτ
Sy (f ) = R
Ry (τ )e
R−∞
+∞ ∞ T
= −∞ 0 C eAξ B Pw C eA(ξ+τ ) B dξ e−j2πf τ dτ
R (6.60)
R∞ +∞ T
= 0 C eAξ B Pw −∞ C eA(ξ+τ ) B e−j2πf τ dτ dξ
1
Friedland B., Control System Design: An Introduction to State-Space Methods, Dover
Books on Electrical Engineering (2012)
154 Chapter 6. Linear Quadratic Gaussian (LQG) regulator
It is worth noticing that the time response of a causal system is zero ∀t < 0.
So we recognize in F(j2πf ) the transfer function of the linear system when
s = j2πf . Indeed for a linear and causal system we have:
R +∞
F(j2πf ) = −∞ C eAt B e−j2πf t dt
R +∞
= 0 C eAt B e−j2πf t dt
R +∞
= 0 C eAt B e−st dt
s=j2πf
R +∞ −(sI−A)t (6.63)
= 0 Ce B dt
s=j2πf
= C (sI − A)−1 B
s=j2πf
= F(s)|s=j2πf
− Random vectors w(t) and v(t) are zero mean Gaussian noise. Let p(w)
and p(v) be the probability density function (pdf) of random processes
w(t) and v(t). Then:
1 T −1
p(w) = n/2
√1 e− 2 w Pw w
(2π) det(Pw )
1 − 12 v T P−1
(6.67)
p(v) = √ v v
p/2
e
(2π) det(Pv )
− Random vectors w(t) and v(t) are white noise (i.e. uncorrelated). The
covariance matrices of w(t) and v(t) will be denoted Pw and Pv
respectively:
E w(t)v T (t + τ ) = 0
(6.69)
E v(t)wT (t + τ ) = 0
b˙ (t) = Ab (6.70)
x x(t) + Bu(t) + L(t) y(t) − Cbx(t)
Where the time dependent observer gain L(t), also-called Kalman gain, is
given by:
L(t) = Y(t)CT P−1v (6.71)
where matrix Y(t) is the solution of the following dierential Riccati
equation:
L = YCT P−1
v
(6.73)
0 = AY + YAT − YCT P−1 v CY + Pw
For discrete time systems, the following discrete time algebraic Riccati
equation has be be solved to get the suboptimal observer gain, as shown in
section 3.8:
−1
Y + AYCT Pv + CYCT CYAT − AYAT − Pw = 0 (6.74)
Kalman gain shall be tuned when the covariance matrices Pw and Pv are
not known:
− When measurements y(t) are very noisy the coecients of covariance
matrix Pv are high and Kalman gain will be quite small;
156 Chapter 6. Linear Quadratic Gaussian (LQG) regulator
− On the other hand when we do not trust very much the linear time
invariant model of the process the coecients of covariance matrix Pw
are high and Kalman gain will be quite high.
From a practical point of view matrices Pw and Pv are design parameters
which are tuned to achieve the desired properties of the closed-loop.
Moreover, when the Riccati equation (6.72) related to the Kalman-Bucy
lter is identied to the Riccati equation related the Linear-Quadratic-Regulator
(LQR) we get:
e(t) = x(t) − x
b(t) (6.76)
b˙ (t)
ė(t) = ẋ(t) − x
= Ax(t) + Bu(t) + w(t) − Ab x(t) + Bu(t) + L(t) y(t) − Cb
x(t)
= Ae(t) + w(t) − L(t) (Cx(t) + v(t) − Cb x(t))
= (A − L(t)C) e(t) + w(t) − L(t)v(t)
(6.77)
Since v(t) and w(t) are zero mean white noise their weighted sum n(t) =
w(t) − L(t)v(t) is also a zero mean white noise. We get:
Pn = E hn(t)nT (t)
i
= E (w(t) − L(t)v(t)) (w(t) − L(t)v(t))T (6.79)
= Pw + L(t)Pv LT (t)
Let's complete the square of −L(t)CY(t) − Y(t)CT L(t)T + L(t)Pv LT (t). First
we will focus on the scalar case where we try to minimize the following quadratic
function f (L) where Pv > 0:
f (L) = P−1 2 2 2 −1
v (LPv − YC) − Y C Pv (6.83)
Then it is clear that f (L) is minimal when LPv − YC and that the minimal
value of f (L) is −Y2 C2 P−1
v . This approach can be extended to the matrix case.
When we complete the square of −L(t)CY(t) − Y(t)CT L(t)T + L(t)Pv LT (t)
we get:
In order to nd the optimum observer gain L(t) which minimizes the covariance
matrix Y(t) we choose L(t) such that Y(t) decreases by the maximum amount
possible at each instant in time. This is accomplished by setting L(t) as follows:
Once L(t) is set such that L(t)Pv − Y(t)CT = 0 the matrix dierential
equation (6.85) reads as follows:
1 ∞ T
Z
J(u(t)) = x (t)Qx(t) + uT (t)Ru(t)dt (6.88)
2 0
158 Chapter 6. Linear Quadratic Gaussian (LQG) regulator
0 = AT P + PA − PBR−1 BT P + Q (6.90)
Then let's compare the preceding relations with the following relations which
are actually those which have been seen in (6.73):
L = YCT P−1
v
(6.92)
0 = YAT + AY − YCT P−1 v CY + Pw
Then it is clear than the duality principle on Table 6.1 between observer and
controller gains apply.
Controller Observer
A AT
B CT
C BT
K LT
P = PT ≥ 0 Y = YT ≥ 0
Q = QT ≥ 0 Pw = PTw ≥ 0
R = RT > 0 Pv = PTv > 0
A − BK AT − CT LT
Table 6.1: Duality principle
e(t) = x(t) − x
b(t) (6.93)
Using (6.109) we get the following expressions for the dynamics of the state
vector x(t):
ẋ(t) = Ax(t) + Bu(t) + w(t)
= Ax(t) − BKb x(t) + w(t)
(6.94)
= Ax(t) − BK (x(t) − e(t)) + w(t)
= (A − BK) x(t) + BKe(t) + w(t)
6.6. Separation principle 159
In addition using (6.108) and y(t) = Cx(t) + v(t) we get the following
expressions for the dynamics of the estimation error e(t) :
1 ∞ T
Z
J(u(t)) = x (t)Qx(t) + uT (t)Ru(t)dt (6.97)
2 0
0 = AT P + PA − PBR−1 BT P + Q (6.99)
d
(6.101)
x x(t) + Bu(t) + L y(t) − Cb
b(t) = Ab x(t)
dt
u(t) = −Kb
x(t) (6.102)
160 Chapter 6. Linear Quadratic Gaussian (LQG) regulator
L = YCT P−1
v (6.103)
If the full state vector x(t) is assumed not to be available the control u(t) =
−Kx(t) cannot be computed. Then an observer has to be added. We recall the
dynamics of the observer (see (6.10)):
b˙ (t) = Ab (6.108)
x x(t) + Bu(t) + L y(t) − Cbx(t)
u(t) = −Kb
x(t) (6.109)
Gathering (6.108) and (6.109) leads to the state space representation of the
controller:
b˙ (t)
x A K BK x
b(t)
(6.110)
u(t) CK DK y(t)
Where:
AK BK A − BK − LC L
= (6.111)
CK DK −K 0
The controller transfer function K(s) is the relation between the Laplace
transform of its output, U (s), and the Laplace transform of its input, Y (s). By
6.7. Controller transfer function 161
Figure 6.3: Block diagram of the controller in the time domain and the frequency
domain
taking the Laplace transform of equation (6.108) and (6.109) (and assuming no
initial condition) we get:
(
sX(s)
b = AX(s)
b + BU (s) + L Y (s) − CX(s)
b
(6.112)
U (s) = −KX(s)
b
We nally get:
U (s) = −K(s)Y (s) (6.113)
where w(t) and v(t) are Gaussian white noise with covariance matrices Pw
and Pv , respectively:
1 1
Pw = σ
1 1
(6.117)
σ>0
Pv = 1
1 ∞ T
Z
J(u(t)) = x (t)Qx(t) + uT (t)Ru(t)dt (6.118)
2 0
where:
1 1
Q=q
1 1
(6.119)
q>0
R=1
0 = AT P + PA − PBR−1 BT P + Q (6.122)
We get:
∗ ∗
P= (6.123)
α α
And:
u(t) = −Kb
x(t) (6.127)
The observer gain L = YCT P−1 v is obtained thanks to the positive semi-
denite solution Y of the following algebraic Riccati equation:
We get:
β ∗
Y= (6.129)
β ∗
And:
√
1
L = YC T
P−1
v =β where β = 2 + 4+σ >0 (6.130)
1
164 Chapter 6. Linear Quadratic Gaussian (LQG) regulator
Now assume that the input matrix of the plant is multiplied by a scalar gain
∆ (nominally unit) :
1
ẋ(t) = Ax(t) + ∆Bu(t) + w(t) (6.131)
1
1
p0 < 0 ⇔ ∆ > 1 + (6.136)
αβ
With large values of α and β even a slight increase in the value of ∆ from
its nominal value will render the closed-loop system to be unstable. Thus the
phase margin of the LQG control-loop can be almost 0. This example clearly
shows that the robustness of the LQG control-loop to modeling uncertainty is
not guaranteed.
AT P + PA − PBR−1 BT P + Q = 0 (6.139)
λI − A + BR−1 BT P v = 0 ⇒ Q + λP TP v = 0
+ A
(6.141)
T
v
⇔ Q λI + A =0
Pv
λI − A BR−1 BT
v
=0
Q λI + AT Pv
(6.142)
A −BR−1 BT
v
⇔ λI − =0
−Q −AT Pv
The preceding relation indicates the equivalence between any closed-loop
eigenvalue λ and the corresponding eigenvector v of the LQ state-feedback
correspond and any eigenvalue of the Hamiltonian matrix
A −BR−1 BT
v
H := and the corresponding eigenvector where
−Q −AT Pv
matrix P is the positive semi-denite solution of the algebraic Riccati
equation.
Q = CT C = CT WT WC = (WC)T (WC)
⇒ 0 = AT P + PA − PBR−1 BT P + Q
−1
= AT P + PA − PB Mϵ2 BT P + (WC)T (WC)
−0.5 T −0.5
= AT P + PA − M ϵ BT P M
BT P + (WC)T (WC)
ϵ
(6.144)
166 Chapter 6. Linear Quadratic Gaussian (LQG) regulator
1 M−0.5 T 1
K = R−1 BT P = M−0.5 B P ≈ M−0.5 WC = R−0.5 WC (6.146)
ϵ ϵ ϵ→0 ϵ
− Input recovery: let ωc be the cut-o frequency (i.e. 0 dB) of the targeted
dynamics. The objective is to tune ρ such that:
The objective of the input recovery design is shown in Figure 6.4. The
corresponding objective in the state space domain is the following:
ẋ(t) = Ax(t) + Bu(t)
(6.149)
u(t) = −Kx(t) + r(t)
6.8. Loop Transfer Recovery 167
− Output recovery: let ωc be the cut-o frequency (i.e. 0 dB) of the targeted
dynamics. The objective is to tune ρ such that:
The objective of the output recovery design is shown in Figure 6.53 . The
corresponding objective in the state space domain is the following:
b˙ (t) = Ab
x x(t) + L r(t) − b
y (t)
(6.151)
y (t) = Cb
b x(t)
We recall that initial design matrices Q0 and R0 are set to meet control
requirements whereas initial design matrices Pw0 and Pv0 are set to meet
observer requirements. Let ρ be a parameter design of either design matrix Pw
or matrix Q. Weighting parameter ρ is tuned to make a trade-o between
initial performances and stability margins and is set according to the type of
Loop Transfer Recovery:
− Input recovery: a new observer design with the following design matrices:
Pw = Pw0 + ρ2 BBT
(6.152)
Pv = Pv0
3
Ronaldo Waschburger and Karl Heinz Kienitz, A root locus approach to loop transfer
recovery based controller design, 13th International Conference on Control Automation
Robotics & Vision (ICARCV), 2014
168 Chapter 6. Linear Quadratic Gaussian (LQG) regulator
Let:
K = k1 k2
l1 (6.155)
L=
l2
Then the controller transfer function is given by (6.114):
K(s) = K (sI − A + BK + LC)−1 L
−1
s + l1 −1 l1
(6.156)
= k1 k2
k1 + l2 s + k2 l2
(k1 l1 +k2 l2 )s+k1 l2
= s2 +(k2 +l1 )s+k2 l1 +k1 +l2
6.9. Proof of the Loop Transfer Recovery condition 169
where:
Φ(s) = (sI − A)−1 (6.165)
The preceding relations can be represented by the block diagram in Figure
6.6. Let K be a full state-feedback gain matrix such that the closed-loop system
is asymptotically stable, i.e. the eigenvalues of A − BK lie in the left half s-
plane, and the open-loop transfer function when the loop is broken at the input
point of the given system meets some given frequency dependent specications.
The state feedback control uf with full state available is:
uo (t) = −Kb
x(t) (6.169)
6.9. Proof of the Loop Transfer Recovery condition 171
or, equivalently:
(
X(s)
b = (sI − A)−1 BU (s) + L Y (s) − CX(s)
b
(6.171)
U (s) = −KX(s)
o
b
Usually equation (6.171) is not the same than (6.166). Relation (6.171) can
be represented by the block diagram in Figure 6.8. The loop transfer function
evaluated when the loop is broken at the input point of the closed-loop system
becomes:
−1
= Φ(s)−1 + LC
X(s)
b (BU (s) + LY (s))
Y (s) = CΦ(s)BU (s)
−1
⇒ X(s)
b = Φ(s)−1 + LC (BU (s) + LCΦ(s)BU (s))
−1 (6.172)
= Φ(s)−1 + LC (B + LCΦ(s)B) U (s)
−1
= Φ(s)−1 + LC
BU (s)
−1
−1
+ Φ(s) + LC LCΦ(s)BU (s)
Thus :
(I + CΦ(s)L)−1 CΦ(s)L = I − (I + CΦ(s)L)−1 (6.179)
Applying this result to equation (6.177) leads to:
−1
Φ(s)−1 + LC LCΦ(s)B = Φ(s)L (I + CΦ(s)L)−1 CΦ(s)B (6.180)
X(s)
b = Φ(s)BU (s) (6.183)
5
D. J. Tylavsky, G. R. L. Sohie, Generalization of the matrix inversion lemma, Proceedings
of the IEEE, Year: 1986, Volume: 74, Issue: 7, Pages: 1050 - 1052
6.9. Proof of the Loop Transfer Recovery condition 173
We nally get:
U o (s) = −KX(s)
b = −KΦ(s)BU (s) (6.184)
That is, we get for U o (s) the same expression than the expression obtained
through the full state-feedback given in (6.166).
As a conclusion, the Loop Transfer Recovery (LTR) is achieved when the
loop transfer function with state-feedback and with state-based observer are
equal, that is when La (s) = Lt (s). This property is achieved as soon as U o (s)
has the same expression as the full state-feedback U f (s), that is when the
following relation holds:
L
lim = BW0 (6.186)
ρ→∞ ρ
L (I + CΦ(s)L)−1 = L
ρ ρ(I + CΦ(s)L)
−1
L 1 L
−1 (6.187)
= ρ ρ I + CΦ(s) ρ
Thus, as ρ → ∞:
−1
lim L (I + CΦ(s)L)−1 = lim L 1
+ CΦ(s) Lρ
ρI
ρ→∞ ρ→∞ ρ
−1
= lim Lρ CΦ(s) Lρ (6.188)
ρ→∞
−1
= BW0 (CΦ(s)BW0 )
= B (CΦ(s)B)−1
Now let's concentrate how (6.186) can be achieved. First we have seen in
(6.143) that if the transfer function CΦ(s)B is right invertible with no unstable
zeros then for some unitary matrix W (WT W = I) and some symmetric positive
denite matrix M (M = MT > 0), the asymptotic value of feedback gain K
reads as follows, where ϵ has been replaced by ρ1 :
M
R = ρ2
Q = CT C ⇒ lim K = ρM−0.5 WC = R−0.5 WC (6.189)
−1 T ρ→∞
K=R B P
174 Chapter 6. Linear Quadratic Gaussian (LQG) regulator
Applying the duality principle we have the same result for the asymptotic
value of the observer gain L:
M
Pv = ρ2
P = BBT ⇒ lim L = ρBWM−0.5 = BWP−0.5 (6.190)
w ρ→∞ v
L = YCT P−1 v
Where d(t) and n(t) are white noise with the intensity of their
autocorrelation function equals to Wd and Wn respectively. Denoting by E()
the mathematical expectation we have:
d(t) T Wd 0
T (6.196)
E d (τ ) n (τ ) = δ(t − τ )
n(t) 0 Wn
The LQG problem consists in nding a controller u(s) = K(s)y(s) such that
the following performance index is minimized:
Z T
T T
(6.197)
JLQG = E lim x (t)Qx(t) + u (t)Ru(t) dt
T →∞ 0
And represent the stochastic inputs d(t) and n(t) as a function of the vector
w(t) of exogenous disturbances :
Wd0.5
d(t) 0
= w(t) (6.200)
n(t) 0 Wn0.5
Where w(t) is a white noise process of unit intensity. Then the LQG cost
function reads as follows:
Z T
JLQG = E lim z (t)z(t)dt = ∥Tzw (s)∥22
T
(6.201)
T →∞ 0
Where:
B1 = Wd0.5 0
0.5
Q
C1 =
0
(6.203)
0
D12 =
R0.5
D21 = 0 Wn0.5
It follows that:
And:
of interest. An example of two sensors that complement each other are baro-
altimeter and vertical accelerometer.
Let y1 (t) and y2 (t) noisy measurements of some signal y(t), coming for
example from a baro-altimeter and a vertical accelerometer, respectively.
Denoting by v(t) some low frequency zero mean noise process, by w(t) some
high frequency zero mean noise process and by s the Laplace variable, we will
assume that:
y1 (t) = y(t) + v(t) Y1 (s) = Y (s) + V (s)
⇔ (6.206)
y2 (t) = ÿ(t) + w(t) Y2 (s) = s2 Y (s) + W (s)
The complementary lter that implements the fusion between the two
measurements is shown in Figure 6.106 .
From Figure 6.10, the expression of Yb (s) reads:
Yb (s) = 1s k1 Y1 (s) − Yb (s)
+ 1s Y2 (s) + k0 Y1 (s) − Yb (s)
(6.207)
⇔ 1 + ks1 + ks20 Yb (s) = ks1 + ks20 Y1 (s) + s12 Y2 (s)
⇔ Yb (s) = 2k1 s+k0 Y1 (s) + 2 1
s +k1 s+k0
Y2 (s)
s +k1 s+k0
Using the fact that Y1 (s) = Y (s) + V (s) and Y2 (s) = s2 Y (s) + W (s), we
nally get:
k1 s+k0 1
Yb (s) = Y (s) + V (s) + W (s)
s2 +k1 s+k0 s2 +k1 s+k0
1−F (s) (6.208)
:= Y (s) + F (s) V (s) + s2
W (s)
where:
k1 s + k0 s2
F (s) := ⇔ 1 − F (s) := (6.209)
s2 + k1 s + k0 s2 + k1 s + k0
Transfer function F (s) is a low-pass lter with unity static gain whereas
1 − F (s) is a high-pass lter.
6
W. T. Higgins, A Comparison of Complementary and Kalman Filtering, IEEE
Transactions on Aerospace and Electronic Systems, vol. AES-11, no. 3, pp. 321-325, May
1975, doi: 10.1109/TAES.1975.308081.
178 Chapter 6. Linear Quadratic Gaussian (LQG) regulator
Take care that in the measurement equation, this is the actual state vector
x(t) which is used, not the noisy state vector xe(t) of the process equation.
Furthermore, denoting w(t) the random processes which represents the
process noise, and y 2 (t) := u(t) + w(t), we nally get:
e˙ (t) = A x
x e(t) + B (u(t) + w(t))
y 2 (t) := u(t) + w(t) ⇒ (6.212)
y 1 (t) = C x(t) + v(t)
Assuming no noise, v(t) = w(t) = 0, then x e(t) changes into its noiseless
value x(t) and actual measurement y 1 (t) changes into its noiseless value y(t).
Then we get the following noiseless state equation:
ẋ(t) = A x(t) + B u(t)
v(t) = w(t) = 0 ⇒ (6.213)
y(t) = C x(t)
The error equations reads as follows where δx(t) is the error state vector and
δy(t) the error output vector. Note that we dene δy(t) as δy(t) := y 1 (t)−C xe(t)
to be compliant with Figure 6.11:
e(t) − x(t)
δx(t) := x δ ẋ(t) = A δx(t) + B w(t)
⇒ (6.214)
δy(t) := y 1 (t) − C x
e(t) δy(t) = −C δx(t) + v(t)
6.11. Sensor data fusion 179
b˙ (t) = A δb
δx x(t) + L(t) δy(t) − (−C) δbx(t) (6.215)
x(t) + L(t) δy(t) + C δb
= A δb x(t)
The time dependent observer gain L(t), also-called Kalman gain, is similar
to (6.71):
L(t) = Y(t) (−C)T P−1
v = −Y(t)C Pv
T −1
(6.216)
where matrix Y(t) is the solution of the following dierential Riccati equation
(see (6.72) where Pw has been replaced by BPw BT , that is the covariance of
noise B w(t)):
e(t) − x
x(t) = x
δb b(t) ⇔ x e(t) − δb
b(t) = x x(t) (6.219)
By taking the time derivative of the preceding equation, and using (6.211)
and (6.214), we get the state equation for the estimate of the actual state vector
b(t):
x
b˙ (t) = x
x e˙ (t) − δ x
b˙ (t)
= Ax e(t) + B y 2 (t) − A δb x(t) + L(t) δy(t) + C δb
x(t)
= A (e x(t) − δb x(t)) (6.220)
+ B y 2 (t)
+L(t) y 1 (t) −
Cx
e(t)
+ C (x
−x
e(t)
b(t))
We nally get:
b˙ (t) = A x
x b(t) + B y 2 (t) + L(t) y 1 (t) − C x
b(t) (6.221)
C X(s)
b = Yb (s) = C (sI − (A − LC))−1 (L Y 1 (s) + B Y 2 (s)) (6.223)
Example 6.3. In the specic case of sensor fusion between baro-altimeter and
vertical accelerometer presented in (6.206), the state vector can be chosen as
follows, assuming no noise:
x1 (t) := y(t)
(6.225)
x2 (t) := ẏ(t)
Let L be the steady state observer gain, also-called steady state Kalman gain,
which is obtained as follows:
L = −YCT P−1
v (6.227)
where matrix Y is the constant positive solution of the following algebraic Riccati
equation:
0 = AY + YAT − YCT P−1
v CY + BPw B
T
(6.228)
Then, according to (6.224), the transfer function F (s) of the low-pass lter
reads as follows:
k1
L := ⇒ F (s) = C (sI − (A − LC))−1 L
k0
s + k1 −1 −1 k1 (6.229)
= 1 0
k0 s k0
k1 s+k0
= s2 +k1 s+k0
Accelerometer measurement
First, we compute the position, velocity and acceleration in the inertial frame:
− Inertial position:
x = L sin(θ)
(6.230)
z = −L cos(θ)
− Inertial velocity:
ẋ = L θ̇ cos(θ)
(6.231)
ż = L θ̇ sin(θ)
− Inertial acceleration:
where:
cos(θ) sin(θ)
Rbi = (6.234)
− sin(θ) cos(θ)
Thus:
ax L θ̈ + g sin(θ)
= (6.235)
az L θ̇2 + g cos(θ)
ax L θ̈ + g sin(θ) g sin(θ)
= ≈ = tan(θ) (6.236)
az L θ̇2 + g cos(θ) g cos(θ)
For the example of section 6.12.1, we have y1 (t) := θa (t) and y2 (t) := q(t).
6.12. Euler angles estimation 183
The complementary lter that implements the fusion between the two
measurements is shown in Figure 6.147 .
From Figure 6.14, the expression of Yb (s) reads:
1
Yb (s) = s Y2 (s) + k0 Y1 (s) − Yb (s)
⇔ 1+ k0
Yb (s) = k0 1 (6.239)
s s Y1 (s) + s Y2 (s)
k0 1
⇔ Yb (s) = s+k0 Y1 (s) + s+k0 Y2 (s)
Using the fact that Y1 (s) = Y (s) + V (s) and Y2 (s) = s Y (s) + W (s), we
nally get:
k0 1
Yb (s) = Y (s) + s+k V (s) + s+k W (s)
0 0
1−F (s) (6.240)
:= Y (s) + F (s) V (s) + s W (s)
where:
k0 s
F (s) := ⇔ 1 − F (s) := (6.241)
s + k0 s + k0
Transfer function F (s) is a low-pass lter with unity static gain whereas
1 − F (s) is a high-pass lter.
In the continuous time domain, (6.239) reads:
k0 1
Yb (s) = s+k 0
Y1 (s) + s+k 0
Y2 (s)
⇔ k0 Y (s) + s Y (s) = k0 Y1 (s) + Y2 (s)
b b (6.242)
d
⇒ k0 yb(t) + dt yb(t) = k0 y1 (t) + y2 (t)
7
W. T. Higgins, A Comparison of Complementary and Kalman Filtering, IEEE
Transactions on Aerospace and Electronic Systems, vol. AES-11, no. 3, pp. 321-325, May
1975, doi: 10.1109/TAES.1975.308081.
184 Chapter 6. Linear Quadratic Gaussian (LQG) regulator
The relation between the angular velocities (p, q, r) in the body frame and
the time derivative of the Euler angles (ϕ, θ, ψ) is the following:
p ϕ̇ 0 0
ν := q = 0 + Rϕ θ̇ + Rϕ Rθ 0 (6.251)
r 0 0 ψ̇
We nally get:
p 1 0 − sin(θ) ϕ̇
q = 0 cos(ϕ) sin(ϕ) cos θ θ̇ (6.252)
r 0 − sin(ϕ) cos(ϕ) cos θ ψ̇
That is:
ν = W(η) η̇ (6.253)
where:
ϕ
η := θ (6.254)
ψ
and:
1 0 − sin(θ)
W(η) = 0 cos(ϕ) sin(ϕ) cos(θ) (6.255)
0 − sin(ϕ) cos(ϕ) cos(θ)
It is worth noticing that the preceding relation can be obtained from the
following equality which simply states that the time derivative of matrix Rib (η)
can be seen as matrix Ω(ν) of the angular velocities in the body frame expressed
in the inertial frame:
0 −r q
d i
R (η) = Rib (η) Ω(ν) where Ω(ν) = −Ω(ν)T = r 0 −p ] (6.256)
dt b
−q p 0
Conversely we have:
η̇ = W(η)−1 ν (6.257)
where:
1 sin(ϕ) tan(θ) cos(ϕ) tan(θ)
W(η)−1 = 0 cos(ϕ) − sin(ϕ) (6.258)
sin(ϕ) cos(ϕ)
0 cos(θ) cos(θ)
am = Rbi ai − g (6.259)
186 Chapter 6. Linear Quadratic Gaussian (LQG) regulator
Denoting v i the velocity in the inertial frame and v b the velocity in the body
frame, we have the following relation where Rib the rotation matrix from the
body frame to the inertial frame:
v i = Rib v b (6.260)
d i
ai := v = Rib v̇ b + Ṙib v b (6.261)
dt
Once the computation achieved, we get the following expression for the
measurements provided by a 3-axis accelerometer8 :
ax u̇ 0 w −v p − sin(θ)
am = ay = v̇ + −w 0 u q − g cos(θ) sin(ϕ)
az ẇ v −u 0 r cos(θ) cos(ϕ)
(6.263)
The last term of equation (6.263) can be used to approximate roll angle ϕ
and pitch angle θ as follows:
a
ϕ ≈ arctan ayz
(6.264)
θ ≈ arctan √ a2x
ay +a2z
Note that if the Inertial Measurement Unit (IMU) is not situated at the
center of mass, then the accelerometers coordinates (lx , ly , lz ) along each axis in
the body frame with its origin at the center of gravity shall be taken into account
and the measurements (6.263) provided by a 3-axis accelerometer becomes8 :
ax u̇ 0 w −v p − sin(θ)
am = ay = v̇ + −w 0 u q − g cos(θ) sin(ϕ)
az ẇ v −u 0 r cos(θ) cos(ϕ)
−r2 − q 2 p q − r2
p r + q̇ lx
+ p q + ṙ −p2 − r2 r q − ṗ ly (6.265)
p r − q̇ r q + ṗ −q 2 − p2 lz
8
Marian J. Blachuta and Rafal T. Grygiel and Roman Czyba and Grzegorz Szafranski,
Attitude and heading reference system based on 3D complementary lter, 2014 19th
International Conference on Methods and Models in Automation and Robotics (MMAR)
6.12. Euler angles estimation 187
= miz cos (θ) sin (ϕ) − mix (cos (ϕ) sin (ψ) − cos (ψ) sin (ϕ) sin (θ))
mix (sin (ϕ) sin (ψ) + cos (ϕ) cos (ψ) sin (θ)) + miz cos (ϕ) cos (θ)
(6.266)
Then it is worth noticing that the following relations hold:
ϕ
1 sin( ϕb ) tan( θb ) cos( ϕb ) tan( θb ) p
d − sin( ϕb )
η )−1 ν = 0
θ ≈ W(b cos( ϕb ) q
dt sin( ϕ ) cos( ϕ )
ψ r
b b
0 bcos( θ ) cos( θ )
b
(6.269)
estimate, or (6.247) for discrete time estimate. In those equations, y1 stands for
the measurements provided by the accelerometers or the magnetometer, and y2
stands for the measurements provided by the gyroscopes.