Optimal Control and The Linear Quadratic Regulator: 1 Derivation of The Euler-Lagrange Equations
Optimal Control and The Linear Quadratic Regulator: 1 Derivation of The Euler-Lagrange Equations
H.P. Gavin
Duke University, Fall 2017
where the first term in called the integral cost and the second term is called the terminal
cost. The scalar-valued function L(x, u; t) is called the Lagrangian of the cost function.
The cost function J is to be minimized subject to the constraint that the dynamics of
the system are enforced, i.e., such that
ẋ = f (x, u; t).
This is done by augmenting the cost function with the constraint through a Lagrange mul-
tiplier.
The augmented cost function is
Z tf
JA = J + λT (t) {f (x, u; t) − ẋ(t)} dt ,
to
where λ(t) ∈ Rn is the vector of Lagrange multipliers, and is also called the adjoint vector or
the co-state vector.
Now, it is helpful to define a new term called the Hamiltonian,
1
so,
h i Z tf n o
JA = φ(x(tf ), tf ) + λT (to )x(to ) − λT (tf )x(tf ) + H(x, u, λ; t) + λ̇T (t)x(t) dt
to
Now, we will minimize J by taking the first variation of J with respect to u(t), setting this
variation equal to zero, and solving for u(t). We presume that the states, x, depend upon
the controls, u(t), in a causal manner. So a δx is a function of a δu and so that u(t) on
t1 ≤ t ≤ tf can not affect x(t) on to ≤ t ≤ t1 . Therefore
∂J ∂J
δJ = δu + δx(δu)
∂u ∂x
Assuming that tf is a constant, and that x(to ) is also a constant,
∂φ(x(tf )) ∂φ
δJ = δu + δx(δu)
∂u ∂x
∂ T ∂ T
+ λ (to )x(to ) δu + λ (to )x(to ) δx(δu)
∂u ∂x
∂ T ∂ T
− λ (tf )x(tf ) δu − λ (tf )x(tf ) δx(δu)
∂u ∂x)
Z tf (
∂H ∂H
+ δu + δx(δu) dt
to ∂u ∂x
Z tf ( )
∂ T ∂ T
+ λ̇ (t)x(t) δu + λ̇ (t)x(t) δx(δu) dt (2)
to ∂u ∂x
This expression for δJ is a summation of ten terms, which we will treat individually.
Z Z
δJ = A + B + C + D + E + F + {G + H} dt + {I + J} dt
2
Each of these three remaining terms must be zero at J = Jmin . Setting each of these three
terms equal to zero,
( )T
∂φ(x(tf ), tf )
λ(tf ) = (4)
∂x
( )T
∂H
λ̇(t) = −
∂x
" #T ( )T
∂f ∂L
= − λ(t) − (5)
∂x ∂x
∂H
= 0
∂u " # ( )
∂f ∂L
= λT (t) + (6)
∂u ∂u
Equation (1), equation (6) and equation (5) with the terminal condition of equation (4) are
called the Euler-Lagrange equations, and provide necessary, (but not sufficient), conditions
for the optimality of u(t). They are a two-point, vector-valued boundary value problem. The
equations
ẋ = f (x, u; t) ; x(to ) = xo
are called the state equations and the equations
" #T ( )T ( )T
∂f ∂L ∂φ(x(tf ), tf )
λ̇(t) = − λ(t) − ; λ(tf ) = ;
∂x ∂x ∂x
are called the co-state equations. Note that the state equations have an initial condition
prescribed whereas the co-state equations have a terminal condition.
3
2 Meaning of the co-state equations
The co-state λ(t) “adjoins” the state equation constraint ẋ = f (x, u; t) to the cost
function, J. It gives the sensitivity of J to the dynamic constraints ẋ = f (x, u; t). In other
words,
∂J
= −λT (t).
∂x
Also, note that at the optimal control trajectory, u(t) = u∗ (t),
dH d n o
= L + λT f
dt dt
!T
dL T df dλ
= +λ + f
dt dt dt
!
∂L ∂f dx
+ + λT
∂x ∂x dt
!
∂L T ∂f du
+ +λ
∂u ∂u dt
dL df ∂H du
= + λT + λ̇T f − λ̇T f +
dt dt ∂u dt
dH dL df
= + λT (7)
dt dt dt
Since L = L(x(t), u(t)), the Lagrangian depends upon time only implicitly through the
state and the control, but does not explicitly depend upon time. For autonomous systems
f = f (x(t), u(t)) does not depend explicitly upon time. So, (for autonomous systems) on the
optimal control trajectory u(t) = u∗ (t),
dH
=0
dt
The Euler-Lagrange equations provide a necessary (but not sufficient) condition for
optimality. These equations are necessary because at a minimum δJ must be equal to zero.
However, if δJ = 0, the cost function J may be at an inflection point, or at a minimum. The
sufficient condition for optimality is that
∂ 2H
= Huu > 0.
∂u2
This is called Pontryagin’s Minimum Principle. If ∂H
∂u
does not depend directly upon u,
Pontryagin’s Minimum Principle must be invoked to find u∗ (t).
4
3 The Linear Quadratic Regulator from the Euler-Lagrange Equations
Consider the linear time invariant (LTI) control system
Solving this last equation for u(t) gives an expression for the optimal control rule:
The only thing left to determine is the solution of the co-state equations, λ(t).
To find co-states we need to solve the co-state equation
for λ(t). As with any differential equation, we may guess a trial solution and determine if it
satisfies the co-state equation and the terminal condition. Here we will guess
5
Substituting the trial solution and the control rule into the co-state equation,
or, u∗ (t) = Kx(t) where the feedback gain matrix K is −R2−1 B T P . This feedback gain matrix
minimizes the quadratic cost function
Z ∞
1 1
J= x R1 x + uT R2 u
T
dt ,
0 2 2
of the linear time-invariant dynamic system
This control rule is called the Linear Quadratic Regulator (LQR). The Riccati Equation and
the Linear Quadratic Regulator are cornerstones of multivariable control.
6
4 Development of the LQR Controller from the H2 norm
Consider now a dynamic system with external disturbance w,
ẋ = Ax + Bu + D1 w ,
u = Kx.
ẋ = (A + BK)x + D1 w
= Ãx + D1 w ; Ã = A + BK.
The matrix à describes the dynamics of the closed-loop system. Now consider the perfor-
mance variable, z(t),
z = E1 x + E2 u = (E1 + E2 K)x = Ẽx
In the Laplace domain, the transfer function from the external disturbance, w(s) to the
performance, z(s) is described by z(s) = G̃(s)w(s), where
" #
à D1
G̃(s) ∼ .
Ẽ 0
We aim to find the matrix K ∈ Rm×n to minimize the area under the magnitude of the
transfer function, ||G̃(s)||2 .
The matrix Q is called the “disturbability” gramian; it satisfies the right Lyapunov equation
The matrix P is called the “performance-ability” gramian; it satisfies the left Lyapunov
equation
0 = ÃT P + P Ã + Ẽ T Ẽ. (15)
7
A connection between the cost function in this formulation and that of the previous
formulation may be carried out as follows. Consider the performance equation
" #
x
z = E1 x + E2 u = [E1 E2 ]
u
such that the matrices Q and P satisfy the Lyapunov equations that guarantee closed-loop
stability. This is a constrained minimization problem. As before, we can adjoin the constraint
to the cost function through a Lagrange multiplier. We will now show that the proper choice
of the Lagrange multiplier for this optimization problem is the “disturbability” gramian. The
augmented cost function is
and
Ẽ T Ẽ =(E1 + E2 K)T (E1 + E2 K)
=R1 + R12 K + K T R12 T
+ K T R2 K
=R1 − R12 R2−1 (B T P + R12
T T T −1 T
) − (B T P + R12 ) R2 R12 + (B T P + R12 T T −1
) R2 (B T P + R12
T
)
=R1 − R12 R2−1 B T P − R12 R2−1 R12
T
− P B T R2−1 R12
T
− R12 R2−1 R12
T
+
−1 T T −1 T −1 T −1 T
P BR2 B P + P B R2 R12 + R12 R2 B P + R12 R2 R12
= R1 + P BR2−1 B T P − R12 R2−1 R12
T
9
Defining,
 = A − BR2−1 R12
T
and
Σ = BR2−1 B T ,
and subbing all this into the left Lyapunov equation for the “performance-ability” gramian,
equation (15), we obtain
which is matrix quadratic equation in P and is called an algebraic Riccati equation. The
solution of this equation for the “performance-ability” gramian, P , depends on the definition
of “performance” (E1 and E2 ), how controls affect the state dynamics (B), and the open-loop
system dynamics matrix (A).
The state-feedback gain matrix K of equation (18) using the “performance-ability”
gramian P computed from the Riccati equation (19) minimizes the objective metric (16)
such that the closed loop system is stable. This state-feedback gain matrix is called the
linear quadratic regulator (LQR).
5 References
1. Robert Stengel, Optimal Control and Estimation, Dover Press, 1994.
10