0% found this document useful (0 votes)
25 views10 pages

Optimal Control and The Linear Quadratic Regulator: 1 Derivation of The Euler-Lagrange Equations

Hh

Uploaded by

Sri ram Mandala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views10 pages

Optimal Control and The Linear Quadratic Regulator: 1 Derivation of The Euler-Lagrange Equations

Hh

Uploaded by

Sri ram Mandala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Optimal Control and the Linear Quadratic Regulator

H.P. Gavin
Duke University, Fall 2017

1 Derivation of the Euler-Lagrange equations


Consider a general, possibly nonlinear, possibly non-autonomous dynamic control sys-
tem n m
ẋ = f (x, u; t) ; x(to ) = xo ; x∈R u∈R (1)
where x(t) is a state vector and u(t) is a control vector. Consider a scalar-valued cost function,
J, to be minimized by the control actions, u(t).
Z tf
J= L(x, u; t)dt + φ(x(tf ), tf )
to

where the first term in called the integral cost and the second term is called the terminal
cost. The scalar-valued function L(x, u; t) is called the Lagrangian of the cost function.
The cost function J is to be minimized subject to the constraint that the dynamics of
the system are enforced, i.e., such that

ẋ = f (x, u; t).

This is done by augmenting the cost function with the constraint through a Lagrange mul-
tiplier.
The augmented cost function is
Z tf
JA = J + λT (t) {f (x, u; t) − ẋ(t)} dt ,
to

where λ(t) ∈ Rn is the vector of Lagrange multipliers, and is also called the adjoint vector or
the co-state vector.
Now, it is helpful to define a new term called the Hamiltonian,

H(x, u, λ; t) = L(x, u; t) + λT (t)f (x, u; t) ,

so that JA may be written


Z tf n o
JA = J = φ(x(tf ), tf ) + H(x, u, λ; t) − λT (t)ẋ(t) dt
to
Z tf Z tf
= φ(x(tf ), tf ) + H(x, u, λ; t) dt − λT (t)ẋ(t) dt
to to

The third term of the RHS may be integrated by parts,


Z tf itf Z tf
T T
λ (t)ẋ(t) dt = λ (t)x(t) − λ̇T (t)x(t) dt,
to to to

1
so,
h i Z tf n o
JA = φ(x(tf ), tf ) + λT (to )x(to ) − λT (tf )x(tf ) + H(x, u, λ; t) + λ̇T (t)x(t) dt
to

Now, we will minimize J by taking the first variation of J with respect to u(t), setting this
variation equal to zero, and solving for u(t). We presume that the states, x, depend upon
the controls, u(t), in a causal manner. So a δx is a function of a δu and so that u(t) on
t1 ≤ t ≤ tf can not affect x(t) on to ≤ t ≤ t1 . Therefore

∂J ∂J
δJ = δu + δx(δu)
∂u ∂x
Assuming that tf is a constant, and that x(to ) is also a constant,

∂φ(x(tf )) ∂φ
δJ = δu + δx(δu)
∂u ∂x
∂  T  ∂  T 
+ λ (to )x(to ) δu + λ (to )x(to ) δx(δu)
∂u ∂x
∂  T  ∂  T 
− λ (tf )x(tf ) δu − λ (tf )x(tf ) δx(δu)
∂u ∂x)
Z tf (
∂H ∂H
+ δu + δx(δu) dt
to ∂u ∂x
Z tf ( )
∂  T  ∂  T 
+ λ̇ (t)x(t) δu + λ̇ (t)x(t) δx(δu) dt (2)
to ∂u ∂x

This expression for δJ is a summation of ten terms, which we will treat individually.
Z Z
δJ = A + B + C + D + E + F + {G + H} dt + {I + J} dt

Term A: φ(x(tf ), tf ) does not depend on u, therefore A = 0.


Terms C and D: u(t) defined in the interval to ≤ t ≤ tf can not change x(to ) because
the controls u(t) have a causal relationship with the states x(t). Therefore C = 0 and D = 0.
Terms E: u(t) is not part of this term, therefore E = 0.
∂ T
Term F: ∂x
λ (tf )x(tf )δx(δu) = λT (tf )δx(δu)
Term I: u(t) is not part of this term, therefore I = 0.

Term J: ∂x
(λ̇T x)δx(δu) = λ̇T (t)δx(δu)
Re-writing equation (2) with the remaining terms (B, F, G, H, and J),
" ! # Z tf " ! #
∂φ ∂H ∂H
δJ = − λT δx(δu) + δu + + λ̇T δx(δu) dt. (3)
∂x t=tf to ∂u ∂x

2
Each of these three remaining terms must be zero at J = Jmin . Setting each of these three
terms equal to zero,
( )T
∂φ(x(tf ), tf )
λ(tf ) = (4)
∂x
( )T
∂H
λ̇(t) = −
∂x
" #T ( )T
∂f ∂L
= − λ(t) − (5)
∂x ∂x
∂H
= 0
∂u " # ( )
∂f ∂L
= λT (t) + (6)
∂u ∂u

Equation (1), equation (6) and equation (5) with the terminal condition of equation (4) are
called the Euler-Lagrange equations, and provide necessary, (but not sufficient), conditions
for the optimality of u(t). They are a two-point, vector-valued boundary value problem. The
equations
ẋ = f (x, u; t) ; x(to ) = xo
are called the state equations and the equations
" #T ( )T ( )T
∂f ∂L ∂φ(x(tf ), tf )
λ̇(t) = − λ(t) − ; λ(tf ) = ;
∂x ∂x ∂x

are called the co-state equations. Note that the state equations have an initial condition
prescribed whereas the co-state equations have a terminal condition.

3
2 Meaning of the co-state equations
The co-state λ(t) “adjoins” the state equation constraint ẋ = f (x, u; t) to the cost
function, J. It gives the sensitivity of J to the dynamic constraints ẋ = f (x, u; t). In other
words,
∂J
= −λT (t).
∂x
Also, note that at the optimal control trajectory, u(t) = u∗ (t),

dH d n o
= L + λT f
dt dt
!T
dL T df dλ
= +λ + f
dt dt dt
!
∂L ∂f dx
+ + λT
∂x ∂x dt
!
∂L T ∂f du
+ +λ
∂u ∂u dt
dL df ∂H du
= + λT + λ̇T f − λ̇T f +
dt dt ∂u dt
dH dL df
= + λT (7)
dt dt dt
Since L = L(x(t), u(t)), the Lagrangian depends upon time only implicitly through the
state and the control, but does not explicitly depend upon time. For autonomous systems
f = f (x(t), u(t)) does not depend explicitly upon time. So, (for autonomous systems) on the
optimal control trajectory u(t) = u∗ (t),

dH
=0
dt

The Euler-Lagrange equations provide a necessary (but not sufficient) condition for
optimality. These equations are necessary because at a minimum δJ must be equal to zero.
However, if δJ = 0, the cost function J may be at an inflection point, or at a minimum. The
sufficient condition for optimality is that

∂ 2H
= Huu > 0.
∂u2
This is called Pontryagin’s Minimum Principle. If ∂H
∂u
does not depend directly upon u,
Pontryagin’s Minimum Principle must be invoked to find u∗ (t).

4
3 The Linear Quadratic Regulator from the Euler-Lagrange Equations
Consider the linear time invariant (LTI) control system

ẋ = f (x, u) = Ax + Bu ; x(0) = x0 , (8)

and the quadratic integral cost function


Z ∞
1 1

J= x R1 x + x R12 u + uT R2 u
T T
dt , (9)
0 2 2
for which the Lagrangian of the cost function is
1 1
L(x, u; t) = xT R1 x + xT R12 u + uT R2 u , (10)
2 2
and where the weighting matrices have the following definitions and properties:
R1 is the state cost weighting matrix, R1 > 0, R1 = R1T ;
R2 is the control cost weighting matrix, R2 > 0, R2 = R2T ; and
R12 is the cross-weighting matrix.
Applying the Euler-Lagrange equations to this linear control synthesis problem, the
co-state equations become,
" #T ( )T ( )T
∂f ∂L ∂φ(x(tf ), tf )
λ̇ = − λ− λ(∞) =
∂x ∂x ∂x
= −AT λ(t) − R1 x − R12 u λ(∞) = 0

and the gradient of the Hamiltonian becomes


( )T " #T ( )T
∂H ∂f ∂L
=0 = λ(t) +
∂u ∂u ∂u
= B T λ(t) + R12
T
x + R2 u

Solving this last equation for u(t) gives an expression for the optimal control rule:

u∗ (t) = −R2−1 (B T λ(t) + R12


T
x(t)) (11)

The only thing left to determine is the solution of the co-state equations, λ(t).
To find co-states we need to solve the co-state equation

λ̇ = −AT λ − R1 x − R12 u ; λ(∞) = 0

for λ(t). As with any differential equation, we may guess a trial solution and determine if it
satisfies the co-state equation and the terminal condition. Here we will guess

λ(t) = P (t) x(t).

5
Substituting the trial solution and the control rule into the co-state equation,

λ̇ = −AT P x − R1 x − R12 (−R2−1 (B T P x + R12T


x))
λ̇ = P ẋ + Ṗ x
= P Ax + P Bu + Ṗ x
= P Ax + P B(−R2−1 (B T P x + R12T
x)) + Ṗ x
−1 T −1 T
= P Ax − P BR2 B P x − P BR2 R12 x + Ṗ x
Ṗ x = −AT P x − P Ax − R1 x
+P BR2−1 B T P x + P BR2−1 R12
T
x + R12 R2−1 B T P x + R12 R2−1 R12
T
x,

or, eliminating x from the right hand side of each term,

−Ṗ = ÂT P + P Â + R1 − P BR2−1 B T P − R12 R2−1 R12


T

where  = A − BR2−1 R12


T
. For a steady-state solution, Ṗ = 0, and, also if R12 = 0, we obtain
the Riccati equation,
0 = AT P + P A + R1 − P BR2−1 B T P. (12)
The solution of the Riccati equation gives the matrix P and

u∗ (t) = −R2−1 B T P x(t) , (13)

or, u∗ (t) = Kx(t) where the feedback gain matrix K is −R2−1 B T P . This feedback gain matrix
minimizes the quadratic cost function
Z ∞
1 1

J= x R1 x + uT R2 u
T
dt ,
0 2 2
of the linear time-invariant dynamic system

ẋ(t) = A x(t) + B u(t) ; x(0) = x0 .

This control rule is called the Linear Quadratic Regulator (LQR). The Riccati Equation and
the Linear Quadratic Regulator are cornerstones of multivariable control.

6
4 Development of the LQR Controller from the H2 norm
Consider now a dynamic system with external disturbance w,

ẋ = Ax + Bu + D1 w ,

in which all the states are measured,


y=x,
and which is controlled by a static compensator,

u = Kx.

Substituting the compensator into the dynamics,

ẋ = (A + BK)x + D1 w
= Ãx + D1 w ; Ã = A + BK.

The matrix à describes the dynamics of the closed-loop system. Now consider the perfor-
mance variable, z(t),
z = E1 x + E2 u = (E1 + E2 K)x = Ẽx
In the Laplace domain, the transfer function from the external disturbance, w(s) to the
performance, z(s) is described by z(s) = G̃(s)w(s), where
" #
à D1
G̃(s) ∼ .
Ẽ 0

We aim to find the matrix K ∈ Rm×n to minimize the area under the magnitude of the
transfer function, ||G̃(s)||2 .

||G̃(s)||22 = trẼQẼ T = trD1T P D1

The matrix Q is called the “disturbability” gramian; it satisfies the right Lyapunov equation

0 = ÃQ + QÃ + D1 D1T (14)

The matrix P is called the “performance-ability” gramian; it satisfies the left Lyapunov
equation
0 = ÃT P + P Ã + Ẽ T Ẽ. (15)

Asymptotic stability of the closed-loop system ẋ = Ax + Bu; u = Kx is determined by


the properties of the dynamics matrix of the closed loop system, Ã = A + BK. Specifically,
if there exists a positive definite matrix P which satisfies the left Lyapunov equation, (15)
then the autonomous dynamic system ẋ = Ãx is asymptotically stable.
Furthermore, if (A, B) is controllable, then a feedback matrix K may be chosen to
arbitrarily place the eigenvalues of A + BK.

7
A connection between the cost function in this formulation and that of the previous
formulation may be carried out as follows. Consider the performance equation
" #
x
z = E1 x + E2 u = [E1 E2 ]
u

and the cost function


Z ∞
J = z T (t) z(t) dt
0
Z ∞ " #!T " #!
x x
= [E1 E2 ] [E1 E2 ] dt
0 u u
Z ∞" #T " #
x T x
= [E1 E2 ] [E1 E2 ] dt
0 u u
Z ∞" #T " #" #
x E1T E1 E1T E2 x
= dt
0 u E2T E1 E2T E2 u
Z ∞" #T " #" #
x R1 R12 x
= T dt
0 u R12 R2 u
Z ∞
= [xT R1 x + 2xT R12 u + uT R2 u] dt
0

The LQR problem statement may now be formally written as follows:


Find K to minimize the scalar performance metric

J(K) = ||G̃(s)||22 = trD1T P D1 (16)

such that the matrices Q and P satisfy the Lyapunov equations that guarantee closed-loop
stability. This is a constrained minimization problem. As before, we can adjoin the constraint
to the cost function through a Lagrange multiplier. We will now show that the proper choice
of the Lagrange multiplier for this optimization problem is the “disturbability” gramian. The
augmented cost function is

JA (K, Q) = trD1T P D1 + trQ[ÃT P + P Ã + Ẽ T Ẽ]


= trD1T P D1 + trQÃT P + trQP Ã + trQẼ T Ẽ
= trD1T P D1 + trQÃT P + trÃQP + trQẼ T Ẽ (17)

where Q is selected to be the Lagrange multiplier. If K ∗ solves the constrained optimization


problem. then there exists a Lagrange multiplier Q∗ such that

JA (K, Q∗ )|K=K ∗ = 0
∂K
and

JA (K ∗ , Q)|Q=Q∗ = ÃT P + P Ã + Ẽ T Ẽ = 0.
∂Q
Also, note that

JA = D1T D1 + QÃT + ÃQ = 0 ,
∂P
8
which shows that Q, the closed loop “disturbability” gramian, is the proper Lagrange multi-
plier.
Now, to evaluate the partial derivative with respect to K, first substitute à = A + BK,
Ẽ = E1 + E2 K, E1T E1 = R1 , E1T E2 = R12 , and E2T E2 = R2 , into JA (K, Q), equation (17)
JA = trD1T P D1 + trQ[(AT + K T B T )P + P (A + BK) + R1 + R12 K + K T R12
T
+ K T R2 K]
Distribute Q,
JA = trD1T P D1 + trQ(AT + K T B T )P + QP (A + BK) + Q(R1 + R12 K + K T R12
T
+ K T R2 K)
Re-arrange the matrices according to trace rules,
JA = trD1T P D1 + tr(A + BK)QP + (A + BK)QP + Q(R1 + R12 K + R12 K + K T R2 K)
Collect powers of K,
JA = tr(D1T P D1 + QR1 + 2AQP ) + tr(2QR12 K + 2BKQP ) + tr(QK T R2 K) .
Re-arrange the fifth and sixth terms according to trace rules,
JA = tr(D1T P D1 + QR1 + 2AQP ) + 2tr(QR12 K + QP BK) + tr(KQK T R2 ) .
Finally, apply matrix calculus rules and solve for the optimal feedback gain matrix K,

JA (K, Q) = 0
∂K
T
2(R12 QT + B T P T QT ) + R2T KQT + R2 KQ = 0,
T
2(R12 Q + B T P Q) + R2 KQ + R2 KQ = 0,
KQ = −R2−1 (B T P + R12
T
)Q
K = −R2−1 (B T P + R12
T
) (18)
Recall that the “performance-ability” gramian, P , satisfies the left Lyapunov equation
ÃT P + P Ã + Ẽ T Ẽ = 0.
But à = A + BK, and K = −R2−1 (B T P + R12
T
) , so substituting
à = A + BK
= A − BR2−1 B T P − BR2−1 R12
T

and
Ẽ T Ẽ =(E1 + E2 K)T (E1 + E2 K)
=R1 + R12 K + K T R12 T
+ K T R2 K
=R1 − R12 R2−1 (B T P + R12
T T T −1 T
) − (B T P + R12 ) R2 R12 + (B T P + R12 T T −1
) R2 (B T P + R12
T
)
=R1 − R12 R2−1 B T P − R12 R2−1 R12
T
− P B T R2−1 R12
T
− R12 R2−1 R12
T
+
−1 T T −1 T −1 T −1 T
P BR2 B P + P B R2 R12 + R12 R2 B P + R12 R2 R12
= R1 + P BR2−1 B T P − R12 R2−1 R12
T

9
Defining,
 = A − BR2−1 R12
T

and
Σ = BR2−1 B T ,
and subbing all this into the left Lyapunov equation for the “performance-ability” gramian,
equation (15), we obtain

0 = (Â − ΣP )T P + P (Â − ΣP ) + R1 − R12 R2−1 R12


T
+ P ΣP
= ÂT P − P ΣP + P Â − P ΣP + R1 − R12 R2−1 R12T
+ P ΣP.
T −1 T
= Â P + P Â − P ΣP + R1 − R12 R2 R12 . (19)

which is matrix quadratic equation in P and is called an algebraic Riccati equation. The
solution of this equation for the “performance-ability” gramian, P , depends on the definition
of “performance” (E1 and E2 ), how controls affect the state dynamics (B), and the open-loop
system dynamics matrix (A).
The state-feedback gain matrix K of equation (18) using the “performance-ability”
gramian P computed from the Riccati equation (19) minimizes the objective metric (16)
such that the closed loop system is stable. This state-feedback gain matrix is called the
linear quadratic regulator (LQR).

5 References
1. Robert Stengel, Optimal Control and Estimation, Dover Press, 1994.

2. A.E. Bryson, Jr., Dynamic Optimization, Addison-Wesley, 1991.

3. Donald E. Kirk, Optimal Control Theory: An Introduction, Prentice Hall, 1970.

10

You might also like