0% found this document useful (0 votes)
5 views32 pages

Linear Quadratic Optimal Control

Bb

Uploaded by

Sri ram Mandala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views32 pages

Linear Quadratic Optimal Control

Bb

Uploaded by

Sri ram Mandala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

ME 547: Linear Systems

Linear Quadratic Optimal Control

Xu Chen

University of Washington

UW Linear Systems (X. Chen, ME547) LQ 1 / 32


Motivation

state feedback control:


▶ allows to arbitrarily assign the closed-loop eigenvalues for a
controllable system
▶ the eigenvalue assignment has been manual thus far
▶ performance is implicit: we assign eigenvalues to induce proper
error convergence
linear quadratic (LQ) optimal regulation control, aka, LQ regulator
(or LQR):
▶ no need to specify closed-loop poles
▶ performance is explicit: a performance index is defined ahead of
time

UW Linear Systems (X. Chen, ME547) LQ 2 / 32


1. Problem formulation

2. Solution to the finite-horizon LQ problem

3. From finite-horizon LQ to stationary LQ

UW Linear Systems (X. Chen, ME547) LQ 3 / 32


Goal
Consider an n-dimensional state-space system
ẋ(t) = Ax (t) + Bu (t) , x (t0 ) = x0
(1)
y (t) = Cx (t)
where x ∈ Rn , u ∈ Rr , and y ∈ Rm .
LQ optimal control aims at minimizing the performance index
tf
1 1
Z  
J = x T (tf )Sx(tf ) + x T (t)Qx(t) + u T (t)Ru(t) dt
2 2 t0

▶ S ⪰ 0, Q ⪰ 0, R ≻ 0: for a nonnegative cost and well-posed


problem
▶ 21 x T (tf )Sx(tf ) penalizes the deviation of x from the origin at tf
▶ x T (t)Qx(t) t ∈ (t0 , tf ) penalizes the transient
▶ often, Q = C T C ⇒ x T (t)Qx(t) = y (t)T y (t)
▶ u T (t)Ru(t) penalizes large control efforts
UW Linear Systems (X. Chen, ME547) LQ 4 / 32
Observations
tf
1 1
Z
J = x T (tf )Sx(tf ) + x T (t)Qx(t) + u T (t)Ru(t) dt

2 2 t0

▶ when the control horizon is made to be infinitely long, i.e.,


tf → ∞, the problem reduces to the infinite-horizon LQ problem
1 ∞ T
Z
x (t)Qx(t) + u T (t)Ru(t) dt

J=
2 t0
▶ terminal cost is not needed, as it will turn out, that the control
will have to drive x to the origin. Otherwise J will go
unbounded.
▶ often, we have t0 = 0 and

1 ∞ T
Z 
J= x (t)Qx(t) + u T (t)Ru(t) dt
2 0

UW Linear Systems (X. Chen, ME547) LQ 5 / 32


1. Problem formulation

2. Solution to the finite-horizon LQ problem

3. From finite-horizon LQ to stationary LQ

UW Linear Systems (X. Chen, ME547) LQ 6 / 32


Solution to the finite-horizon LQ
Consider the performance index

1 T 1 tf T
Z
x (t)Qx(t) + u T (t)Ru(t) dt

J = x (tf )Sx(tf ) +
2 2 t0

with ẋ = Ax + Bu, x (t0 ) = x0 , S ⪰ 0, R ≻ 0, and Q = C T C .


▶ do a Lyapunov-like construction: V (t) ≜ 12 x T (t) P (t) x (t)
▶ then
d 1 1 1
V (t) = ẋ T (t) P (t) x (t) + x T (t) Ṗ (t) x (t) + x T (t) P (t) ẋ (t)
dt 2 2 2
1 T 1 T dP 1 T
= (Ax + Bu) Px + x x + x P (Ax + Bu)
2  2 dt  2 
1 dP
= x T (t) AT P + + PA x (t) + u T B T Px + x T PBu
2 dt

UW Linear Systems (X. Chen, ME547) LQ 7 / 32


Solution to the finite-horizon LQ
d
with dt
V (t) from the last slide, we have
Z tf
V (tf ) − V (t0 ) = V̇ dt
t0
Z tf    
1 T T dP T T T
= x A P + PA + x + u B Px + x PBu dt
2 t0 dt

▶ adding
tf
1 1
Z
J = x T (tf )Sx(tf ) + x T (t)Qx(t) + u T (t)Ru(t) dt

2 2 t0

yields
1
J + V (tf ) − V (t0 ) = x T (tf )Sx(tf )+
 2 
Z tf  
1 x T AT P + PA + Q + dP x + u T B T Px + x T PBu + u T Ru  dt
2 t0 dt | {z } | {z }
products of x and u quadratic

UW Linear Systems (X. Chen, ME547) LQ 8 / 32


Solution to the finite-horizon LQ
▶ “complete the squares” in u T B T Px + x T PBu + u T
| {z } | {zRu} (scalar
products of x and u quadratic
case):
scalar case
u T B T Px + x T PBu + u T Ru = Ru 2 + 2xPBu
   2  2
=Ru 2 + 2 xPBR −1/2 R 1/2
u + R −1/2
BPx − R −1/2
BPx

| {z }
Ru 2
 2  2
= R 1/2 u + R −1/2 BPx − R −1/2 BPx

▶ extending the concept to the general vector case:


1 −1
u T B T Px+x T PBu+u T Ru = ∥R 2 u + R 2 B T Px∥22 −x T PBR −1 B T Px
| {z }
recall ∥−

a ∥22 =−

a T−

a

UW Linear Systems (X. Chen, ME547) LQ 9 / 32


Solution to the finite-horizon LQ

1
J + V (tf ) − V (t0 ) = x T (tf )Sx(tf )+
 2 
1 tf 
Z  
x T AT P + PA + Q + dP x + u T B T Px + x T PBu + u T Ru

 dt
2 t0  dt | {z } 
1 −1
∥R 2 u+R 2 B T Px∥22 −x T PBR −1 B T Px

⇓“completing the squares”


1 1 1
J + x T (tf )P (tf ) x(tf ) − x T (t0 )P (t0 ) x(t0 ) = x T (tf )Sx(tf )+
2 2 2
Z tf    
1 dP 1 −1
xT + AT P + PA + Q − PBR −1 B T P x + ∥R 2 u + R 2 B T Px∥22 dt
2 t0 dt

UW Linear Systems (X. Chen, ME547) LQ 10 / 32


Solution to the finite-horizon LQ
1
J + V (tf ) − V (t0 ) = x T (tf )Sx(tf )+
2
1 tf
Z    
T T dP T T T T
x A P + PA + Q + x + u B Px + x PBu + u Ru dt
2 t0 dt
⇓“completing the squares”
1 1 1
J + x T (tf )P (tf ) x(tf ) − x T (t0 )P (t0 ) x(t0 ) = x T (tf )Sx(tf )+
2 2 2
Z tf   !
1 T dP T −1 T 1 −1
T 2
x + A P + PA + Q − PBR B P x + ∥R 2 u + R 2 B Px∥2 dt
2 t0 dt

▶ the best that the control can do in minimizing the cost is to have
u(t) = −K (t) x (t) = −R −1 B T P(t)x(t)
dP
− = AT P + PA − PBR −1 B T P + Q, P(tf ) = S
dt
to yield the optimal cost J 0 = 12 x0T P(t0 )x0
UW Linear Systems (X. Chen, ME547) LQ 11 / 32
Observation 1

u(t) = −K (t) x (t) = −R −1 B T P(t)x(t) optimal control law


dP
− = AT P + PA − PBR −1 B T P + Q, P(tf ) = S the Riccati differential equation
dt

▶ the control u(t) = −R −1 B T P (t) x(t) is a state feedback law


(the power of state feedback!)
▶ the state feedback law is time-varying because of P (t)
▶ the closed-loop dynamics becomes

A − BR −1 B T P (t) x (t)

ẋ (t) = Ax (t) + Bu (t) =
| {z }
time-varying closed-loop dynamics

UW Linear Systems (X. Chen, ME547) LQ 12 / 32


Observation 2
u(t) = −K (t) x (t) = −R −1 B T P(t)x(t) optimal state feedback control
dP
− = AT P + PA − PBR −1 B T P + Q, P(tf ) = S the Riccati differential equation
dt

▶ boundary condition of the Riccati equation is given at the final


time tf ⇒ the equation must be integrated backward in time
▶ backward integration of
dP
− = AT P + PA + Q − PBR −1 B T P, P (tf ) = S
dt
is equivalent to the forward integration of
dP ∗
= AT P ∗ + P ∗ A + Q − P ∗ BR −1 B T P ∗ , P ∗ (0) = S (2)
dt
by letting P (t) = P ∗ (tf − t)
▶ Eq. (2) can be solved by numerical integration, e.g., ODE45 in
Matlab
UW Linear Systems (X. Chen, ME547) LQ 13 / 32
Observation 3

tf
1 1
Z
J = x T (tf )Sx(tf ) + x T (t)Qx(t) + u T (t)Ru(t) dt

2 2 t0

1
J 0 = x0T P(t0 )x0
2
▶ the minimum value J 0 is a function of the initial state x (t0 )
▶ J (and hence J 0 ) is nonnegative ⇒ P (t0 ) is at least positive
semidefinite
▶ t0 can be taken anywhere in (0, tf ) ⇒ P (t) is at least positive
semidefinite for any t

UW Linear Systems (X. Chen, ME547) LQ 14 / 32


Example: LQR of a pure inertia system
Consider
1 tf  T
   
1 T
Z
0 1 0 
ẋ = x+ u, J = x (tf ) Sx (tf ) + x Qx + Ru 2 dt
0 0 1 2 2 0
   
1 0 1 0
where S = , Q= , R >0
0 1 0 0
▶ we let P (t) = P ∗ (tf − t) and solve
dP ∗
 
T ∗ ∗ ∗ −1 T ∗ ∗ 1 0
= A P + P A + Q − P BR B P , P (0) =
dt 0 1

       
dP 0 0 ∗ 0 1 1 0 0 1
P + P∗ − P∗ 1 P∗
 
⇔ = + 0
dt 1 0 0 0 0 0 1 R
▶ letting 
d ∗ 1 ∗ 2
 ∗ ∗
  dt p11 = 1 − R (p12 )
 ∗
p11 (0) = 1
p p12 ∗
P ∗ = 11 ∗
d ∗ ∗ 1 ∗ ∗
∗ ⇒ dt p12 = p11 − R p12 p22 ⇒ p12 (0) = 0
p12 p22 ∗
∗ 2 p22 (0) = 1

− R1 (p22
d ∗ ∗
p = 2p12
dt 22
)
UW Linear Systems (X. Chen, ME547) LQ 15 / 32
Example: LQR of a pure inertia system: analysis
P * with R = 0.0001
1.0 p11*
p12*
0.8 p22*

0.6

0.4

0.2

0.0
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
time/s

1 0
Figure: LQ example: P ∗ (0) = , P (t) = P ∗ (tf − t)
0 1

▶ if the final time tf is large, P ∗ (t) forward converges to a


stationary value
▶ i.e., P (t) backward converges to a stationary value at P (0)
UW Linear Systems (X. Chen, ME547) LQ 16 / 32
Example: LQR of a pure inertia system: analysis
P * with R = 1 P * with R = 100
1.75
1.50 40
1.25
30
1.00 p11*
p12*
0.75 20 p22*
0.50
p11* 10
0.25 p12*
0.00 p22* 0
0 2 4 6 8 10 12 14 0 5 10 15 20 25 30 35 40
time/s time/s
 
1 0
Figure: LQ example with different penalties on control. P ∗ (0) =
0 1

▶ a larger R results in a longer transient


▶ i.e., a larger penalty on the control input yields a longer time to
settle
UW Linear Systems (X. Chen, ME547) LQ 17 / 32
Example: LQR of a pure inertia system: analysis

P * with R = 100 P * with R = 100 and a different initial value


p11*
40 60 p12*
p22*
50
30
p11* 40
p12*
20 p22* 30
20
10
10
0 0
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
time/s time/s
   
∗ 1 0 ∗ 20 0
(a) P (0) = (b) P (0) =
0 1 0 2

Figure: LQ with different boundary values in Riccati difference Eq.

▶ for the same R, the initial value P (tf ) = S becomes irrelevant


as tf → ∞
UW Linear Systems (X. Chen, ME547) LQ 18 / 32
1. Problem formulation

2. Solution to the finite-horizon LQ problem

3. From finite-horizon LQ to stationary LQ

UW Linear Systems (X. Chen, ME547) LQ 19 / 32


From LQ to stationary LQ

P * with R = 100 P * with R = 100 and a different initial value


p11*
40 60 p12*
p22*
50
30
p11* 40
p12*
20 p22* 30
20
10
10
0 0
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
time/s time/s

▶ in the example, we see that P in the Riccati differential Eq.


converges to a stationary value given sufficient time
▶ when tf → ∞, LQ becomes the stationary LQ problem, under
two additional conditions that we now discuss in details:
▶ (A, B) is controllable/stabilizable
▶ (A, C ) is observable/detectable

UW Linear Systems (X. Chen, ME547) LQ 20 / 32


Need for controllability/stabilizability
1 T 1
Z tf  
J= x (tf )Sx(tf ) + x T (t)Qx(t) + u T (t)Ru(t) dt
2 2 t0

dP
− = AT P + PA − PBR −1 B T P + Q, P(tf ) = S the Riccati differential equation
dt

1 T
J0 = x P(t0 )x0
2 0

if (A, B) is controllable or stabilizable, then P (t) is


guaranteed to converge to a bounded and stationary value
▶ for uncontrollable or unstabilizable systems, there can be
unstable uncontrollable modes that cause J to be unbounded
▶ then if J 0 = 21 x0T P (0) x0 is unbounded, we will have
||P (0) || = ∞

UW Linear Systems (X. Chen, ME547) LQ 21 / 32


Need for controllability/stabilizability
if (A, B) is controllable or stabilizable, then P (t) is
guaranteed to converge to a bounded and stationary value
▶ e.g.: ẋ = x + 0 · u, x (0) = 1, Q = 1 and R be any positive
value
▶ system is uncontrollable and the uncontrollable mode is unstable
▶ x (t) will keep increasing to infinity
▶ ⇒J = 12 0∞ x T Qx + u T Ru dt unbounded regardless of u (t)
R 

▶ in this case, the Riccati equation is

dP dP ∗
− = P + P + 1 = 2P + 1 ⇔ = 2P ∗ + 1
dt dt
forward integration of P ∗ (backward integration of P), will drive
P ∗ (∞) and P (0) to infinity

UW Linear Systems (X. Chen, ME547) LQ 22 / 32


Need for observability/detectability


1
Z
x T (t)Qx(t) + u T (t)Ru(t) dt

J=
2 t0

with ẋ = Ax + Bu, x (t0 ) = x0 , R ≻ 0, and Q = C T C .


if (A, C ) is observable or detectable, the optimal state
feedback control system will be asymptotically stable
▶ intuition: if the system is observable, y = Cx will relate to all
states ⇒ regulating x T Qx = x T C T Cx will regulate all states
▶ formally: if (A, C ) is observable (detectable), the solution of the
Riccati equation will converge to a positive (semi)definite value
P+ (proof in course notes)

UW Linear Systems (X. Chen, ME547) LQ 23 / 32


From LQ to stationary LQ

LQ stationary LQ

J = 12 x T (tf )Sx(tf )+ 1
R∞
x T Qx + u T Ru dt

Cost 1
R tf ⇒ J= 2 t0
x T (t)Qx(t) + u T (t)Ru(t) dt

2 t0
ẋ = Ax + Bu
Syst. ẋ = Ax + Bu ⇒ (A, B) controllable/stabilizable
(A, C ) observable/detectable

Riccati Eq. (RE) Algebraic RE (ARE)


Key Eq.
− dP = A T P + PA − PBR −1 B T P
dt ⇒ AT P + PA − PBR −1 B T P + Q = 0
+Q, P(tf ) = S
Opt.
control u(t) = −R −1 B T P(t)x(t) ⇒ u(t) = −R −1 B T P+ x(t)
& cost J 0 = 21 x0T P(t0 )x0 ⇒ J 0 = 21 x0T P+ x0

UW Linear Systems (X. Chen, ME547) LQ 24 / 32


More formally: Solution of the infinite-horizon LQ
For ∞
1
Z 
J= x (t)T Qx (t) + u (t)T Ru (t) dt, Q = C T C
2 t0
with ẋ(t) = Ax (t) + Bu (t) , x (t0 ) = x0 and R ≻ 0:
▶ if (A, B) is controllable (stabilizable) and (A, C ) is observable
(detectable)
▶ then the optimal control input is given by
u(t) = −R −1 B T P+ x(t)

▶ where P+ = P+T is the positive (semi)definite solution of the
algebraic Riccati equation (ARE)
AT P + PA − PBR −1 B T P + Q = 0
▶ and the closed-loop system is asymptotically stable, with
1
Jmin = J 0 = x (t0 )T P+ x (t0 )
2
UW Linear Systems (X. Chen, ME547) LQ 25 / 32
Observations
▶ the control u(t) = −R −1 B T Px(t) is a constant state feedback
law
▶ under the optimal control, the closed loop is given by
ẋ = Ax − BR −1 B T Px = A − BR −1 B T P x and J =

| {z }
Ac
1 ∞
R∞
x T Qx + u T Ru dt = 12 t0 x T Q + PBR −1 B T P xdt
R  
2 t0
| {z }
Qc
▶ for the above closed-loop system, the Lyapunov Eq. is
AT
c P + PAc = −Qc
T
⇔ A − BR −1 B T P P + P A − BR −1 B T P = −Q − PBR −1 B T P
 

⇔ AT P + PA − PBR −1 B T P = −Q (the ARE!)

▶ when the ARE solution P+ is positive definite, 12 x T P+ x is a


Lyapunov function for the closed-loop system
UW Linear Systems (X. Chen, ME547) LQ 26 / 32
Observations

▶ Lyapunov Eq. and the ARE:

1
R∞ 1
R∞ 
Cost J= 2 0
x T Qc xdt J= 2 t0
x T Qx + u T Ru dt
ẋ = Ax + Bu
Syst. dynamics ẋ = Ac x (A, B) controllable/stabilizable
(A, C ) observable/detectable
Key Eq. AT
c P + PAc + Qc = 0 AT P + PA − PBR −1 B T P + Q = 0
Optimal control N/A u(t) = −R −1 B T P+ x(t)
0 T
Opt. cost J = 21 x T (0) P+ x (0) J 0 = 12 x (t0 ) P+ x (t0 )

▶ the guaranteed closed-loop stability is an attractive feature


▶ more nice properties will show up later

UW Linear Systems (X. Chen, ME547) LQ 27 / 32


Example: Stationary LQR of a pure inertia system
▶ Consider
1 ∞ T 1 0
    Z    
0 1 0 2
ẋ = x+ u, J = x x + Ru dt, R > 0
0 0 1 2 0 0 0
▶ the ARE is
2R 1/4 1/2
        √ 
0 0 0 1 1 0 0 1  
√R 3/4
0= P +P + −P 0 1 P ⇒ P+ =
1 0 0 0 0 0 1 R R 1/2 2R

▶ the closed-loop A matrix can be computed to be


 
−1 T 0 √ 1
Ac = A − BR B P+ =
−R −1/2 − 2R −1/4
▶ ⇒ closed-loop eigenvalues:
1 1
λ1,2 = − √ ±√ j
2R 1/4 2R 1/4
UW Linear Systems (X. Chen, ME547) LQ 28 / 32
1 ∞
    Z    
0 1 0 1 0
ẋ = x+ u, J = xT x + Ru 2 dt
0 0 1 2 0 0 0

Root locus
1.00 R 0
0.75
0.50
0.25

Imag axis 0.00 R


0.25
0.50
0.75
1.00
1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00
Real axis

Figure: Eigenvalue λ1,2 = − √2R1 1/4 ± √ 1


2R 1/4
j evolution (root locus)

▶ R ↑ (more penalty on the control input) ⇒ λ1,2 move closer to


the origin ⇒ slower state convergence to zero
▶ R ↓ (allow for large control efforts) ⇒ λ1,2 move further to the
left of the complex plane ⇒ faster speed of closed-loop dynamics
UW Linear Systems (X. Chen, ME547) LQ 29 / 32
MATLAB commands
▶ care: solves the ARE for a continuous-time system:

[P, Λ, K ] = care A, B, C T C , R


where K = R −1 B T P and Λ is a diagonal matrix with the


closed-loop eigenvalues, i.e., the eigenvalues of A − BK , in the
diagonal entries.
▶ lqr and lqry: provide the LQ regulator with

[K , P, Λ] = lqr A, B, C T C , R


[K , P, Λ] = lqry (sys, Qy , R)

where sys is defined by ẋ = Ax + Bu, y = Cx + Du, and


1 ∞ T
Z
y Qy y + u T Ru dt

J=
2 0
UW Linear Systems (X. Chen, ME547) LQ 30 / 32
Additional excellent properties of stationary LQ

▶ we know stationary LQR yields guaranteed closed-loop stability


for controllable (stabilizable) and observable (detectable)
systems
It turns out that LQ regulators with full state feedback has excellent
additional properties of:
▶ at least a 60 degree phase margin
▶ infinite gain margin
▶ stability is guaranteed up to a 50% reduction in the gain

UW Linear Systems (X. Chen, ME547) LQ 31 / 32


Applications and practice

choosing R and Q:
▶ if there is not a good idea for the structure for Q and R, start
with diagonal matrices;
▶ gain an idea of the magnitude of each state variable and input
variable
▶ call them xi,max (i = 1, . . . , n) and ui,max (i = 1, . . . , r )
▶ make the diagonal elements of Q and R inversely proportional to
||xi,max ||2 and ||ui,max ||2 , respectively.

UW Linear Systems (X. Chen, ME547) LQ 32 / 32

You might also like