0% found this document useful (0 votes)
82 views21 pages

DR Lee Peng Hin: EE6203 Computer Control Systems 231 DR Lee Peng Hin: EE6203 Computer Control Systems 232

This document discusses optimal control and dynamic programming. It contains the following key points: 1. Optimal control aims to find the control inputs that minimize a performance index function while satisfying system dynamics constraints. 2. Dynamic programming, developed by Bellman, can solve optimal control problems for nonlinear, time-varying systems by expressing the optimal control as state feedback. 3. Bellman's principle of optimality states that the optimal policy at each step only depends on the current state, not the sequence of events that preceded it.

Uploaded by

phatct
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views21 pages

DR Lee Peng Hin: EE6203 Computer Control Systems 231 DR Lee Peng Hin: EE6203 Computer Control Systems 232

This document discusses optimal control and dynamic programming. It contains the following key points: 1. Optimal control aims to find the control inputs that minimize a performance index function while satisfying system dynamics constraints. 2. Dynamic programming, developed by Bellman, can solve optimal control problems for nonlinear, time-varying systems by expressing the optimal control as state feedback. 3. Bellman's principle of optimality states that the optimal policy at each step only depends on the current state, not the sequence of events that preceded it.

Uploaded by

phatct
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Dr Lee Peng Hin : EE6203 Computer Control Systems 231 Dr Lee Peng Hin : EE6203 Computer Control Systems

tems 232

22 Optimal control • Optimal control :

– Let the plant be


Design approaches :
x(k + 1) = f k (x(k), u(k)) (22.1)
• Classical techniques :
where the superscript k on f indi-
– Frequency response and root locus
cates that it can be time-varying.
- are effective but largely trial and
Suppose that we associate with this
error, with experiences very use-
plant the performance index
ful.

• Modern techniques : Ji(x(i)) = φ(N, x(N ))+


N
 −1
– State space approach such as in Lk (x(k), u(k)) (22.2)
k=i
pole placement problems.
where [i, N ] is the time interval of
interest. We have shown the de-
pendence of J on the initial time
and state.
Dr Lee Peng Hin : EE6203 Computer Control Systems 233 Dr Lee Peng Hin : EE6203 Computer Control Systems 234

The optimal control problem is


to find the control u∗(k) on the
interval [i, N ] that drives the
system (22.1) along a trajec-
tory x∗(k) such that the per-
formance index (22.2) is min-
imised.

22.1 Dynamic Programming

Dynamic programming was developed


by RE Bellman in the late 1950s. It can
be used to solve control problems for
nonlinear, time-varying system. The
optimal control is expressed as a state
variable feedback form.
Dr Lee Peng Hin : EE6203 Computer Control Systems 235 Dr Lee Peng Hin : EE6203 Computer Control Systems 236
Dr Lee Peng Hin : EE6203 Computer Control Systems 237 Dr Lee Peng Hin : EE6203 Computer Control Systems 238

22.1.1 Bellman’s principle of optimality

Dynamic programming is based on Bell-


man’s principle of optimality which states
:

An optimal policy has the prop-


erty that no matter what the pre-
vious decisions (i.e., controls) have
been, the remaining decisions must
constitute an optimal policy with
regards to the state resulting from
those previous decisions
Dr Lee Peng Hin : EE6203 Computer Control Systems 239 Dr Lee Peng Hin : EE6203 Computer Control Systems 240

Example 22.1. An aircraft routing exam-


ple : An aircraft can fly from left to
right along the paths as shown in Fig-
ure 12. Intersections a, b, c, . . . rep-
resent cities, and the numbers repre-
sent the fuel required to complete each
path. We shall use the principle of
optimality to solve the minimum fuel
problem.

Figure 12: Aircraft routing network


Dr Lee Peng Hin : EE6203 Computer Control Systems 241 Dr Lee Peng Hin : EE6203 Computer Control Systems 242

To construct a state-variable feedback sidered to be uk = ±1, where uk = 1


that shows the optimal cost and the results in a move up, and uk = −1 re-
optimal control from any node to i, sults in a move down to stage k + 1.
we first define what is meant by state On hand is a minimum fuel problem
in this example. At each stage k = with fixed final state and constrained
0, 1, . . . N − 1, a decision is required, control and state values. To find the
and N = 4 is the final stage. The minimum fuel feedback control law us-
current state is the node where we are ing the principle of optimality, start at
making the current decision. Thus, the k = N = 4. No decision is required
initial state is x0 = a. At stage 1, here, so decrement k to 3. If x3 = f ,
the state can be x1 = b or x1 = d. the optimal (only) control is u3 = −1,
Similarly, x2 = c, e or g; x3 = f or h; and the cost is then 4. This is indicated
and the final state is constrained to be by placing (4) above node f , and plac-
xN = x4 = i. ing an arrowhead on path f → i. If
The control uk at stage k can be con- x3 = h, the optimal control is u3 = 1,
Dr Lee Peng Hin : EE6203 Computer Control Systems 243 Dr Lee Peng Hin : EE6203 Computer Control Systems 244

with a cost of 2, which is now indicated By successively decrementing k and


on the figure. continuing to compare the control pos-
Now decrement k to 2.If x2 = c, then sibilities allowed by the principle of op-
u2 = −1 with a cost to go of 4 + 3 = timality, we can fill in the remainder of
7. This information is added to the the control decisions (arrowheads) and
figure. If x2 = e, then we must make optimal costs to go shown in the figure.
a decision. If we apply u2 = 1 to go to It should be clearly realised that the
f , and then go via the optimal path to only control sequences we are allowed
i, the cost is 4 + 3 = 7. On the other to consider are those the last portions
hand, if we apply u2 = −1 at e and go of which are optimal sequences.
to h, the cost is 2 + 2 = 4. Hence, at e Note that when k = 0, a control of
the optimal decision is u2 = −1 with a either uo = 1 or uo = −1 yields the
cost to go of 4. same cost to go of 8.
If x2 = g, there is only one choice : To examine what we have just con-
u2 = 1 with a cost to go of 6. structed, suppose we now are told to
Dr Lee Peng Hin : EE6203 Computer Control Systems 245 Dr Lee Peng Hin : EE6203 Computer Control Systems 246

find the minimum fuel path from node sighted decision maker at a would com-
d to the destination i. All we need to pare the costs of traveling to b and d,
do is to begin at d and follow the ar- and decide to go to d. The next myopic
rows ! The optimal control u∗k and the decision would take him to g. From
cost to go at each stage k are deter- there on there is no choice : he must
mined if we know the value of xk . go via h to i. The net cost of this strat-
Our feedback law tells us how to get egy is 1 + 2 + 4 + 2 = 9, which is non
from any state to the fixed final state optimal.
x4 = xN = i. If we change the final
state, however (e.g., to x3 = xN = f ),
then the entire grid must be redone.
Note : Suppose we had attempted, in
ignorance of the optimality principle,
to determine an optimal route from a
to i by working forward. Then a near-
Dr Lee Peng Hin : EE6203 Computer Control Systems 247 Dr Lee Peng Hin : EE6203 Computer Control Systems 248

22.2 Principle of optimality to discrete k + 1 to the terminal time N for all


time systems possible states x(k + 1), and that we
have also found the optimal control se-
We now apply Bellman’s Principle of
quences from time k + 1 to N for all
optimality to the control of dynamical
x(k + 1). The optimal costs results
systems. Let the plant be
when the optimal control sequence u∗(k+
x(k + 1) = f k (x(k), u(k)) (22.3)
1), u∗(k + 2), . . . , u∗(N − 1) is applied
Suppose that the performance index is
to the plant with a state of x(k + 1).
Ji(x(i)) = φ(N, x(N ))+ If we apply any arbitrary control u(k)
N
 −1
at time k and then use the known op-
Lk (x(k), u(k)) (22.4)
k=i timal control sequence from k + 1 on,
We want to use the principle of opti-
the resulting cost will be,
mality to select the control sequence
∗ (x(k + 1)) (22.5)
Lk (x(k), u(k)) + Jk+1
u(k) to minimize (22.4).
Suppose that we have computed the where x(k) is the state at time k, and
∗ (x(k + 1)) from time
optimal cost Jk+1 x(k + 1) is given by (22.3). According
Dr Lee Peng Hin : EE6203 Computer Control Systems 249 Dr Lee Peng Hin : EE6203 Computer Control Systems 250

to Bellman, the optimal cost from time 22.3 Discrete time linear quadratic reg-
k on is equal to ulator via dynamic programming

Jk∗(x(k)) = min Lk(x(k), u(k)) Let the plant
u(k)

∗ (x(k + 1) (22.6)
+ Jk+1 x(k + 1) = Ax(k) + Bu(k) (22.7)

and the optimal control u∗(k) at time k have an associated performance index
1
is the u(k) that achieves this minimum. Ji = xT(N )S(N )x(N )+
2
(22.6) is the principle of optimality of N −1 
1 
xT(k)Qx(k) + uT(k)Ru(k)
discrete time systems. Its importance 2
k=i
lies in the fact that it allows us to op- (22.8)
timise over only one control vector at wit h
a time by working backwards from N .
S(N) ≥ 0, Q ≥ 0, R>0

(A square symmetric matrix is pos-


itive definite if and only if all its
eigenvalues are greater than 0. Eigen-
Dr Lee Peng Hin : EE6203 Computer Control Systems 251 Dr Lee Peng Hin : EE6203 Computer Control Systems 252

values of a symmetric matrix are ity (22.6). let k = N and write


all real. If some of the eigenval- ∗ = 1 xT (N )S(N )x(N )
JN (22.9)
2
ues are 0 and the rest are positive,
which is the penalty for being in state
the matrix is said to be positive
x(N ) at time N .
semidefinite.)
Now decrement k to N − 1 and write
(If the plant and weighting matrices 1
JN −1 = xT(N − 1)Qx(N − 1)+
are time-varying, the development to 2
1 T 1
follow still holds). It is desired to find u (N −1)Ru(N −1)+ xT(N )S(N )x(N )
2 2
the optimal control u∗(k) on the fixed (22.10)
time interval [i, N ] that minimises Ji. According to (22.6), we need to find
The initial state x(i) is given and the u∗(N − 1) by minimising (22.10). To
final state x(N ) is free.
We shall now determine the optimal
cost u∗(k) by the principle of optimal-
Dr Lee Peng Hin : EE6203 Computer Control Systems 253 Dr Lee Peng Hin : EE6203 Computer Control Systems 254

do this, use (22.7) to write Solving for the optimal control yields
 −1
1
JN −1 = xT(N − 1)Qx(N − 1)+ u∗(N − 1) = − BTS(N )B + R
2
1 T
u (N − 1)Ru(N − 1) BTS(N )Ax(N − 1) (22.13)
 2 T
1
+ Ax(N − 1) + Bu(N − 1) S(N ) Defining,
2   −1
Ax(N − 1) + Bu(N − 1) (22.11) K(N − 1) = BTS(N )B + R

The minimum of JN −1 is found by set- BTS(N )A (22.14)

ting we can write


∂JN −1
0= = Ru(N − 1)+ u∗(N − 1) = −K(N − 1)x(N − 1)
∂uN −1
 
(22.15)
BTS(N ) Ax(N − 1) + Bu(N − 1)
The optimal cost to go from k = N −
(22.12)
1 is found by substituting (22.15) into
Dr Lee Peng Hin : EE6203 Computer Control Systems 255 Dr Lee Peng Hin : EE6203 Computer Control Systems 256

(22.11) and yields, Now decrement to k = N − 2. Then,


 T 1
1
∗ T JN −2 = xT(N − 2)Qx(N − 2)+
JN −1 = x (N −1) A − BK(N −1)
2
2  1 T
S(N ) A − BK(N − 1) + u (N − 2)Ru(N − 2)
2
1
+ xT(N − 1)S(N − 1)x(N − 1)
KT (N − 1)RK(N − 1) 2

(22.19)
+ Q x(N − 1) (22.16)
are the admissible costs for N −2, since
If we define
 T these costs that are optimal from N −1
S(N − 1) = A − BK(N − 1) on. To determine u∗(N − 2), according
 
to (22.6), we must minimise (22.19), in
S(N ) A − BK(N − 1) +
a similar way as before. If we contin-
KT (N − 1)RK(N − 1) + Q (22.17)
ued to decrement k and apply the op-
Then, (22.16) becomes timality principle, the result for each
∗ 1 T
JN −1 = x (N − 1)S(N − 1)x(N − 1)
2
(22.18)
Dr Lee Peng Hin : EE6203 Computer Control Systems 257 Dr Lee Peng Hin : EE6203 Computer Control Systems 258

k = N − 1, . . . , 1, 0 is Example 22.2. Consider a discrete time


 −1
system,
K(k) = BTS(k + 1)B + R

BTS(k + 1)A (22.20) x(k + 1) = 2x(k) + u(k) (22.25)


2
u∗(k) = −K(k)x(k) (22.21) J2 = x2(k) + u2(k)
(22.26)
 T
k=0
S(k) = A − BK(k) S(k + 1) Here
 
A − BK(k) + (22.22) A = 2, B = 1, Q = 2, R = 2, S(3) = 0
KT (k)RK(k) + Q (22.23)
1 K(2) = (B T S(3)B + R)−1B T S(3)A = 0
Jk∗ = xT(k)S(k)x(k) (22.24)
2 S(2) = (A − BK(2))T S(3)(A − BK(2))
where the final condition S(N ) for (22.23)
+K T (2)RK(2) + Q = 2
is given in (22.8).
K(1) = (B T S(2)B + R)−1B T S(2)A = 1
S(1) = 6
K(0) = 1.5
S(0) = 8
Dr Lee Peng Hin : EE6203 Computer Control Systems 259 Dr Lee Peng Hin : EE6203 Computer Control Systems 260

Hence the optimal gain schedule is 23 Solution of the discrete Riccati dif-

{K(0), K(1)} = {1.5, 1} ference equation

and from (22.24), the minimum cost is The difference equations employed in
1
J ∗ = xT(0)S(0)x(0) = 4x2(0) the design of optimal control systems
2
are :

 −1
K(k) = BTS(k + 1)B + R

BTS(k + 1)A (23.1a)


 T
S(k) = A − BK(k) S(k + 1)
 
A − BK(k) +

KT (k)RK(k) + Q (23.1b)
Dr Lee Peng Hin : EE6203 Computer Control Systems 261 Dr Lee Peng Hin : EE6203 Computer Control Systems 262

Sub (23.1a) into (23.1b) and rearrang- Note that through some lengthy ma-
ing, nipulations involving Hamiltonians, gen-
eralised eigenvalues, etc., for the steady
S(k) = ATS(k + 1)A + Q−
 −1 state solution as N → ∞ such that the
ATS(k + 1)B BTS(k + 1)B + R
gains K(k) have become constant val-
BTS(k + 1)A (23.2) ues, it must then be true in (23.2) that

which is the discrete Riccati equation. S(k) = S(k + 1) = constant matrix


For the infinite time problem, as N → (23.3)
∞, the following conditions are required Denoting this matrix as Ŝ, (23.2) be-
for the asymptotically stability of the comes
closed loop system :
Ŝ = ATŜA + Q−
 −1
• The pairs (A, B) and (A, C) must be
ATŜB BTŜB + R BTŜA (23.4)
completely controllable and observ-
• Equation (23.4) is referred to as the
able, respectively, for any n x n ma-
discrete algebraic Riccati equation.
trix C such that CCT = Q.
Dr Lee Peng Hin : EE6203 Computer Control Systems 263 Dr Lee Peng Hin : EE6203 Computer Control Systems 264

• The solution of this equation can be Example 23.1. A second-order digital pro-
found by recursion or by the eigenvalue- cess is described by
 
eigenvector method. We shall dis- 0 1
x(k + 1) = x(k) +
cuss the recursive method here : −1 1
 
0
– set N to a large value and cal- u(k) (23.5)
1
culate the values of the S matrix
T
Given that x(0) = 1 1 , find the
(by computer) until the matrix el-
optimal control u(k), k = 0, 1, 2, . . . , 7
ements become constant values.
such that the performance index
– the computer solution requires set-

7
ting a tolerance level , so that the J8 = [x21(k) + u2(k)] (23.6)
k=0
difference between every element
is minimised. We have
of S(k) and the corresponding el-    
20 0 0
N = 8, R = 2, Q = , S(8) =
ement of S(k − 1) is less than . 00 0 0
– Then we have the solution to (23.4). (23.1a) and (23.1b) are solved recur-
sively to give,
Dr Lee Peng Hin : EE6203 Computer Control Systems 265 Dr Lee Peng Hin : EE6203 Computer Control Systems 266

 
and the constant optimal control is

S(7) =
2 0
K(7) = 0 0


0 0
 K = −0.654 0.486
2 0

S(6) = K(6) = 0 0
0 2
 

For N = 8, the finite time problem al-


3 −1
S(5) = K(5) = −0.5 0.5

−1 3
 ready has solutions rapidly approach-
3.2 −0.8

S(4) = K(4) = −0.6 0.4 ing these values. Note that in this ex-
−0.8 3.2
 
3.23 −0.922

S(3) = K(3) = −0.615 0.462 ample, since the pair (A, B) is com-
−0.922 3.69
 
−0.973

S(2) =
3.297
K(2) = −0.651 0.481
pletely controllable and we can find a
−0.973 3.729
  T

2 x 2 matrix
S(1) =
3.301 −0.962
K(1) = −0.652 0.485
 √ C such that CC = Q

−0.962 3.75
 2 0
3.305 −0.97
(e.g. C = ) and (A, C) is com-
S(0) = K(0) = −0.6538 0.486 0 0
−0.97 3.777
pletely observable, the closed loop sys-

For large values of N , we can show tem will be asymptotically stable for

that the Riccati equation approaches N = ∞.

the steady state solution of


 
3.308 −0.972
S=
−0.972 3.78
Dr Lee Peng Hin : EE6203 Computer Control Systems 267 Dr Lee Peng Hin : EE6203 Computer Control Systems 268

Example on infinite time problem. Given


   
1 1 1
A= ; B=
Consider the discrete-time SS model : 1 0 0
   
1 1 1 The optimal control law is
x(k + 1) = x(k) + u(k)
1 0 0
(23.7) u(k) = −Kx(k)
and the performance index where

1 T
J = (x (k)Qx(k) + uT (k)Ru(k)) K = (B T SB + R)−1B T SA
2
k=0
 
1 0 and S solves
where Q = ; R = 1. Deter-
0 1 S = AT SA + Q
mine the optimal controller and opti-
− AT SB(R + B T SB)−1B T SA (23.8)
mal trajectories,
  x(k), for k = 1, 2, 3, if
1
x(0) = . Comment on the stability
0
of the closed loop system.
Dr Lee Peng Hin : EE6203 Computer Control Systems 269 Dr Lee Peng Hin : EE6203 Computer Control Systems 270

 
S11 S12 (23.10) gives S12 = 1. Then from (23.9),
Let S = . Then (23.8) gives
S12 S22 we have
 
S11 S12 S22 = S11 − 2 (23.12)
=
S12 S22
  Sub (23.12) into (23.11), we have
(S11 + 2S12 + S22 + 1) S11 + S12
S11 + S12 S11 + 1 2 − 3S − 3 = 0
S11
  11
1 (S11 + S12)2 S11(S11 + S12)
− 2 which gives
S11 + 1 S11(S11 + S12) S11
which gives S11 = 3.7913; −0.7913

S11 = S11 + 2S12 + S22 Choose


 S11= 3.7913
 so that 
S11 S12 3.7913 1
(S11 + S12)2 = >0
+1− (23.9) S12 S22 1 1.7913
S11 + 1
The control law u(k) = −Kx(k), where
S11(S11 + S12)
S12 = S11 + S12 − K = (B T SB + R)−1 B T SA
S11 + 1

(23.10) = 1 0.7913

2 i.e.,
S11

S22 = S11 + 1 − (23.11)


S11 + 1 u(k) = − 1 0.7913 x(k) (23.13)
Dr Lee Peng Hin : EE6203 Computer Control Systems 271 Dr Lee Peng Hin : EE6203 Computer Control Systems 272

 
The closed-loop state equation, with 0.6206 x 10−10
Note that x(30) =
the optimal control law u(k) is 0
and x(N ) → 0 as N → ∞.
x(k + 1) = (A − BK)x(k)
To compute the optimal trajectories
Closed-loop poles are x(k), can also use (23.7) and (23.13)
  
0 0.2087 1
λi(A − BK) = λi iteratively, given that x(0) = .
1 0 0
= ±0.4568

i.e. the closed-loop system is stable.


Also, the optimal trajectories,

x(N ) = (A − BK)N x(0)

which gives
 
0
x(1) =
1
   
0.2087 0
x(2) = ; x(3) =
0 0.2087

You might also like