Optimal Control: Dynamic Programming
Optimal Control: Dynamic Programming
Dynamic Programming
Bellman’s Principle of Optimality
b’
b
a
Assertion
– If a+b is the best, there is no b’ is better than b.
Principle of optimality:
– An optimal policy has the property that whatever the initial state
and initial decision are, the remaining decisions must constitute an
optimal policy with regard to the state resulting from the first
decision
d*
a s1
b
s2 e* 1
0
f*
c s3
d*
a
b
0 e* 1
f*
c
7 5 2 5
3 3 6
4 4 8 4
7 5 3
End point
22
8 16
4 12
5 9
7 5 2 5
15 3 12 3 10 6 4
4 4 8 4
15 7 8 5 3 3 0
End point
10 possible routes
if moving to the right or moving
downward is only allowed
• In our control problem, the cost value is a functional by the state and
the control, 𝑥, and 𝑢.
𝑡𝑓
𝐽 = න 𝑔 𝑥 𝑡 , 𝑢 𝑡 𝑑𝑡
0
𝐽
𝑔 𝑥 𝑡 ,𝑢 𝑡
𝑡𝑓 𝑡
𝐽𝑘,𝑁 𝑥(𝑘)
𝑥(𝑘)
𝑘
0 1 𝑘 𝑘+1 𝑘+2 𝑁−1 𝑁
𝐽𝑘,𝑁 𝑥(𝑘) : cost from the current state 𝑥(𝑘) to the final state while
the state 𝑥 is determined by the state equation,
𝑥 𝑘 + 1 = 𝑓 𝑥 𝑘 ,𝑢 𝑘
∗
𝐽𝑘,𝑁 𝑥(𝑘) ?
𝑥(𝑘)
𝑘
0 1 𝑘 𝑘+1 𝑁
𝑥 𝑘 + 1 = 𝑓 𝑥 𝑘 ,𝑢 𝑘
∗ ∗
𝐽𝑘,𝑁 𝑥 𝑘 = min 𝐽𝑘,𝑘+1 𝑥 𝑘 , 𝑢 𝑘 + 𝐽𝑘+1,𝑁 𝑥 𝑘+1
𝑢 𝑘
∗
𝐽𝑘,𝑁 𝑥(𝑘)
𝑥(𝑘)
∗
Interpolation is needed for calculating 𝐽𝑘,𝑁 𝑥 𝑘
𝑥 𝑘 + 1 = 𝑓 𝑥 𝑘 ,𝑢 𝑘
– State equation
𝑥ሶ 𝑡 = 𝑎 𝑥 𝑡 , 𝑢 𝑡
𝑡𝑓
𝐽 = ℎ 𝑥 𝑡𝑓 + න 𝑔 𝑥 𝑡 , 𝑢 𝑡 𝑑𝑡
0
– We will denote by
𝑥 𝑘 + 1 ≜ 𝑎𝐷 𝑥 𝑘 , 𝑢 𝑘
– We will denote by
𝑁−1
𝐽=ℎ 𝑥 𝑁 + 𝑔𝐷 𝑥 𝑘 , 𝑢 𝑘
𝑘=0
𝐽𝑁−1,𝑁 𝑥 𝑁 − 1 , 𝑢 𝑁 − 1 ≜ 𝑔𝐷 𝑥 𝑁 − 1 , 𝑢 𝑁 − 1 +ℎ 𝑥 𝑁
= 𝑔𝐷 𝑥 𝑁 − 1 , 𝑢 𝑁 − 1 + 𝐽𝑁𝑁 𝑥 𝑁
𝐽𝑁−1,𝑁 𝑥 𝑁 − 1 , 𝑢 𝑁 − 1 = 𝑔𝐷 𝑥 𝑁 − 1 , 𝑢 𝑁 − 1 + 𝐽𝑁𝑁 𝑎𝐷 𝑥 𝑁 − 1 , 𝑢 𝑁 − 1
∗ ∗
𝐽𝑁−3,𝑁 𝑥 𝑁−3 = min 𝑔𝐷 𝑥 𝑁 − 3 , 𝑢 𝑁 − 3 + 𝐽𝑁−2,𝑁 𝑎𝐷 𝑥 𝑁 − 3 , 𝑢 𝑁 − 3
𝑢 𝑁−3
𝑁−1
∗
𝐽𝑁−𝐾,𝑁 𝑥 𝑁−𝐾 = min ℎ 𝑥 𝑁 + 𝑔 𝑥 𝑘 ,𝑢 𝑘
𝑢 𝑁−𝐾 ,𝑢 𝑁−𝐾+1 ,…,𝑢 𝑁−1
𝑘=𝑁−𝐾
– Recurrence relation
∗
𝐽𝑁−𝐾,𝑁 𝑥 𝑁−𝐾
∗
= min 𝑔𝐷 𝑥 𝑁 − 𝐾 , 𝑢 𝑁 − 𝐾 + 𝐽𝑁−(𝐾−1),𝑁 𝑎𝐷 𝑥 𝑁 − 𝐾 , 𝑢 𝑁 − 𝐾
𝑢 𝑁−𝐾
∗
𝐽𝑁−𝐾,𝑁 𝑥(𝑁 − 𝐾)
𝑥(𝑁 − 𝐾)
𝑁
𝑘
0 𝑁−𝐾 𝑁−𝐾 +1
𝑥 𝑁 − 𝐾 + 1 = 𝑎𝐷 𝑥 𝑁 − 𝐾 , 𝑢 𝑁 − 𝐾
∗
𝐽𝑁−𝐾,𝑁 𝑥 𝑁−𝐾
∗
= min 𝑔𝐷 𝑥 𝑁 − 𝐾 , 𝑢 𝑁 − 𝐾 + 𝐽𝑁−(𝐾−1),𝑁 𝑎𝐷 𝑥 𝑁 − 𝐾 , 𝑢 𝑁 − 𝐾
𝑢 𝑁−𝐾