0% found this document useful (0 votes)
30 views19 pages

Optimal Control: Dynamic Programming

The document discusses Bellman's principle of optimality in dynamic programming. It states that the principle says that an optimal policy has the property that whatever the initial state and decision are, the remaining decisions must constitute an optimal policy for the new state resulting from the first decision. It provides examples of applying the principle to decision making processes and routing problems. It also discusses concepts in dynamic programming like functionals, cost-to-go, and using recurrence relations to approximate continuous systems as discrete systems.

Uploaded by

­윤준석
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views19 pages

Optimal Control: Dynamic Programming

The document discusses Bellman's principle of optimality in dynamic programming. It states that the principle says that an optimal policy has the property that whatever the initial state and decision are, the remaining decisions must constitute an optimal policy for the new state resulting from the first decision. It provides examples of applying the principle to decision making processes and routing problems. It also discusses concepts in dynamic programming like functionals, cost-to-go, and using recurrence relations to approximate continuous systems as discrete systems.

Uploaded by

­윤준석
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Optimal Control

Dynamic Programming
Bellman’s Principle of Optimality
b’

b
a
Assertion
– If a+b is the best, there is no b’ is better than b.

𝑓 𝑏 < 𝑓 𝑏 ′ , 𝑤ℎ𝑖𝑐ℎ 𝑖𝑠 𝑎𝑙𝑤𝑎𝑦𝑠 𝑡𝑟𝑢𝑒 𝑖𝑓 𝑓 𝑎, 𝑏 𝑖𝑠 𝑡ℎ𝑒 𝑚𝑖𝑛𝑖𝑚𝑢𝑚.

Principle of optimality:
– An optimal policy has the property that whatever the initial state
and initial decision are, the remaining decisions must constitute an
optimal policy with regard to the state resulting from the first
decision

Chapter 3: Dynamic Programming - Principle of Optimality 2


Bellman’s Principle of Optimality
• Decision Making Process

d*
a s1
b
s2 e* 1
0

f*
c s3

– * means an optimal trajectory between the two points.

Chapter 3: Dynamic Programming - Principle of Optimality 3


Bellman’s Principle of Optimality
• Decision Making Process

d*
a
b
0 e* 1

f*
c

– The best solution from 0 to 1 can be obtained by comparing the


following three cases
Case1: a+d*
Case2: b+e*
Case3: c+f*
Chapter 3: Dynamic Programming - Principle of Optimality 4
Bellman’s Principle of Optimality
• Dynamic Programming on routing problem
Hybrid electric vehicles
Modeling
Optimal Control

Chapter 3: Dynamic Programming - Principle of Optimality 5


Bellman’s Principle of Optimality
• The optimal road map
Start point
8 4 5

7 5 2 5
3 3 6

4 4 8 4

7 5 3
End point

Chapter 3: Dynamic Programming - Principle of Optimality 6


Bellman’s Principle of Optimality
• The optimal road map
Start point

22
8 16
4 12
5 9

7 5 2 5
15 3 12 3 10 6 4

4 4 8 4

15 7 8 5 3 3 0

End point

Chapter 3: Dynamic Programming - Principle of Optimality 7


Bellman’s Principle of Optimality
• Benefits of Bellman’s principle of optimality
– Computational load comparison

10 possible routes
if moving to the right or moving
downward is only allowed

Direct enumeration Using Bellman’s principle

10 cases × (4 ‘+’ calculation) Comparisons of two cases on 6 points


Comparison of the 10 cases while 2 ‘+’ calculation on each case

40 ‘+’ calculation 12 ‘+’ calculation + 3 '+' calculation


9 comparison 6 comparison

Chapter 3: Dynamic Programming - Principle of Optimality 8


Concept of Dynamic Programming
• Functional and Cost-to-go
– Definition of the functional
• It generally refers to the a mapping from a space 𝑋 into the real
numbers. (18th century as part of the Calculus of Variations)

• In our control problem, the cost value is a functional by the state and
the control, 𝑥, and 𝑢.

𝑡𝑓
𝐽 = න 𝑔 𝑥 𝑡 , 𝑢 𝑡 𝑑𝑡
0

𝐽
𝑔 𝑥 𝑡 ,𝑢 𝑡

𝑡𝑓 𝑡

Chapter 3: Dynamic Programming - Principle of Optimality 9


Concept of Dynamic Programming
𝑡𝑓
• Functional and Cost-to-go 𝐽 = න 𝑔 𝑥 𝑡 , 𝑢 𝑡 𝑑𝑡
0
𝑥 𝑥(𝑘 + 1)

𝐽𝑘,𝑁 𝑥(𝑘)

𝑥(𝑘)

𝑘
0 1 𝑘 𝑘+1 𝑘+2 𝑁−1 𝑁

𝐽𝑘,𝑁 𝑥(𝑘) : cost from the current state 𝑥(𝑘) to the final state while
the state 𝑥 is determined by the state equation,
𝑥 𝑘 + 1 = 𝑓 𝑥 𝑘 ,𝑢 𝑘

Chapter 3: Dynamic Programming - Principle of Optimality 10


Concept of Dynamic Programming
• Process to Find the Optimal Trajectory

𝑥 𝐽𝑘+1,𝑁 𝑥(𝑘 + 1)
𝑥(𝑘 + 1)


𝐽𝑘,𝑁 𝑥(𝑘) ?

𝑥(𝑘)

𝑘
0 1 𝑘 𝑘+1 𝑁
𝑥 𝑘 + 1 = 𝑓 𝑥 𝑘 ,𝑢 𝑘

∗ ∗
𝐽𝑘,𝑁 𝑥 𝑘 = min 𝐽𝑘,𝑘+1 𝑥 𝑘 , 𝑢 𝑘 + 𝐽𝑘+1,𝑁 𝑥 𝑘+1
𝑢 𝑘

Chapter 3: Dynamic Programming - Principle of Optimality 11


Concept of Dynamic Programming
• Numerical Interpolation
– In real world application, it is nearly impossible to obtain the exact
control, 𝑢(𝑘), resulting into the state, 𝑥 𝑘 + 1 .
𝑥(𝑘 + 1)

𝐽𝑘+1,𝑁 𝑥(𝑘 + 1)


𝐽𝑘,𝑁 𝑥(𝑘)

𝑥(𝑘)


Interpolation is needed for calculating 𝐽𝑘,𝑁 𝑥 𝑘

𝑥 𝑘 + 1 = 𝑓 𝑥 𝑘 ,𝑢 𝑘

Chapter 3: Dynamic Programming - Principle of Optimality 12


Dynamic Programming
• A recurrence relation of Dynamic programming

– State equation

𝑥ሶ 𝑡 = 𝑎 𝑥 𝑡 , 𝑢 𝑡

– Performance measure, or cost

𝑡𝑓
𝐽 = ℎ 𝑥 𝑡𝑓 + න 𝑔 𝑥 𝑡 , 𝑢 𝑡 𝑑𝑡
0

Chapter 3: Dynamic Programming - Principle of Optimality 13


Dynamic Programming
• A recurrence relation of Dynamic programming
– State equation
𝑥ሶ 𝑡 = 𝑎 𝑥 𝑡 , 𝑢 𝑡

– Approximate the continuously operating system by a discrete


system
𝑥 𝑡 + ∆𝑡 − 𝑥 𝑡
≈ 𝑎 𝑥 𝑡 ,𝑢 𝑡 or 𝑥 𝑡 + ∆𝑡 = 𝑥 𝑡 + ∆𝑡 ⋅ 𝑎 𝑥 𝑡 , 𝑢 𝑡
∆𝑡
– Using the shorthand notation
𝑡 = 𝑘∆𝑡, 𝑥 𝑘∆𝑡 = 𝑥 𝑘 , 𝑘 = 0,1, … , 𝑁 − 1
𝑥 𝑘 + 1 = 𝑥 𝑘 + ∆𝑡 ⋅ 𝑎 𝑥 𝑘 , 𝑢 𝑘

– We will denote by
𝑥 𝑘 + 1 ≜ 𝑎𝐷 𝑥 𝑘 , 𝑢 𝑘

Chapter 3: Dynamic Programming - Principle of Optimality 14


Dynamic Programming
• A recurrence relation of Dynamic programming
– Performance measure
𝑡𝑓
𝐽 = ℎ 𝑥 𝑡𝑓 + න 𝑔 𝑥 𝑡 , 𝑢 𝑡 𝑑𝑡
0
– Operating on the performance measure in a similar manner
∆𝑡 2∆𝑡 3∆𝑡 𝑁∆𝑡
𝐽 = ℎ 𝑥 𝑁∆𝑡 + න 𝑔𝑑𝑡 + න 𝑔𝑑𝑡 + න 𝑔𝑑𝑡 + ⋯ + න 𝑔𝑑𝑡
0 ∆𝑡 2∆𝑡 𝑁−1 ∆𝑡
𝑁−1

𝐽=ℎ 𝑥 𝑁 + ∆𝑡 ෍ 𝑔 𝑥 𝑘 , 𝑢 𝑘 for small enough ∆𝑡


𝑘=0

– We will denote by
𝑁−1

𝐽=ℎ 𝑥 𝑁 + ෍ 𝑔𝐷 𝑥 𝑘 , 𝑢 𝑘
𝑘=0

Chapter 3: Dynamic Programming - Principle of Optimality 15


Bellman’s principle of optimality
• A recurrence relation of Dynamic programming
– Derivation of the recurrence equation
– Cost of a one-stage process from state 𝑥 𝑁 to 𝑥 𝑁 − 1

𝐽𝑁−1,𝑁 𝑥 𝑁 − 1 , 𝑢 𝑁 − 1 ≜ 𝑔𝐷 𝑥 𝑁 − 1 , 𝑢 𝑁 − 1 +ℎ 𝑥 𝑁
= 𝑔𝐷 𝑥 𝑁 − 1 , 𝑢 𝑁 − 1 + 𝐽𝑁𝑁 𝑥 𝑁

– 𝑥 𝑁 is related to 𝑥 𝑁 − 1 and 𝑢 𝑁 − 1 through the state


equation

𝐽𝑁−1,𝑁 𝑥 𝑁 − 1 , 𝑢 𝑁 − 1 = 𝑔𝐷 𝑥 𝑁 − 1 , 𝑢 𝑁 − 1 + 𝐽𝑁𝑁 𝑎𝐷 𝑥 𝑁 − 1 , 𝑢 𝑁 − 1

– The optimal cost can be obtained by



𝐽𝑁−1,𝑁 𝑥 𝑁−1 = min 𝑔𝐷 𝑥 𝑁 − 1 , 𝑢 𝑁 − 1 + 𝐽𝑁𝑁 𝑎𝐷 𝑥 𝑁 − 1 , 𝑢 𝑁 − 1
𝑢 𝑁−1

Chapter 3: Dynamic Programming - Principle of Optimality 16


Dynamic Programming
• A recurrence relation of Dynamic programming
– Cost of operation over the last two intervals
𝐽𝑁−2,𝑁 𝑥 𝑁 − 2 , 𝑢 𝑁 − 2 , 𝑢 𝑁 − 1 = 𝑔𝐷 𝑥 𝑁 − 2 , 𝑢 𝑁 − 2 + 𝑔𝐷 𝑥 𝑁 − 1 , 𝑢 𝑁 − 1 + ℎ 𝑥 𝑁
= 𝑔𝐷 𝑥 𝑁 − 2 , 𝑢 𝑁 − 2 + 𝐽𝑁−1,𝑁 𝑥 𝑁 − 1 , 𝑢 𝑁 − 1

– The optimal cost is then



𝐽𝑁−2,𝑁 𝑥 𝑁−2 ≜ min 𝑔𝐷 𝑥 𝑁 − 2 , 𝑢 𝑁 − 2 + 𝐽𝑁−1,𝑁 𝑥 𝑁 − 1 , 𝑢 𝑁 − 1
𝑢 𝑁−2 ,𝑢 𝑁−1

= min 𝑔𝐷 𝑥 𝑁 − 2 , 𝑢 𝑁 − 2 + 𝐽𝑁−1,𝑁 𝑥 𝑁−1
𝑢 𝑁−2

= min 𝑔𝐷 𝑥 𝑁 − 2 , 𝑢 𝑁 − 2 + 𝐽𝑁−1,𝑁 𝑎𝐷 𝑥 𝑁 − 2 , 𝑢 𝑁 − 2
𝑢 𝑁−2

– Cost of operation over the last three intervals

∗ ∗
𝐽𝑁−3,𝑁 𝑥 𝑁−3 = min 𝑔𝐷 𝑥 𝑁 − 3 , 𝑢 𝑁 − 3 + 𝐽𝑁−2,𝑁 𝑎𝐷 𝑥 𝑁 − 3 , 𝑢 𝑁 − 3
𝑢 𝑁−3

Chapter 3: Dynamic Programming - Principle of Optimality 17


Dynamic Programming
• A recurrence relation of Dynamic programming
– Continuing backward in this manner, obtain the cost in Kth
operation

𝑁−1

𝐽𝑁−𝐾,𝑁 𝑥 𝑁−𝐾 = min ℎ 𝑥 𝑁 + ෍ 𝑔 𝑥 𝑘 ,𝑢 𝑘
𝑢 𝑁−𝐾 ,𝑢 𝑁−𝐾+1 ,…,𝑢 𝑁−1
𝑘=𝑁−𝐾

– Recurrence relation


𝐽𝑁−𝐾,𝑁 𝑥 𝑁−𝐾

= min 𝑔𝐷 𝑥 𝑁 − 𝐾 , 𝑢 𝑁 − 𝐾 + 𝐽𝑁−(𝐾−1),𝑁 𝑎𝐷 𝑥 𝑁 − 𝐾 , 𝑢 𝑁 − 𝐾
𝑢 𝑁−𝐾

Chapter 3: Dynamic Programming - Principle of Optimality 18


Dynamic Programming
• A recurrence relation of Dynamic programming
𝑔𝐷 𝑥 𝑁 − 𝐾 , 𝑢 𝑁 − 𝐾 ∗
𝑥 𝐽𝑁− 𝐾−1 ,𝑁 𝑥(𝑁 − 𝐾 + 1)
𝑥(𝑁 − 𝐾 + 1)


𝐽𝑁−𝐾,𝑁 𝑥(𝑁 − 𝐾)

𝑥(𝑁 − 𝐾)

𝑁
𝑘
0 𝑁−𝐾 𝑁−𝐾 +1

𝑥 𝑁 − 𝐾 + 1 = 𝑎𝐷 𝑥 𝑁 − 𝐾 , 𝑢 𝑁 − 𝐾


𝐽𝑁−𝐾,𝑁 𝑥 𝑁−𝐾

= min 𝑔𝐷 𝑥 𝑁 − 𝐾 , 𝑢 𝑁 − 𝐾 + 𝐽𝑁−(𝐾−1),𝑁 𝑎𝐷 𝑥 𝑁 − 𝐾 , 𝑢 𝑁 − 𝐾
𝑢 𝑁−𝐾

Chapter 3: Dynamic Programming - Principle of Optimality 19

You might also like