Tutorial 8
Tutorial 8
Programming
IEDA 3010 Tutorial 8
Chen Yang
Correction
There is a mistake in the tutorial note on usage of K out of N constraints.
(I have corrected the related tutorial note (6 & 7) before, please update.)
𝑦𝑖 = 𝑁 − 𝐾
𝑖=1
At most K constraints hold: σ𝑁𝑖=1 𝑦𝑖 ≥ 𝑁 − 𝐾 (This is WRONG)
Equivalent at least K constraints hold: σ𝑁
𝑖=1 𝑦𝑖 ≤ 𝑁 − 𝐾
represents 𝑅3 + 𝑅2 + 𝑅4 . x
Dynamic Programming
• Stage: n, index of the time stage of the dynamic process
• Total stage: N, the total number of stages.
• Decision: 𝑥𝑛 , the decision to make at stage 𝑛
• State: 𝑠𝑛 , the current state at stage 𝑛. The decision 𝑥𝑛 will change
the state 𝑠𝑛 .
• Immediate cost: 𝑐𝑠𝑛 ,𝑥𝑛 , the directly cost of decision 𝑥𝑛 in the
current state 𝑠𝑛 . It is the cost caused in the stage 𝑛.
DP Formulation
• Cost function: 𝑓𝑛 (𝑠𝑛 , 𝑥𝑛 ), the total cost of the best over all policy
for the remaining stages, i.e., the sum of cost from stage n to stage
N. 𝑠𝑛 , 𝑥𝑛 means we consider the cost of best policy starting from
stage 𝑛 with state 𝑠𝑛 and decision 𝑥𝑛 .
Let 𝑓𝑛∗ 𝑠𝑛 = min 𝑓𝑛 (𝑠𝑛 , 𝑥𝑛 ) , the cost under the best decision of state
𝑥𝑛
𝑠𝑛 .
∗
𝑓𝑛 𝑠𝑛 , 𝑥𝑛 = 𝑐𝑠𝑛 ,𝑥𝑛 + 𝑓𝑛+1 𝑠𝑛+1
The state of 𝑠𝑛+1 is determined by 𝑠𝑛 and 𝑥𝑛 , i.e. 𝑠𝑛+1 = 𝑠𝑛+1 (𝑠𝑛 , 𝑥𝑛 ).
The recursive relationship is
∗
𝑓𝑛∗ 𝑠𝑛 = min{𝑐𝑠𝑛 ,𝑥𝑛 + 𝑓𝑛+1 𝑠𝑛+1 }
𝑥𝑛
Backward Induction
Consider a three-stage problem.
1 𝑘3
• Let {𝑠3 , … , 𝑠3 } denote the possible states of stage 3, calculate
𝑓3∗ 𝑠3𝑘 = min 𝑐𝑠𝑘 ,𝑥 , 𝑘 = 1, … , 𝑘3
𝑥 3
1 𝑘2
• Let {𝑠2 , … , 𝑠2 } denote the possible states of stage 2, calculate
𝑓2∗ 𝑠2𝑘 = min 𝑐𝑠𝑘,𝑥 + 𝑓3∗ 𝑠3 𝑠2𝑘 , 𝑥 , 𝑘 = 1, … , 𝑘2
𝑥 2
(a) What are the stages and states for the dynamic
programming formulation of this problem?
(b) Use dynamic programming to solve this problem.
Solution
In total, there are 5 stages (5 steps from the start to finish). The
states possible in a stage are in the same column.
Let 𝑠𝑛 denote the starting point of stage 𝑛 and 𝑥𝑛 denote the ending
point of stage 𝑛. Hence, 𝑥𝑛 = 𝑠𝑛+1
• Stage 5
𝑠5 𝑥5 𝑓5∗ (𝑠5 )
J FINISH 5
K FINISH 4
L FINISH 7
𝑠5 𝑥5∗ 𝑓5∗ (𝑠5 )
Solution J
K
FINISH
FINISH
5
4
L FINISH 7
• Stage 4
𝑠4 𝑥4 𝑓4 (𝑠4 , 𝑥4 ) 𝑥4∗ 𝑓4∗ 𝑠4 = max 𝑐𝑠4 ,𝑥 + 𝑓5 𝑠5
𝑥
F J 6 J 6
G K 8 K 8
H K 10 K 10
I L 9 L 9
• Stage 3
𝑠3 𝑥3 𝑓3 (𝑠3 , 𝑥3 ) 𝑥3∗ 𝑓3∗ 𝑠3 = m𝑎𝑥 𝑐𝑠3 ,𝑥 + 𝑓4 𝑠4
𝑥
C G,F 12, 10 G 12
D H,I 12, 11 H 12
E H,I 13, 12 H 13
𝑠3 𝑥3 𝑓3 (𝑠3 , 𝑥3 ) 𝑥3∗ 𝑓3∗ 𝑠3 = m𝑎𝑥 𝑐𝑠3 ,𝑥 + 𝑓4 𝑠4
𝑥
C G,F 12, 10 G 12
Solution D H,I 12, 11 H 12
E H,I 13, 12 H 13
• Stage 2
𝑠2 𝑥2 𝑓2 (𝑠2 , 𝑥2 ) 𝑥2∗ 𝑓2∗ 𝑠2 = m𝑎𝑥 𝑐𝑠2 ,𝑥 + 𝑓3 𝑠3
𝑥
A C,D 17, 17 C,D 17
B E 16 E 16
• Stage 1
𝑠1 𝑥1 𝑓1 (𝑠1 , 𝑥1 ) 𝑥1∗ 𝑓1∗ 𝑠1 = m𝑎𝑥 𝑐𝑠1 ,𝑥 + 𝑓2 𝑠2
𝑥
START A,B 17, 16 A 17
Retrieve the optimal solution
𝑆𝑇𝐴𝑅𝑇 → 𝐴 → 𝐶 → 𝐺 → 𝐾 → 𝐹𝐼𝑁𝐼𝑆𝐻
𝑆𝑇𝐴𝑅𝑇 → 𝐴 → 𝐷 → 𝐻 → 𝐾 → 𝐹𝐼𝑁𝐼𝑆𝐻
Stochastic DP
The next state is not completely determined by the
current state and policy
𝑠𝑛+1 𝑠𝑛 , 𝑥𝑛 → 𝑝(𝑠𝑛+1 |𝑠𝑛 , 𝑥𝑛 )
But the probability distribution is determined.
𝑠2 = 75 Win, 0.5
𝑥1∗ = 0
𝑥2∗ = 25
Lose, 0.5 𝑠3 = 50 𝑠4 = 0
𝑥3∗ = 50 Lose, 0.5