0% found this document useful (0 votes)
52 views18 pages

Tutorial 8

This document discusses dynamic programming and its application to solving optimization problems. It begins by explaining the key components of a dynamic programming problem including stages, decisions, states, costs. It then provides the recursive formulation to solve such problems using backward induction. The document also provides an example problem applying these concepts to find the longest path through a network. It discusses how to model the problem as a dynamic programming problem by defining the stages and states. It then walks through applying backward induction to solve the problem. Finally, it briefly discusses how to extend the approach to stochastic dynamic programming problems where the next state depends on a probability distribution rather than being fully determined.

Uploaded by

karly yu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views18 pages

Tutorial 8

This document discusses dynamic programming and its application to solving optimization problems. It begins by explaining the key components of a dynamic programming problem including stages, decisions, states, costs. It then provides the recursive formulation to solve such problems using backward induction. The document also provides an example problem applying these concepts to find the longest path through a network. It discusses how to model the problem as a dynamic programming problem by defining the stages and states. It then walks through applying backward induction to solve the problem. Finally, it briefly discusses how to extend the approach to stochastic dynamic programming problems where the next state depends on a probability distribution rather than being fully determined.

Uploaded by

karly yu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Binary Usage & Dynamic

Programming
IEDA 3010 Tutorial 8
Chen Yang
Correction
There is a mistake in the tutorial note on usage of K out of N constraints.
(I have corrected the related tutorial note (6 & 7) before, please update.)

▪ K out of N Constraints Must Hold (At least K constraints hold)


𝑙ℎ𝑠𝑖 ≤ 𝑟ℎ𝑠𝑖 + 𝑀 𝑦𝑖 , 𝑖 = 1, … , 𝑁
𝑁

෍ 𝑦𝑖 = 𝑁 − 𝐾
𝑖=1
At most K constraints hold: σ𝑁𝑖=1 𝑦𝑖 ≥ 𝑁 − 𝐾 (This is WRONG)
Equivalent at least K constraints hold: σ𝑁
𝑖=1 𝑦𝑖 ≤ 𝑁 − 𝐾

The case of at most K constraints is more complicated and actually non-


representable.
At Most K Constraints Hold y (2)

We start with the case of two constraints 𝑅4


𝑐𝑜𝑛𝑠 1: 𝑥 + 𝑦 ≤ 2
𝑐𝑜𝑛𝑠 2: 𝑥 − 𝑦 ≤ 0
𝑅1 𝑅3
The total area is separated to four regions by
these two constraints, and denote these four 𝑅2
regions as 𝑅1 , 𝑅2 , 𝑅3 , 𝑅4 . (1)
x
Then,
- At least one constraint holds: 𝑅1 + 𝑅2 + 𝑅4 Close set: a set is close if and
- At most one constraint holds: only if it contains all of its
boundary points.
𝑅3 + 𝑅2 + 𝑅4 − Boundary points
Conclusion: The region of at most one constraint hold cannot be represented by constraints
because it is not close, whereas the feasible region of some constrains is always close if it exists.
Representation of 𝑅3 + 𝑅2 + 𝑅4
Consider the either-or usage
𝑥 + 𝑦 ≤ 2 + 𝑀𝑧 y (2)
𝑥 − 𝑦 ≤ 0 + 𝑀(1 − 𝑧)
• For 𝑥, 𝑦 ∈ 𝑅1 , no matter 𝑧 = 0 𝑜𝑟 1, both 𝑅4

constraints are satisfied.


• For 𝑥, 𝑦 ∈ 𝑅2 , let 𝑧 = 0, (𝑥, 𝑦) is feasible. 𝑅1 𝑅3

• For 𝑥, 𝑦 ∈ 𝑅4 , let 𝑧 = 1, (𝑥, 𝑦) is feasible. 𝑅2


• For 𝑥, 𝑦 ∈ 𝑅3 , (𝑥, 𝑦) can never be feasible. (1)
x

Hence, the either-or case is equivalent to at


least one constraint hold.
Representation of 𝑅3 + 𝑅2 + 𝑅4
Consider add a minus sign to the two
constraints but keep the less than equal y (2’)
𝑐𝑜𝑛𝑠 1′: −𝑥 − 𝑦 ≤ −2
𝑐𝑜𝑛𝑠 2′: −𝑥 + 𝑦 ≤ 0 𝑅4

The region that at least one of cons 1’ and


cons 2’ holds is 𝑅2 + 𝑅3 + 𝑅4 , which is exactly 𝑅1 𝑅3
the region we want. Hence,
−𝑥 − 𝑦 ≤ −2 + 𝑀𝑧 𝑅2
−𝑥 + 𝑦 ≤ 0 + 𝑀 1 − 𝑧 (1’)

represents 𝑅3 + 𝑅2 + 𝑅4 . x
Dynamic Programming
• Stage: n, index of the time stage of the dynamic process
• Total stage: N, the total number of stages.
• Decision: 𝑥𝑛 , the decision to make at stage 𝑛
• State: 𝑠𝑛 , the current state at stage 𝑛. The decision 𝑥𝑛 will change
the state 𝑠𝑛 .
• Immediate cost: 𝑐𝑠𝑛 ,𝑥𝑛 , the directly cost of decision 𝑥𝑛 in the
current state 𝑠𝑛 . It is the cost caused in the stage 𝑛.
DP Formulation
• Cost function: 𝑓𝑛 (𝑠𝑛 , 𝑥𝑛 ), the total cost of the best over all policy
for the remaining stages, i.e., the sum of cost from stage n to stage
N. 𝑠𝑛 , 𝑥𝑛 means we consider the cost of best policy starting from
stage 𝑛 with state 𝑠𝑛 and decision 𝑥𝑛 .

Let 𝑓𝑛∗ 𝑠𝑛 = min 𝑓𝑛 (𝑠𝑛 , 𝑥𝑛 ) , the cost under the best decision of state
𝑥𝑛
𝑠𝑛 .

𝑓𝑛 𝑠𝑛 , 𝑥𝑛 = 𝑐𝑠𝑛 ,𝑥𝑛 + 𝑓𝑛+1 𝑠𝑛+1
The state of 𝑠𝑛+1 is determined by 𝑠𝑛 and 𝑥𝑛 , i.e. 𝑠𝑛+1 = 𝑠𝑛+1 (𝑠𝑛 , 𝑥𝑛 ).
The recursive relationship is

𝑓𝑛∗ 𝑠𝑛 = min{𝑐𝑠𝑛 ,𝑥𝑛 + 𝑓𝑛+1 𝑠𝑛+1 }
𝑥𝑛
Backward Induction
Consider a three-stage problem.
1 𝑘3
• Let {𝑠3 , … , 𝑠3 } denote the possible states of stage 3, calculate
𝑓3∗ 𝑠3𝑘 = min 𝑐𝑠𝑘 ,𝑥 , 𝑘 = 1, … , 𝑘3
𝑥 3
1 𝑘2
• Let {𝑠2 , … , 𝑠2 } denote the possible states of stage 2, calculate
𝑓2∗ 𝑠2𝑘 = min 𝑐𝑠𝑘,𝑥 + 𝑓3∗ 𝑠3 𝑠2𝑘 , 𝑥 , 𝑘 = 1, … , 𝑘2
𝑥 2

• Let 𝑠1 denote the starting state of stage 1, calculate


𝑓1∗ 𝑠1 = min 𝑐𝑠1 ,𝑥 + 𝑓2∗ 𝑠2 𝑠1 , 𝑥
𝑥
Then, retrieve the optimal solution by
𝑠1 → 𝑥1∗ → 𝑠2 → 𝑥2∗ → 𝑠3 → 𝑥3∗
Example 1
Consider the following project network, where the number over each
node is the time required for the corresponding activity. Consider the
problem of finding the longest path (the largest total time) through
this network from start to finish, since the longest path is the critical
path.

(a) What are the stages and states for the dynamic
programming formulation of this problem?
(b) Use dynamic programming to solve this problem.
Solution
In total, there are 5 stages (5 steps from the start to finish). The
states possible in a stage are in the same column.
Let 𝑠𝑛 denote the starting point of stage 𝑛 and 𝑥𝑛 denote the ending
point of stage 𝑛. Hence, 𝑥𝑛 = 𝑠𝑛+1

• Stage 5

𝑠5 𝑥5 𝑓5∗ (𝑠5 )
J FINISH 5
K FINISH 4
L FINISH 7
𝑠5 𝑥5∗ 𝑓5∗ (𝑠5 )

Solution J
K
FINISH
FINISH
5
4
L FINISH 7
• Stage 4
𝑠4 𝑥4 𝑓4 (𝑠4 , 𝑥4 ) 𝑥4∗ 𝑓4∗ 𝑠4 = max 𝑐𝑠4 ,𝑥 + 𝑓5 𝑠5
𝑥
F J 6 J 6
G K 8 K 8
H K 10 K 10
I L 9 L 9

• Stage 3
𝑠3 𝑥3 𝑓3 (𝑠3 , 𝑥3 ) 𝑥3∗ 𝑓3∗ 𝑠3 = m𝑎𝑥 𝑐𝑠3 ,𝑥 + 𝑓4 𝑠4
𝑥
C G,F 12, 10 G 12
D H,I 12, 11 H 12
E H,I 13, 12 H 13
𝑠3 𝑥3 𝑓3 (𝑠3 , 𝑥3 ) 𝑥3∗ 𝑓3∗ 𝑠3 = m𝑎𝑥 𝑐𝑠3 ,𝑥 + 𝑓4 𝑠4
𝑥
C G,F 12, 10 G 12
Solution D H,I 12, 11 H 12
E H,I 13, 12 H 13
• Stage 2
𝑠2 𝑥2 𝑓2 (𝑠2 , 𝑥2 ) 𝑥2∗ 𝑓2∗ 𝑠2 = m𝑎𝑥 𝑐𝑠2 ,𝑥 + 𝑓3 𝑠3
𝑥
A C,D 17, 17 C,D 17
B E 16 E 16

• Stage 1
𝑠1 𝑥1 𝑓1 (𝑠1 , 𝑥1 ) 𝑥1∗ 𝑓1∗ 𝑠1 = m𝑎𝑥 𝑐𝑠1 ,𝑥 + 𝑓2 𝑠2
𝑥
START A,B 17, 16 A 17
Retrieve the optimal solution
𝑆𝑇𝐴𝑅𝑇 → 𝐴 → 𝐶 → 𝐺 → 𝐾 → 𝐹𝐼𝑁𝐼𝑆𝐻
𝑆𝑇𝐴𝑅𝑇 → 𝐴 → 𝐷 → 𝐻 → 𝐾 → 𝐹𝐼𝑁𝐼𝑆𝐻
Stochastic DP
The next state is not completely determined by the
current state and policy
𝑠𝑛+1 𝑠𝑛 , 𝑥𝑛 → 𝑝(𝑠𝑛+1 |𝑠𝑛 , 𝑥𝑛 )
But the probability distribution is determined.

Calculate the expected profit for the future stages



𝐷𝑒𝑡𝑒𝑟𝑚𝑖𝑛𝑖𝑠𝑡𝑖𝑐: 𝑓𝑛∗ 𝑠𝑛 = min{𝑐𝑠𝑛 ,𝑥𝑛 + 𝑓𝑛+1 𝑠𝑛+1 }
𝑥𝑛

𝑆𝑡𝑜𝑐𝑎ℎ𝑠𝑡𝑖𝑐: 𝑓𝑛∗ 𝑠𝑛 = min{𝐸𝑃𝑠𝑛,𝑥𝑛 𝑐𝑠𝑛,𝑥𝑛 + 𝑓𝑛+1 𝑠𝑛+1 }
𝑥𝑛

Notice this formula is not generally applied. The


formulation depends on the problem settings.
Example 2
A backgammon player will be playing three consecutive matches with
friends tonight. For each match, he will have the opportunity to place an
even bet that he will win; the amount to bet can be any quantity of his
choice between zero and the amount of money he still has left after the
bets on the preceding matches. For each match, the probability is 0.5
that he will win the match and thus gain the amount to bet, whereas the
probability is 0.5 that he will lose the match and thus lose the amount to
bet. He will begin with $75, and his goal is to have $100 at the end.
(Because these are friendly matches, he does not want to end up with
more than $100.) Therefore, he wants to find the optimal betting policy
(including all ties) that maximizes the probability that he will have exactly
$100 after the three matches.

Use dynamic programming to solve this problem.


Solution
Let 𝑠𝑛 denote the money before the 𝑛-th match. 𝑥𝑛 be the amount to
bet. 𝑓𝑛 (𝑠𝑛 , 𝑥𝑛 ) be the probability to have $100 at the end of the game.
𝑓𝑛 𝑠𝑛 , 𝑥𝑛 = 𝑃 𝑠4 = 100
The feasible range of 𝑥𝑛 is always 0 ≤ 𝑥𝑛 ≤ 𝑠𝑛
• Stage 3
0.5, 𝑠4 = 𝑠3 + 𝑥3
𝑃 𝑠4 = 100 = ቊ
0.5, 𝑠4 = 𝑠3 − 𝑥3
Solution
• Stage 2
𝑃 𝑠4 = 100 = 0.5𝑓3∗ 𝑠2 + 𝑥2 + 0.5𝑓3∗ (𝑠2 − 𝑥2 )
Solution
• In stage 1, we have 𝑠1 = 75 and
𝑓1∗ 𝑠1 = max {0.5𝑓2∗ 75 + 𝑥 + 0.5𝑓2∗ 75 − 𝑥 }
0≤𝑥≤75
We can calculate that

Hence, 𝑥1∗ = 0 𝑜𝑟 25.


Solution
Retrieve the optimal policy
𝑠3 = 100 𝑠4 = 100
• Case 1 Win, 0.5 𝑥3∗ = 0

𝑠2 = 75 Win, 0.5
𝑥1∗ = 0
𝑥2∗ = 25
Lose, 0.5 𝑠3 = 50 𝑠4 = 0
𝑥3∗ = 50 Lose, 0.5

• Case 2 𝑠2 = 100 𝑠3 = 100


𝑠4 = 100
Win, 0.5 𝑥2∗ = 0 𝑥3∗ = 0
Win, 0.5
𝑥1∗ = 25
𝑠2 = 50 𝑠3 = 0
Lose, 0.5
𝑥2∗ = 50 𝑥3∗ = 𝑔𝑖𝑣𝑒 𝑢𝑝
Lose, 0.5

You might also like