0% found this document useful (0 votes)
71 views26 pages

Part4 2 Stochastic Full

The document describes a dynamic programming problem involving a student who believes he can win at blackjack with a probability of 2/3. He has made a bet with his classmates that over 3 plays of blackjack, starting with 3 chips, he will have at least 5 chips. The goal is to determine an optimal betting policy that maximizes his probability of winning the bet. A dynamic programming formulation and solution procedure is presented, involving defining the state as the number of chips at each play, decisions as chip bets, and the objective as the probability of having at least 5 chips after 3 plays. The solution works backwards from the last play to determine the optimal bets maximizing the probability of winning the bet.

Uploaded by

karly yu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views26 pages

Part4 2 Stochastic Full

The document describes a dynamic programming problem involving a student who believes he can win at blackjack with a probability of 2/3. He has made a bet with his classmates that over 3 plays of blackjack, starting with 3 chips, he will have at least 5 chips. The goal is to determine an optimal betting policy that maximizes his probability of winning the bet. A dynamic programming formulation and solution procedure is presented, involving defining the state as the number of chips at each play, decisions as chip bets, and the objective as the probability of having at least 5 chips after 3 plays. The solution works backwards from the last play to determine the optimal bets maximizing the probability of winning the bet.

Uploaded by

karly yu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

IEDA 3010

Prescriptive Analytics
Dynamic Programming 2

IEDA 3010
Dynamic Programming

Dr. Jin QI
Department of Industrial Engineering and Decision Analytics
Hong Kong University of Science and Technology
IEDA 3010
Prescriptive Analytics
Dynamic Programming 2

Introduction
• The state at the next stage is not completely
determined by the state and policy decision at the
current stage.
– What the next state will be follows a probability distribution
– This probability distribution is completely determined by the
state and policy decision at the current stage
• The recursive relationship involves the expected profit
or cost from the future stages

2
IEDA 3010
Prescriptive Analytics
CHAPTER 10 DYNAMIC PROGRAMMING Dynamic Programming 2

Stage n Stage n ! 1

Probability Contribution 1
from stage n
C1 f*n!1(1)

p1
C2 2
Decision p2
State: sn xn !
n!1(2)
f*
fn(sn, xn) pS !
! !
!
CS !
1
re for S
mic
n!1(S)
f*

Basic structure for stochastic DP models

and policy decision at the current stage. The resulting basic structure for probabilistic dy-
namic programming is described diagrammatically in Fig. 10.11. 3
IEDA 3010
Prescriptive Analytics
Dynamic Programming 2

Example 1: Winning in Macau


• A UST student believes he has developed a system for winning
Blackjack, a popular game in Macau casinos.
• He believes his system will give him a probability of ⅔ of
winning a given play of the game.
• But his classmates do not believe him so they have made a
large bet with him that if he starts with three chips, he will not
have at least five chips after three plays of the game.
• Each play of the game involves betting any desired number of
available chips and then either winning or losing this number of
chips.
• Goal: determine a betting policy that maximizes his probability
of winning the bet with his classmates.

4
IEDA 3010
Prescriptive Analytics
Dynamic Programming 2

DP formulation
• DP formulation
– Stage n: nth play of the game, n = 1,2,3
– Decision variable xn: number of chips to bet at stage n
– State sn: number of chips in hand at beginning of stage n
– Profit function fn(sn, xn): probability of finishing three plays
with at least five chips, given that he starts stage n in state
sn, makes immediate decision xn, and makes optimal
decisions thereafter

5
IEDA 3010
Prescriptive Analytics
Dynamic Programming 2

DP formulation
• DP recursion
– Given sn and n, let x⇤n denote any value of xn that maximizes
fn(sn, xn), and let fn⇤ (sn ) be the corresponding maximum value
of fn(sn, xn) ⇤ ⇤
fn (sn ) = max fn (sn , xn ) = fn (sn , xn )
xn =0,1,...,sn

– fn⇤ (sn ) is the maximum probability of finishing three plays


with at least five chips, given that he starts stage n in state
sn

6
IEDA 3010
Prescriptive Analytics
CHAPTER 10 DYNAMIC PROGRAMMING Dynamic Programming 2

Stage n Stage n % 1
Probability Contribution
from stage n
sn $ xn
0
1 n%1(sn $ xn)
f*
Decision 3
State: sn xn
2
Value: fn(sn, xn) 3
0
1 2
& f*n%1(sn $ xn) % f*n%1(sn % xn) sn % xn
3 3
for the
f*n%1(sn % xn)

– So the recursive relationship


⇢ is ⇢
1 ⇤ 21 ⇤ 2 ⇤
fn⇤ (sn ) = max fn⇤ (sn ) f=n+1 (snmaxxn ) + fn+1 (sn + xn ) + fn+1 (sn + xn
xn =0,1,2,...,sn 3 xn =0,1,2,...,sn 23 2
Solution Procedure. This recursive relationship leads to the following computational
results.with terminal value ⇢
1, if s4 5
f4⇤ (s4 ) =
n ! 3: s3 f 3*(s3) x3* 0, if s4 < 5
!0 0 — 7
!1 0 —
IEDA 3010
Prescriptive Analytics
Dynamic Programming 2

Solution Procedure
• n=3
– Suppose the current state is 3. The possible decisions
are to bet xn = 0, 1, 2, or 3 chips.
1 ⇤ 2 ⇤ 1 2
x3 = 0 : f3 (3, 0) = f4 (3) + f4 (3) = ⇥0+ ⇥0=0
3 3 3 3
1 ⇤ 2 ⇤ 1 2
x3 = 1 : f3 (3, 1) = f4 (2) + f (4) = ⇥0+ ⇥0=0
3 3 4 3 3
1 2 ⇤ 1 2 2
x3 = 2 : f3 (3, 2) = f4⇤ (1) + f4 (5) = ⇥0+ ⇥1=
3 3 3 3 3
1 2 ⇤ 1 2 2
x3 = 3 : f3 (3, 3) = f4⇤ (0) + f4 (6) = ⇥0+ ⇥1=
3 3 3 3 3
– So the optimal decision given the current state 3 at
stage 3 is x⇤3 = 2 or x⇤3 = 3 with f3⇤ (3) = 32
8
as problem. f*n%1(sn
IEDA 3010
Prescriptive Analytics
Dynamic Programming 2

Solution Procedure
Solution Procedure. This recursive relationship leads to the following com
results.

n ! 3: s3 f 3*(s3) x3*

!0 0 —
!1 0 —
!2 0 —
2
!3 "" 2 (or more)
3
2
!4 "" 1 (or more)
3
!5 1 0 (or # s3 $ 5)

1 2
f2(s2, x2) ! ""f 3*(s2 " x2) # ""f 3*(s2 # x2)
3 3
x2
n ! 2: s2 0 1 2 3 4 f 2*(s2)

!0 0 0
9
!1 0 0 0
IEDA 3010
Prescriptive Analytics
Dynamic Programming 2

Solution Procedure
• n=2
– Suppose the current state is 4. The possible decisions
are to bet xn = 0, 1, 2, 3, or 4 chips.
1 ⇤ 2 ⇤ 1 2 2 2 2
x2 = 0 : f2 (4, 0) = f (4) + f (4) = ⇥ + ⇥ =
3 3 3 3 3 3 3 3 3
1 ⇤ 2 ⇤ 1 2 2 8
x2 = 1 : f2 (4, 1) = f (3) + f (5) = ⇥ + ⇥1=
3 3 3 3 3 3 3 9
1 ⇤ 2 ⇤ 1 2 2
x2 = 2 : f2 (4, 2) = f3 (2) + f3 (6) = ⇥0+ ⇥1=
3 3 3 3 3
1 ⇤ 2 ⇤ 1 2 2
x2 = 3 : f2 (4, 3) = f (1) + f (7) = ⇥0+ ⇥1=
3 3 3 3 3 3 3
1 ⇤ 2 ⇤ 1 2 2
x2 = 4 : f2 (4, 4) = f (0) + f (8) = ⇥0+ ⇥1=
3 3 3 3 3 3 3
– So the optimal decision given the current state 4 at
stage 2 is x⇤2 = 1 with f2⇤ (4) = 98
10
!0 0 —
!1 0 — IEDA 3010
!2 0 — Prescriptive Analytics
Dynamic Programming 2
2
!3 "" 2 (or more)
3
!4
2
""
3 Solution Procedure
1 (or more)
!5 1 0 (or # s3 $ 5)

1 2
f2(s2, x2) ! ""f 3*(s2 " x2) # ""f 3*(s2 # x2)
3 3
x2
n ! 2: s2 0 1 2 3 4 f 2*(s2) x2*

!0 0 0 —
!1 0 0 0 —
4 4 4
!2 0 "" "" "" 1 or 2
9 9 9
2 4 2 2 2
!3 "" "" "" "" "" 0, 2, or 3
3 9 3 3 3
2 8 2 2 2 8
!4 "" "" "" "" "" "" 1
3 9 3 3 3 9
!5 1 1 0 (or # s2 $ 5)

1 2 11
f (s , x ) ! ""f *(s " x ) # ""f *(s # x )
IEDA 3010
Prescriptive Analytics
Dynamic Programming 2

Solution Procedure
• n=1
– The current state is 3. The possible decisions are to bet
x1 = 0, 1, 2, or 3 chips
1 ⇤ 2 ⇤ 1 2 2 2 2
x1 = 0 : f1 (3, 0) = f2 (3) + f2 (3) = ⇥ + ⇥ =
3 3 3 3 3 3 3
1 2 ⇤ 1 4 2 8 20
x1 = 1 : f1 (3, 1) = f2⇤ (2) + f2 (4) = ⇥ + ⇥ =
3 3 3 9 3 9 27
1 2 ⇤ 1 2 2
x1 = 2 : f1 (3, 2) = f2⇤ (1) + f2 (5) = ⇥0+ ⇥1=
3 3 3 3 3
1 ⇤ 2 ⇤ 1 2 2
x1 = 3 : f1 (3, 3) = f2 (0) + f (6) = ⇥0+ ⇥1=
3 3 2 3 3 3

– So the optimal decision given the current state 3 at



stage 1 is x⇤1 = 1 with f1 (3) = 27
20

12
2
!3 "" 2 (or more)
3 IEDA 3010
2 Prescriptive Analytics
!4 "" 1 (or more)
3 Dynamic Programming 2
!5 1 0 (or # s3 $ 5)

Solution Procedure
LEARNING AIDS FOR THIS CHAPTER ON OUR WEBSITE 457
1 2
f2(s2, x2) ! ""f 3*(s2 " x2) # ""f 3*(s2 # x2)
3 3
• Therefore,
Optimal
x thepolicy
optimal policy is
2
n ! 2: s2 0 1 2 3 4 f 2*(s2) x2*

!
!0
!1
if 0
win,
0
x2* ! 1
0 ! if win,
if lose,
x3* ! 0 0
x3* ! 2 or 03.


4 4 4
*!
x!2 1 0 "" "" "" 1 or 2
1 9 9 9

!
(for x2* ! 1)
!3
2
""
if lose,
3
4
""
x92* !
2
""
1 3or 2 3
if2 win,
"" x * !
3 ""
2
!
2 or 3
0, 2, or 3
3 1, 2, 3, or 4 (for x2* ! 2)
!4
2
""
8
""
2
""
if
2 lose, 2
"" ""
bet is"8" lost 1
3 9 3 3 3 9
20
This policy gives the
statistician
– This policy
!5 1
gives the a statistician probability of "" of winning
27 a probability of 20/27 of
1 her bet#with
0 (or her colleagues.
s2 $ 5)

winning his bet with his classmates.


1 2
f1(s1, x1) ! ""f 2*(s1 " x1) # ""f 2*(s1 # x1)
3 3
x1
CLUSIONS
n ! 1: s 1 0 1 2 3 f 1*(s1) x1*

Dynamic
3 programming"" is a very
2
"" useful technique
20 2
"" for
"" making "
2 2a"
0sequence of interrelated
1
3 27 3 3 27
decisions. It requires formulating an appropriate recursive relationship for each individ-
13
ual problem. However, it provides a great computational savings over using exhaustive
IEDA 3010
Prescriptive Analytics
Dynamic Programming 2

Example 2: Determining Lot Size


• A company has received an order to supply one item of a
particular type.
• Due to the customer’s stringent requirement, the company
may have to produce more than one item to obtain an
acceptable item.
• The defect rate of the production is ½
– So the probability of producing no acceptable items in a lot of size
x is (½)x

14
IEDA 3010
Prescriptive Analytics
Dynamic Programming 2

Example 2: Determining Lot Size


• Producing one item costs $1K (even if defective)
• A setup cost of $3K is incurred for each production run
• The company has time to make at most three production
runs before the customer’s deadline
• If an acceptable item has not been obtained by the end of
the third production run, the cost to the company in lost
sales income and penalty costs will be $16K
• Goal: determine the policy regarding the lot size for the
required production run(s) that minimizes total expected
cost for the company

15
IEDA 3010
Prescriptive Analytics
Dynamic Programming 2

DP formulation
• DP formulation
– Stage n: production run n, n = 1,2,3
– Decision variable xn: lot size for stage n
– State sn: number of acceptable items still needed (1 or 0) at
beginning of stage n
– Cost function fn(sn, xn): total expected cost for stages n=1,
…, 3 if system starts in state sn at stage n, immediate
decision xn, and optimal decisions are made thereafter
– Setup cost K(xn): Unit is⇢thousand dollars
3, if xn > 0
K(xn ) =
0, if xn = 0
so the immediate cost at stage n is [K(xn) + xn]

16
IEDA 3010
Prescriptive Analytics
Dynamic Programming 2

DP formulation
• DP recursion

– Given sn and n, let xn denote any value of xn that minimizes
fn(sn, xn), and let fn⇤ (sn ) be the corresponding minimum value
of fn(sn, xn)
fn⇤ (sn ) = min fn (sn , xn ) = fn (sn , x⇤n )
xn =0,1,...

– fn⇤ (sn ) is the minimum expected cost for stages n=1, …, 3 if


system starts in state sn at stage n

– Obviously, fn (0, xn ) = 0 and thus fn (0) = 0 for any n
because if no acceptable item is needed then no production
is needed

17
IEDA 3010
Prescriptive Analytics
Dynamic Programming 2

DP formulation
CHAPTER 10 DYNAMIC PROGRAMMING

Probability Contribution
from stage n
0
K(xn)"xn
11 xxnn f*n"1(0) # 0

State:
Decision ()
1 $ (2)
2
1 xn xn

Value: fn(1, xn) x


(2)
1
K(xn)"xn
e for the
# K(xn)"xn" ()
1 n
f* (1)
2 n"1 1
ufacturing
f*n"1(1)

✓ ◆xn  ✓ ◆x n
1 ⇤ 1 ⇤
f (1, x ) =K(x ) + x +
Solution Procedure. The calculations
n n n n f n+1 (1) + 1 f n+1 (0)summa-
using this recursive 2relationship are
2
rized as follows. ✓ ◆xn
1 ⇤
=K(xn ) + xn + fn+1 (1)
2
18
x
IEDA 3010
Prescriptive Analytics
Dynamic Programming 2

DP formulation
– So the recursive relationship is ✓ ◆x n
⇤ 1 ⇤
fn (1) = min {K(xn ) + xn + fn+1 (1)}
xn =0,1,2,... 2
with terminal value f4⇤ (1) = 1.6
16 from the lost sale and penalty
for failing to deliver an acceptable product

19
IEDA 3010
Prescriptive Analytics
Dynamic Programming 2

Solution procedure
• n=3
– Suppose the current state is 1. The possible decisions
are to make the lot size xn = 0, 1, 2, 3, … (an infinite
sequence). x3 = 0 : f3 (1, 0) = K(0) + 0 + (0.5)0 + f4⇤ (1) = 16
1
x3 = 1 : f3 (1, 1) = K(1) + 1 + (0.5) + f4⇤ (1) = 12
2
x3 = 2 : f3 (1, 2) = K(2) + 2 + (0.5) + f4⇤ (1) = 9
3
x3 = 3 : f3 (1, 3) = K(3) + 3 + (0.5) + f4⇤ (1) = 8
4
x3 = 4 : f3 (1, 4) = K(4) + 4 + (0.5) + f4⇤ (1) = 8
5
x3 = 5 : f3 (1, 5) = K(5) + 5 + (0.5) + f4⇤ (1) = 8.5
··· ··· ···
– So the optimal decision given the current state 1 at
stage 3 is x⇤3 = 3 or x⇤3 = 4 with
20
f*n"1(1)
IEDA 3010
Prescriptive Analytics
Dynamic Programming 2

Solution procedure
Solution Procedure. The calculations using this recursive relationship are summa-
rized as follows.

1
! "
x3
f3(1, x3) ! K(x3) " x3 " 16 !!
2
x3
n ! 3: s3 0 1 2 3 4 5 f 3*(s3) x3*

0 0 0 0
1
1 16 12 9 8 8 8!! 8 3 or 4
2

1
! "
x2
f2(1, x2) ! K(x2) " x2 " !! f 3*(1)
2
x2
n ! 2: s2 0 1 2 3 4 f 2*(s2) x2*

0 0 0 0 21
IEDA 3010
Prescriptive Analytics
Dynamic Programming 2

Solution procedure
• n=2
– Suppose the current state is 1. The possible decisions are to
make the lot size xn = 0, 1, 2, 3, … (an infinite sequence).
0
x2 = 0 : f2 (1, 0) = K(0) + 0 + (0.5) + f3⇤ (1) = 8
1
x2 = 1 : f2 (1, 1) = K(1) + 1 + (0.5) + f3⇤ (1) = 8
2
x2 = 2 : f2 (1, 2) = K(2) + 2 + (0.5) + f3⇤ (1) = 7
3
x2 = 3 : f2 (1, 3) = K(3) + 3 + (0.5) + f3⇤ (1) = 7
4
x2 = 4 : f2 (1, 4) = K(4) + 4 + (0.5) + f3⇤ (1) = 7.5
··· ··· ···
– So the optimal decision given the current state 1 at stage 2
is x⇤2 = 2 or x⇤2 = 3 with f2⇤ (1) = 7

22
f3(1, x3) ! K(x3) " x3 " 16 !!
2 ! "
x3 IEDA 3010
n ! 3: s3 0 1 2 3 4 5 f 3*(s3)Prescriptive
x3*Analytics
Dynamic Programming 2
0 0 0 0
1
1 16
Solution procedure
12 9 8 8 8!!
2
8 3 or 4

1
! "
x2
f2(1, x2) ! K(x2) " x2 " !! f 3*(1)
2
x2
n ! 2: s2 0 1 2 3 4 f 2*(s2) x2*

0 0 0 0
1
1 8 8 7 7 7!! 7 2 or 3
2

1
! "
x
f 2*(1)
1
f1(1, x1) ! K(x1) " x1 " !!
2
x1
n ! 1: s1 0 1 2 3 4 f 1*(s1) x1*

1 3 7 7 3 23
1 7 7!! 6!! 6!! 7!! 6!! 2
IEDA 3010
Prescriptive Analytics
Dynamic Programming 2

Solution procedure
• n=1
– The current state is 1. The possible decisions are to make
the lot size xn = 0, 1, 2, 3, … (an infinite sequence).
0
x1 = 0 : f1 (1, 0) = K(0) + 0 + (0.5) + f2⇤ (1) = 7
1
x1 = 1 : f1 (1, 1) = K(1) + 1 + (0.5) + f2⇤ (1) = 7 21
2
x1 = 2 : f1 (1, 2) = K(2) + 2 + (0.5) + f2⇤ (1) = 6 43
3
x1 = 3 : f1 (1, 3) = K(3) + 3 + (0.5) + f2⇤ (1) = 6 87
4
x1 = 4 : f1 (1, 4) = K(4) + 4 + (0.5) + f2⇤ (1) = 7 16
7

··· ··· ···


– So the optimal decision given the current state 1 at stage 1
is x⇤1 = 2 with f1⇤ (1) = 6 43

24
1
1 16 12 9 8 8 8!! 8 3 or 4
2 IEDA 3010
Prescriptive Analytics
Dynamic Programming 2

Solution procedure
1
! "
x2
*
f2(1, x2) ! K(x2) " x2 " !! f 3 (1)
2
x2
n ! 2: • s2 Optimal policy
0 1 2 3 4 f 2*(s2) x2*
– Produce two items on the first production run; if none is
0 0 0 0
acceptable, then produce either two or 1three items on the
1 8 8 7 7 7!! 7 2 or 3
second production run; if none is acceptable,
2 then produce
either three or four items on the third production run.
– The total expected cost for this policy is $6750.
1
! "
x
f 2*(1)
1
f1(1, x1) ! K(x1) " x1 " !!
2
x1
n ! 1: s1 0 1 2 3 4 f 1*(s1) x1*

1 3 7 7 3
1 7 7!! 6!! 6!! 7!! 6!! 2
2 4 8 16 4

25
IEDA 3010
Prescriptive Analytics
Dynamic Programming 2

Summary
• DP is a very useful technique for making a sequence
of interrelated decisions.
• It requires formulating an appropriate recursive
relationship for each individual problem.
• For stochastic DP models, state in the next stage is
not completely determined by the current state and
the immediate decision, but follows a probability
distribution.
– The cost or profit generated from future stages is random so
the expected cost or profit is optimized.

26

You might also like