0% found this document useful (0 votes)

29 views44 pages

Dynamic Programming Online Teaching FOR PRINT

Uploaded by

qubaahmed20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views44 pages

Dynamic Programming Online Teaching FOR PRINT

Uploaded by

qubaahmed20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

LGT6204 Inventory and Supply Chain Management

Dynamic Programming

Miao Song

Department of Logistics and Maritime Studies

The Hong Kong Polytechnic University

[email protected]

January 18, 2024

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 1 / 44

Overview

1 Introduction

2 The Dynamic Programming Algorithm

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 2 / 44

Principal Features

An underlying discrete-time dynamic system over a finite number of

stages (a finite horizon)
A cost function that is additive over time

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 3 / 44

Discrete-Time Dynamic System

xk+1 = fk (xk , uk , wk ), k = 0, 1, ..., N − 1

k indexes discrete time and N is the horizon or number of times

control is applied.
xk ∈ Sk is the state of the system. It summarizes past information
that is relevant for future optimization.
uk is the control or decision variable to be selected at time k.
uk ∈ Uk (xk ) for all xk ∈ Sk and k.
wk is a random parameter (also called disturbance or noise depending
on the context). Its distribution P(·|xk , uk ) may depend explicitly on
xk and uk but not on values of prior disturbances wk−1 , ..., w0 .
▶ The system is deterministic if each wk can take only one value.
fk is a function that describes the system and in particular the
mechanism by which the state is updated.
Any of xk , uk , wk can be either a scalar or vector.
Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 4 / 44
Additive Cost Function

N−1
X
gN (xN ) + gk (xk , uk , wk )
k=0

gN (xN ) is a terminal cost incurred at the end of the process.

gk (xk , uk , wk ) is the cost incurred at time k.
The problem is formulated as an optimization of the expected cost
N−1
( )
X
E gN (xN ) + gk (xk , uk , wk )
k=0

over the controls u0 , u1 , ..., uN−1 , where the expectation is with respect to
the joint distribution of the random variables involved.

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 5 / 44

Example 1 (Inventory Control)
Consider a problem of ordering a quantity of a certain item at each of N
periods so as to minimize the incurred expected cost. Let us denote
xk stock available at the beginning of the kth periods,
uk stock ordered (and immediately delivered) at the beginning of the
kth period,
wk demand during the kth period with given probability distribution.
We assume that w0 , w1 , ..., wN−1 are independent random variables, and
that excess demand is backlogged and filled as soon as additional
inventory becomes available. Then stock evolves according to the
discrete-time equation

xk+1 = xk + uk − wk ,

where negative stock corresponds to backlogged demand.

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 6 / 44

Example 1 (Contd)
The cost incurred in period k consists of two components.
A cost h(xk+1 ) representing a penalty for either positive stock xk+1
(holding cost for excess inventory) or negative stock xk+1 (shortage
cost for unfilled demand).
The purchasing cost c(uk ).
There is also a terminal cost gN (xN ) for being left with inventory xN at
the end of N periods. Thus, the total cost over N periods is
N−1
( )
X
E gN (XN ) + h(xk+1 ) + c(uk ) .
k=0

We want to minimize this cost by proper choice of the orders u0 , ..., uN−1 ,
subject to the natural constraint uk ≥ 0 for all k.

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 7 / 44

Open-Loop and Closed-Loop Optimization
Open-loop optimization
▶ Select all decisions u0 , ..., uN−1 at once at time 0, without waiting to
see the subsequent disturbances wk .
▶ Find optimal numerical values of uk .
Closed-loop optimization: dynamic programming (DP)
▶ Postpone the decision uk until the last possible moment (time k) when
the current state xk will be known.
▶ Find an optimal rule for selecting at each period k a decision uk for
each possible value of state xk that can conceivably occur.
▶ Mathematically, it is to find a sequence of functions µk ,
k = 0, ..., N − 1, mapping state xk into decision uk so as to minimize
the expected cost.
⋆ For each k and each possible value of xk , µk (xk ) represents the action
to be taken at time k if the state is xk .
⋆ The sequence π = {µ0 , ..., µN−1 } is referred to as a policy or control
law.
⋆ A policy such that µk (xk ) ∈ Uk (xk ) for all xk ∈ Sk is called admissible.

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 8 / 44

Open-Loop and Closed-Loop Optimization: Same for
Deterministic Problems
Suppose that wk is deterministic for all k.
Admissible policy {µ0 , ..., µN−1 } vs. control vector {u0 , ..., uN−1 }
For period 0, we can observe the initial state x0 .
▶ As x0 is the given initial state, we can just consider u0 = µ0 (x0 ).
▶ Given x0 and µ0 (equivalently, u0 ), as w0 is deterministic,
x1 = f0 (x0 , µ0 (x0 ), w0 ) = f0 (x0 , u0 , w0 )
is perfectly predictable.
For any period k, suppose that xk is perfectly predictable.
▶ As xk is perfectly predictable, uk = µk (xk ) is also a perfectly
predictable variable, instead of a function.
▶ Given xk and µk (equivalently, uk ), as wk is deterministic,
xk+1 = fk (xk , µk (xk ), wk ) = fk (xk , uk , wk )
is perfectly predictable.
The cost achieved by an admissible policy {µ0 , ..., µN−1 } is also
achieved by the control sequence {u0 , ..., uN−1 }.
Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 9 / 44
Dynamic Programming (DP)

Given an initial state x0 and an admissible policy π = {µ0 , ..., µN−1 }, the
states xk are random variables defined through the system equation

xk+1 = fk (xk , µk (xk ), wk ), k = 0, 1, ..., N − 1.

Thus, for given functions gk , k = 0, 1, ..., N, the expected cost of π

starting at x0 is
N−1
( )
X
Jπ (x0 ) = E gN (xN ) + gk (xk , µk (xk ), wk ) ,
k=0

where the expectation is taken over the random variables wk and xk .

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 10 / 44

Dynamic Programming (Contd)
The optimal cost depends on x0 and is denoted by J ∗ (x0 ), i.e.,

J ∗ (x0 ) = min Jπ (x0 ),

π∈Π

where Π is the set of all admissible policies.

J ∗ can be viewed as a function that assigns to each initial state x0 the
optimal cost J ∗ (x0 ) and call it the optimal cost function or optimal
value function.
An optimal policy π ∗ is one that minimizes this cost, i.e.,

Jπ∗ (x0 ) = J ∗ (x0 ) = min Jπ (x0 ).

π∈Π

▶ By this definition, the optimal policy π ∗ is associated with a fixed

initial state x0 . Nevertheless, we are typically interested in a policy π ∗
that is simultaneously optimal for all initial states, i.e.,

Jπ∗ (x0 ) = J ∗ (x0 ) = min Jπ (x0 ) ∀x0 ∈ S0 .

π∈Π

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 11 / 44

Dynamic Programming (Contd)
To formulate a dynamic program, we need to determine
state xk ,
disturbance wk ,
control uk and feasible set Uk (xk ),
state transition function fk ,
additive cost including the one-period cost function gk and the
terminal cost gN .
A policy π = {µ0 , ..., µN−1 }, where µk is a function of the state xk , is
admissible if µk (xk ) ∈ Uk (xk ) for all k, xk .
N−1
( )
X
Jπ (x0 ) = E gN (xN ) + gk (xk , µk (xk ), wk ) ,
k=0

where xk+1 = fk (xk , µk (xk ), wk ) for all k.

J ∗ (x0 ) = Jπ∗ (x0 ) = min Jπ (x0 ),
π∈Π

where Π is the set of all admissible policies.

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 12 / 44
Overview

1 Introduction

2 The Dynamic Programming Algorithm

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 13 / 44

Principle of Optimality
If a policy {µ∗0 , µ∗1 , ..., µ∗N−1 } is optimal for the problem from time 0 to
time N, then the truncated policy {µ∗k , µ∗k+1 , ..., µ∗N−1 } is optimal for the
subproblem minimizing the cost from time k to time N.
The tail portion of an optimal policy is optimal for the tail
subproblem.

A Travel Analogy
The fastest route from Beijing to Hong Kong is
µ∗ µ∗ µ∗ µ∗
Beijing →0 Shanghai →1 Guangzhou →2 Shenzhen →3 Hong Kong.

The fastest route from Shanghai to Hong Kong is

µ∗ µ∗ µ∗
Shanghai →1 Guangzhou →2 Shenzhen →3 Hong Kong.

The fastest route from Guangzhou to Hong Kong is

µ∗ µ∗
Guangzhou →2 Shenzhen →3 Hong Kong.

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 14 / 44

Principle of Optimality (Contd)

An optimal policy can be constructed in piecemeal fashion.

Construct an optimal policy for the “tail subproblem” involving the
last stage
Extend the optimal policy for the last two stages
Continue in this manner until an optimal policy for the entire problem
is constructed

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 15 / 44

Theorem 1
For every initial state x0 , the optimal cost J ∗ (x0 ) of the basic problem is
equal to J0 (x0 ), given by the last step of the following algorithm, which
proceeds backward in time from period N − 1 to period 0:

JN (xN ) = gN (xN ),
n o
Jk (xk ) = min Ewk gk (xk , uk , wk ) + Jk+1 (fk (xk , uk , wk )) , (1)
uk ∈Uk (xk )

for any k = 0, 1, ..., N − 1, where the expectation is taken with respect to

the probability distribution of wk depending on xk and uk . Furthermore, if
uk∗ = µ∗k (xk ) minimizes the right side of (1) for each xk and k, the policy
π ∗ = {µ∗0 , ..., µ∗N−1 } is optimal.

The theorem holds as long as wk is independent of w0 , ..., wk−1 when

conditioning on xk , uk . For simplicity, we suppose that w0 , ..., wN−1 are all
independent.

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 16 / 44

Proof of Theorem 1
For any admissible policy π = {µ0 , µ1 , ..., µN−1 } and each
k = 0, 1, ..., N − 1, denote

π k = {µk , µk+1 , ..., µN−1 }.

For k = 0, 1, ..., N − 1, let Jk∗ (xk ) be the optimal cost for the (N − k)-stage
problem that starts at state xk and time k, and ends at time N, i.e.,
N−1
( )
X
∗
Jk (xk ) = min Ewk ,...,wN−1 gN (xN ) + gi (xi , µi (xi ), wi ) .
πk
i=k

For k = N, we define JN∗ (xN ) = gN (xN ).

By definition, we know that

J0∗ (x0 ) = J ∗ (x0 ).

We want to show J ∗ (x0 ) = J0 (x0 ). As long as Jk∗ (xk ) = Jk (xk ) for all k
and xk , we obtain J ∗ (x0 ) = J0∗ (x0 ) = J0 (x0 ).
Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 17 / 44
Proof of Theorem 1 (Contd)

We will show Jk∗ (xk ) = Jk (xk ) for all k and xk by induction.

Period N. By definition,

JN∗ (xN ) = gN (xN ) and JN (xN ) = gN (xN ).

Thus, JN∗ (xN ) = JN (xN ) for any xN .

Period k, k = 0, 1, ..., N − 1.
Induction Hypothesis
∗ (x
Jk+1 k+1 ) = Jk+1 (xk+1 ) for all xk+1 .

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 18 / 44

Proof of Theorem 1 (Contd)

Since π k = {µk , µk+1 , ..., µN−1 } and π k+1 = {µk+1 , ..., µN−1 },

π k = (µk , π k+1 ).

N−1
( )
X
Jk∗ (xk ) = min Ewk ,...,wN−1 gN (xN ) + gi (xi , µi (xi ), wi )
πk
i=k
N−1
( )
X
= min Ewk ,...,wN−1 gN (xN ) + gi (xi , µi (xi ), wi )
(µk ,π k+1 )
i=k
 
 gk (xkN−1
 , µk (xk ), wk ) + gN (xN ) 

= min Ewk ,...,wN−1 X
(µk ,π k+1 )  +
 gi (xi , µi (xi ), wi ) 

i=k+1

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 19 / 44

Proof of Theorem 1 (Contd)

Next, we would like to show

N−1
( )
X
Ewk ,...,wN−1 gk (xk , µk (xk ), wk ) + gN (xN ) + gi (xi , µi (xi ), wi )
i=k+1
 )
N−1
(
 X 
= Ewk gk (xk , µk (xk ), wk ) + Ewk+1 , gN (xN ) + gi (xi , µi (xi ), wi ) .
 ..., 
wN−1 i=k+1

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 20 / 44

Proof of Theorem 1 (Contd)

N−1
( )
X
Ewk ,...,wN−1 gk (xk , µk (xk ), wk ) + gN (xN ) + gi (xi , µi (xi ), wi )
i=k+1
  

 
 g k (xk , µk (xk ), w k ) + g N (x N ) 

N−1
= Ewk E wk , X wk tower rule
 w...,
  +
k+1 ,  gi (xi , µi (xi ), wi ) 

wN−1 i=k+1
  

  gk (xkN−1
 , µk (xk ), wk ) + gN (xN ) 


= Ewk Ewk+1 , X wk
 w...,

N−1
 +
 gi (xi , µi (xi ), wi ) 


i=k+1
  

 
 gN (xN ) 


N−1
= Ewk gk (xk , µk (xk ), wk ) + Ewk+1 , X wk
 ...,  + gi (xi , µi (xi ), wi ) 

 wN−1  
i=k+1

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 21 / 44

Proof of Theorem 1 (Contd)

N−1
( )
X
Ewk ,...,wN−1 gk (xk , µk (xk ), wk ) + gN (xN ) + gi (xi , µi (xi ), wi )
i=k+1
  

 
 gN (xN ) 
 
N−1
= Ewk gk (xk , µk (xk ), wk ) + Ewk+1 , X wk
 ...,  + gi (xi , µi (xi ), wi ) 

 wN−1  
i=k+1
 )
N−1
(
 X 
= Ewk gk (xk , µk (xk ), wk ) + Ewk+1 , gN (xN ) + gi (xi , µi (xi ), wi )
 ..., 
wN−1 i=k+1

wk , wk+1 , ..., wN−1 independent

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 22 / 44

Proof of Theorem 1 (Contd)

 
 gk (xkN−1
 , µk (xk ), wk ) + gN (xN ) 

Jk∗ (xk ) = min Ewk ,...,wN−1 X
(µk ,π k+1 )  +
 gi (xi , µi (xi ), wi ) 

i=k+1
  

  N−1 gN (xN )
 


= min Ewk gk (xk , µk (xk ), wk ) + Ewk+1 , X
(µk ,π k+1 )  ...,  + gi (xi , µi (xi ), wi ) 

 wN−1  
i=k+1
   

  N−1 gN (xN )
 

= min Ewk gk (xk , µk (xk ), wk ) + minEwk+1 ,
 X 
..., + gi (xi , µi (xi ), wi )

µk  π k+1 
 wN−1  
i=k+1

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 23 / 44

Move the minimization over π k+1 inside the expectation
  

 
 N−1 g N (x N ) 


min Ewk gk (xk , µk (xk ), wk ) + Ewk+1 , X
(µk ,π k+1 )  ...,  + gi (xi , µi (xi ), wi ) 

 wN−1  
i=k+1
   

  N−1 gN (xN )
 

= min Ewk gk (xk , µk (xk ), wk ) + minEwk+1 ,
 X 
..., + gi (xi , µi (xi ), wi )

µk  π k+1 
 wN−1  
i=k+1

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 24 / 44

Let π (k+1)∗ = (µ∗k+1 , µ∗k+2 , ..., µ∗N−1 ) be an optimal policy to the tail
subproblem
    

 g N (x N ) 
 
 g N (x N ) 

N−1 N−1
minE k+1 = Ewk+1 ,
 w , X  X
∗
..., + gi (xi , µi (xi ), wi ) ...,  + gi (xi , µi (xi ), wi ) 

π k+1
wN−1   wN−1  
i=k+1 i=k+1

Principle of optimality: the tail portion of an optimal policy is optimal for

the tail subproblem
  

 
 gN (xN ) 


N−1
min Ewk gk (xk , µk (xk ), wk ) + Ewk+1 , X
(µk ,π k+1 ) 
 ...,
wN−1 +
 gi (xi , µi (xi ), wi ) 


i=k+1
  

 
 gN (xN ) 

N−1
= min Ewk gk (xk , µk (xk ), wk ) + Ewk+1 , X
µk  ...,
+
 gi (xi , µ∗i (xi ), wi ) 

 wN−1 
i=k+1
   

 
 gN (xN ) 
 

N−1
= min Ewk gk (xk , µk (xk ), wk ) + minEwk+1 ,
 X 
..., + gi (xi , µi (xi ), wi )
µk  π k+1
 wN−1   

i=k+1

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 25 / 44

  

 
 N−1 g N (x N ) 


min Ewk gk (xk , µk (xk ), wk ) + Ewk+1 , X
(µk ,π k+1 )  ...,  + gi (xi , µi (xi ), wi ) 

 wN−1  
i=k+1
 
 Ewk [gk (xk , µk (xk ), wk )] 
    
 N−1 gN (xN )

 

 
= min

+ E wk E
 
(µk ,π k+1 )  w
 k+1 , X
+ g (x , µ (x ), w )
 
  ...,  i i i i i  

 wN−1   
i=k+1
 
E [g (x , µ (x ), wk )]

  wk k k k k  

 N−1 gN (xN )

 

 
= min

+ min Ewk Ewk+1 ,
 
µk X
+ g (x , µ (x ), w )
  

 π k+1 ...,  i i i i i  

 wN−1   
i=k+1

π k+1 does not appear in gk (xk , µk (xk ), wk )

Holds for both closed-loop optimization and open-loop optimization
(i.e., replace µk with uk and π k+1 with (uk+1 , ..., uN−1 ))
Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 26 / 44
Next, we would like to show that
 )
N−1
(
X
min Ewk Ewk+1 , gN (xN ) + gi (xi , µi (xi ), wi ) 
π k+1 ...,
wN−1 i=k+1
 )
N−1
(
X
= Ewk min Ewk+1 , gN (xN ) + gi (xi , µi (xi ), wi ) 
π k+1 ...,
wN−1 i=k+1

for any given xk and µk .

  
  N−1
X 
min Ewk gk (xk , µk (xk ), wk ) + Ewk+1 , gN (xN ) + gi (xi , µi (xi ), wi )
(µk ,π k+1 )  ...,  
wN−1 i=k+1
   
  N−1
X  
= min Ewk [gk (xk , µk (xk ), wk )] + min Ewk Ewk+1 , gN (xN ) + gi (xi , µi (xi ), wi ) 
µk  π k+1 ...,   
wN−1 i=k+1
   
  N−1
X  
= min Ewk [gk (xk , µk (xk ), wk )] + Ewk  min Ewk+1 , gN (xN ) + gi (xi , µi (xi ), wi ) 
µk  π k+1 ...,   
wN−1 i=k+1
   
  N−1
X  
= min Ewk gk (xk , µk (xk ), wk ) + min Ewk+1 , gN (xN ) + gi (xi , µi (xi ), wi ) 
µk  π k+1 ...,   
wN−1 i=k+1

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 27 / 44

Given xk and µk ,
N−1
( )
X
Ewk+1 , gN (xN ) + gi (xi , µi (xi ), wi )
...,
wN−1 i=k+1

depends on wk only because xk+1 = fk (xk , µk (xk ), wk ) depends on wk .

 )
N−1
(
X
Ewk Ewk+1 , gN (xN ) + gi (xi , µi (xi ), wi ) 
...,
wN−1 i=k+1
  
N−1
( )
X
= Ewk  Exk+1 Ewk+1 , gN (xN ) + gi (xi , µi (xi ), wi ) wk 
...,
wN−1 i=k+1
 )
N−1
(
X
= Exk+1 Ewk+1 , gN (xN ) + gi (xi , µi (xi ), wi ) 
...,
wN−1 i=k+1

The first equality is obtained since xk+1 is known when wk is given. The
second equality follows from tower rule.
Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 28 / 44
π k+1 = {µk+1 , µk+2 , ..., µN−1 }
= {µk+1 (xk+1 )∀xk+1 , µk+2 (xk+2 )∀xk+2 , ..., µN−1 (xN−1 )∀xN−1 }

As xi+1 = fi (xi , µi (xi ), wi ) for all i ∈ {k + 1, ..., N − 1}, xk+1 , xk+2 , ...,
xN−1 all depend on xk+1 . With a little abuse of notation,

π k+1 = {π k+1 (xk+1 )∀xk+1 }.

This simply represents that the actions we will take from period k + 1 to
period N − 1 depend on the state xk+1 in period k + 1, which is in
accordance with the closed-loop optimization.

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 29 / 44

Given xk and µk ,
 )
N−1
(
X
min Ewk Ewk+1 , gN (xN ) + gi (xi , µi (xi ), wi ) 
π k+1 ...,
wN−1 i=k+1
 )
N−1
(
X
= min Exk+1 Ewk+1 , gN (xN ) + gi (xi , µi (xi ), wi ) 
π k+1 ...,
wN−1 i=k+1
 )
N−1
(
X
= min Exk+1 Ewk+1 , gN (xN ) + gi (xi , µi (xi ), wi ) 
π k+1 (xk+1 )∀xk+1 ...,
wN−1 i=k+1
 )
N−1
(
X
= Exk+1  min Ewk+1 , gN (xN ) + gi (xi , µi (xi ), wi ) 
π k+1 (xk+1 )∀xk+1 ...,
wN−1 i=k+1

Last equality analogous to minf (Z ) E [g (Z , f (Z ))] = E [minf (Z ) g (Z , f (Z ))]

for some random variable Z since xk+1 ↔ Z , π k+1 ↔ f , and
N−1
( )
X
Ewk+1 , gN (xN ) + gi (xi , µi (xi ), wi ) is a function of xk+1 and π k+1
...,
wN−1 i=k+1
Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 30 / 44
To see why minf (Z ) E [g (Z , f (Z ))] = EP
[minf (Z ) g (Z , f (Z ))], suppose that
P(Z = zi ) = pi for all i = 1, ..., n and i pi = 1.
n
X
min E [g (Z , f (Z ))] = min pi g (zi , f (zi ))
f (Z ) f (z1 ),...,f (zn )
i=1
n
X
= min pi g (zi , yi )
y1 ,...,yn
i=1
n
X
= pi min g (zi , yi )
yi
i=1
Xn
= pi min g (zi , f (zi ))
f (zi )
i=1

= E min g (Z , f (Z ))
f (Z )

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 31 / 44

Given xk and µk ,
 )
N−1
(
X
min Ewk Ewk+1 , gN (xN ) + gi (xi , µi (xi ), wi ) 
π k+1 ...,
wN−1 i=k+1
 )
N−1
(
X
= Exk+1  min Ewk+1 , gN (xN ) + gi (xi , µi (xi ), wi ) 
π k+1 (xk+1 )∀xk+1 ...,
wN−1 i=k+1
 )
N−1
(
X
= Exk+1 min Ewk+1 , gN (xN ) + gi (xi , µi (xi ), wi ) 
π k+1 ...,
wN−1 i=k+1
  
N−1
( )
X
= Ewk  Exk+1 min Ewk+1 , gN (xN ) + gi (xi , µi (xi ), wi ) wk 
π k+1 ...,
wN−1 i=k+1
 )
N−1
(
X
= Ewk min Ewk+1 , gN (xN ) + gi (xi , µi (xi ), wi ) 
π k+1 ...,
wN−1 i=k+1

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 32 / 44

  

  N−1 gN (xN )
 


min Ewk gk (xk , µk (xk ), wk ) + Ewk+1 , X
(µk ,π k+1 )  ...,  + gi (xi , µi (xi ), wi ) 

 wN−1  
i=k+1
 
E [g (x , µ (x ), wk )]

  wk k k k k  


  g N (x N ) 


= min
 N−1 
+ min Ewk Ewk+1 ,
 
µk X
+ g (x , µ (x ), w )
  

 π k+1 ...,  i i i i i 


 wN−1   
i=k+1
 
 Ewk [gk (xk , µk (xk ), wk )] 
    

  gN (xN ) 


= min
 N−1

+ Ewk min E k+1
 
µk w , X
+ g (x , µ (x ), w )
  

 π k+1 ...,  i i i i i  

 wN−1  
i=k+1
   

 
 N−1 g N (x N ) 

= min Ewk gk (xk , µk (xk ), wk ) + minEwk+1 ,
 X 
..., + gi (xi , µi (xi ), wi )

µk  π k+1 
 wN−1  
i=k+1

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 33 / 44

Proof of Theorem 1 (Contd)

Jk∗ (xk )
   

  N−1 gN (xN )
 

= min Ewk gk (xk , µk (xk ), wk ) + minEwk+1 ,
 X 
..., + gi (xi , µi (xi ), wi )

µk  π k+1 
 wN−1  
i=k+1
n o
∗
= min Ewk gk (xk , µk (xk ), wk ) + Jk+1 (fk (xk , µk (xk ), wk )) ,
µk

where the second equality follows from xk+1 = fk (xk , µk (xk ), wk ) and the
∗ (x
definition of Jk+1 k+1 ).

N−1
( )
X
Jk∗ (xk ) = min Ewk ,...,wN−1 gN (xN ) + gi (xi , µi (xi ), wi )
πk
i=k

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 34 / 44

Proof of Theorem 1 (Contd)
n o
Jk∗ (xk ) = min Ewk gk (xk , µk (xk ), wk ) + Jk+1
∗
(fk (xk , µk (xk ), wk ))
µk
n o
= min Ewk gk (xk , µk (xk ), wk ) + Jk+1 (fk (xk , µk (xk ), wk ))
µk
n o
= min Ewk gk (xk , uk , wk ) + Jk+1 (fk (xk , uk , wk ))
uk ∈Uk (xk )

The second equality follows from the induction hypothesis, i.e.,

∗ (x
Jk+1 k+1 ) = Jk+1 (xk+1 ) for all xk+1 .
The third equality follows from the fact that for any function F of x
and u, we have
min F (x, µ(x)) = min F (x, u),
µ∈M u∈U(x)

where M is the set of all functions µ(x) such that µ(x) ∈ U(x) for all
x.
Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 35 / 44
Proof of Theorem 1 (Contd)
n o
Jk∗ (xk ) = min Ewk gk (xk , µk (xk ), wk ) + Jk+1
∗
(fk (xk , µk (xk ), wk ))
µk
n o
= min Ewk gk (xk , µk (xk ), wk ) + Jk+1 (fk (xk , µk (xk ), wk ))
µk
n o
= min Ewk gk (xk , uk , wk ) + Jk+1 (fk (xk , uk , wk ))
uk ∈Uk (xk )

= Jk (xk )

The last equality follows from the definition of Jk (xk ), i.e.,

n o
Jk (xk ) = min Ewk gk (xk , uk , wk ) + Jk+1 (fk (xk , uk , wk )) .
uk ∈Uk (xk )

This completes the induction proof that shows Jk∗ (xk ) = Jk (xk ) for all k
and xk . Recall that J ∗ (x0 ) = J0∗ (x0 ). We obtain
J ∗ (x0 ) = J0∗ (x0 ) = J0 (x0 ).

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 36 / 44

Proof of Theorem 1 (Contd)

To show the optimality of π ∗ = {µ∗0 , ..., µ∗N−1 }, where uk∗ = µ∗k (xk )
minimizes the right side of
n o
Jk (xk ) = min Ewk gk (xk , uk , wk ) + Jk+1 (fk (xk , uk , wk ))
uk ∈Uk (xk )

for each xk and k, we can make use of the principle of optimality, i.e., “the
tail portion of an optimal policy is optimal for the tail subproblem.”
An optimal policy for the problem JN−1 is {µ∗N−1 }.
An optimal policy for the problem JN−2 is µ∗N−2 plus an optimal
policy for the problem JN−1 , i.e., {µ∗N−2 , µ∗N−1 }.
Continuing in this manner, an optimal policy for the problem J0 , i.e.,
J ∗ , is π ∗ = {µ∗0 , ..., µ∗N−1 }.

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 37 / 44

Proof of Theorem 1 (Contd)

Period N − 1. For any given xN−1 ,

gN−1 (xN−1 , uN−1 , wN−1 )
JN−1 (xN−1 ) = min EwN−1
uN−1 ∈UN−1 (xN−1 ) + JN (fN−1 (xN−1 , uN−1 , wN−1 ))
gN−1 (xN−1 , µ∗N−1 (xN−1 ), wN−1 )

= EwN−1 ,
+ JN (fN−1 (xN−1 , µ∗N−1 (xN−1 ), wN−1 ))
∗
since uN−1 = µ∗N−1 (xN−1 ) is the optimal solution to the optimization
problem on the right hand side in the first equality. Let
xN = fN−1 (xN−1 , µ∗N−1 (xN−1 ), wN−1 ). We have
n o
JN−1 (xN−1 ) = EwN−1 gN (xN ) + gN−1 (xN−1 , µ∗N−1 (xN−1 ), wN−1 ) .

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 38 / 44

Proof of Theorem 1 (Contd)
Period k, for any k = 0, 1, ..., N − 2.
Induction Hypothesis
N−1
( )
X
Jk+1 (xk+1 ) = Ewk+1 ,...,wN−1 gN (xN ) + gi (xi , µ∗i (xi ), wi ) ,
i=k+1

where xi+1 = fi (xi , µ∗i (xi ), wi ) for i = k + 1, ..., N − 1.

For any given xk ,

n o
Jk (xk ) = min Ewk gk (xk , uk , wk ) + Jk+1 (fk (xk , uk , wk ))
uk ∈Uk (xk )
n o
= Ewk gk (xk , µ∗k (xk ), wk ) + Jk+1 (fk (xk , µ∗k (xk ), wk )) ,
since uk∗ = µ∗k (xk ) is the optimal solution to the optimization problem on
the right hand side in the first equality. Let xk+1 = fk (xk , µ∗k (xk ), wk ). We
have n o
Jk (xk ) = Ewk+1 gk (xk , µ∗k (xk ), wk ) + Jk+1 (xk+1 ) .
Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 39 / 44
Proof of Theorem 1 (Contd)
Period k, for any k = 0, 1, ..., N − 2.
Induction Hypothesis
N−1
( )
X
Jk+1 (xk+1 ) = Ewk+1 ,...,wN−1 gN (xN ) + gi (xi , µ∗i (xi ), wi ) ,
i=k+1

where xi+1 = fi (xi , µ∗i (xi ), wi ) for i = k + 1, ..., N − 1.

n o
Jk (xk ) = Ewk gk (xk , µ∗k (xk ), wk ) + Jk+1 (xk+1 )
 )
N−1
(
 X 
= Ewk gk (xk , µ∗k (xk ), wk ) + Ewk+1 , gN (xN ) + gi (xi , µ∗i (xi ), wi ) ,
 ..., 
wN−1 i=k+1

where xk+1 = fk (xk , µ∗k (xk ), wk ) and xi+1 = fi (xi , µ∗i (xi ), wi ) for
i = k + 1, ..., N − 1.

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 40 / 44

Proof of Theorem 1 (Contd)

Jk (xk )
 )
N−1
(
 X 
= Ewk gk (xk , µ∗k (xk ), wk ) + Ewk+1 , gN (xN ) + gi (xi , µ∗i (xi ), wi ) ,
 ..., 
wN−1 i=k+1
N−1
( )
X
= Ewk ,...,wN−1 gN (xN ) + gk (xk , µ∗k (xk ), wk ) + gi (xi , µ∗i (xi ), wi )
i=k+1

where xk+1 = fk (xk , µ∗k (xk ), wk ) and xi+1 = fi (xi , µ∗i (xi ), wi ) for
i = k + 1, ..., N − 1.
N−1
( )
X
Jk (xk ) = Ewk ,...,wN−1 gN (xN ) + gi (xi , µ∗i (xi ), wi ) ,
i=k

where xi+1 = fi (xi , µ∗i (xi ), wi ) for i = k, ..., N − 1.

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 41 / 44

Proof of Theorem 1 (Contd)

Thus, for any x0 ,

N−1
( )
X
J0 (x0 ) = Ew0 ,...,wN−1 gN (xN ) + gi (xi , µ∗i (xi ), wi ) ,
i=0

where xk+1 = fk (xk , µ∗k (xk ), wk ) for k = 0, ..., N − 1, i.e.,

J ∗ (x0 ) = J0 (x0 ) = Jπ∗ (x0 ),

where π ∗ = {µ∗0 , ..., µ∗N−1 }. This implies that π ∗ is an optimal policy.

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 42 / 44

The Bellman Equation

n o
Jk (xk ) = min Ewk gk (xk , uk , wk )+Jk+1 (fk (xk , uk , wk )) Jk (
| {z } uk ∈Uk (xk ) | {z } | {z } | {
cost-to-go at time k cost at time k cost-to-go at time k + 1 cost-to-go

Jk (xk ) is the optimal cost for an (N − k)-stage problem starting at

state xk and time k, and ending at time N.
▶ Jk (xk ) is called the cost-to-go at state xk and time k.
▶ Jk is called the cost-to-go function at time k.

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 43 / 44

Numerical Execution of the DP Algorithm

The minimization in the DP recursion (1), i.e.,

n o
Jk (xk ) = min Ewk gk (xk , uk , wk ) + Jk+1 (fk (xk , uk , wk )) ,
uk ∈Uk (xk )

must be carried out for each value of xk .

Curse of dimensionality
▶ Suppose that any state in a period always leads to two possible states
in the next period. Starting from x0 , the number of possible states in
period k, i.e., the number of the possible values of xk , can be 2k .

Miao Song (PolyU, HK) Dynamic Programming January 18, 2024 44 / 44

Alan v. Oppenheim, Ronald W. Schafer - Digital Signal Processing (1975, Prentice-Hall) - Libgen - Li
50% (2)
Alan v. Oppenheim, Ronald W. Schafer - Digital Signal Processing (1975, Prentice-Hall) - Libgen - Li
600 pages
Dynamic Programming and Optimal Control: Third Edition Dimitri P. Bertsekas
0% (1)
Dynamic Programming and Optimal Control: Third Edition Dimitri P. Bertsekas
54 pages
Dynamic Programming 7707
No ratings yet
Dynamic Programming 7707
51 pages
DP Slides
No ratings yet
DP Slides
263 pages
Dynamic Programming and Optimal Control
No ratings yet
Dynamic Programming and Optimal Control
199 pages
Hiller - Dynamic Programming PDF
No ratings yet
Hiller - Dynamic Programming PDF
6 pages
MIT6 231F15 Complete Slide
No ratings yet
MIT6 231F15 Complete Slide
166 pages
Dynamic Programming and Optimal Control Script
No ratings yet
Dynamic Programming and Optimal Control Script
58 pages
MIT Dynamic Programming Lecture Slides
No ratings yet
MIT Dynamic Programming Lecture Slides
261 pages
Powell-Tutorial-ComputationalStochasticOptimization Informs Nov152014
No ratings yet
Powell-Tutorial-ComputationalStochasticOptimization Informs Nov152014
142 pages
Dynamic Programing and Optimal Control PDF
No ratings yet
Dynamic Programing and Optimal Control PDF
276 pages
Dynamic Programming and Optimal Control
No ratings yet
Dynamic Programming and Optimal Control
62 pages
RL Module 4
No ratings yet
RL Module 4
50 pages
Dynamic Programming and Optimal Control
No ratings yet
Dynamic Programming and Optimal Control
62 pages
Dynamic Optimization
No ratings yet
Dynamic Optimization
73 pages
Dynamic Programming: Quantitative Macroeconomics (Econ 5725)
No ratings yet
Dynamic Programming: Quantitative Macroeconomics (Econ 5725)
55 pages
3 DP PDF
No ratings yet
3 DP PDF
42 pages
RL Monograph1
No ratings yet
RL Monograph1
48 pages
2 Marks
No ratings yet
2 Marks
23 pages
Dynamic Programming
No ratings yet
Dynamic Programming
37 pages
Dynamic Programing and Optimal Control
No ratings yet
Dynamic Programing and Optimal Control
276 pages
Part 10
No ratings yet
Part 10
57 pages
MIT6 231F15 Notes PDF
No ratings yet
MIT6 231F15 Notes PDF
303 pages
Dynamic Programming
No ratings yet
Dynamic Programming
52 pages
Dynamic Programming: Xiaolan Xie
No ratings yet
Dynamic Programming: Xiaolan Xie
97 pages
MIT6 231F11 Notes Short
No ratings yet
MIT6 231F11 Notes Short
125 pages
RL and ObC Lecture 2
No ratings yet
RL and ObC Lecture 2
20 pages
CH 9 MDP
No ratings yet
CH 9 MDP
97 pages
Dynamic Programming and Linear Quadratic (LQ) Control (Discrete-Time and Continuous Time Cases)
No ratings yet
Dynamic Programming and Linear Quadratic (LQ) Control (Discrete-Time and Continuous Time Cases)
53 pages
Dynamic Programming
No ratings yet
Dynamic Programming
16 pages
RL Monograph1
No ratings yet
RL Monograph1
42 pages
Typeset by AMS-TEX
No ratings yet
Typeset by AMS-TEX
27 pages
Approximate Dynamic Programming - II: Algorithms: Warren B. Powell
No ratings yet
Approximate Dynamic Programming - II: Algorithms: Warren B. Powell
22 pages
DP Methods
No ratings yet
DP Methods
61 pages
Lecture 3 and 4
No ratings yet
Lecture 3 and 4
14 pages
Dynamic Programming
No ratings yet
Dynamic Programming
9 pages
Dynamic Programming and Optimal Control, Volumes I Solution Selected
No ratings yet
Dynamic Programming and Optimal Control, Volumes I Solution Selected
30 pages
P550
No ratings yet
P550
27 pages
Solution - 05 - 223 - Spring 2024 - Truncated
No ratings yet
Solution - 05 - 223 - Spring 2024 - Truncated
12 pages
04 - OR2 - Dynamic Programming
No ratings yet
04 - OR2 - Dynamic Programming
14 pages
1 Optimal Control: 1.1 Problem Definition
No ratings yet
1 Optimal Control: 1.1 Problem Definition
8 pages
16.323 Principles of Optimal Control: Mit Opencourseware
No ratings yet
16.323 Principles of Optimal Control: Mit Opencourseware
27 pages
Dynamic Optimization - Book
No ratings yet
Dynamic Optimization - Book
84 pages
Notas - Dynamic Optimation and Optimal Control
No ratings yet
Notas - Dynamic Optimation and Optimal Control
26 pages
Dynamic Programming
No ratings yet
Dynamic Programming
9 pages
5.1 Dynamic Programming and The HJB Equation: k+1 K K K K
No ratings yet
5.1 Dynamic Programming and The HJB Equation: k+1 K K K K
30 pages
SLchapt 3
No ratings yet
SLchapt 3
10 pages
Figure by Mit Opencourseware
No ratings yet
Figure by Mit Opencourseware
26 pages
Chapter 1 PDF
No ratings yet
Chapter 1 PDF
45 pages
Optimal Control Theory
No ratings yet
Optimal Control Theory
28 pages
DPOCexam2017 Solution BB
No ratings yet
DPOCexam2017 Solution BB
20 pages
Dynamic Mode Decomposition: Data-Driven Modeling of Complex Systems
50% (2)
Dynamic Mode Decomposition: Data-Driven Modeling of Complex Systems
241 pages
Dynamic Equilibrium Models III: Infinite Periods
No ratings yet
Dynamic Equilibrium Models III: Infinite Periods
15 pages
Digital SIgnal Processing
No ratings yet
Digital SIgnal Processing
19 pages
Vol I Dimitri PDF
No ratings yet
Vol I Dimitri PDF
30 pages
ECE 551 Lecture 2
No ratings yet
ECE 551 Lecture 2
11 pages
Dynamic Optimization: A Tool Kit: Manuel W Alti This Draft: September 2002
No ratings yet
Dynamic Optimization: A Tool Kit: Manuel W Alti This Draft: September 2002
17 pages
Namic Programming
No ratings yet
Namic Programming
18 pages
Introduction To Dynamic Optimization
No ratings yet
Introduction To Dynamic Optimization
7 pages
Scan 09-Sep-2020
No ratings yet
Scan 09-Sep-2020
3 pages
DP Combined Report
No ratings yet
DP Combined Report
2 pages
Dynamic Programming Handout - : 14.451 Recitation, February 18, 2005 - Todd Gormley
No ratings yet
Dynamic Programming Handout - : 14.451 Recitation, February 18, 2005 - Todd Gormley
11 pages
DP Report
No ratings yet
DP Report
1 page
DSP Book 1 PDF
No ratings yet
DSP Book 1 PDF
131 pages
Lecture 3 Sampling of Continuous-Time Signals
No ratings yet
Lecture 3 Sampling of Continuous-Time Signals
59 pages
Types of Sensor Passive and Active Sensors
No ratings yet
Types of Sensor Passive and Active Sensors
5 pages
Chapter 2 Discrete-Time Signals and Systems: Signal Processing
No ratings yet
Chapter 2 Discrete-Time Signals and Systems: Signal Processing
76 pages
Control Systems (Introduction)
No ratings yet
Control Systems (Introduction)
15 pages
Descriptive Statistics: Amit K Biswas
No ratings yet
Descriptive Statistics: Amit K Biswas
240 pages
Estimate State-Space Models in System Identification Tool - MATLAB & Simulink
No ratings yet
Estimate State-Space Models in System Identification Tool - MATLAB & Simulink
3 pages
E - /PQT/PQT UNIT 1 - 5 PDF
No ratings yet
E - /PQT/PQT UNIT 1 - 5 PDF
43 pages
Linearization OpenFAST
No ratings yet
Linearization OpenFAST
13 pages
SS - Question Bank 19-20
No ratings yet
SS - Question Bank 19-20
24 pages
Introduction To Biomedical Signals
No ratings yet
Introduction To Biomedical Signals
30 pages
IOT Unit 2
No ratings yet
IOT Unit 2
100 pages
AdaptivePolePositioning MIMOSystems
No ratings yet
AdaptivePolePositioning MIMOSystems
41 pages
Cloud Computing
No ratings yet
Cloud Computing
31 pages
Intro To FDM
No ratings yet
Intro To FDM
90 pages
HDIEA High Dimensional Color Image Encryption Architecture Using Five Dimensional Gauss Logistic and Lorenz System
No ratings yet
HDIEA High Dimensional Color Image Encryption Architecture Using Five Dimensional Gauss Logistic and Lorenz System
36 pages
SSRN-id1498514-Order Book Resilience, Price Manipulation, and The Positive Portfolio Problem
No ratings yet
SSRN-id1498514-Order Book Resilience, Price Manipulation, and The Positive Portfolio Problem
24 pages
Chapter 6 D
No ratings yet
Chapter 6 D
20 pages
Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network
No ratings yet
Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network
39 pages
Discrete-Time Models
No ratings yet
Discrete-Time Models
17 pages
Signals Classification
No ratings yet
Signals Classification
4 pages
Signal Processing: Haoran Zhao, Liyan Qiao, Ning Fu, Guoxing Huang
No ratings yet
Signal Processing: Haoran Zhao, Liyan Qiao, Ning Fu, Guoxing Huang
11 pages
Seminar 2
No ratings yet
Seminar 2
2 pages
Seminar 1
No ratings yet
Seminar 1
2 pages
Group 20 Project Work Submission 1
No ratings yet
Group 20 Project Work Submission 1
7 pages
Discrete and Continuous Data - Google Search
No ratings yet
Discrete and Continuous Data - Google Search
7 pages
Authority Letter
No ratings yet
Authority Letter
1 page
American International University-Bangladesh (AIUB) Faculty of Engineering (EEE)
No ratings yet
American International University-Bangladesh (AIUB) Faculty of Engineering (EEE)
4 pages
Tutorial No.3
No ratings yet
Tutorial No.3
3 pages
Optimal Scheduling in A Yogurt Productio PDF
No ratings yet
Optimal Scheduling in A Yogurt Productio PDF
9 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet