0% found this document useful (0 votes)
45 views207 pages

Mpro 4

Uploaded by

pokadoc289
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views207 pages

Mpro 4

Uploaded by

pokadoc289
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 207

Stochastic Dynamic Programming

Extending the usage of dynamic programming


Structured problems

Stochastic Dynamic Programmin

V. Leclère

December 8th 2023

Vincent Leclère Dynamic Programming 08/12/2023 1 / 36


Stochastic Dynamic Programming
Extending the usage of dynamic programming
Structured problems

Presentation Outline

1 Stochastic Dynamic Programming


Stochastic optimal control problem
Dynamic Programming principle
Example

2 Extending the usage of dynamic programming


More flexibility in the framework
Continuous state space

3 Structured problems
Linear Quadratic case
Linear convex case

Vincent Leclère Dynamic Programming 08/12/2023 1 / 36


Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

Presentation Outline

1 Stochastic Dynamic Programming


Stochastic optimal control problem
Dynamic Programming principle
Example

2 Extending the usage of dynamic programming


More flexibility in the framework
Continuous state space

3 Structured problems
Linear Quadratic case
Linear convex case

Vincent Leclère Dynamic Programming 08/12/2023 1 / 36


Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

Presentation Outline

1 Stochastic Dynamic Programming


Stochastic optimal control problem
Dynamic Programming principle
Example

2 Extending the usage of dynamic programming


More flexibility in the framework
Continuous state space

3 Structured problems
Linear Quadratic case
Linear convex case

Vincent Leclère Dynamic Programming 08/12/2023 1 / 36


Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

Stochastic Controlled Dynamic System


A discrete time controlled stochastic dynamic system is defined by
its dynamic
x t+1 = ft (x t , u t , ξ t+1 )
and initial state
x 0 = ξ0

The variables
x t is the state of the system,
u t is the control applied to the system at time t,
ξ t is an exogeneous noise.

Usually, x t ∈ Xt and u t belongs to a set depending upon the state:


u t ∈ Ut (x t ).

Vincent Leclère Dynamic Programming 08/12/2023 2 / 36


Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

Examples

Stock of water in a dam:


x t is the amount of water in the dam at time t,
u t is the amount of water turbined at time t,
ξ t+1 is the inflow of water in [t, t + 1[.
Boat in the ocean:
x t is the position of the boat at time t,
u t is the direction and speed chosen for [t, t + 1[,
ξ t+1 is the wind and current for [t, t + 1[.
Subway network:
x t is the position and speed of each train at time t,
u t is the acceleration chosen at time t,
ξ t+1 is the delay due to passengers and incident on the
network for [t, t + 1[.

Vincent Leclère Dynamic Programming 08/12/2023 3 / 36


Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

More considerations about the state

Physical state: the physical value of the controlled system.


e.g. amount of water in your dam, position of your boat...
Information state: physical state and information you have
over noises. e.g.: amount of water and weather forecast...
Knowledge state: your current belief over the actual
information state (in case of noisy observations). Represented
as a distribution law over information states.
The state, in the Dynamic Programming sense, is the information
required to define an optimal solution.

Vincent Leclère Dynamic Programming 08/12/2023 4 / 36


Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

Optimization Problem

−1
h TX  i
min E Lt x t , u t , ξ t+1 + K x T
u
t=0
s.t. x t+1 = ft (x t , u t , ξ t+1 ), x 0 = ξ0
u t ∈ Ut (x t ), x t ∈ Xt

σ(u t ) ⊂ σ ξ 0 , · · · , ξ t

Vincent Leclère Dynamic Programming 08/12/2023 5 / 36


Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

Optimization Problem

−1
h TX  i
min E Lt x t , u t , ξ t+1 + K x T
u
t=0
s.t. x t+1 = ft (x t , u t , ξ t+1 ), x 0 = ξ0
u t ∈ Ut (x t ), x t ∈ Xt

σ(u t ) ⊂ σ ξ 0 , · · · , ξ t

1 We want to minimize the expectation of the sum of costs.


2 The system follows a dynamic given by the function ft .
3 There are stagewise constraints on the controls and costs.
4 The controls are functions of the past noises
(= non-anticipativity).
Vincent Leclère Dynamic Programming 08/12/2023 5 / 36
Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

Optimization Problem

−1
h TX  i
min E Lt x t , u t , ξ t+1 + K x T
Φ
t=0
s.t. x t+1 = ft (x t , u t , ξ t+1 ), x 0 = ξ0
u t ∈ Ut (x t ), x t ∈ Xt
u t = Φ(ξ 0 , · · · , ξ t )

1 We want to minimize the expectation of the sum of costs.


2 The system follows a dynamic given by the function ft .
3 There are stagewise constraints on the controls and costs.
4 The controls are functions of the past noises
(= non-anticipativity).
Vincent Leclère Dynamic Programming 08/12/2023 5 / 36
Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

Optimization Problem with independence of noises

Assuming stagewise independence of the noises, we can compress


information in the following way:
−1
h TX  i
min E Lt x t , u t , ξ t+1 + K x T
Φ
t=0
s.t. x t+1 = ft (x t , u t , ξ t+1 ), x 0 = ξ0
u t ∈ Ut (x t ), x t ∈ Xt
u t = Φt (ξ 0 , · · · , ξ t )

Vincent Leclère Dynamic Programming 08/12/2023 6 / 36


Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

Optimization Problem with independence of noises

Assuming stagewise independence of the noises, we can compress


information in the following way:
−1
h TX  i
min E Lt x t , u t , ξ t+1 + K x T
π
t=0
s.t. x t+1 = ft (x t , u t , ξ t+1 ), x 0 = ξ0
u t ∈ Ut (x t ), x t ∈ Xt
u t = πt (x t )

Vincent Leclère Dynamic Programming 08/12/2023 6 / 36


Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

Keeping only the state


For notational ease, we want to formulate Problem (??) only with states.
Let Xt (xt , ξt+1 ) be the reachable states, i.e.,
n o
Xt (xt , ξt+1 ) := xt+1 ∈ Xt+1 | ∃ut ∈ Ut (xt , ξt+1 ), xt+1 = ft (xt , ut , ξt+1 ) .

And ct (xt , xt+1 , ξt+1 ) the transition cost from xt to xt+1 , i.e.,
n o
ct (xt , xt+1 , ξt+1 ) := min Lt (xt , ut , ξt+1 ) | xt+1 = ft (xt , ut , ξt+1 ) .
ut ∈Ut (xt ,ξt+1 )

Then, under independance of noises, the optimization problem reads


−1
h TX i
min E ct (x t , x t+1 , ξ t+1 ) + K (xT )
ψ
t=0
s.t. x t+1 ∈ Xt (x t , ξ t+1 ), x0 = ξ 0
x t+1 = ψt (x t , ξ t+1 )

Vincent Leclère Dynamic Programming 08/12/2023 7 / 36


Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

Keeping only the state


For notational ease, we want to formulate Problem (??) only with states.
Let Xt (xt , ξt+1 ) be the reachable states, i.e.,
n o
Xt (xt , ξt+1 ) := xt+1 ∈ Xt+1 | ∃ut ∈ Ut (xt , ξt+1 ), xt+1 = ft (xt , ut , ξt+1 ) .

And ct (xt , xt+1 , ξt+1 ) the transition cost from xt to xt+1 , i.e.,
n o
ct (xt , xt+1 , ξt+1 ) := min Lt (xt , ut , ξt+1 ) | xt+1 = ft (xt , ut , ξt+1 ) .
ut ∈Ut (xt ,ξt+1 )

Then, under independance of noises, the optimization problem reads


−1
h TX i
min E ct (x t , x t+1 , ξ t+1 ) + K (xT )
ψ
t=0
s.t. x t+1 ∈ Xt (x t , ξ t+1 ), x0 = ξ 0
x t+1 = ψt (x t , ξ t+1 )

Vincent Leclère Dynamic Programming 08/12/2023 7 / 36


Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

Presentation Outline

1 Stochastic Dynamic Programming


Stochastic optimal control problem
Dynamic Programming principle
Example

2 Extending the usage of dynamic programming


More flexibility in the framework
Continuous state space

3 Structured problems
Linear Quadratic case
Linear convex case

Vincent Leclère Dynamic Programming 08/12/2023 7 / 36


Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

Bellman’s Principle of Optimality

An optimal policy has the


property that whatever the
initial state and initial deci-
sion are, the remaining de-
cisions must constitute an
optimal policy with regard
to the state resulting from
Richard Ernest Bellman the first decision (Richard
(August 26, 1920 – March 19, Bellman)
1984)

Vincent Leclère Dynamic Programming 08/12/2023 8 / 36


Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

The shortest path on a graph illustrates Bellman’s


Principle of Optimality
For an auto travel analogy,
suppose that the fastest
route from Los Angeles
to Boston passes through
Chicago.
The principle of optimality
translates to obvious fact
that the Chicago to Boston
portion of the route is also
the fastest route for a trip
that starts from Chicago
and ends in Boston. (Dim-
itri P. Bertsekas)

Vincent Leclère Dynamic Programming 08/12/2023 9 / 36


Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

Idea behind dynamic programming

If noises are time independent, then


1 The cost to go at time t depends only upon the current state.
2 We can compute recursively the cost to go for each position,
starting from the terminal state and computing optimal
trajectories backward.

Optimal cost-to-go of being in state x at time t is:


At time t, Vt+1 gives the cost of the future.
Dynamic Programming is a time decomposition method.

Vincent Leclère Dynamic Programming 08/12/2023 10 / 36


Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

Idea behind dynamic programming

If noises are time independent, then


1 The cost to go at time t depends only upon the current state.
2 We can compute recursively the cost to go for each position,
starting from the terminal state and computing optimal
trajectories backward.

Optimal cost-to-go of being in state x at time t is:


At time t, Vt+1 gives the cost of the future.
Dynamic Programming is a time decomposition method.

Vincent Leclère Dynamic Programming 08/12/2023 10 / 36


Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

Idea Behind Dynamic Programming

−1
h TX
" #
  i
min E L0 x0 , u0 , ξ 1 + min E Lt x t , u t , w t+1 + K x T
u0 ∈U0 (x0 ) u1 ,...uT −1
t=1
s.t. x 1 = f0 (x0 , u0 , ξ 1 )
x t+1 = ft (x t , u t , ξ t+1 ) ∈ Xt+1 ,
u t ∈ Ut (x t )

σ(u t ) ⊂ σ ξ 0 , · · · , ξ t

Vincent Leclère Dynamic Programming 08/12/2023 11 / 36


Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

Idea Behind Dynamic Programming

−1
h TX
" #
  i
min E L0 x0 , u0 , ξ 1 + min E Lt x t , u t , w t+1 + K x T
u0 ∈U0 (x0 ) u1 ,...uT −1
t=1
s.t. x 1 = f0 (x0 , u0 , ξ 1 )
x t+1 = ft (x t , u t , ξ t+1 ) ∈ Xt+1 ,
u t ∈ Ut (x t )
σ(u t ) ⊂ σ(x t )

Independence of noises

Vincent Leclère Dynamic Programming 08/12/2023 11 / 36


Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

Idea Behind Dynamic Programming

−1
h TX
" #
  i
min E L0 x0 , u0 , ξ 1 + min E Lt x t , u t , w t+1 + K x T
u0 ∈U0 (x0 ) u1 ,...uT −1
t=1
s.t. x 1 = f0 (x0 , u0 , ξ 1 )
x t+1 = ft (x t , u t , ξ t+1 ) ∈ Xt+1 ,
u t ∈ Ut (x t )
σ(u t ) ⊂ σ(x t )

| {z }
=:V1 (x 1 )
Independence of noises

Vincent Leclère Dynamic Programming 08/12/2023 11 / 36


Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

Definition of Bellman Value Function

The Bellman’s value function Vt0 (x ) is defined as the value of the


problem starting at time t0 from the state x .
More precisely we have
−1
h TX  i
Vt0 (x ) = min E Lt x t , u t , ξ t+1 + K x T
t=t0
s.t. x t+1 = ft (x t , u t , ξ t+1 ), x t0 = x
u t ∈ Ut (x t ), x t ∈ Xt

σ(u t ) ⊂ σ ξ 0 , · · · , ξ t

Vincent Leclère Dynamic Programming 08/12/2023 12 / 36


Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

Bellman’s recursion

The core idea of Bellman’s recursion is to see the total (expected)


cost as the sum of the current cost and the future cost:
h i
Vt (xt ) = min E Lt (x , u, ξt+1 ) + Vt+1 (xt+1 )
ut
xt+1 = ft (xt , ut , ξt+1 )
ut ∈ Ut (xt )
xt+1 ∈ Xt+1

And we know the final cost function:

VT (xT ) = K (xT ).

Vincent Leclère Dynamic Programming 08/12/2023 13 / 36


Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

Dynamic Programming Algorithm - Discrete Case

Data: Problem parameters


Result: optimal strategy and value;
VT ≡ K ; Vt ≡ 0
for t : T − 1 → 0 do
for x ∈ Xt do h  i
Vt (x ) = min E Lt (x , u, ξ t+1 ) + Vt+1 ft (x , u, ξt+1 )
u∈Ut (x ) | {z }
xt+1
Algorithm 1: Classical stochastic DP algorithm

Vincent Leclère Dynamic Programming 08/12/2023 14 / 36


Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

Dynamic Programming Algorithm - Discrete Case

Data: Problem parameters


Result: optimal strategy and value;
VT ≡ K ; Vt ≡ 0
for t : T − 1 → 0 do
for x ∈ Xt do
Vt (x ) = +∞;
for u ∈ U(x ) do h  i
Qt (x , u) = E Lt (x , u, ξ t+1 ) + Vt+1 ft (x , u, ξt+1 )
| {z }
xt+1
if Qt (x , u) < Vt (x ) then
Vt (x ) = Qt (x , u);
πt (x ) = u;
Algorithm 1: Classical stochastic DP algorithm

Vincent Leclère Dynamic Programming 08/12/2023 14 / 36


Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

Dynamic Programming Algorithm - Discrete Case


Data: Problem parameters
Result: optimal strategy and value;
VT ≡ K ; Vt ≡ 0
for t : T − 1 → 0 do
for x ∈ Xt do
Vt (x ) = +∞;
for u ∈ U(x ) do
for ξ ∈ Ξt+1 do
ξ
xt+1 = ft (x , u, ξ);
ξ
if xt+1 ∈ Xt then
ξ
Q̇t (x , u, ξ) = Lt (x , u, ξ t+1 ) + Vt+1 (xt+1 )
else
Q̇t (x , u, ξ) = +∞
P
Qt (x , u) = P(ξ t+1 = ξ)Q̇t (x , u, ξ);
ξ∈Ξt+1
if Qt (x , u) < Vt (x ) then
Vt (x ) = Qt (x , u); πt (x ) = u;
Algorithm 1: Classical stochastic DP algorithm
Vincent Leclère Dynamic Programming 08/12/2023 14 / 36
Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

3 curses of dimensionality
Complexity = O(T × |Xt | × |Ut | × |Ξt |)
Linear in the number of time steps, but we have 3 curses of
dimensionality :
1 State. Complexity is exponential in the dimension of Xt
e.g. 3 independent states each taking 10 values leads to a
loop over 1000 points.
2 Decision. Complexity is exponential in the dimension of Ut .
⇝ due to exhaustive minimization of inner problem. Can be
accelerated using faster method (e.g. MILP solver).
3 Expectation. Complexity is exponential in the dimension of
Ξt .
⇝ due to expectation computation. Can be accelerated
through Monte-Carlo approximation (still at least 1000 points)
In practice, DP is not used for a state of dimension more than 5.
Vincent Leclère Dynamic Programming 08/12/2023 15 / 36
Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

3 curses of dimensionality
Complexity = O(T × |Xt | × |Ut | × |Ξt |)
Linear in the number of time steps, but we have 3 curses of
dimensionality :
1 State. Complexity is exponential in the dimension of Xt
e.g. 3 independent states each taking 10 values leads to a
loop over 1000 points.
2 Decision. Complexity is exponential in the dimension of Ut .
⇝ due to exhaustive minimization of inner problem. Can be
accelerated using faster method (e.g. MILP solver).
3 Expectation. Complexity is exponential in the dimension of
Ξt .
⇝ due to expectation computation. Can be accelerated
through Monte-Carlo approximation (still at least 1000 points)
In practice, DP is not used for a state of dimension more than 5.
Vincent Leclère Dynamic Programming 08/12/2023 15 / 36
Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

Illustrating dynamic programming with the damsvalley


example

Gnioure Izourt Soulcem

Auzat

Sabart

Vincent Leclère Dynamic Programming 08/12/2023 16 / 36


Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

Illustrating the curse of dimensionality

We are in dimension 5 (not so high in the world of big data!) with


52 timesteps (common in energy management) plus 5 controls and
5 independent noises.
1 We discretize each state’s dimension in 100 values:
|Xt | = 1005 = 1010
2 We discretize each control’s dimension in 100 values:
|Ut | = 1005 = 1010
3 We use optimal quantization to discretize the noises’ space in
10 values: |Ξt | = 10
Number of flops: O(52 × 1010 × 1010 × 10) ≈ O(1023 ).
In the TOP500, the best computer computes 1017 flops/s.
Even with the most powerful computer, it takes at least 12 days to
solve this problem.

Vincent Leclère Dynamic Programming 08/12/2023 17 / 36


Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

Presentation Outline

1 Stochastic Dynamic Programming


Stochastic optimal control problem
Dynamic Programming principle
Example

2 Extending the usage of dynamic programming


More flexibility in the framework
Continuous state space

3 Structured problems
Linear Quadratic case
Linear convex case

Vincent Leclère Dynamic Programming 08/12/2023 17 / 36


Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

A storage management example


A producer that needs to satisfy a weekly demand over 12 weeks.
Storage capacity of 100 units, starting with 50 units.
The producer can produce 0 (cost 0), 10 (cost 20) or 20 (cost 30)
or 25 (cost 45) units per week.
Demand is random and follows a stagewise independent uniform
distribution on {0, 10, 20, 30, 40}.
Storage cost 0.1 per unit per week.
Unmet demand is lost and costs 5 per unit.
Products remaining at the end are sold at 1 per unit.
During a given week:
producer decide how much to produce during the week
demand is revealed and should be met with current stock and
production
remaining stock is stored (at a cost), stock above capacity is
lost
Vincent Leclère Dynamic Programming 08/12/2023 18 / 36
Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example

Exercise

1 Formulate the problem as a stochastic dynamic program,


underlying state, decision and noise.
2 Write the dynamic programming (Bellman’s) equation.
3 Solve the problem with your favorite programming language.

Vincent Leclère Dynamic Programming 08/12/2023 19 / 36


Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Presentation Outline

1 Stochastic Dynamic Programming


Stochastic optimal control problem
Dynamic Programming principle
Example

2 Extending the usage of dynamic programming


More flexibility in the framework
Continuous state space

3 Structured problems
Linear Quadratic case
Linear convex case

Vincent Leclère Dynamic Programming 08/12/2023 19 / 36


Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Presentation Outline

1 Stochastic Dynamic Programming


Stochastic optimal control problem
Dynamic Programming principle
Example

2 Extending the usage of dynamic programming


More flexibility in the framework
Continuous state space

3 Structured problems
Linear Quadratic case
Linear convex case

Vincent Leclère Dynamic Programming 08/12/2023 19 / 36


Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Requirements of stochastic DP

−1
h TX  i
min E Lt x t , u t , ξ t+1 + K x T
π
t=0
s.t. x t+1 = ft (x t , u t , ξ t+1 ), x 0 = x0
u t ∈ Ut (x t ), x t ∈ Xt
u t = πt (x t )

Assumptions:
The noise are stagewise-independent.
The only constraint linking stages is the dynamic equation: no
coupling between stages.
The cost function is additive over stages.
We consider the expectation of costs.
Vincent Leclère Dynamic Programming 08/12/2023 20 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Dynamic Programming Algorithm - Discrete Case

Data: Problem parameters


Result: optimal strategy and value;
VT ≡ K ; Vt ≡ 0
for t : T − 1 → 0 do
for x ∈ Xt do h  i
Vt (x ) = min E Lt (x , u, ξ t+1 ) + Vt+1 ft (x , u, ξt+1 )
u∈Ut (x ) | {z }
xt+1
Algorithm 2: Classical stochastic DP algorithm

Vincent Leclère Dynamic Programming 08/12/2023 21 / 36


Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Dynamic Programming Algorithm - Discrete Case

Data: Problem parameters


Result: optimal strategy and value;
VT ≡ K ; Vt ≡ 0
for t : T − 1 → 0 do
for x ∈ Xt do
Vt (x ) = +∞;
for u ∈ U(x ) do h  i
Qt (x , u) = E Lt (x , u, ξ t+1 ) + Vt+1 ft (x , u, ξt+1 )
| {z }
xt+1
if Qt (x , u) < Vt (x ) then
Vt (x ) = Qt (x , u);
πt (x ) = u;
Algorithm 2: Classical stochastic DP algorithm

Vincent Leclère Dynamic Programming 08/12/2023 21 / 36


Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Dynamic Programming Algorithm - Discrete Case


Data: Problem parameters
Result: optimal strategy and value;
VT ≡ K ; Vt ≡ 0
for t : T − 1 → 0 do
for x ∈ Xt do
Vt (x ) = +∞;
for u ∈ U(x ) do
for ξ ∈ Ξt+1 do
ξ
xt+1 = ft (x , u, ξ);
ξ
if xt+1 ∈ Xt then
ξ
Q̇t (x , u, ξ) = Lt (x , u, ξ t+1 ) + Vt+1 (xt+1 )
else
Q̇t (x , u, ξ) = +∞
P
Qt (x , u) = P(ξ t+1 = ξ)Q̇t (x , u, ξ);
ξ∈Ξt+1
if Qt (x , u) < Vt (x ) then
Vt (x ) = Qt (x , u); πt (x ) = u;
Algorithm 2: Classical stochastic DP algorithm
Vincent Leclère Dynamic Programming 08/12/2023 21 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Markovian noise
Assume that (ξ t )t is a Markovian noise, i.e. ξ t only depends on x t .
We can recover the previous setting by defining an extended
state
x̃t = (x t , ξ t )

Bellman equation then becomes:


h i
Vt (xt , ξt ) := min E Lt (xt , ut , ξ t+1 )+Vt+1 (x t+1 ) | ξ t = ξt
ut ∈Ut (xt )

More precisely, it means that:


1 The value function V (and the optimal policy π ) depends on
t t
both the current physical state xt and the current noise ξt .
2 The probability used to average the cost to go in the

algorithm is the conditional probability given ξt .


Vincent Leclère Dynamic Programming 08/12/2023 22 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Markovian noise
Assume that (ξ t )t is a Markovian noise, i.e. ξ t only depends on x t .
We can recover the previous setting by defining an extended
state
x̃t = (x t , ξ t )

Bellman equation then becomes:


h i
Vt (xt , ξt ) := min E Lt (xt , ut , ξ t+1 )+Vt+1 (x t+1 ) | ξ t = ξt
ut ∈Ut (xt )

More precisely, it means that:


1 The value function V (and the optimal policy π ) depends on
t t
both the current physical state xt and the current noise ξt .
2 The probability used to average the cost to go in the

algorithm is the conditional probability given ξt .


Vincent Leclère Dynamic Programming 08/12/2023 22 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Markovian noise
Assume that (ξ t )t is a Markovian noise, i.e. ξ t only depends on x t .
We can recover the previous setting by defining an extended
state
x̃t = (x t , ξ t )

Bellman equation then becomes:


h i
Vt (xt , ξt ) := min E Lt (xt , ut , ξ t+1 )+Vt+1 (x t+1 ) | ξ t = ξt
ut ∈Ut (xt )

More precisely, it means that:


1 The value function V (and the optimal policy π ) depends on
t t
both the current physical state xt and the current noise ξt .
2 The probability used to average the cost to go in the

algorithm is the conditional probability given ξt .


Vincent Leclère Dynamic Programming 08/12/2023 22 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Coupling control

Consider the following problem, with stagewise independent noise:


−1
h TX  i
min E Lt x t , u t , ξ t+1 + K x T
π
t=0
s.t. x t+1 = ft (x t , u t , ξ t+1 ), x 0 = x0
u t ∈ Ut (x t ), x t ∈ Xt
u t = πt (x t )
∥u t − u t−1 ∥ ≤ δ

How can we solve this problem using Dynamic Programming?

Vincent Leclère Dynamic Programming 08/12/2023 23 / 36


Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Coupling control

Consider the following problem, with stagewise independent noise:


−1
h TX  i
min E Lt x t , u t , ξ t+1 + K x T
π
t=0
s.t. x t+1 = ft (x t , u t , ξ t+1 ), x 0 = x0
u t ∈ Ut (x t ), x t ∈ Xt
u t = πt (x t )
∥u t − u t−1 ∥ ≤ δ

How can we solve this problem using Dynamic Programming?

Vincent Leclère Dynamic Programming 08/12/2023 23 / 36


Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Delayed control

Consider the following problem, with stagewise independent noise:


−1
h TX  i
min E Lt x t , u t , ξ t+1 + K x T
π
t=0
s.t. x t+1 = ft (x t , u t−2 , ξ t+1 ), x 0 = x0
u t ∈ Ut (x t ), x t ∈ Xt
u t = πt (x t )

How can we solve this problem using Dynamic Programming?

Vincent Leclère Dynamic Programming 08/12/2023 24 / 36


Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Delayed control

Consider the following problem, with stagewise independent noise:


−1
h TX  i
min E Lt x t , u t , ξ t+1 + K x T
π
t=0
s.t. x t+1 = ft (x t , u t−2 , ξ t+1 ), x 0 = x0
u t ∈ Ut (x t ), x t ∈ Xt
u t = πt (x t )

How can we solve this problem using Dynamic Programming?

Vincent Leclère Dynamic Programming 08/12/2023 24 / 36


Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Bankruptcy

Consider the following problem, with stagewise independent noise:


−1
h TX  i
min E Lt x t , u t , ξ t+1 + K x T
π
t=0
s.t. x t+1 = ft (x t , u t , ξ t+1 ), x 0 = x0
u t ∈ Ut (x t ), x t ∈ Xt
u t = πt (x t )

In addition, we assume that we start with a capital C0 , and that


we must never, under any circonstance, have a negative capital.
How can we solve this problem using Dynamic Programming?

Vincent Leclère Dynamic Programming 08/12/2023 25 / 36


Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Bankruptcy

Consider the following problem, with stagewise independent noise:


−1
h TX  i
min E Lt x t , u t , ξ t+1 + K x T
π
t=0
s.t. x t+1 = ft (x t , u t , ξ t+1 ), x 0 = x0
u t ∈ Ut (x t ), x t ∈ Xt
u t = πt (x t )

In addition, we assume that we start with a capital C0 , and that


we must never, under any circonstance, have a negative capital.
How can we solve this problem using Dynamic Programming?

Vincent Leclère Dynamic Programming 08/12/2023 25 / 36


Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Maximizing probability

Consider the following problem, with stagewise independent noise:


−1
h TX  i
min E Lt x t , u t , ξ t+1 + K x T
π
t=0
s.t. x t+1 = ft (x t , u t , ξ t+1 ), x 0 = x0
u t ∈ Ut (x t ), x t ∈ Xt
u t = πt (x t )

We are now reconsidering our objective function, and want to


replace the expectation by the probability of the accumulated, at
the end of the period, to be negative.
How can we solve this problem by Dynamic Programming?

Vincent Leclère Dynamic Programming 08/12/2023 26 / 36


Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Presentation Outline

1 Stochastic Dynamic Programming


Stochastic optimal control problem
Dynamic Programming principle
Example

2 Extending the usage of dynamic programming


More flexibility in the framework
Continuous state space

3 Structured problems
Linear Quadratic case
Linear convex case

Vincent Leclère Dynamic Programming 08/12/2023 26 / 36


Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Dynamic Programming Algorithm - Discrete Case - HD

Data: Problem parameters


Result: optimal trajectory and value;
VT ≡ K ; Vt ≡ 0
for t : T − 1 → 0 do
for x ∈ Xt do h  i
Vt (x ) = E min ct (x , y , ξ t+1 ) + Vt+1 (y )
y ∈Xt (x ,ξt+1 )
Algorithm 3: Classical stochastic dynamic programming algo-
rithm

Vincent Leclère Dynamic Programming 08/12/2023 27 / 36


Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Dynamic Programming Algorithm - Discrete Case - HD

Data: Problem parameters


Result: optimal trajectory and value;
VT ≡ K ; Vt ≡ 0
for t : T − 1 → 0 do
for x ∈ Xt do
for ξ ∈ Ξt do
V̂t (x , ξ) = min ct (x , y , ξ) + Vt+1 (y )
y ∈Xt (x ,ξ)
Vt (x ) = Vt (x ) + P(ξ)V̂t (x , ξ)
Algorithm 3: Classical stochastic dynamic programming algo-
rithm

Vincent Leclère Dynamic Programming 08/12/2023 27 / 36


Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Dynamic Programming Algorithm - Discrete Case - HD


Data: Problem parameters
Result: optimal trajectory and value;
VT ≡ K ; Vt ≡ 0
for t : T − 1 → 0 do
for x ∈ Xt do
for ξ ∈ Ξt do
V̂t (x , ξ) = ∞;
for y ∈ Xt (x , ξ) do
vy = ct (x , y , ξ) + Vt+1 (y );
if vy < V̂t (x , ξ) then
V̂t (x , ξ) = vy ;
ψt (x , ξ) = y ;
Vt (x ) = Vt (x ) + P(ξ)V̂t (x , ξ)
Algorithm 3: Classical stochastic dynamic programming algo-
rithm

Vincent Leclère Dynamic Programming 08/12/2023 27 / 36


Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Discretized Stochastic Dynamic Programming


The simplest DP algorithm is obtained by discretizing the state
set, and then doing a single backward pass over the grid.

x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Cost-to-go induced policy and Forward Bellman operator

The point of most DP methods is to produce approximations


Ṽt of the true value function1 Vt .
From any approximation Ṽt of Vt , we can define a cost-to-go
induced policy ψt by solving the stage problem:

min ℓt+1 (xin , xt , ut , ξt ) + Ṽ (xout )


xout ,ut ∈Xt (xin ,ξt ) | {z } | {z }
transition costs cost-to-go

Thus a (sequence of) value functions approximations yields a


policy, which can be simulated to obtain trajectories and costs.
➥ Often used to pass information from long-term to short-term
problems.

1
Sometimes it can be of V̇t instead
Vincent Leclère Dynamic Programming 08/12/2023 29 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Cost-to-go induced policy and Forward Bellman operator

The point of most DP methods is to produce approximations


Ṽt of the true value function1 Vt .
From any approximation Ṽt of Vt , we can define a cost-to-go
induced policy ψt by solving the stage problem:

min ℓt+1 (xin , xt , ut , ξt ) + Ṽ (xout )


xout ,ut ∈Xt (xin ,ξt ) | {z } | {z }
transition costs cost-to-go

Thus a (sequence of) value functions approximations yields a


policy, which can be simulated to obtain trajectories and costs.
➥ Often used to pass information from long-term to short-term
problems.

1
Sometimes it can be of V̇t instead
Vincent Leclère Dynamic Programming 08/12/2023 29 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Cost-to-go induced policy and Forward Bellman operator

The point of most DP methods is to produce approximations


Ṽt of the true value function1 Vt .
From any approximation Ṽt of Vt , we can define a cost-to-go
induced policy ψt by solving the stage problem:

min ℓt+1 (xin , xt , ut , ξt ) + Ṽ (xout )


xout ,ut ∈Xt (xin ,ξt ) | {z } | {z }
transition costs cost-to-go

Thus a (sequence of) value functions approximations yields a


policy, which can be simulated to obtain trajectories and costs.
➥ Often used to pass information from long-term to short-term
problems.

1
Sometimes it can be of V̇t instead
Vincent Leclère Dynamic Programming 08/12/2023 29 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems

Cost-to-go induced policy and Forward Bellman operator

The point of most DP methods is to produce approximations


Ṽt of the true value function1 Vt .
From any approximation Ṽt of Vt , we can define a cost-to-go
induced policy ψt by solving the stage problem:

min ℓt+1 (xin , xt , ut , ξt ) + Ṽ (xout )


xout ,ut ∈Xt (xin ,ξt ) | {z } | {z }
transition costs cost-to-go

Thus a (sequence of) value functions approximations yields a


policy, which can be simulated to obtain trajectories and costs.
➥ Often used to pass information from long-term to short-term
problems.

1
Sometimes it can be of V̇t instead
Vincent Leclère Dynamic Programming 08/12/2023 29 / 36
Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Presentation Outline

1 Stochastic Dynamic Programming


Stochastic optimal control problem
Dynamic Programming principle
Example

2 Extending the usage of dynamic programming


More flexibility in the framework
Continuous state space

3 Structured problems
Linear Quadratic case
Linear convex case

Vincent Leclère Dynamic Programming 08/12/2023 29 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Presentation Outline

1 Stochastic Dynamic Programming


Stochastic optimal control problem
Dynamic Programming principle
Example

2 Extending the usage of dynamic programming


More flexibility in the framework
Continuous state space

3 Structured problems
Linear Quadratic case
Linear convex case

Vincent Leclère Dynamic Programming 08/12/2023 29 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Linear Quadratic case

−1
h TX i
min E x⊤ ⊤ ⊤
t Qt x t + u t Rt u t + x T QT x T
π
t=0
s.t. x t+1 = At x t + Bt u t + ξ t , x 0 = x0
u t = πt (x t )
Under stagewise independence of the (centered) noise we can show that:
1 The value function is quadratic: Vt (xt ) = xt⊤ Kt xt + kt .
2 The optimal policy is linear: πt (xt ) = Lt xt .
3With explicit (Riccati) formulas for Kt and Lt .

KT = QT , kT = 0

Kt = Qt + A⊤ ⊤ ⊤ −1 ⊤
t Kt+1 At − At Kt+1 Bt (Rt + Bt Kt+1 Bt ) Bt Kt+1 At
⊤ −1 ⊤

Lt = −(Rt + Bt Kt+1 Bt ) Bt Kt+1 At

➥ Can be solved for large dimension (say n ∼ 104 ).
Vincent Leclère Dynamic Programming 08/12/2023 30 / 36
Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Linear Quadratic case

−1
h TX i
min E x⊤ ⊤ ⊤
t Qt x t + u t Rt u t + x T QT x T
π
t=0
s.t. x t+1 = At x t + Bt u t + ξ t , x 0 = x0
u t = πt (x t )
Under stagewise independence of the (centered) noise we can show that:
1 The value function is quadratic: Vt (xt ) = xt⊤ Kt xt + kt .
2 The optimal policy is linear: πt (xt ) = Lt xt .
3With explicit (Riccati) formulas for Kt and Lt .

KT = QT , kT = 0

Kt = Qt + A⊤ ⊤ ⊤ −1 ⊤
t Kt+1 At − At Kt+1 Bt (Rt + Bt Kt+1 Bt ) Bt Kt+1 At
⊤ −1 ⊤

Lt = −(Rt + Bt Kt+1 Bt ) Bt Kt+1 At

➥ Can be solved for large dimension (say n ∼ 104 ).
Vincent Leclère Dynamic Programming 08/12/2023 30 / 36
Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Presentation Outline

1 Stochastic Dynamic Programming


Stochastic optimal control problem
Dynamic Programming principle
Example

2 Extending the usage of dynamic programming


More flexibility in the framework
Continuous state space

3 Structured problems
Linear Quadratic case
Linear convex case

Vincent Leclère Dynamic Programming 08/12/2023 30 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

From Dynamic Programming to SDDP

DP is a flexible tool, hampered


by the curses of dimensionality
Numerical illustration (7 dams):
T = 52 weeks
|S| = 1007 possible states
|U| = 107 possible controls
|ξt | = 10 (1052 scenarios)

➥ ≈ 2 days on today’s fastest


super-computer
(3.106 years for 10 dams)

➥ Can be solved2 in ≈ 10 minutes


2
Approximately, depending on the problem and precision required...
Vincent Leclère Dynamic Programming 08/12/2023 31 / 36
Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

From Dynamic Programming to SDDP

DP is a flexible tool, hampered


by the curses of dimensionality
Numerical illustration (7 dams):
T = 52 weeks
|S| = 1007 possible states
|U| = 107 possible controls
|ξt | = 10 (1052 scenarios)

➥ ≈ 2 days on today’s fastest


super-computer
(3.106 years for 10 dams)

➥ Can be solved2 in ≈ 10 minutes


2
Approximately, depending on the problem and precision required...
Vincent Leclère Dynamic Programming 08/12/2023 31 / 36
Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

From Dynamic Programming to SDDP

DP is a flexible tool, hampered


by the curses of dimensionality
Numerical illustration (7 dams):
T = 52 weeks
|S| = 1007 possible states
|U| = 107 possible controls
|ξt | = 10 (1052 scenarios)

➥ ≈ 2 days on today’s fastest


super-computer
(3.106 years for 10 dams)

➥ Can be solved2 in ≈ 10 minutes


2
Approximately, depending on the problem and precision required...
Vincent Leclère Dynamic Programming 08/12/2023 31 / 36
Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

How can we be so much faster ?

Structural assumptions:
convexity
continuous state Independ
➥ duality tools Finitely suppo
Sampling instead of exhaustive computation Convex
Discrete c
Iteratively refining value function estimation at ”the right State discre
places” only Progres
Maximum
LP solvers

➥ Stochastic Dual Dynamic Programming (SDDP) which


has been around for 30 years
is widely used in the energy community
has lots of extensions and variants
some convergence results, mainly asymptotic

Vincent Leclère Dynamic Programming 08/12/2023 32 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

The setting

1 We are in a finite-time, stagewise independent framework.


2 The state and control variables are continuous and bounded.
3 The costs are convex (jointly in state and control).
4 The dynamic is linear.
5 The constraint on control is convex.
6 We are in a relatively complete recourse framework.
Then, we can show that, the value function are convex, and we
can approximate them by polyhedral functions.

Vincent Leclère Dynamic Programming 08/12/2023 33 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming: principle

The main idea is to update approximations of the value functions


by adding cuts, in order to refine the approximations. We iterate
the following steps:
Forward pass Given approximations of the value functions, we
simulate the policy induced by these approximations,
and obtain a trajectory.
Backward pass We refine the approximations by adding cuts, in
order to make the approximations more precise
around the trajectory.

Vincent Leclère Dynamic Programming 08/12/2023 34 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

First forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

First forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

First forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

First forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

First forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

First forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

First forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

First forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

First forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

First forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

First forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

First forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

First forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

First forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

First backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

First backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

First backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

First backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

First backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

First backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

First backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

First backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

First backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

First backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

second forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

second forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

second forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

second forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

second forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

second forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

second forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

second forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

second forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

second forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

second forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

second forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

second backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

second backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

second backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

second backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

second backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

second backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

second backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

second backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

second backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

second backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

second backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

third forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

third forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

third forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

third forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

third forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

third forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

third forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

third forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

third forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

third forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

third forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

third forward pass : computing trajectory

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

third backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

third backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

third backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

third backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

third backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

third backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

third backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

third backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

third backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

third backward pass : refining approximation (adding cuts)

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

Stochastic Dual Dynamic Programming

x2

x1

time

And so on...

Vincent Leclère Dynamic Programming 08/12/2023 35 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

SDDP

t=0 t=1 t=2

V2

x x x

Final Cost V2 = V2

Vincent Leclère Dynamic Programming 08/12/2023 36 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

SDDP

t=0 t=1 t=2

V2
V1

x x x

Real Bellman function V1 = B1 (V2 )

Vincent Leclère Dynamic Programming 08/12/2023 36 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

SDDP

t=0 t=1 t=2


V0
V2
V1

x x x

Real Bellman function V0 = B0 (V1 )

Vincent Leclère Dynamic Programming 08/12/2023 36 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

SDDP

t=0 t=1 t=2


V0
V2
V1

x x x

Lower polyhedral approximation V2 of V2

Vincent Leclère Dynamic Programming 08/12/2023 36 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

SDDP

t=0 t=1 t=2


V0
V2
V1

x x x

Lower polyhedral approximation V 1 = Bt (V 2 ) of V1

Vincent Leclère Dynamic Programming 08/12/2023 36 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

SDDP

t=0 t=1 t=2


V0
V2
V1

x x x

Lower polyhedral approximation V 0 = Bt (V 1 ) of V0

Vincent Leclère Dynamic Programming 08/12/2023 36 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

SDDP

t=0 t=1 t=2


V0
V2
V1

x x x

Assume that we have lower polyhedral approximations of Vt

Vincent Leclère Dynamic Programming 08/12/2023 36 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

SDDP

t=0 t=1 t=2


V0
V2
V1

V0 (x0 )

V 20 (x0 )

x0

x x x

Obtain a lower bound on the value of our problem

Vincent Leclère Dynamic Programming 08/12/2023 36 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

SDDP

t=0 t=1 t=2


V0
V2
V1

V0 (x0 )

V 20 (x0 )

x0

x x x
(2)  (2)
Apply F0 V 1 (x0 ) and obtain X 1

Vincent Leclère Dynamic Programming 08/12/2023 36 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

SDDP

t=0 t=1 t=2


V0
V2
V1

V0 (x0 )

V 20 (x0 )

x0 X12 X12 X12

x x x
(2)  (2)
Apply F0 V 1 (x0 ) and obtain X 1

Vincent Leclère Dynamic Programming 08/12/2023 36 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

SDDP

t=0 t=1 t=2


V0
V2
V1

V0 (x0 )

V 20 (x0 )

x0 x12

x x x
(2) (2)
Draw a random realisation x1 of X 1

Vincent Leclère Dynamic Programming 08/12/2023 36 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

SDDP

t=0 t=1 t=2


V0
V2
V1

V0 (x0 )

V 20 (x0 )

x0 x12

x x x
(2)  (2) (2)
We apply F1 V 1 (x1 ) and obtain X 2

Vincent Leclère Dynamic Programming 08/12/2023 36 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

SDDP

t=0 t=1 t=2


V0
V2
V1

V0 (x0 )

V 20 (x0 )

x0 x12 X22 X22 X22

x x x
(2)  (2) (2)
We apply F1 V 1 (x1 ) and obtain X 2

Vincent Leclère Dynamic Programming 08/12/2023 36 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

SDDP

t=0 t=1 t=2


V0
V2
V1

V0 (x0 )

V 20 (x0 )

x0 x12 x22

x x x
(2) (2)
Draw a random realisation x2 of X 2

Vincent Leclère Dynamic Programming 08/12/2023 36 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

SDDP

t=0 t=1 t=2


V0
V2
V1

V0 (x0 )

V 20 (x0 )

x0 x12 x22

x x x
(2)
Compute a cut for V2 at x2

Vincent Leclère Dynamic Programming 08/12/2023 36 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

SDDP

t=0 t=1 t=2


V0
V2
V1

V0 (x0 )

V 20 (x0 )

x0 x12

x x x
(2) (3)
Add the cut to V 2 which gives V 2

Vincent Leclère Dynamic Programming 08/12/2023 36 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

SDDP

t=0 t=1 t=2


V0
V2
V1

V0 (x0 )

V 20 (x0 )

x0 x12

x x x
(3)
A new lower approximation of V1 is B1 (V 2 )

Vincent Leclère Dynamic Programming 08/12/2023 36 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

SDDP

t=0 t=1 t=2


V0
V2
V1

V0 (x0 )

V 20 (x0 )

x0 x12

x x x
(2)
Compute the face active at x1

Vincent Leclère Dynamic Programming 08/12/2023 36 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

SDDP

t=0 t=1 t=2


V0
V2
V1

V0 (x0 )

V 20 (x0 )

x0

x x x
(2) (3)
Add the cut to V 1 which gives V 1

Vincent Leclère Dynamic Programming 08/12/2023 36 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

SDDP

t=0 t=1 t=2


V0
V2
V1

V0 (x0 )

V 20 (x0 )

x0

x x x
(3)
A new lower approximation of V0 is B0 (V 1 )

Vincent Leclère Dynamic Programming 08/12/2023 36 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

SDDP

t=0 t=1 t=2


V0
V2
V1

V0 (x0 )

V 20 (x0 )

x0

x x x

Compute the face active at x0

Vincent Leclère Dynamic Programming 08/12/2023 36 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

SDDP

t=0 t=1 t=2


V0
V2
V1

V0 (x0 )

V 20 (x0 )

x0

x x x

Compute the face active at x0

Vincent Leclère Dynamic Programming 08/12/2023 36 / 36


Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems

SDDP

t=0 t=1 t=2


V0
V2
V1

V0 (x0 )

V 30 (x0 )

x0

x x x

Obtain a new lower bound

Vincent Leclère Dynamic Programming 08/12/2023 36 / 36

You might also like