Mpro 4
Mpro 4
V. Leclère
Presentation Outline
3 Structured problems
Linear Quadratic case
Linear convex case
Presentation Outline
3 Structured problems
Linear Quadratic case
Linear convex case
Presentation Outline
3 Structured problems
Linear Quadratic case
Linear convex case
The variables
x t is the state of the system,
u t is the control applied to the system at time t,
ξ t is an exogeneous noise.
Examples
Optimization Problem
−1
h TX i
min E Lt x t , u t , ξ t+1 + K x T
u
t=0
s.t. x t+1 = ft (x t , u t , ξ t+1 ), x 0 = ξ0
u t ∈ Ut (x t ), x t ∈ Xt
σ(u t ) ⊂ σ ξ 0 , · · · , ξ t
Optimization Problem
−1
h TX i
min E Lt x t , u t , ξ t+1 + K x T
u
t=0
s.t. x t+1 = ft (x t , u t , ξ t+1 ), x 0 = ξ0
u t ∈ Ut (x t ), x t ∈ Xt
σ(u t ) ⊂ σ ξ 0 , · · · , ξ t
Optimization Problem
−1
h TX i
min E Lt x t , u t , ξ t+1 + K x T
Φ
t=0
s.t. x t+1 = ft (x t , u t , ξ t+1 ), x 0 = ξ0
u t ∈ Ut (x t ), x t ∈ Xt
u t = Φ(ξ 0 , · · · , ξ t )
And ct (xt , xt+1 , ξt+1 ) the transition cost from xt to xt+1 , i.e.,
n o
ct (xt , xt+1 , ξt+1 ) := min Lt (xt , ut , ξt+1 ) | xt+1 = ft (xt , ut , ξt+1 ) .
ut ∈Ut (xt ,ξt+1 )
And ct (xt , xt+1 , ξt+1 ) the transition cost from xt to xt+1 , i.e.,
n o
ct (xt , xt+1 , ξt+1 ) := min Lt (xt , ut , ξt+1 ) | xt+1 = ft (xt , ut , ξt+1 ) .
ut ∈Ut (xt ,ξt+1 )
Presentation Outline
3 Structured problems
Linear Quadratic case
Linear convex case
−1
h TX
" #
i
min E L0 x0 , u0 , ξ 1 + min E Lt x t , u t , w t+1 + K x T
u0 ∈U0 (x0 ) u1 ,...uT −1
t=1
s.t. x 1 = f0 (x0 , u0 , ξ 1 )
x t+1 = ft (x t , u t , ξ t+1 ) ∈ Xt+1 ,
u t ∈ Ut (x t )
σ(u t ) ⊂ σ ξ 0 , · · · , ξ t
−1
h TX
" #
i
min E L0 x0 , u0 , ξ 1 + min E Lt x t , u t , w t+1 + K x T
u0 ∈U0 (x0 ) u1 ,...uT −1
t=1
s.t. x 1 = f0 (x0 , u0 , ξ 1 )
x t+1 = ft (x t , u t , ξ t+1 ) ∈ Xt+1 ,
u t ∈ Ut (x t )
σ(u t ) ⊂ σ(x t )
Independence of noises
−1
h TX
" #
i
min E L0 x0 , u0 , ξ 1 + min E Lt x t , u t , w t+1 + K x T
u0 ∈U0 (x0 ) u1 ,...uT −1
t=1
s.t. x 1 = f0 (x0 , u0 , ξ 1 )
x t+1 = ft (x t , u t , ξ t+1 ) ∈ Xt+1 ,
u t ∈ Ut (x t )
σ(u t ) ⊂ σ(x t )
| {z }
=:V1 (x 1 )
Independence of noises
Bellman’s recursion
VT (xT ) = K (xT ).
3 curses of dimensionality
Complexity = O(T × |Xt | × |Ut | × |Ξt |)
Linear in the number of time steps, but we have 3 curses of
dimensionality :
1 State. Complexity is exponential in the dimension of Xt
e.g. 3 independent states each taking 10 values leads to a
loop over 1000 points.
2 Decision. Complexity is exponential in the dimension of Ut .
⇝ due to exhaustive minimization of inner problem. Can be
accelerated using faster method (e.g. MILP solver).
3 Expectation. Complexity is exponential in the dimension of
Ξt .
⇝ due to expectation computation. Can be accelerated
through Monte-Carlo approximation (still at least 1000 points)
In practice, DP is not used for a state of dimension more than 5.
Vincent Leclère Dynamic Programming 08/12/2023 15 / 36
Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example
3 curses of dimensionality
Complexity = O(T × |Xt | × |Ut | × |Ξt |)
Linear in the number of time steps, but we have 3 curses of
dimensionality :
1 State. Complexity is exponential in the dimension of Xt
e.g. 3 independent states each taking 10 values leads to a
loop over 1000 points.
2 Decision. Complexity is exponential in the dimension of Ut .
⇝ due to exhaustive minimization of inner problem. Can be
accelerated using faster method (e.g. MILP solver).
3 Expectation. Complexity is exponential in the dimension of
Ξt .
⇝ due to expectation computation. Can be accelerated
through Monte-Carlo approximation (still at least 1000 points)
In practice, DP is not used for a state of dimension more than 5.
Vincent Leclère Dynamic Programming 08/12/2023 15 / 36
Stochastic Dynamic Programming Stochastic optimal control problem
Extending the usage of dynamic programming Dynamic Programming principle
Structured problems Example
Auzat
Sabart
Presentation Outline
3 Structured problems
Linear Quadratic case
Linear convex case
Exercise
Presentation Outline
3 Structured problems
Linear Quadratic case
Linear convex case
Presentation Outline
3 Structured problems
Linear Quadratic case
Linear convex case
Requirements of stochastic DP
−1
h TX i
min E Lt x t , u t , ξ t+1 + K x T
π
t=0
s.t. x t+1 = ft (x t , u t , ξ t+1 ), x 0 = x0
u t ∈ Ut (x t ), x t ∈ Xt
u t = πt (x t )
Assumptions:
The noise are stagewise-independent.
The only constraint linking stages is the dynamic equation: no
coupling between stages.
The cost function is additive over stages.
We consider the expectation of costs.
Vincent Leclère Dynamic Programming 08/12/2023 20 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
Markovian noise
Assume that (ξ t )t is a Markovian noise, i.e. ξ t only depends on x t .
We can recover the previous setting by defining an extended
state
x̃t = (x t , ξ t )
Markovian noise
Assume that (ξ t )t is a Markovian noise, i.e. ξ t only depends on x t .
We can recover the previous setting by defining an extended
state
x̃t = (x t , ξ t )
Markovian noise
Assume that (ξ t )t is a Markovian noise, i.e. ξ t only depends on x t .
We can recover the previous setting by defining an extended
state
x̃t = (x t , ξ t )
Coupling control
Coupling control
Delayed control
Delayed control
Bankruptcy
Bankruptcy
Maximizing probability
Presentation Outline
3 Structured problems
Linear Quadratic case
Linear convex case
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
x2
Ṽt ≡ 0
for t : T − 1 → 1 do
D
for xin ∈ Xt−1 do
for ξ ∈ Ξt do
v̇ξ =
min ℓt (xin , xout , ξ) + Ṽt+1 (xout )
xout ∈Xt (xin ,ξ)
| {z }
:=Ḃt (Ṽt+1 )(xin ,ξ)
Ṽt (xin ) += πξ v̇ξ
x1
|{z}
:=P(ξt =ξ)
Extend definition of Ṽt to Xt by
interpolation time
Algorithm 1: Discretized SDP
Vincent Leclère Dynamic Programming 08/12/2023 28 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
1
Sometimes it can be of V̇t instead
Vincent Leclère Dynamic Programming 08/12/2023 29 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
1
Sometimes it can be of V̇t instead
Vincent Leclère Dynamic Programming 08/12/2023 29 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
1
Sometimes it can be of V̇t instead
Vincent Leclère Dynamic Programming 08/12/2023 29 / 36
Stochastic Dynamic Programming
More flexibility in the framework
Extending the usage of dynamic programming
Continuous state space
Structured problems
1
Sometimes it can be of V̇t instead
Vincent Leclère Dynamic Programming 08/12/2023 29 / 36
Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems
Presentation Outline
3 Structured problems
Linear Quadratic case
Linear convex case
Presentation Outline
3 Structured problems
Linear Quadratic case
Linear convex case
−1
h TX i
min E x⊤ ⊤ ⊤
t Qt x t + u t Rt u t + x T QT x T
π
t=0
s.t. x t+1 = At x t + Bt u t + ξ t , x 0 = x0
u t = πt (x t )
Under stagewise independence of the (centered) noise we can show that:
1 The value function is quadratic: Vt (xt ) = xt⊤ Kt xt + kt .
2 The optimal policy is linear: πt (xt ) = Lt xt .
3With explicit (Riccati) formulas for Kt and Lt .
KT = QT , kT = 0
Kt = Qt + A⊤ ⊤ ⊤ −1 ⊤
t Kt+1 At − At Kt+1 Bt (Rt + Bt Kt+1 Bt ) Bt Kt+1 At
⊤ −1 ⊤
Lt = −(Rt + Bt Kt+1 Bt ) Bt Kt+1 At
➥ Can be solved for large dimension (say n ∼ 104 ).
Vincent Leclère Dynamic Programming 08/12/2023 30 / 36
Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems
−1
h TX i
min E x⊤ ⊤ ⊤
t Qt x t + u t Rt u t + x T QT x T
π
t=0
s.t. x t+1 = At x t + Bt u t + ξ t , x 0 = x0
u t = πt (x t )
Under stagewise independence of the (centered) noise we can show that:
1 The value function is quadratic: Vt (xt ) = xt⊤ Kt xt + kt .
2 The optimal policy is linear: πt (xt ) = Lt xt .
3With explicit (Riccati) formulas for Kt and Lt .
KT = QT , kT = 0
Kt = Qt + A⊤ ⊤ ⊤ −1 ⊤
t Kt+1 At − At Kt+1 Bt (Rt + Bt Kt+1 Bt ) Bt Kt+1 At
⊤ −1 ⊤
Lt = −(Rt + Bt Kt+1 Bt ) Bt Kt+1 At
➥ Can be solved for large dimension (say n ∼ 104 ).
Vincent Leclère Dynamic Programming 08/12/2023 30 / 36
Stochastic Dynamic Programming
Linear Quadratic case
Extending the usage of dynamic programming
Linear convex case
Structured problems
Presentation Outline
3 Structured problems
Linear Quadratic case
Linear convex case
Structural assumptions:
convexity
continuous state Independ
➥ duality tools Finitely suppo
Sampling instead of exhaustive computation Convex
Discrete c
Iteratively refining value function estimation at ”the right State discre
places” only Progres
Maximum
LP solvers
The setting
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
x2
x1
time
And so on...
SDDP
V2
x x x
Final Cost V2 = V2
SDDP
V2
V1
x x x
SDDP
x x x
SDDP
x x x
SDDP
x x x
SDDP
x x x
SDDP
x x x
SDDP
V0 (x0 )
V 20 (x0 )
x0
x x x
SDDP
V0 (x0 )
V 20 (x0 )
x0
x x x
(2) (2)
Apply F0 V 1 (x0 ) and obtain X 1
SDDP
V0 (x0 )
V 20 (x0 )
x x x
(2) (2)
Apply F0 V 1 (x0 ) and obtain X 1
SDDP
V0 (x0 )
V 20 (x0 )
x0 x12
x x x
(2) (2)
Draw a random realisation x1 of X 1
SDDP
V0 (x0 )
V 20 (x0 )
x0 x12
x x x
(2) (2) (2)
We apply F1 V 1 (x1 ) and obtain X 2
SDDP
V0 (x0 )
V 20 (x0 )
x x x
(2) (2) (2)
We apply F1 V 1 (x1 ) and obtain X 2
SDDP
V0 (x0 )
V 20 (x0 )
x0 x12 x22
x x x
(2) (2)
Draw a random realisation x2 of X 2
SDDP
V0 (x0 )
V 20 (x0 )
x0 x12 x22
x x x
(2)
Compute a cut for V2 at x2
SDDP
V0 (x0 )
V 20 (x0 )
x0 x12
x x x
(2) (3)
Add the cut to V 2 which gives V 2
SDDP
V0 (x0 )
V 20 (x0 )
x0 x12
x x x
(3)
A new lower approximation of V1 is B1 (V 2 )
SDDP
V0 (x0 )
V 20 (x0 )
x0 x12
x x x
(2)
Compute the face active at x1
SDDP
V0 (x0 )
V 20 (x0 )
x0
x x x
(2) (3)
Add the cut to V 1 which gives V 1
SDDP
V0 (x0 )
V 20 (x0 )
x0
x x x
(3)
A new lower approximation of V0 is B0 (V 1 )
SDDP
V0 (x0 )
V 20 (x0 )
x0
x x x
SDDP
V0 (x0 )
V 20 (x0 )
x0
x x x
SDDP
V0 (x0 )
V 30 (x0 )
x0
x x x