0% found this document useful (0 votes)
84 views11 pages

Stochastic DP Problems

This document summarizes key concepts from a lecture on stochastic dynamic programming and linear-quadratic control problems. It discusses how linear-quadratic problems can be solved using dynamic programming and the discrete-time Riccati equation. The Riccati equation provides the optimal linear feedback policy and gives a quadratic value function. It also describes how the Riccati equation can be used to analyze inventory control problems and provides justification for the optimality of a base-stock policy.

Uploaded by

Mehdi Rabbani
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views11 pages

Stochastic DP Problems

This document summarizes key concepts from a lecture on stochastic dynamic programming and linear-quadratic control problems. It discusses how linear-quadratic problems can be solved using dynamic programming and the discrete-time Riccati equation. The Riccati equation provides the optimal linear feedback policy and gives a quadratic value function. It also describes how the Riccati equation can be used to analyze inventory control problems and provides justification for the optimality of a base-stock policy.

Uploaded by

Mehdi Rabbani
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

6.

231 DYNAMIC PROGRAMMING


LECTURE 5
LECTURE OUTLINE
Examples of stochastic DP problems
Linear-quadratic problems
Inventory control
1
LINEAR-QUADRATIC PROBLEMS
System: x
k+1
= A
k
x
k
+B
k
u
k
+w
k
Quadratic cost
N1
E
x

N
Q
N
x
N
+ (x

k
Q
k
x
k
+u

k
R
k
u
k
)
w
k
k=0,1,...,N1
_
k

=0
_
where Q
k
0 and R
k
> 0 (in the positive (semi)denite
sense).
w
k
are independent and zero mean
DP algorithm:
J
N
(x
N
) = x

N
Q
N
x
N
,
J

k
(x
k
) = min E x
k
Q
k
x
k
+u
k
R
k
u
k
u
k
_
+J
k+1
(A
k
x
k
+B
k
u
k
+w
k
)
Key facts:
_
J
k
(x
k
) is quadratic
Optimal policy {

0
, . . . ,

N1
} is linear:

k
(x
k
) = L
k
x
k
Similar treatment of a number of variants
2
DERIVATION
By induction verify that

k
(x
k
) = L
k
x
k
, J
k
(x
k
) = x

k
K
k
x
k
+constant,
where L
k
are matrices given by
L
k
= (B

k
K
k+1
B
k
+R
k
)
1
B

k
K
k+1
A
k
,
and where K
k
are symmetric positive semidenite
matrices given by
K
N
= Q
N
,
K = A

k
k
_
K
k+1
K
k+1
B
k
(B
k
K
k+1
B
k
+R )
1
k
B

k
K
k+1
A
k
+Q
k
.
This is called the discrete-time Ric
_
cati equation.
Just like DP, it starts at the terminal time N
and proceeds backwards.
Certainty equivalence holds (optimal policy is
the same as when w
k
is replaced by its expected
value E{w
k
} = 0).
3
ASYMPTOTIC BEHAVIOR OF RICCATI EQ.
Assume time-independent system and cost per
stage, and some technical assumptions: controla-
bility of (A, B) and observability of (A, C) where
Q = C

C
The Riccati equation converges lim
k
K
k
=
K, where K is pos. denite, and is the unique
(within the class of pos. semidenite matrices) so-
lution of the algebraic Riccati equation
K = A

_
K KB(B

KB +R)
1
B

K
_
A+Q
The corresponding steady-state controller

(x) =
Lx, where
L = (B

KB +R)
1
B

KA,
is stable in the sense that the matrix (A+BL) of
the closed-loop system
x
k+1
= (A+BL)x
k
+w
k
satises lim
k
(A+BL)
k
= 0.
4
GRAPHICAL PROOF FOR SCALAR SYSTEMS
2
A R
2
+ Q
B
F(P)
Q
R
-
2
P
B
0 45
0
P
*
k
P
k + 1
P
P
Riccati equation (with P
k
= K
Nk
):
B
2
P
P = A
_
2
2
P
k
k+1 k
B
2
P
k
+R
_
+Q,
or P
k+1
= F(P
k
), where
A
2
RP
F(P) = +Q.
B
2
P +R
Note the two steady-state solutions, satisfying
P = F(P), of which only one is positive.
5
RANDOM SYSTEM MATRICES
Suppose that {A
0
, B
0
}, . . . , {A
N1
, B
N1
} are
not known but rather are independent random
matrices that are also independent of the w
k
DP algorithm is
J
N
(x
N
) = x

N
Q
N
x
N
,
J
k
(x
k
) = min
E
u
k
w ,A ,B
k k k
+u

k
R
k
u
_
x

k
Q
k
x
k
k
+J
k+1
(A
k
x
k
+B
k
u
k
+w
k
)
_
Optimal policy

k
(x
k
) = L
k
x
k
, where
1
L
k
= R
k
+E{B

k
K
k+1
B

k
} E{B
k
K
k+1
A
k
},
and whe
_
re the matrices K
k
ar
_
e given by
K
N
= Q
N
,
K
k
= E{A

k
K
k+1
A
k
} E{A

k
K
k+1
B
k
}
1
R +E{B

K B } E{B

k
k
k+1 k
k
K
k+1
A
k
} +Q
k
_ _
6
PROPERTIES
Certainty equivalence may not hold
Riccati equation may not converge to a steady-
state
F(P)
Q
45
0
R
-
2
E{ }
0 P
B
We have P
k+1
= F

(P
k
), where
E{A
2
}RP TP
2
F

(P) = +Q+ ,
E{B
2
}P +R E{B
2
}P +R
2 2
T = E{A
2
}E{B
2
} E{A} E{B}
_ _ _ _
7
INVENTORY CONTROL
x
k
: stock, u
k
: inventory purchased, w
k
: de-
mand
x
k+1
= x
k
+u
k
w
k
, k = 0, 1, . . . , N 1
Minimize
_
N1
E
k

=0
_
cu
k
+r(x
k
+u
k
w
k
)
_
_
where, for some p > 0 and h > 0,
r(x) = p max(0, x) +hmax(0, x)
DP algorithm:
J
N
(x
N
) = 0,
J
k
(x
k
) = min
_
cu +H(x +u )+E
_
J (x +u
k k k k+1 k k
w
k
) ,
u 0
k
where H(x +u) = E{r(x +u w)}.
_
8
OPTIMAL POLICY
DP algorithm can be written as
J
N
(x
N
) = 0,
J
k
(x
k
) = min G
k
(x
k
+u
k
) cx
k
,
u 0
k
where
G
k
(y) = cy +H(y) +E
_
J
k+1
(y w)
_
.
If G
k
is convex and lim
|x|
G
k
(x) , we
have

k
(x
k
) =
_
S
k
x
k
if x
k
< S
k
,
0 if x
k
S
k
,
where S
k
minimizes G
k
(y).
This is shown, assuming that c < p, by showing
that J
k
is convex for all k, and
lim J
k
(x)
|x|
9
JUSTIFICATION
Graphical inductive proof that J
k
is convex.
- cy
- cy
y
H(y)
cy + H(y)
S
N - 1
cS
N - 1
J
N - 1
(x
N - 1
)
x
N - 1
S
N - 1
10



MIT OpenCourseWare
http://ocw.mit.edu
6.231 Dynamic Programming and Stochastic Control
Fall 2011
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

You might also like