HandbookORMS SP-Chapter01
HandbookORMS SP-Chapter01
10
ß 2003 Elsevier Science B.V. All rights reserved.
Chapter 1
Andrzej Ruszczyński
Department of Management Science and Information Systems, Rutgers University,
94 Rockefeller Rd, Piscataway, NJ 08854, USA
Alexander Shapiro
School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta,
GA 30332, USA
Abstract
1 Introduction
1.1 Motivation
1
2 A. Ruszczyński and A. Shapiro
ðs cÞx, if x D,
Fðx, DÞ ¼ ð1:1Þ
ðr cÞx þ ðs rÞD, if x > D:
Z 1
E½Fðx, DÞ ¼ Fðx, wÞ dGðwÞ:
0
Ch. 1. Stochastic Programming Models 3
Therefore, from the statistical point of view it makes sense to optimize the
objective function on average, i.e., to maximize the expected profit E[F(x, D)].
This leads to the following stochastic programming problem1
Max f ðxÞ :¼ E½Fðx, DÞ : ð1:2Þ
x0
Note that we treat here x as a continuous rather than integer variable. This
makes sense if the quantity of newspapers x is reasonably large.
In the present case it is not difficult to solve the above optimization problem
in a closed form. Let us observe that for any D 0, the function F( , D)
is concave (and piecewise linear). Therefore, the expected value function f( ) is
also concave. Suppose for a moment that G( ) is continuous at a point x>0.
Then
Zx Z1
f ðxÞ ¼ ½ðr cÞx þ ðs rÞw dGðwÞ þ ðs cÞx dGðwÞ:
0 x
1
The notation ‘‘ :¼ ’’ means equal by definition.
2
Recall that G 1() is called the -quantile of the cdf G.
4 A. Ruszczyński and A. Shapiro
hence if this probability is positive and (s c)/(s r) G(0), then the optimal
solution x* ¼ 0.
Clearly the above approach explicitly depends on the knowledge of the
probability distribution of the demand D. In practice the corresponding cdf
G( ) is never known exactly and could be approximated (estimated) at best. In
the present case the optimal solution is given in a closed form and therefore its
dependence on G( ) can be easily evaluated. It is well known that -quantiles
are robust (stable) with respect to small perturbations of the corresponding
cdf G( ), provided that is not too close to 0 or 1. In general, it is important
to investigate sensitivity of a considered stochastic programming problem with
respect to the assumed probability distributions.
The following deterministic optimization approach is also often used for
decision making under uncertainty. The random variable D is replaced by its
mean ¼ E[D], and then the following deterministic optimization problem is
solved:
Hence
(1.6) is ‘‘too optimistic’’ in the sense that it does not take into account possible
variability of the demand D.
Another point which is worth mentioning is that by solving (1.2) the
newsvendor tries to optimize the profit on average. However, for a particular
realization of the demand D, on a particular day, the profit F(x* , D) could be
very different from the corresponding expected value f (x* ). This may happen
if F(x* , D), considered as a random variable, has a large variability which
could be measured by its variance Var [F(x* , D)]. Therefore, if the newsvendor
wants to hedge against such variability he may consider the following
optimization problem
Max f ðxÞ :¼ E½Fðx, DÞ Var½Fðx, dÞ : ð1:7Þ
x0
Min x ð1:8Þ
s:t: P Fðx, DÞ b 1 : ð1:9Þ
The newsvendor can solve this problem, too (remember that he is really
smart). It is clear that the following inequality should be satisfied
ðs cÞx b, ð1:10Þ
Therefore
P Fðx, DÞ b ¼ P D dðx, bÞ ,
b þ ðc rÞx
dðx, bÞ ¼ :
sr
It is clear that the solution can exist iff the constraints (1.10)–(1.11) are
consistent, that is, if
b
x̂ ¼ : ð1:13Þ
sc
than the others. A way of dealing with that is to optimize the objective function
on average. This leads to the following mathematical programming problem
Min f ðxÞ :¼ E½Fðx, !Þ : ð1:14Þ
x2X
Gi ðx, !Þ 0, i ¼ 1, . . . , m, ð1:17Þ
Example 2 (Reservoir Capacity). Consider the system of two reservoirs (Fig. 1),
whose objective is to retain the flood in the protected area. The flood is
produced by two random inflows, 1 and 2. Flood danger occurs once a year,
say, and 1, 2 appear simultaneously. The damage from flood of size y 0 is
modeled as a convex nondecreasing function L( y), where L(0) ¼ 0. Our
objective is to determine the reservoir capacities, x1 and x2, so that the
expected damage from the flood is below some specified limit b, and the cost of
the reservoirs, f (x1, x2) is minimized.
The size of the flood is random and is given by the expression
y ¼ maxf0, 1 þ 2 x1 x2 , 2 x2 g:
Min f ðx1 , x2 Þ
s:t: E½Lðmaxf0, 1 þ 2 x1 x2 , 2 x2 gÞ b:
x1 0, x2 0: ð1:19Þ
Lðmaxf0, 1 þ 2 x1 x2 , 2 x2 gÞ
E½Lðmaxf0, 1 þ 2 x1 x2 , 2 x2 gÞ;
and the difference may be large, even for a linear function L( ). As a result,
the expected losses from a flood may be much higher than foreseen by a naive
deterministic model.
Another way to define the feasible set is to use constraints on the
probability of satisfying (1.17):
P Gi ðx, !Þ 0 1 , i ¼ 1, . . . , m, ð1:20Þ
with some fixed 2 (0,1) (as in our newsvendor example). Such constraints are
called probabilistic or chance constraints.3
For a set A we denote by 1A( ) its characteristic function,
1, if t 2 A,
1A ðtÞ :¼ ð1:21Þ
0, if t 2
6 A:
X
n
Max i x i
i¼1
( )
X
n
s:t: P Ri xi b 1 ,
i¼1
x 0, ð1:23Þ
where i ¼ E½Ri . Note that for the sake of simplicity we do not impose here
the constraint x1 þ þ xn ¼ W0, where W0 is the total invested amount, as
compared with the example of financial planning (Example 7) discussed later.
3
In the extreme case when ¼ 0, conditions (1.20) mean that constraints Gi(x, !) 0, i ¼ 1, . . . , m,
should hold for a.e. ! 2 .
10 A. Ruszczyński and A. Shapiro
If the returns have a joint normal distribution with the covariance matrix
, the distribution of the profit (or loss) is normal, too, with the expected
pffiffiffiffiffiffiffiffiffiffiffiffi
value Tx, and variance xTx. Consequently, ðGðx, RÞ T xÞ= xT x has the
standard normal distribution (i.e., normal distribution with mean zero and
variance one). Our probabilistic constraint is therefore equivalent to the
inequality
b þ T x
pffiffiffiffiffiffiffiffiffiffiffiffi z ,
xT x
Max T x
pffiffiffiffiffiffiffiffiffiffiffiffi
s:t: z xT x T x b,
x 0: ð1:24Þ
z x
ð1 þ Þ pffiffiffiffiffiffiffiffiffiffiffiffi ¼ 0:
xT x
From here we deduce that there must exist a scalar t such that x ¼ t. We
assume that the matrix is nonsingular and 6¼ 0. Substitution to the
constraint yields (afterpffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
simple ffi calculations) t ¼ b=ðz Þ and ¼
ðz = 1Þ1 , with :¼ T 1 (note that 1 is positive definite and
hence T1 is positive). If z, then the problem is unbounded, i.e., its
optimal value is þ 1. If < z, then the vector
b
x̂ :¼ 1
ðz Þ
In many practical situations, though, the returns are not jointly normally
distributed, and even the single Value at Risk constraint, like the one analyzed
here, may create significant difficulties.
Let us now assume that our planning horizon is T years, and let
R1(t), . . . , Rn(t) be the random investment returns in years t ¼ 1, . . . , T. We
want to maximize the expected value of our investment after T years, under
the condition that with probability at least 1 the value of our investment
will never drop by more than b from the initial amount invested. We do not
want to re-allocate our investment, we just want to invest once and then watch
our wealth grow (hopefully).
Let x1, . . . , xn be the amounts invested in the n opportunities. The net
change in the value of our investment in year t is
X
n
Gðx, R, tÞ ¼ Si ðtÞxi ,
i¼1
X
n
Max i x i
x0
i¼1
s:t: P Gðx, R, tÞ b, t ¼ 1, . . . , T 1 : ð1:25Þ
2 Two-stage models
We can view the decision problem which the newsvendor faces in Example 1
as two stage. In the morning, before a realization of the demand D is known,
12 A. Ruszczyński and A. Shapiro
Max sy þ rz
y, z
subject to
y D, y þ z x, y 0, z 0:
Miny qT y
s:t: Tx þ Wy ¼ h, y 0: ð2:3Þ
Here x and y are vectors of first and second stage decision variables,
respectively. The second stage problem depends on the data :¼ (q, h, T, W ),
some (all) elements of which can be random. Therefore we view ¼ (!) as a
random vector. The expectation in (2.2) is taken with respect to the probability
distribution of (!), which is supposed to be known. The matrices T and W are
called the technology and recourse matrices, respectively. If the matrix W is fixed
(not random), the above two-stage problem is called the problem with fixed
recourse. In a sense the second stage problem (2.3) can be viewed as a penalty
term for violation of the constraint Tx ¼ h, hence is the name ‘‘with recourse’’.
For any x and the function Q(x, ), although not given explicitly, is a well
defined extended real valued function: it takes the value þ 1 if the feasible set
of the second stage problem (2.3) is empty, and the value 1 if the second
Ch. 1. Stochastic Programming Models 13
That is, the two-stage problem can be formulated as one large linear
programming problem.
14 A. Ruszczyński and A. Shapiro
ymn
a 0, a 2 A, i, m, n 2 N : ð2:8Þ
This problem depends on the random demand vector D and on the arc
capacities, x. Its optimal value will be denoted Q(x, D).The first stage problem
has the form
X
Min ca xa þ E½Qðx, DÞ:
x0
a2A
In this example only some right hand side entries in the second stage
constraints are random. All the matrices and cost vectors are deterministic.
Nevertheless, the size of this problem, even for discete distributions of the
demands, may be enormous. If the number of nodes is
, the demand vector
has
(
1) components. If they are independent, and each of them has r
possible realizations, we have to deal with K ¼ r
(
1) scenarios. For each of
Ch. 1. Stochastic Programming Models 15
2 ð
1Þ þ jAj constraints (excluding nonnegativity constraints). As a result,
the large scale linear programming formulation has jAj þ
ð
1ÞjAjr
ð
1Þ
variables and ð
2 ð
1Þ þ jAjÞr
ð
1Þ constraints. These are large numbers,
even for moderately sized networks and distributions with only few
possibilities.
A more complex situation occurs when the arcs are subject to failures and
they may lose random fractions a of their capacities. Then the capacity
constraint in the second stage problem has a slightly different form:
X
ymn
a ð1 a Þxa , a 2 A,
m, n2N
Let us relax problem (2.7) by replacing the first stage decision vector x by K
possibly different vectors xk. We obtain the problem
X
K
Min pk ðcT xk þ qTk yk Þ
x1 ,..., xK k¼1
y1 ,..., yK
s:t: Axk ¼ b,
Tk xk þ Wk yk ¼ hk ;
xk 0, yk 0, k ¼ 1, . . . , K: ð2:9Þ
Problem (2.9) is separable in the sense that it can be split into K smaller
problems, one for each scenario, and therefore it is much easier for a
numerical solution. However, (2.9) is not suitable for modeling a two stage
process. This is because the first stage decision variables xk in (2.9) are now
allowed to depend on a realization of the random data at the second stage.
This can be fixed by introducing the additional constraints
X
K
xk ¼ pi xi , k ¼ 1, . . . , K; ð2:11Þ
i¼1
As it was discussed above the essence of two stage modeling is that there are
two distinct parts of the decision vector. The value of the first vector x 2 X,
with X ¼ Rn, has to be chosen before any realization of the unknown
quantities, summarized in the data vector ¼ (!), are observed. The value of
the second part, y, can be chosen after the realization of becomes known and
generally depends on the realization of and on the choice of x. Consequently,
at the first stage one has to solve the expectation optimization problem
with Q(x, ) being the optimal value of the second stage optimization problem
(2.3) (viewed as an extended real valued function). In such formulation an
explicit dependence on the second stage decision variables y is suppressed. It
will be convenient to discuss that formulation first.
As in the example of problem (2.9), we may relax the expectation problem
(2.12) by allowing the first stage decision variables to depend on the random
data and then to correct that by enforcing nonanticipativity constraints.
Denote by M ¼ M(, F, X ) the space of measurable mappings4 x( ): ! X
such that the expectation E[F(x(!), !)] is well defined. Then the relaxed
problem can be formulated in the form
4
We write here x ( ), instead of x, in order to emphasize that x( ) is not a vector, but rather a vector
valued function of !.
Ch. 1. Stochastic Programming Models 17
Denote
X
K
Min pk Fðxk , !k Þ: ð2:14Þ
x1,..., xK
k¼1
inf E½Fðxð!Þ, !Þ ¼ E inf Fðx, !Þ : ð2:15Þ
xðÞ2M x2X
Since " is an arbitrary positive number, this implies that the left hand side of
(2.15) is less than or equal to the right hand side of (2.15). Finally, if the event
‘‘#(!) ¼ 1’’ happens with positive probability, then both sides of (2.15) are
equal to 1. u
5
See Section 5.3 of the Appendix for the definition and discussion of random lower semicontinuous
functions.
18 A. Ruszczyński and A. Shapiro
The above inequality also follows directly from the obvious inequality
Fðx, !Þ #ð!Þ for all x 2 X and ! 2 .
Let us give now a formulation where the second stage decision variables
appear explicitly:
F : Rn1 Rn2 Rd ! R,
Gi : Rn1 Rn2 Rd ! R, i ¼ 1, . . . , m,
x 2 X, ð2:23Þ
6
Since the expected value of two random variables which may differ on a set of measure zero is the
same, it actually suffices to verify the constraints (2.17) for P-almost every ! 2 .
Ch. 1. Stochastic Programming Models 19
yð!Þ 2 Y, ð2:24Þ
(
Fðx, y, Þ, if x 2 X, y 2 Y, Gi ðx, y, Þ 0, i ¼ 1, . . . , m,
Fðx, y, Þ :¼
þ 1, otherwise:
7
Written: ‘‘a.e. ! 2 ’’.
20 A. Ruszczyński and A. Shapiro
Min E Fðx, yð!Þ, ð!ÞÞ : ð2:25Þ
x2, yðÞ2Y
xð!Þ 2 X, yð!Þ 2 Y:
All constraints here are assumed to hold P-almost surely, i.e., for a.e.
! 2 . The above problem is an analogue of (2.13) with optimization
performed over mappings ðxðÞ, yðÞÞ in an appropriate functional space, and as
in the finite scenario case, is a relaxation of the problem (2.21)–(2.24). To
make it equivalent to the original formulation we must add the nonanti-
cipativity constraint which can be written, for example, in the form (2.17).
For example, consider the two-stage linear program (2.2)–(2.3). We can
write it in the form
Min E cT x þ qð!ÞT yð!Þ
x, yðÞ
with yðÞ being a mapping from into Rn2 . In order for the above problem to
make sense the mapping yð!Þ should be measurable and the corresponding
expected value should be well defined. Suppose for a moment that vector q is
not random, i.e., it does not depend on !. Then we can assume that yð!Þ is an
element of the space Ln12 ð, F , PÞ of F -measurable mappings8 y : ! Rn2
8
In fact an element of Ln12 ð, F , PÞ is a class of mappings which may differ from each other on sets of
P-measure zero.
Ch. 1. Stochastic Programming Models 21
R
such that jjyð!Þjj dPð!Þ < þ 1. If qð!Þ
R is random we can consider a space
of measurable mappings yðÞ such that jqð!ÞT yð!Þj dPð!Þ < þ 1.
Of course, the optimal solution xðÞ (if it exists) and the optimal value ðÞ of
problem (2.26) depend on the realization of the data. The average of ðÞ
over all possible realizations of the random data ¼ ð!Þ, i.e., the expected
value
E½ðÞ ¼ E inf Vðx, ð!ÞÞ , ð2:28Þ
x2X
That is, the optimal value of the stochastic programming problem (2.19) is
always greater than or equal to E½ðÞ. Suppose further that problem (2.19)
has an optimal solution x̂. We have that Vðx̂, Þ ðÞ is nonnegative for all ,
22 A. Ruszczyński and A. Shapiro
and hence its expected value is zero iff Vðx̂, Þ ðÞ ¼ 0 w.p.1. That is, the
equality in (2.30) holds, iff
represents the expected value of perfect information. It follows from (2.30) that
EVPI is always nonnegative, and EVPI ¼ 0 iff condition (2.31) holds.
3 Multistage models
The two-stage model is a special case of a more general structure, called the
multi-stage stochastic programming model, in which the decision variables
and constraints are divided into groups corresponding to stages t ¼ 1, . . . , T:
The fundamental issue in such a model is the information structure: what is
known at stage t when decisions associated with this period are made? We first
give a general description of such multistage models and then discuss
examples in Section 3.4.
Let x1 , . . . , xT be decision vectors corresponding to time periods (stages)
1, . . . , T. Consider the following linear programming problem
At, t1 and Att , t ¼ 2, . . . , T, and right hand side vectors b2 , . . . , bT are random.
At each stage some of these quantities become known, and we have the
following sequence of actions:
decision ðx1 Þ
observation 2 :¼ ðc2 , A21 , A22 , b2 Þ
decision ðx2 Þ
..
.
observation T :¼ ðcT , AT, T 1 , ATT , bT Þ
decision ðxT Þ:
Our objective is to design the decision process in such a way that the
expected value of the total cost is minimized while optimal decisions are
allowed to be made at every time period t ¼ 1, . . . , T:
Let us denote by t the data which become known at time period t. In the
setting of the multiperiod problem (3.1), t is assembled from the components
of ct , At, t1 , Att , bt , some (all) of which can be random, and the data
1 ¼ ðc1 , A11 , b1 Þ at the first stage of problem (3.1) which is assumed to be
known. For 1 t1 t2 T, denote by
the history of the process from time t1 to time t2 . In particular, ½1, t represents
the information available up to time t. The important condition in the above
multistage process is that every decision vector xt may depend on the
information available at time t (that is, ½1, t ), but not on the results of
observations to be made at later stages. This differs multistage stochastic
programs from deterministic multiperiod problems, in which all the inform-
ation is assumed to be available at the beginning.
There are several possible ways how multistage stochastic programs can be
formulated in a precise mathematical form. In one such formulation
xt ¼ xt ð½1, t Þ, t ¼ 2, . . . , T, is viewed as a function of ½1, t ¼ ð1 , . . . , t Þ, and
the minimization in (3.1) is performed over appropriate functional spaces (as
it was discussed in Section 2.4 in the case of two-stage programming). If the
number of scenarios is finite, this leads to a formulation of the linear
multistage stochastic program as one large (deterministic) linear programming
problem. We discuss that further in the following Section 3.2. It is also useful
to connect dynamics of the multistage process starting from the end as
follows.
Let us look at our problem from the perspective of the last stage T. At that
time the values of all problem data, ½1, T , are already known, and the values
24 A. Ruszczyński and A. Shapiro
of the earlier decision vectors, x1 , . . . , xT1 , have been chosen. Our problem is,
therefore, a simple linear programming problem
Min cTT xT
xT
The optimal value of this problem depends on the earlier decision vector
xT1 and data T ¼ ðcT , AT, T1 , AT, T , bT Þ, and is denoted by QT ðxT1 , T Þ.
At stage T 1 we know xT2 and ½1, T 1 . We face, therefore, the following
two-stage stochastic programming problem
Min cTT1 xT1 þ E QT ðxT1 , T Þ j ½1, T1
xT1
The optimal value of the above problem depends on xT2 and data ½1, T1 ,
and is denoted QT1 ðxT2 , ½1, T1 Þ.
Generally, at stage t ¼ 2, . . . , T 1, we have the problem
Min cTt xt þ E Qtþ1 ðxt , ½1, tþ1 Þ j ½1, t
xt
Its optimal value is denoted Qt ðxt1 , ½1, t Þ and is called the cost-to-go
function.
Note that, since 1 is not random, the conditional distribution of t þ 1 given
½1, t is the same as the conditional distribution of t þ 1 given ½2, t ,
t ¼ 2, . . . , T 1. Therefore, it suffices to take the conditional expectation in
(3.4) (in (3.3)) with respect to ½2, t (with respect to ½2, T1 ), only.
On top of all these problems is the problem to find the first decisions, x1 ,
s:t: A11 x1 ¼ b1 ,
x1 0: ð3:5Þ
expected values. Note also that since 1 is not random, the optimal value
Q2 ðx1 , 2 Þ does not depend on 1 . In particular, if T ¼ 2, then (3.5) coincides
with the formulation (2.2) of a two-stage linear problem.
We arrived in this way at the following nested formulation:
2 2 2 333
6 7
Min cT1 x1 þE4 min cT2 x2 þE6
4þE4 min cTT xT 57
55:
A11 x1 ¼b1 A21 x1 þA22 x2 ¼b2 AT,T1 xT1 þATT xT ¼bT
x2 0 xT 0
x1 0
Then, of course, each subproblem (3.4) depends on the entire history of our
decisions, x½1, t1 :¼ ðx1 , . . . , xt1 Þ. It takes on the form
Min cTt xt þ E Qt þ 1 ðx½1, t , ½1, t þ 1 Þ j ½1, t
xt
xt 0: ð3:7Þ
in which all blocks Ait , i ¼ 2, . . . , T are identical and observed at time t. Then
we can define the state variables rt , t ¼ 1, . . . , T recursively by the state
Ch. 1. Stochastic Programming Models 27
Suppose that in our basic problem (3.1) there are only finitely many, say K,
different values the problem data can take. We shall call them scenarios.
With each scenario k is associated probability pk and the corresponding
sequence of decisions9 xk ¼ ðxk1 , xk2 , . . . , xkT Þ: Of course, it would not be
appropriate to try to find the optimal values of these decisions by solving the
relaxed version of (3.1):
PK T k k T k k T k
Min k¼1 pk ½ ðc1 Þ x1 þ ðc2 Þ x2 þ ðc3 Þ x3 þ þ ðckT ÞT xkT
s:t: A11 xk1 ¼ b1 ,
Ak21 xk1 þ Ak22 xk2 ¼ bk2 ,
Ak32 xk2 þ Ak33 xk3 ¼ bk3 ,
.....................................................................................
AkT,T 1 xkT1 þ AkTT xkT ¼ bkT ,
9
To avoid ugly collisions of subscripts we change our notation a little and we put the index of the
scenario, k, as a superscript.
28 A. Ruszczyński and A. Shapiro
The reason is the same as in the two-stage case: in the problem above all
parts of the decision vector are allowed to depend on all parts of the random
data, while in reality each part xt is allowed to depend only on the data known
up to stage t. In particular, problem (3.8) may suggest different values of x1 for
each scenario k, but we need only one value.
It is clear that we need the nonanticipativity constraints
similarly to (2.10). But this is not sufficient, in general. Consider the second
part of the decision vector, x2 . It is allowed to depend only on ½1, 2 ¼ ð1 , 2 Þ,
k
so it has to have the same value for all scenarios k for which ½1, 2 is identical.
We must therefore, satisfy the equations
xk2 ¼ x2j k
for all k, j for which ½1, j
2 ¼ ½1, 2 : ð3:10Þ
xkt ¼ xtj k
for all k, j for which ½1, j
t ¼ ½1, t , t ¼ 1, . . . , T: ð3:11Þ
problem (3.1) and in that sense can be useful. Note, however, that this model
does not make much sense, since it assumes that at the end of the process
when all realizations of the random data become known, one can go back in
time and make all decisions x2 , . . . , xK .
It is useful to depict the possible sequences of data ½1, t in a form of
a scenario tree. It has nodes organized in levels which correspond to
stages 1, 2, . . . , T. At level 1 we have only one root node, and we associate
with it the value of 1 (which is known at stage 1). At level 2 we have at
least as many nodes as many different realizations of 2 may occur. Each
of them is connected with the root node by an arc. For each node i of
level 2 (which corresponds to a particular realization 2i of 2 ) we create
at least as many nodes at level 3 as different values of 3 may follow 2i ,
and we connect them with the node i, etc. Generally, nodes at level
t correspond to possible values of t that may occur. Each of them is
connected to a unique node at level t 1, called the ancestor node, which
corresponds to the identical first t 1 parts of the process ½1, t , and is also
connected to nodes at level t þ 1, which correspond to possible continuations
of history ½1, t .
Note that, in general, realizations ti are vectors and it may happen that
some of the values ti , associated with nodes at a given level t, are equal to each
other. Nevertheless, such equal values may be represented by different nodes
since they may correspond to different histories of the process. Note also that
if for every t ¼ 1, . . . , T all realizations ti are different from each other, then the
random process 1, . . . , T is Markovian because of the tree structure of the
process. Indeed, in that case the conditional probability of t to be at state
ti depends on the previous history of the process only through the ancestor
node at level t 1.
In order to illustrate the above ideas let us discuss the following simple
example.
Fig. 2. Scenario tree. Nodes represent information states. Paths from the root to leaves
represent scenarios. Numbers along the arcs represent conditional probabilities of moving
to the next node. Bold numbers represent numerical values of the process.
level t ¼ 4, the numerical values associated with eight nodes are defined, from
left to right, as 10, 10, 30, 12, 10, 20, 40, 70. The respective probabilities can
be calculated by using the corresponding conditional probabilities. For
example,
Pfb4 ¼ 10g ¼ 0:4 0:1 1:0 þ 0:4 0:4 0:5 þ 0:6 0:4 0:4:
while
Fig. 3. Sequences of decisions for scenarios from Fig. 2. Horizontal dotted lines represent
the equations of nonanticipativity.
and hence
bT xT1 , if xT1 bT ,
QT ðxT1 , T Þ ¼
þ 1, otherwise:
10
Recall that a random process Zt , t ¼ 1, . . . , is called a martingale if the equalities E Zt þ 1 jZ½1, t ¼
Zt , t ¼ 1, . . . , hold with probability one.
32 A. Ruszczyński and A. Shapiro
The history b½1, 3 of the process bt , and hence the history ½1, 3 of the process
t , is in one-to-one correspondence with the nodes of the tree at level t ¼ 3. It
i
has 5 possible realizations ½1, 3 , i ¼ 1, . . . , 5, numbered from left to right, i.e.,
for i ¼ 1 it corresponds to the realization b1 ¼ 36, b2 ¼ 15, b3 ¼ 10 of b½1, 3 .
We have that
1
1 10 xT1 , if xT1 10,
E QT ðxT1 ,4 Þj½1,3 ¼½1,3 ¼QT ðxT 1 ,4 Þ ¼
þ 1, otherwise,
where 41 ¼ 1, 1, 1, b14 and b14 ¼ 10. Consequently,
1
10, if 0 xT2 10,
QT1 xT2 , ½1, 3 ¼ þ 1, otherwise:
Similarly,
2
1 2 1
E QT xT1 , 4 j½1, 3 ¼ ½1, 3 ¼ QT ðxT1 , 4 Þ þ QT ðxT1 , 43 Þ,
2 2
and hence
2
20, if 10 xT2 20,
QT1 xT2 , ½1, 3 ¼
þ 1, otherwise,
11
Since 1 is not random, for t ¼ 2 the distribution of 2 is independent of 1 .
Ch. 1. Stochastic Programming Models 33
for all states zit1 , . . . , zi1 ,zi , zj and all t ¼ 1, . . . : In some instances it is natural
to model the data process as a Markov chain with the corresponding state
space12 f
1 , . . . ,
m g and probabilities pij of moving from state
i to state
j ,
i, j ¼ 1, . . . , m. We can model such process by a scenario tree. At stage t ¼ 1
there is one root node to which is assigned one of the values from the state
space, say
i . At stage t ¼ 2 there are m nodes to which are assigned values
12
In our modeling, values
1 , . . . ,
m can be numbers or vectors.
34 A. Ruszczyński and A. Shapiro
P
j2A ðkÞ pj xtj
xkt ¼ P t , k ¼ 1, . . . , K, t ¼ 1, . . . , T, ð3:13Þ
j2At ðkÞ pj
j k
where At ðkÞ :¼ f j : ½1, t ¼ ½1, t g is the set of scenarios that share with scenario
k the history up to stage t. The expression at the right hand side of the above
relation is the conditional expectation of xt under the condition that
k
½1, t ¼ ½1, t , where xt is viewed as a random variable which can take values
xtj with probabilities pj , j ¼ 1, . . . , K. We can therefore rewrite (3.13) as
xt ¼ E xt j ½1, t , t ¼ 1, . . . , T: ð3:14Þ
The values of the decision vector xt , chosen at stage t, may depend on the
information ½1, t available up to time t, but not on the results of future
observations. We can formulate this requirement using nonanticipativity
constraints. That is, we view each xt ¼ xt ðÞ as an element of the space of
measurable mappings from to Rnt , and hence consider xt ð!Þ as a random
(vector valued) process of time t. It has to satisfy the following additional
condition, called the nonanticipativity constraint,
13
F t is the minimal subalgebra of the sigma algebra F such that 1 ð!Þ, . . . , t ð!Þ are F t -measurable.
Since 1 is not random, F 1 contains only two sets ; and . We can assume that F T ¼ F .
36 A. Ruszczyński and A. Shapiro
F : Rn1 þ þ nT
Rd1 þ þ dT
! R,
X
n
xi0 ¼ W0 : ð3:19Þ
i¼1
We can put an equation sign here, because one of our assets is cash.
After the first period, our wealth may change, due to random returns from
the investments, and at time t ¼ 1 it will be equal to
X
n
W1 ¼ ð1 þ Ri1 Þxi0 : ð3:20Þ
i¼1
ð1 þ qÞðW aÞ, if W a,
UðWÞ :¼ ð3:22Þ
ð1 þ rÞðW aÞ, if W a,
with r > q > 0 and a > 0. We can view the involved parameters as follows: a is
the amount that we have to pay at time t ¼ 1, q is the interest at which we can
invest the additional wealth W a, provided that W > a, and r is the interest
at which we will have to borrow if W is less than a. For the above utility
function, problem (3.21) can be formulated as the following two-stage
stochastic linear program
Suppose now that T > 1. In that case we can rebalance the portfolio at time
t ¼ 1, by specifying the amounts x11 , . . . , xn1 invested in the assets in the
second period. Note that we already know the actual returns in the first
period, so it is reasonable to use this information in the rebalancing decisions.
Thus, our second stage decisions are actually functions of R1 ¼ ðR11 , . . . , Rn1 Þ,
and they can be written as x11 ðR1 Þ, . . . , xn1 ðR1 Þ. We also must remember about
our balance of wealth:
X
n
xi1 ðR1 Þ ¼ W1 ð3:25Þ
i¼1
Ch. 1. Stochastic Programming Models 39
Its optimal value is denoted QT1 ðxT 2 , R½1, T1 Þ. At stage t ¼ T 2 reali-
zations of the random process R1 , . . . , RT2 are known and xT3 has been
chosen. We have then to solve the following two-stage stochastic program
Max E QT1 ðxT2 , R½1, T1 ÞjR½1, T2
X
n X n
s:t: xi, T2 ¼ ð1 þ Ri, T2 Þxi, T3 ,
i¼1 i¼1
xi, T2 0, i ¼ 1, . . . , n: ð3:31Þ
Its optimal value is denoted QT2 ðxT3 , R½1, T2 Þ, etc. At stage t ¼ 0 we
have to solve the following program
Note that in the present case the cost-to-go function QT1 ðxT2P , R½1, T1 Þ
depends on xT 2 ¼ ðx1, T2 , . . . , xn, T2 Þ only through WT1 ¼ ni¼1 ð1 þ
Ri, T 1 Þxi, T2 . That is, if Q~ T1 ðWT1 , R½1, T1 Þ is defined as the optimal
value of the problem
( " # )
X
n
Max E U ð1 þ RiT Þxi, T 1 R½1, T1
i¼1
X
n
s:t: xi, T1 ¼ WT1 , xi, T1 0, i ¼ 1, . . . , n, ð3:33Þ
i¼1
then
!
X
n
QT1 ðxT2 , R½1, T1 Þ ¼ Q~ T1 ð1 þ Ri, T1 Þxi, T2 , R½1, T1 :
i¼1
Similarly, QT2 ðxT3 , R½1, T2 Þ depends on xT3 only through WT 2 , and
so on.
We may also note that the need for multistage modeling occurs here mainly
because of the nonlinearity of the utility function UðÞ. Indeed, if UðWÞ:W,
and the returns in different stages are independent random vectors, it is
Ch. 1. Stochastic Programming Models 41
defined
Qby the recursive
equation14 xit ¼ ð1 þ Rit Þxi, t1 , which implies that
t
xit ¼ s¼1 ð1 þ Ris Þ xi0 . Consequently, xit is completely determined by the
initial value xi0 ¼ xi and a realization of the random process Ri1 , . . . , Rit . On
the other hand in the multistage model values xit are rebalanced at every
period of time subject to the constraints (3.26)–(3.27). Therefore the
multistage problem (3.29) can be viewed as a relaxation of the two-stage
problem (3.35), and hence has a larger optimal value.
We discuss further the above example in section ‘‘An Example of Financial
Planning’’ of chapter ‘‘Monte Carlo Sampling Methods’’.
The following example also demonstrates that in some cases the same
practical problem can be modeled as a multistage or two-stage multiperiod
program.
It ¼ ½It1 þ xt Dt þ , ð3:36Þ
X
T
Fðx, DÞ :¼ ht It ct min ½It1 þ xt , Dt :
t¼1
14
This defines an implementable and feasible policy for the multistage problem (3.29), see section
‘‘Multistage Models’’ of Chapter ‘‘Optimality and Quality in Stochastic Programming’’ for the
definition of implementable and feasible policies.
Ch. 1. Stochastic Programming Models 43
X
T X
T
Fðx, DÞ ¼ qt It ct xt c1 I0 ,
t¼1 t¼1
Min cT x þ E½Qðx, DÞ , ð3:37Þ
x0
X
T
Min qt yt
y0
t¼1
s:t: yt1 þ xt Dt yt , t ¼ 1, . . . , T,
y0 ¼ I0 : ð3:38Þ
15
Since I0 does not depend on x, the term c1 I0 is omitted.
44 A. Ruszczyński and A. Shapiro
number of scenarios K ¼ mT . We can write then the two stage problem (3.37)–
(3.38) as the linear problem (compare with (2.7)):
!
X
T X
K X
T
Min c t xt þ pk qt ykt
t¼1 k¼1 t¼1
Note that the optimal values of ykt in (3.39) represent It ðx, Dk Þ. Since
It ðx, Dk Þ depend only on the realization Dk up to time t, the nonanticipativity
constraints with respect to ykt hold in (3.39) automatically.
On the other hand, depending on the flexibility of the production process,
one can update production quantities at every time period t ¼ 1, . . . , T using
known realizations of the demand up to time t. This can be formulated as a
multistage stochastic program where an optimal decision is made at every
period of time based on available realizations of the random data. Consider
the following relaxation of (3.39):
" #
X
K X
T
Min pk qt ykt
ct xkt
k¼1 t¼1
k
s:t: yt1 þ xtk Dkt ytk , t ¼ 1, . . . , T,
xtk 0, y0k ¼ I0 , ytk 0, t ¼ 1, . . . , T, k ¼ 1, . . . , K: ð3:40Þ
r 0, y 0, z 0: ð3:41Þ
stþ1 ¼ At st þ Bt ut þ Ct et , t ¼ 1, . . . , T,
in which st denotes the state of the system at time t, ut is the control vector,
and et is a random ‘disturbance’ at time t. The matrices At , Bt and Ct are
known. The random vectors et , t ¼ 1, . . . , T, are assumed to be independent.
At time t we observe the current state value, st , but not the disturbances et .
Our objective is to find a control law, u^ t ðÞ, t ¼ 1, . . . , T, so that the actual
values of the control variables can be determined through the feedback rule:
ut ¼ u^ t ðst Þ, t ¼ 1, . . . , T 1:
" #
X
T 1
E Ft ðst , ut Þ þ FT ðsT Þ
t¼1
gti ðst , ut Þ 0, i ¼ 1, . . . , mt , t ¼ 1, . . . , T 1:
For the sake of simplicity we assume that they are all incorporated into the
definition of the partial objectives, that is, Ft ðst , ut Þ ¼ þ1 if these constraints
are not satisfied.
The crucial characteristics of the optimal control model is that we look for
a solution in the form of a function of the state vector. We are allowed to
focus on such a special form of the control rule due to the independence of the
disturbances at different stages. If the disturbances are dependent in certain
ways, augmentation of the state space may reduce the model to the case of
independent et ’s.
The key role in the optimal control theory is played by the cost-to-go
function
" #
X
T 1
Vt ðst Þ :¼ infE F ðs , u Þ þ FT ðsT Þ ,
¼t
Ch. 1. Stochastic Programming Models 47
where the minimization is carried out among all possible feedback laws
applied at stages t, . . . , T 1. The functions Vt ðÞ give the dynamic program-
ming equation:
xt ¼ ðut , st Þ, t ¼ 1, . . . , T 1,
xT ¼ sT ,
t ¼ Ct1 et1 , t ¼ 2, . . . , T:
Min Ft ðst , ut Þ þ E½Vtþ1 ðAt st þ Bt ut þ Ct et Þ
st , ut
X
n X
n
rit ¼ ri, t1 þ ykit yijt ,
k¼1 j¼1
Dtþ1 ¼ et :
zijt Dijt , i, j ¼ 1, . . . , n,
zijt yijt , i, j ¼ 1, . . . , n:
X
n
Ft ðst , ut Þ ¼ ðqij zijt cij yijt Þ,
i, j¼1
and depends on controls alone. So, if the demands in different days are
independent, the optimal solution has the form of a feedback rule:
yt ¼ y^t ðrt1 , Dt Þ,
zt ¼ z^t ðrt1 , Dt Þ:
where
2 !2 3
X
K X
K X
K
ðzÞ :¼ pk zk þ 4 pk z2k pk zk 5:
k¼1 k¼1 k¼1
Now for > 0 the objective function of the above problem is not necessarily
convex even though the functions Qð, i Þ, i ¼ 1, . . . , K, are all convex, and the
second stage optimality does not hold in the sense that problem (4.1) is not
equivalent to the problem
Min cT x þ qT1 y1 , . . . , qTK yK
x, y1 ,..., yk
s:t: Ax ¼ b,
Tk x þ Wk yk ¼ hk ,
x 0, yk 0, k ¼ 1, . . . , K: ð4:2Þ
Proposition 11. Suppose that problem (4.2) is feasible and function ðzÞ is
componentwise nondecreasing. Then problems (4.1) and (4.2) have the same
optimal value, and if, moreover, problem (4.2) has an optimal solution, then
problems (4.1) and (4.2) have the same set of first stage optimal solutions.
Proof. Since (4.2) is feasible it follows that there exists a feasible x such that
all Qðx, k Þ, k ¼ 1, . . . , K, are less than þ1, and hence the optimal value of
problem (4.1) is also less than þ1. By (2.6) we have that Qðx, k Þ is given by
the optimal value of a linear programming problem. Therefore, if Qðx, k Þ is
finite, then the corresponding linear programming problem has an optimal
solution. It follows that if all Qðx, k Þ are finite, then ðQðx, 1 Þ, . . . , Qðx, K ÞÞ
is equal to ðqT1 y1 , . . . , qTK yK Þ for some yk , k ¼ 1, . . . , K, satisfying the
constraints of problem (4.2) and hence the optimal value of (4.1) is greater
than or equal to the optimal value of (4.2). Conversely, for a given x, Qðx, k Þ
is less than or equal to qTk yk , k ¼ 1, . . . , K, for any y1 , . . . , yk feasible for (4.2).
Since ðzÞ is componentwise nondecreasing, it follows that the optimal value
of (4.2) is greater than or equal to the optimal value of (4.1), and hence these
two optimal values are equal to each other. Moreover, it follows that if
x* , y*1 , . . . , y*K is an optimal solution of problem (4.2), then x* is an optimal
solution of problem (4.1), and vice versa. u
for some 0 and 2 R. Note that for both above choices of k , the
corresponding function ðzÞ is componentwise nondecreasing and convex.
If the parameter in (4.4) is equal to E½Qðx, Þ and the distribution of
Qðx, Þ is symmetrical around its mean, then
2
1 ðZ; Þ :¼ E½ðZ Þþ and 2 ðZ; Þ :¼ E ðZ Þþ ,
respectively, which represent the expected excess (or square excess) over the
target level . More sophisticated are semideviation measures, which
use, instead of the fixed target level , the expected value of the random
outcome. The simplest and most convenient in applications is the absolute
semideviation:
The presence of the expected value of the outcome in the definition of the
measure makes the resulting risk term
1 ðFðx, ÞÞ ¼ E ðFðx, Þ E½Fðx, ÞÞþ ,
Min E½Fðx, Þ þ E ðFðx, Þ E½Fðx, ÞÞþ
remains a convex problem, provided that the coefficient in front of the risk
term is confined to ½0, 1. This can be seen from the representation:
E½Fðx, Þ þ E ðFðx, Þ E½Fðx, ÞÞþ
¼ ð1 ÞE½Fðx, Þ þ E½maxfE½Fðx, Þ, Fðx, Þg,
in which case (4.6) represents the expected shortfall below some target profit
level . If ¼ b < 0, our measure represents the expected loss in excess of b.
Supposing that our initial capital (wealth) is W, we may formulate the
following mean–risk optimization problem
( !)
X
n Xn
Max i xi Ri xi
x0
i¼1 i¼1
X
n
s:t: xi W: ð4:8Þ
i¼1
( )
X
K
Max ð1 Þ þ pk rk
x0, , r
k¼1
X
n
s:t: i xi ¼ ,
i¼1
rk , k ¼ 1, . . . , K,
Xn
rk Rik xi , k ¼ 1, . . . , K,
i¼1
X
n
xi W:
i¼1
For the above problem to make sense it is assumed, of course, that for every
P 2 S the expectation EP ½Fðx, !Þ is well defined for all x 2 X .
In order to see a relation between these two approaches let us assume for
the sake of simplicity that the set S ¼ fP1 , . . . , Pl g is finite. Then problem (4.9)
can be written in the following equivalent way
Min z
ðx, zÞ2XR
where fi ðxÞ :¼ EPi ½Fðx, !Þ. Suppose further that problem (4.10), and hence
problem (4.9), is feasible and for every ! 2 the function Fð, !Þ is convex. It
follows from convexity of Fð, !Þ that the functions fi ðÞ are also convex, and
hence problem (4.10) is a convex programming problem. Then, by the duality
theory of convex programming,
P there exist Lagrange multipliers i 0,
i ¼ 1, . . . , l, such that li¼1 i ¼ 1 and problem (4.10) has the same optimal
value as the problem
( )
Xl
Min f ðxÞ :¼ i fi ðxÞ
x2X
i¼1
and the set of optimal solutions of (4.10) is included in the set of optimal
solutions of the above problem. Since f ðxÞ ¼ EP* ½Fðx, !Þ, where P* :¼
P l
i¼1 i Pi , we obtain that problem (4.9) is equivalent to the stochastic
programming problem
5 Appendix
In this section we briefly discuss some basic concepts and definitions from
probability and optimization theories, needed for the development of
stochastic programming models. Of course, a careful derivation of the
required results goes far beyond the scope of this book. The interested reader
may look into standard textbooks for a thorough development of these topics.
16
In fact it suffices to verify F -measurability of V 1 ðAÞ for any family of sets generating the Borel
sigma algebra of Rm .
56 A. Ruszczyński and A. Shapiro
17
Recall that Zþ :¼ maxf0, Zg.
Ch. 1. Stochastic Programming Models 57
value of Zð!Þ is well defined one has to check that Zð!Þ is measurable and
either E½Zþ < þ1 or E½ðZÞþ < þ1. Note that if Zð!Þ and Z 0 ð!Þ are two
(extended) random variables such that their expectations are well defined and
Zð!Þ ¼ Z 0 ð!Þ for all ! 2 except possibly on a set of measure zero, then
E½Z ¼ E½Z0 . It is said that Zð!Þ is P-integrable if the expected value E½Z is
well defined and finite. The expected value of a random vector is defined
componentwise.
If the random variable Zð!Þ can take only a countable (finite) number of
different values, say z1 , z2 , . . . , then it is said that Zð!Þ has a discrete
distribution (discrete distribution with a finite support). In such cases all
relevant probabilistic informationP is contained in the probabilities
pi :¼ PfZ ¼ zi g. In that case E½Z ¼ i pi zi .
is well defined, i.e., for every18 x 2 Rn the function Fðx, Þ is measurable, and
either E½FðxÞþ < þ1 or E½ðFðxÞÞþ < þ1. The (effective) feasible set of
the problem (1.4) is given by X \ ðdom f Þ, where
denotes the domain of f. It is said that f is proper if f ðxÞ > 1 for all x 2 Rn
and dom f 6¼ ;.
From the theoretical point of view it is convenient to incorporate the
constraints ‘‘x 2 X’’ into the objective function. That is, for any ! 2 define
Fðx, !Þ, if x 2 X,
Fðx, !Þ :¼
þ1, if x 2
6 X:
18
Since we are interested here in x belonging to the feasible set X, we can assume that f ðxÞ is well
defined for x 2 X.
58 A. Ruszczyński and A. Shapiro
If the problem is infeasible (that is, f ðxÞ ¼ þ1 for every x 2 X ), then any
x* 2 X is "-optimal. If the problem is feasible, and hence inf x2X f ðxÞ < þ1,
then "-optimality of x* implies that f ðx* Þ < þ1, i.e., that x* 2 dom f . Note
that by the nature of the minimization process, if inf x2X f ðxÞ > 1, then for
any " > 0 there always exists an "-optimal solution.
An extended real valued function f : Rn ! R is called lower semicontinuous
(lsc) at a point x0 if
is a closed subset of Rn R.
Proposition 14. Suppose that: (i) for P-almost every ! 2 the function Fð, !Þ
is lsc at x0 , (ii) Fðx, Þ is measurable for every x in a neighborhood of x0 , (iii)
there exists P-integrable function Zð!Þ such that Fðx, !Þ Zð!Þ for P-almost
all ! 2 and all x in a neighborhood of x0 . Then for all x in a neighborhood of
x0 the expected value function f ðxÞ :¼ E½Fðx, !Þ is well defined and lsc at x0 .
Proof. It follows from assumptions (ii) and (iii) that f ðÞ is well defined in a
neighborhood of x0 . Under assumption (iii), it follows by Fatou’s lemma that
Z Z
lim inf Fðx, !Þ dPð!Þ lim inf Fðx, !Þ dPð!Þ: ð5:3Þ
x ! x0 x ! x0
Suppose further that for P-almost every ! 2 the functions Gi ð, !Þ are lsc,
and for all x, Gi ðx, Þ are measurable. Then functions 1ð0, þ1Þ ðGi ð, !ÞÞ are also
lsc for P-almost every ! 2 , and clearly are bounded. Consequently we
obtain by Proposition 14 that the corresponding expected value functions in
the left hand side of (5.4) are lsc. It follows that constraints (5.4), and hence
the probabilistic constraints (1.20), define a closed subset of Rn .
We often have to deal with optimal value functions of min or max types.
That is, consider an extended real valued function h : Rn Rm ! R and the
associated functions
Proposition 15. The following holds. (i) Suppose that for every y 2 Rm
the function hð, yÞ is lsc. Then the max-function ðxÞ is lsc. (ii) Suppose that
the function hð, Þ is lsc and there exists a bounded set S Rm such that
dom hðx, Þ S for all x 2 Rn . Then the min-function fðxÞ is lsc.
60 A. Ruszczyński and A. Shapiro
Proof. (i) The epigraph of the max-function ðÞ is given by the intersection of
the epigraphs of hð, yÞ, y 2 Rm . By lower semicontinuity of hð, yÞ, these
epigraphs are closed, and hence their intersection is closed. It follows that ðÞ
is lsc.
(ii) Consider a point x0 2 Rn and let fxk g be a sequence converging to x0
along which the lim inf x!x0 fðxÞ is attained. If limk!1 fðxk Þ ¼ þ1, then
clearly limk!1 fðxk Þ fðx0 Þ, and hence f is lsc at x0 . Therefore, we can
assume that fðxk Þ < þ1 for all k. Let " be a given positive number and
yk 2 Rm be such that hðxk , yk Þ fðxk Þ þ ". Since yk 2 dom hðxk , Þ S and S
is bounded, by passing to a subsequence if necessary we can assume that yk
converges to a point y0 . By lower semicontinuity of hð, Þ we have then that
limk!1 fðxk Þ hðx0 , y0 Þ " fðx0 Þ ". Since " was arbitrary, it follows that
limk!1 fðxk Þ fðx0 Þ, and hence fðÞ is lsc at x0 . This completes the
proof. u
of G is an F -measurable subset of .
It is said that a mapping G : dom G ! Rn is a selection of G if Gð!Þ 2 Gð!Þ
for all ! 2 dom G. If, in addition, the mapping G is measurable, it is said that
G is a measurable selection of G.
Definition 17. It is said that the function ðx, !Þ ° Fðx, !Þ is random lower
semicontinuous if the associated epigraphical multifunction ! ° epi Fð, !Þ is
closed valued and measurable.
Note that close valuedness of the epigraphical multifunction means that for
every ! 2 , the epigraph epi Fð, !Þ is a closed subset of Rn R, i.e., that
Fð, !Þ is lsc.
Theorem 18. Suppose that the sigma algebra F is P-complete. Then an extended
real valued function F : Rn ! R is random lsc iff the following two
properties hold: (i) for every ! 2 , the function Fð, !Þ is lsc, (ii) the function
Fð, Þ is measurable with respect to the sigma algebra of Rn given by the
product of the sigma algebras B and F .
A large class of random lower semicontinuous functions is given by the
so-called Carathe´odory functions, i.e., real valued functions F : Rn ! R
such that Fðx, Þ is F -measurable for every x 2 Rn and Fð, !Þ continuous for
a.e. ! 2 .
be the associated optimal value function. Suppose that there exists a bounded set
S Rm such that domFðx, , !Þ S for all ðx, !Þ 2 Rn . Then the optimal
value function #ðx, !Þ is random lsc.
62 A. Ruszczyński and A. Shapiro
Let us finally observe that the above framework of random lsc functions
is aimed at minimization problems. Of course, the problem of maximization
of E½Fðx, !Þ is equivalent to minimization of E½Fðx, !Þ. Therefore, for
maximization problems one would need the comparable concept of random
upper semicontinuous functions.
6 Bibliographic notes
There are many good textbooks on probability and measure theory, e.g.,
Billingsley (1995), to which we refer for a thorough discussion of such basic
concepts as random variables, probability space, etc. Also a proof of Fatou’s
lemma, used in the proof of Proposition 14, can be found there. For an
additional discussion of the expected value function see section ‘‘Expectation
Functions’’ of Chapter 2. Continuity and differentiability properties of the
optimal value functions, of the form defined in equation (5.5), were studied
extensively in the optimization literature (see, e.g., Bonnans and Shapiro
(2000) and the references therein).
Measurable selection theorem (Theorem 16) is due to Castaing. A thorough
discussion of measurable mappings and selections can be found in Castaing
and Valadier (1977), Ioffe and Tihomirov (1979) and Rockafellar and Wets
(1998). Random lower semicontinuous functions are called normal integrands
(see Definition 14.27 in Rockafellar and Wets (1998)) by some authors. Proofs
of theorems 18, 19 and 20 can be found in the section on normal integrands of
Rockafellar and Wets (1998).
References
Beale, E.M.L. (1955). On minimizing a convex function subject to linear inequalities. Journal of the
Royal Statistical Society Series B 17, 173–184.
Beale, E.M.L., J.J.H. Forrest, C.J. Taylor (1980). Multi-time-period stochastic programming, in:
M.A.H. Dempster (ed.), Stochastic Programming, Academic Press, New York, pp. 387–402.
Billingsley, P. (1995). Probability and Measure, John Wiley & Sons, New York.
Birge, J.R. (1982). The value of the stochastic solution in stochastic linear programs with fixed
recourse. Mathematical Programming 33, 314–325.
Birge, J.R. (1985). Decomposition and partitioning methods for multistage stochastic linear programs.
Operations Research, 989–1007.
Birge, J.R., F.V. Louveaux (1997). Introduction to Stochastic Programming, Springer-Verlag,
New York.
Bonnans, J.F., A. Shapiro (2000). Perturbation Analysis of Optimization Problems, Springer-Verlag,
New York, NY.
Castaing, C., M. Valadier (1977). Convex Analysis and Measurable Multifunctions, Lecture Notes
in Mathematics, Vol. 580, Springer-Verlag, Berlin.
Charnes, A., W.W. Cooper, G.H. Symonds (1958). Cost horizons and certainty equivalents: an
approach to stochastic programming of heating oil. Management Science 4, 235–263.
Dantzig, G.B. (1955). Linear programming under uncertainty. Management Science 1, 197–206.
Dempster, M.A.H. (1981). The expected value of perfect information in the optimal evolution of
stochastic systems, in: M. Arato, D. Vermes, A.V. Balakrishnan (eds.), Stochastic Differential
Systems, Lecture Notes in Control and Information Systems, Vol. 36. Springer-Verlag, Berlin,
pp. 25–41.
Dowd, K. (1997). Beyond Value at Risk. The Science of Risk Management, Wiley, New York.
Dupačová, J. (1980). On minimax decision rule in stochastic linear programming Studies on
Mathematical Programming, A. Prékopa Akadémiai Kiadó, Budapest, pp. 47–60.
Dupačová, J. (1987). The minimax approach to stochastic programming and an illustrative
application. Stochastics 20, 73–88.
Ioffe, A.D., V.M. Tihomirov (1979). Theory of Extremal Problems, North-Holland Publishing
Company, Amsterdam.
64 A. Ruszczyński and A. Shapiro