0% found this document useful (0 votes)
64 views64 pages

HandbookORMS SP-Chapter01

1. This document introduces the stochastic programming approach to modeling optimization problems with uncertainty. It uses the newsvendor problem as a motivating example to illustrate key concepts. 2. In the newsvendor problem, the newsvendor must decide how many newspapers to order without knowing demand, which is a random variable. The goal is to maximize expected profit. This leads to a stochastic programming formulation where the objective is to maximize the expected value of the profit function. 3. For the newsvendor problem, the optimal solution can be found in closed form and is given by the inverse of the demand distribution function evaluated at a particular point. This solution is more robust to changes in the demand distribution than the expected value

Uploaded by

Stephen Testerov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views64 pages

HandbookORMS SP-Chapter01

1. This document introduces the stochastic programming approach to modeling optimization problems with uncertainty. It uses the newsvendor problem as a motivating example to illustrate key concepts. 2. In the newsvendor problem, the newsvendor must decide how many newspapers to order without knowing demand, which is a random variable. The goal is to maximize expected profit. This leads to a stochastic programming formulation where the objective is to maximize the expected value of the profit function. 3. For the newsvendor problem, the optimal solution can be found in closed form and is given by the inverse of the demand distribution function evaluated at a particular point. This solution is more robust to changes in the demand distribution than the expected value

Uploaded by

Stephen Testerov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

A. Ruszczyński and A. Shapiro, Eds., Handbooks in OR & MS, Vol.

10
ß 2003 Elsevier Science B.V. All rights reserved.

Chapter 1

Stochastic Programming Models

Andrzej Ruszczyński
Department of Management Science and Information Systems, Rutgers University,
94 Rockefeller Rd, Piscataway, NJ 08854, USA

Alexander Shapiro
School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta,
GA 30332, USA

Abstract

In this introductory chapter we discuss some basic approaches to modeling of


stochastic optimization problems. We start with motivating examples and then
proceed to formulation of linear, and later nonlinear, two stage stochastic
programming problems. We give a functional description of two stage pro-
grams. After that we proceed to a discussion of multistage stochastic program-
ming and its connections with dynamic programming. We end this chapter by
introducing robust and min–max approaches to stochastic programming.
Finally, in the appendix, we introduce and briefly discuss some relevant
concepts from probability and optimization theories.

Key words: Two stage stochastic programming, expected value solution,


stochastic programming with recourse, nonanticipativity constraints, multistage
stochastic programming, dynamic programming, chance constraints, value at
risk, scenario tree, robust stochastic programming, mean–risk models.

1 Introduction

1.1 Motivation

Uncertainty is the key ingredient in many decision problems. Financial


planning, airline scheduling, unit commitment in power systems are just few
examples of areas in which ignoring uncertainty may lead to inferior or simply
wrong decisions. Often there is a variety of ways in which the uncertainty can be

1
2 A. Ruszczyński and A. Shapiro

formalized and over the years various approaches to optimization under


uncertainty were developed. We discuss a particular approach based on
probabilistic models of uncertainty. By averaging possible outcomes or
considering probabilities of events of interest we can define the objectives and
the constraints of the corresponding mathematical programming model.
To formulate a problem in a consistent way, a number of fundamental
assumptions need to be made about the nature of uncertainty, our knowledge
of it, and the relations of decisions to the observations made. In order to
motivate the main concepts let us start by discussing the following classical
example.

Example 1 (Newsvendor Problem). A newsvendor has to decide about the


quantity x of newspapers which he purchases from a distributor at the begin-
ning of a day at the cost of c per unit. He can sell a newspaper at the price s per
unit and unsold newspapers can be returned to the vendor at the price of r per
unit. It is assumed that 0  r < c < s. If the demand D, i.e., the quantity of
newspapers which he is able to sell at a particular day, turns out to be greater
than or equal to the order quantity x, then he makes the profit sx  cx ¼ (s  c)x,
while if D is less than x, his profit is sD þ r(x  D)  cx ¼ (r  c)x þ (s  r)D.
Thus the profit is a function of x and D and is given by


ðs  cÞx, if x  D,
Fðx, DÞ ¼ ð1:1Þ
ðr  cÞx þ ðs  rÞD, if x > D:

The objective of the newsvendor is to maximize his profit. We assume that


the newsvendor is very intelligent (he has Ph.D. degree in mathematics from a
prestigious university and sells newspapers now), so he knows what he is
doing. The function F(  , D) is a continuous piecewise linear function with
positive slope s  c for x < D and negative slope r  c for x>D. Therefore, if
the demand D is known, then the best decision is to choose the order quantity
x* ¼ D. However, in reality D is not known at the time the order decision
has to be made, and consequently the problem becomes more involved.
Since the newsvendor has this job for a while he collected data and has quite a
good idea about the probability distribution of the demand D. That is, the
demand D is viewed now as a random variable with a known, or at least well
estimated, probability distribution measured by the corresponding cumulative
distribution function (cdf) G(w) :¼ P(D  w). Note that since the demand
cannot be negative, it follows that G(w) ¼ 0 for any w < 0. By the Law of Large
Numbers the average profit over a long period of time tends to the expected value

Z 1
E½Fðx, DÞ ¼ Fðx, wÞ dGðwÞ:
0
Ch. 1. Stochastic Programming Models 3

Therefore, from the statistical point of view it makes sense to optimize the
objective function on average, i.e., to maximize the expected profit E[F(x, D)].
This leads to the following stochastic programming problem1
 
Max f ðxÞ :¼ E½Fðx, DÞ : ð1:2Þ
x0

Note that we treat here x as a continuous rather than integer variable. This
makes sense if the quantity of newspapers x is reasonably large.
In the present case it is not difficult to solve the above optimization problem
in a closed form. Let us observe that for any D  0, the function F(  , D)
is concave (and piecewise linear). Therefore, the expected value function f(  ) is
also concave. Suppose for a moment that G(  ) is continuous at a point x>0.
Then
Zx Z1
f ðxÞ ¼ ½ðr  cÞx þ ðs  rÞw dGðwÞ þ ðs  cÞx dGðwÞ:
0 x

Using integration by parts it is possible to calculate then that


Z x
f ðxÞ ¼ ðs  cÞx  ðs  rÞ GðwÞ dw: ð1:3Þ
0

The function f(  ) is concave, and hence continuous, and therefore formula


(1.3) holds even if G(  ) is discontinuous at x. It follows that f(  ) is
differentiable at x iff (that is, if and only if) G(  ) is continuous at x, in which
case

f 0 ðxÞ ¼ s  c  ðs  rÞGðxÞ: ð1:4Þ

Consider the inverse G  1() :¼ min{x: G(x)  } function2 of the cdf G,


which is defined for  2 (0, 1). Since f(  ) is concave, a necessary and sufficient
condition for x* >0 to be an optimal solution of problem (1.2) is that
f 0 (x* ) ¼ 0, provided that f(  ) is differentiable at x* . Note that because
r < c < s, it follows that 0 < (s  c)/(s  r) < 1. Consequently, an optimal
solution of (1.2) is given by
s  c 
x* ¼ G1 : ð1:5Þ
sr

This holds even if G(  ) is discontinuous at x* . It is interesting to note


that G(0) is equal to the probability that the demand D is zero, and

1
The notation ‘‘ :¼ ’’ means equal by definition.
2
Recall that G  1() is called the -quantile of the cdf G.
4 A. Ruszczyński and A. Shapiro

hence if this probability is positive and (s  c)/(s  r)  G(0), then the optimal
solution x* ¼ 0.
Clearly the above approach explicitly depends on the knowledge of the
probability distribution of the demand D. In practice the corresponding cdf
G(  ) is never known exactly and could be approximated (estimated) at best. In
the present case the optimal solution is given in a closed form and therefore its
dependence on G(  ) can be easily evaluated. It is well known that -quantiles
are robust (stable) with respect to small perturbations of the corresponding
cdf G(  ), provided that  is not too close to 0 or 1. In general, it is important
to investigate sensitivity of a considered stochastic programming problem with
respect to the assumed probability distributions.
The following deterministic optimization approach is also often used for
decision making under uncertainty. The random variable D is replaced by its
mean  ¼ E[D], and then the following deterministic optimization problem is
solved:

Max Fðx, Þ: ð1:6Þ


x0

A resulting optimal solution x is sometimes called the expected value


solution. In the present example, the optimal solution of this deterministic
optimization problem is x ¼ . Note that the mean solution x can be very
different from the solution x* given in (1.5). It is well known that the quantiles
are much more stable to variations of the cdf G than the corresponding
mean value. Therefore, the optimal solution x* of the stochastic optimization
problem is more robust with respect to variations of the probability
distributions than an optimal solution x of the corresponding deterministic opti-
mization problem. This should be not surprising since the deterministic
problem (1.6) can be formulated in the framework of the stochastic
programming problem (1.2) by considering the trivial distribution of D being
identically equal to .
For any x, F(x, D) is concave in D. Therefore the following Jensen’s
inequality holds:

Fðx, Þ  E½Fðx, DÞ:

Hence

max Fðx, Þ  max E½Fðx, DÞ:


x0 x0

Thus the optimal value of the deterministic optimization problem is


biased upward relative to the optimal value of the stochastic optimization
problem. This should be also not surprising since the optimization problem
Ch. 1. Stochastic Programming Models 5

(1.6) is ‘‘too optimistic’’ in the sense that it does not take into account possible
variability of the demand D.
Another point which is worth mentioning is that by solving (1.2) the
newsvendor tries to optimize the profit on average. However, for a particular
realization of the demand D, on a particular day, the profit F(x* , D) could be
very different from the corresponding expected value f (x* ). This may happen
if F(x* , D), considered as a random variable, has a large variability which
could be measured by its variance Var [F(x* , D)]. Therefore, if the newsvendor
wants to hedge against such variability he may consider the following
optimization problem
 
Max f ðxÞ :¼ E½Fðx, DÞ  Var½Fðx, dÞ : ð1:7Þ
x0

The coefficient   0 represents the weight given to the conservative part of


the decision. If  is ‘‘large’’, then the above optimization problem tries to find
a solution with minimal profit variance, while if  ¼ 0, then problem (1.7)
coincides with problem (1.2). Since

Var½Fðx, DÞ ¼ E½Fðx, DÞ2   ½EFðx, DÞ2 ,

from a mathematical point of view problem (1.7) is similar to the expected


value problem (1.2). Note, however, that the additional (variance) term in
(1.7) destroys the convexity of the optimization problem (see Section 4 for a
further discussion).
The newsvendor may be also interested in making at least a specified
amount of money, b, on a particular day. Then it would be reasonable
to consider the problem of purchasing the minimum number of newspapers,
x, under the condition that the probability of making at least b is not
less than 1  , where  2 (0, 1) is fixed. Such a problem can be formulated in
the form

Min x ð1:8Þ
 
s:t: P Fðx, DÞ  b  1  : ð1:9Þ

The newsvendor can solve this problem, too (remember that he is really
smart). It is clear that the following inequality should be satisfied

ðs  cÞx  b, ð1:10Þ

since otherwise there is no way of making b. For a fixed x satisfying this


condition, the profit F(x, D) is a nondecreasing function of the demand D.
6 A. Ruszczyński and A. Shapiro

Therefore
   
P Fðx, DÞ  b ¼ P D  dðx, bÞ ,

where (after straightforward calculations)

b þ ðc  rÞx
dðx, bÞ ¼ :
sr

It follows from (1.9) that d(x, b)  G  1(), which can be written as

b þ ðc  rÞx  ðs  rÞG1 ðÞ: ð1:11Þ

It is clear that the solution can exist iff the constraints (1.10)–(1.11) are
consistent, that is, if

b  ðs  cÞG1 ðÞ: ð1:12Þ

Therefore, we obtain that problem (1.8)–(1.9) is feasible iff (1.12) holds, in


which case it has the optimal solution

b
x̂ ¼ : ð1:13Þ
sc

1.2 The basic model

Let us formalize optimization problems of the type discussed in the


newsvendor example. To this end we use the following notation and
terminology. By X we denote the space of decision variables. In most
applications considered in this book X can be identified with a finite
dimensional vector space Rn. It is assumed that there is a given set X  X of
feasible (or permissible) decisions and an (objective) function F(x, !) of
decision vector x  X and random element !. In an abstract setting
we consider ! as an element of a sample space  equipped with a sigma
algebra F. In typical applications considered in this book, the involved
random data is formed by a finite number of parameters. Consequently, the
objective function is given in the form Fðx, !Þ :¼ Vðx, ð!ÞÞ, where (!) is a
finite dimensional random vector and V(x, ) is a function of two vector
variables x and .
Of course, the mathematical programming problem of minimization
(or maximization) of F(x, !) subject to x 2 X depends on ! and does not make
much sense. For different realizations of the random parameters one would
obtain different optimal solutions without any insight which one is ‘‘better’’
Ch. 1. Stochastic Programming Models 7

than the others. A way of dealing with that is to optimize the objective function
on average. This leads to the following mathematical programming problem
 
Min f ðxÞ :¼ E½Fðx, !Þ : ð1:14Þ
x2X

The above formulation of a stochastic programming problem assumes


implicitly that the expected value is taken with respect to a known probability
distribution (measure) P on (, F ) and that the expected value operator
Z
E½Fðx, !Þ ¼ Fðx, !Þ dPð!Þ ð1:15Þ


is well defined. We refer to the function f(x), defined in (1.14), as the


expectation or expected value function. Note that we will have to deal with
extended real valued functions. That is, the function F(x, !) (as well as its
expectation) is allowed to take values þ 1 or  1. The precise meaning of
the involved concepts is discussed in the Appendix (Section 5).

1.3 Modeling the constraints

In (1.14) we have assumed that we have an explicit description of the


feasible set X. For example, the feasible set X can be written in a standard
mathematical programming formulation as follows

X :¼ fx 2 X0 : gi ðxÞ  0, i ¼ 1, . . . , mg, ð1:16Þ

where X0 is a convex subset of X :¼ Rn and gi (x) are real valued functions.


When the uncertain quantities enter the ‘raw’ constraints of our
background model,

Gi ðx, !Þ  0, i ¼ 1, . . . , m, ð1:17Þ

we need to specify what we mean by ‘feasibility’. Some values of x may satisfy


(1.17) for some ! and violate these conditions for other !. Often it is
unrealistic to require that constraints (1.17) should hold for all ! 2 . In our
newsvendor example, for instance, the requirement to make at least profit
b can hardly be satisfied for all realizations of the demand D.
Several approaches can be used to introduce a meaningful notion of
feasibility in this context. One of them is to consider the expected values,

gi ðxÞ :¼ E½Gi ðx, !Þ, i ¼ 1, . . . , m, ð1:18Þ

as constraint functions in (1.16).


8 A. Ruszczyński and A. Shapiro

Expected value constraints usually occur in situations when we have, in


fact, several objectives, and we put some of them into the constraints, as in the
example below.

Example 2 (Reservoir Capacity). Consider the system of two reservoirs (Fig. 1),
whose objective is to retain the flood in the protected area. The flood is
produced by two random inflows, 1 and 2. Flood danger occurs once a year,
say, and 1, 2 appear simultaneously. The damage from flood of size y  0 is
modeled as a convex nondecreasing function L( y), where L(0) ¼ 0. Our
objective is to determine the reservoir capacities, x1 and x2, so that the
expected damage from the flood is below some specified limit b, and the cost of
the reservoirs, f (x1, x2) is minimized.
The size of the flood is random and is given by the expression

y ¼ maxf0, 1 þ 2  x1  x2 , 2  x2 g:

Our problem takes on the form

Min f ðx1 , x2 Þ
s:t: E½Lðmaxf0, 1 þ 2  x1  x2 , 2  x2 gÞ  b:
x1  0, x2  0: ð1:19Þ

It would be an error to replace the random inflows in this problem by their


expected values, 1 and 2. By Jensen’s inequality we have

Lðmaxf0, 1 þ 2  x1  x2 , 2  x2 gÞ
 E½Lðmaxf0, 1 þ 2  x1  x2 , 2  x2 gÞ;

and the difference may be large, even for a linear function L(  ). As a result,
the expected losses from a flood may be much higher than foreseen by a naive
deterministic model.
Another way to define the feasible set is to use constraints on the
probability of satisfying (1.17):
 
P Gi ðx, !Þ  0  1  , i ¼ 1, . . . , m, ð1:20Þ

Fig. 1. The water reservoir system.


Ch. 1. Stochastic Programming Models 9

with some fixed  2 (0,1) (as in our newsvendor example). Such constraints are
called probabilistic or chance constraints.3
For a set A we denote by 1A(  ) its characteristic function,

1, if t 2 A,
1A ðtÞ :¼ ð1:21Þ
0, if t 2
6 A:

Then (1.20) can be written as the expected value constraints

E½1ð1, 0Þ ðGi ðx, !ÞÞ  1  , i ¼ 1, . . . , m: ð1:22Þ

Note, however, that the discontinuity of the characteristic function makes


such constraints very specific and different from the ‘standard’ expected value
constraints.
Following is an example where probabilistic constraints appear in a natural
way.

Example 3 (Value at Risk). Suppose that there are n investment opportunities,


with random returns R1, . . . , Rn in the next year. We have a practically
unlimited initial capital and our aim is to invest some of it in such a way that
the expected value of our investment after a year is maximized, under the
condition that the chance of losing no more than some fixed amount b>0 is at
least 1  , where  2 (0, 1). Such a requirement is called the Value at Risk
constraint.
Let x1, . . . , xn be the amounts invested in the n opportunities. P The
net increase of the value of our investment after a year is Gðx, RÞ ¼ ni¼1 Ri xi :
Our problem takes on the form of a probabilistic constrained stochastic
program:

X
n
Max i x i
i¼1
( )
X
n
s:t: P Ri xi  b  1  ,
i¼1
x  0, ð1:23Þ

where i ¼ E½Ri . Note that for the sake of simplicity we do not impose here
the constraint x1 þ    þ xn ¼ W0, where W0 is the total invested amount, as
compared with the example of financial planning (Example 7) discussed later.

3
In the extreme case when  ¼ 0, conditions (1.20) mean that constraints Gi(x, !)  0, i ¼ 1, . . . , m,
should hold for a.e. ! 2 .
10 A. Ruszczyński and A. Shapiro

If the returns have a joint normal distribution with the covariance matrix
, the distribution of the profit (or loss) is normal, too, with the expected
pffiffiffiffiffiffiffiffiffiffiffiffi
value Tx, and variance xTx. Consequently, ðGðx, RÞ  T xÞ= xT x has the
standard normal distribution (i.e., normal distribution with mean zero and
variance one). Our probabilistic constraint is therefore equivalent to the
inequality

b þ T x
pffiffiffiffiffiffiffiffiffiffiffiffi  z ,
xT x

where z is the (1  )-quantile of the standard normal variable. If   1/2 then


z  0. After elementary manipulations we obtain the following convex
programming equivalent of problem (1.23)

Max T x
pffiffiffiffiffiffiffiffiffiffiffiffi
s:t: z xT x  T x  b,
x  0: ð1:24Þ

If we ignore the nonnegativity constraint on x we can solve this problem


analytically. Indeed, x ¼ 0 is a feasible solution and both functions are
positively homogeneous in x, so either the probabilistic constraint has to be
satisfied as an equality or the problem is unbounded. Let   0 be the
Lagrange multiplier associated with this constraint. We obtain the equation

z x
ð1 þ Þ  pffiffiffiffiffiffiffiffiffiffiffiffi ¼ 0:
xT x

From here we deduce that there must exist a scalar t such that x ¼ t. We
assume that the matrix  is nonsingular and  6¼ 0. Substitution to the
constraint yields (afterpffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
simple ffi calculations) t ¼ b=ðz  Þ and  ¼
ðz =  1Þ1 , with  :¼ T 1  (note that 1 is positive definite and
hence T1 is positive). If   z, then the problem is unbounded, i.e., its
optimal value is þ 1. If  < z, then the vector

b
x̂ :¼ 1 
ðz  Þ

is the solution to the problem without sign restrictions on x. If, in addition,


1  0, then the vector x̂ solves our original problem. Otherwise, numerical
methods of convex programming are needed to find the optimal solution.
Ch. 1. Stochastic Programming Models 11

In many practical situations, though, the returns are not jointly normally
distributed, and even the single Value at Risk constraint, like the one analyzed
here, may create significant difficulties.
Let us now assume that our planning horizon is T years, and let
R1(t), . . . , Rn(t) be the random investment returns in years t ¼ 1, . . . , T. We
want to maximize the expected value of our investment after T years, under
the condition that with probability at least 1   the value of our investment
will never drop by more than b from the initial amount invested. We do not
want to re-allocate our investment, we just want to invest once and then watch
our wealth grow (hopefully).
Let x1, . . . , xn be the amounts invested in the n opportunities. The net
change in the value of our investment in year t is

X
n
Gðx, R, tÞ ¼ Si ðtÞxi ,
i¼1

where Si ðtÞ :¼ t¼1 ð1 þ Ri ðÞÞ  1 is the compounded return of investment


i up to year t. Denoting i :¼ E[Si (T)], our problem takes on the form:

X
n
Max i x i
x0
i¼1
 
s:t: P Gðx, R, tÞ  b, t ¼ 1, . . . , T  1  : ð1:25Þ

This is an example of a problem with a joint probabilistic constraint,


which is different from imposing the constraints PfGðx, R, tÞ  bg 
1  , t ¼ 1, . . . , T, requiring that for each year the probability of losing
no more than b is 1   or higher. A joint probabilistic constraint can be
formally treated as a constraint for one function, defined as the worst case
among the individual constraints. In our example we may define
Gðx, RÞ :¼ min1tT Gðx, R, tÞ and require that

PfGðx, RÞ  bg  1  : ð1:26Þ

Such constraints may be difficult to handle, both theoretically and


computationally.

2 Two-stage models

2.1 The linear model

We can view the decision problem which the newsvendor faces in Example 1
as two stage. In the morning, before a realization of the demand D is known,
12 A. Ruszczyński and A. Shapiro

he has to decide about the quantity x of newspapers which he purchases for


that day. By the end of the day when value of D becomes known, he optimizes
his behavior by selling as many newspapers as possible. Although simple, his
second stage decision can be also formulated as an optimization problem.
His second stage decision variables can be defined as the quantity y which he
sells at price s, and the quantity z which he returns at price r. Then, given a value
of the first stage decision variable x and a realization of the demand D, the
second stage problem consists of maximizing the profit and can be written as
follows

Max sy þ rz
y, z

subject to
y  D, y þ z  x, y  0, z  0:

The optimal solution of the above problem is y* ¼ minfx, Dg,


z* ¼ maxfx  D, 0g, and its optimal value is the profit F(x, D) defined in (1.1).
This is the basic idea of a two stage process. At the first stage, before a
realization of the corresponding random variables becomes known, one
chooses the first stage decision variables to optimize the expected value of an
objective function which in turn is the optimal value of the second stage
optimization problem. A two-stage stochastic linear program can be written as
follows

Minx cT x þ E½Qðx, ð!ÞÞ


s:t: Ax ¼ b, x  0, ð2:2Þ

where Q(x, ) is the optimal value of the second stage problem

Miny qT y
s:t: Tx þ Wy ¼ h, y  0: ð2:3Þ

Here x and y are vectors of first and second stage decision variables,
respectively. The second stage problem depends on the data  :¼ (q, h, T, W ),
some (all) elements of which can be random. Therefore we view  ¼ (!) as a
random vector. The expectation in (2.2) is taken with respect to the probability
distribution of (!), which is supposed to be known. The matrices T and W are
called the technology and recourse matrices, respectively. If the matrix W is fixed
(not random), the above two-stage problem is called the problem with fixed
recourse. In a sense the second stage problem (2.3) can be viewed as a penalty
term for violation of the constraint Tx ¼ h, hence is the name ‘‘with recourse’’.
For any x and  the function Q(x, ), although not given explicitly, is a well
defined extended real valued function: it takes the value þ 1 if the feasible set
of the second stage problem (2.3) is empty, and the value  1 if the second
Ch. 1. Stochastic Programming Models 13

stage problem is unbounded from below. As it is discussed in Section 5.2, it


should be verified that the expected value in (2.2) is well defined. It is
worthwhile to note at this point that problem (2.2) is a particular case of the
stochastic programming problem (1.14) with Fðx, !Þ :¼ cT x þ Qðx, ð!ÞÞ and
X :¼ fx : Ax ¼ b, x  0g.
By the definition of the function Q(x, ) we have that it can be written in the
form Qðx, Þ ¼ Qðh  TxÞ, where

QðÞ :¼ inffqT y : Wy ¼ , y  0g: ð2:4Þ


By the duality theory of linear programming the optimal value Q()
of the linear program in the right hand side of (2.4) is equal to sup
f T  : W T  qg, unless both systems: Wy ¼ , y  0 and WT  q, are infeasible.
Consequently,
 
Qðx, Þ ¼ sup T ðh  TxÞ : W T  q : ð2:5Þ

The feasible set { : WT  q} of the dual problem is convex polyhedral.


Therefore, for any realization of random data , the function Q(  , ) is
convex piecewise linear. Chapter ‘‘Optimality and Quality in Stochastic
Programming’’ of this book provides a detailed analysis of the properties of
Q(  , ) and of its expected value.

2.2 The case of discrete distributions

There are equivalent formulations of the two-stage linear recourse problem


(2.2)–(2.3) which are useful in different situations. In order to simplify
the presentation and to defer technical details let us assume now that the
random data have a discrete distribution with a finite number K of possible
realizations k ¼ (qk, hk, Tk, Wk), called scenarios,
P with the corresponding
probabilities pk. In that case E½Qðx, Þ ¼ K k¼1 pk Qðx, k Þ where

Qðx, k Þ ¼ inffqTk yk : Tk x þ Wk yk ¼ hk, yk  0g: ð2:6Þ

Consequently, we can write (2.2)–(2.3) in the form


X
K
Min cT x þ pk qTk yk
x, y1 ,..., yk
k¼1
s:t: Ax ¼ b,
Tk x þ Wk yk ¼ hk , k ¼ 1, . . . , K,
x  0, yk  0, k ¼ 1, . . . , K: ð2:7Þ

That is, the two-stage problem can be formulated as one large linear
programming problem.
14 A. Ruszczyński and A. Shapiro

Example 4 (Capacity Expansion). Consider a directed graph with node set N


and arc set A. With each arc a 2 A we associate a decision variable xa and call
it the capacity of a. There is a cost ca for each unit of capacity of arc a.
For each pair of nodes (m, n) 2 N  N we have a random demand Dmn for
shipments from m to n. These shipments have to be sent through the network
and they can be arbitrarily split into pieces taking different paths. We denote
by ymna the amount of the shipment from m to n sent through arc a. There is a
unit cost qa for shipments on each arc a.
Our objective is to assign arc capacities and to organize shipments in such a
way that the expected total cost, comprising the capacity cost and the shipping
cost, is minimized. The condition is that the capacities have to be assigned
before the actual demands Dmn become known, while the shipments can be
arranged after that.
We recognize in this model a linear two-stage stochastic programming
model with first stage variables xa, a 2 A, and second stage variables
ymn
a , a 2 A, (m, n) 2 N  N.
Let us define the second stage problem. For each node i denote by A þ (i)
and A  (i) the sets of arcs entering and leaving node i. The second stage
problem is the multicommodity network flow problem
X X
Min qa ymn
a
m, n2N a2A
8 mn
X X < D ,
> if i ¼ m,
s:t: ymn
a  ymn
a ¼ Dmn , if i ¼ n,
a2Aþ ðiÞ a2A ðiÞ
>
:
0, otherwise,
X
ymn
a  xa , a 2 A,
m, n2N

ymn
a  0, a 2 A, i, m, n 2 N : ð2:8Þ

This problem depends on the random demand vector D and on the arc
capacities, x. Its optimal value will be denoted Q(x, D).The first stage problem
has the form
X
Min ca xa þ E½Qðx, DÞ:
x0
a2A

In this example only some right hand side entries in the second stage
constraints are random. All the matrices and cost vectors are deterministic.
Nevertheless, the size of this problem, even for discete distributions of the
demands, may be enormous. If the number of nodes is
, the demand vector
has
(
 1) components. If they are independent, and each of them has r
possible realizations, we have to deal with K ¼ r
(
 1) scenarios. For each of
Ch. 1. Stochastic Programming Models 15

them the second stage vector has j


ð
 1ÞjAjj components and there are

2 ð
 1Þ þ jAj constraints (excluding nonnegativity constraints). As a result,
the large scale linear programming formulation has jAj þ
ð
 1ÞjAjr
ð
1Þ
variables and ð
2 ð
 1Þ þ jAjÞr
ð
1Þ constraints. These are large numbers,
even for moderately sized networks and distributions with only few
possibilities.
A more complex situation occurs when the arcs are subject to failures and
they may lose random fractions a of their capacities. Then the capacity
constraint in the second stage problem has a slightly different form:
X
ymn
a  ð1  a Þxa , a 2 A,
m, n2N

and we have a two-stage problem with a random ‘technology’ matrix. Its


complexity, of course, is even higher than before.

2.3 Scenario formulation and nonanticipativity

Let us relax problem (2.7) by replacing the first stage decision vector x by K
possibly different vectors xk. We obtain the problem

X
K
Min pk ðcT xk þ qTk yk Þ
x1 ,..., xK k¼1
y1 ,..., yK

s:t: Axk ¼ b,
Tk xk þ Wk yk ¼ hk ;
xk  0, yk  0, k ¼ 1, . . . , K: ð2:9Þ

Problem (2.9) is separable in the sense that it can be split into K smaller
problems, one for each scenario, and therefore it is much easier for a
numerical solution. However, (2.9) is not suitable for modeling a two stage
process. This is because the first stage decision variables xk in (2.9) are now
allowed to depend on a realization of the random data at the second stage.
This can be fixed by introducing the additional constraints

xk ¼ xj , for all 1  k < j  K: ð2:10Þ

Together with the additional constraints (2.10), problem (2.9) becomes


equivalent to (2.7).
Constraints (2.10) are called nonanticipativity constraints. They ensure that
the first stage decision variables do not depend on the second stage realization
of the random data. Such nonanticipativity constraints will be especially
important in multistage modeling which we will discuss later.
16 A. Ruszczyński and A. Shapiro

In fact, some of the constraints in (2.10) are redundant; for example, it is


sufficient to require that xk ¼ xk þ 1 for k ¼ 1, . . . , K  1. There are many other
ways to express these conditions, but they all define the same linear subspace
of the space of decision variables of (2.9). A way to express the non-
anticipativity condition is to require that

X
K
xk ¼ pi xi , k ¼ 1, . . . , K; ð2:11Þ
i¼1

which is convenient for extensions to the general case.

2.4 General formulation

As it was discussed above the essence of two stage modeling is that there are
two distinct parts of the decision vector. The value of the first vector x 2 X,
with X ¼ Rn, has to be chosen before any realization of the unknown
quantities, summarized in the data vector  ¼ (!), are observed. The value of
the second part, y, can be chosen after the realization of  becomes known and
generally depends on the realization of  and on the choice of x. Consequently,
at the first stage one has to solve the expectation optimization problem

Min E½Fðx, !Þ: ð2:12Þ


x2X

In the case of two-stage linear problem (2.2),

Fðx, !Þ :¼ cT x þ Qðx, ð!ÞÞ

with Q(x, ) being the optimal value of the second stage optimization problem
(2.3) (viewed as an extended real valued function). In such formulation an
explicit dependence on the second stage decision variables y is suppressed. It
will be convenient to discuss that formulation first.
As in the example of problem (2.9), we may relax the expectation problem
(2.12) by allowing the first stage decision variables to depend on the random
data and then to correct that by enforcing nonanticipativity constraints.
Denote by M ¼ M(, F, X ) the space of measurable mappings4 x(  ):  ! X
such that the expectation E[F(x(!), !)] is well defined. Then the relaxed
problem can be formulated in the form

Min E½Fðxð!Þ, !Þ: ð2:13Þ


xðÞ2M

4
We write here x (  ), instead of x, in order to emphasize that x(  ) is not a vector, but rather a vector
valued function of !.
Ch. 1. Stochastic Programming Models 17

Denote

#ð!Þ :¼ inf Fðx, !Þ


x2X

the optimal value function of problem (2.12).


Note that optimization in (2.13) is performed over all mappings x(!) in the
functional space M. In particular, if  :¼ {!1, . . . , !K} is finite, with respective
probabilities p1, . . . , pk, then x(!) can be identified with (x1, . . . , xK), where
xk:¼ xð!k Þ. In that case problem (2.13) can be written in the form

X
K
Min pk Fðxk , !k Þ: ð2:14Þ
x1,..., xK
k¼1

Proposition 5. Suppose that: (i) the function F(x, !) is random lower


semicontinuous,5 (ii) either E½#ð!Þ þ  < þ 1 or E½ð#ð!ÞÞ þ  < þ 1. Then

 
inf E½Fðxð!Þ, !Þ ¼ E inf Fðx, !Þ : ð2:15Þ
xðÞ2M x2X

Proof. Since F(x, !) is random lsc we have by Theorem 19 that #(!) is


measurable. Together with the assumption (ii) this implies that the expectation
in the right hand side of (2.15) is well defined. For any x(  ) 2 M(, F, X ) we
have that F(x(!), !)  #(!) for all ! 2 , and hence the left hand side of (2.15)
is always greater than or equal to the right hand side of (2.15). Conversely, if
#(!)>  1 for a.e. ! 2 , then for any given ">0 and a.e. ! 2  there exists
an "-optimal solution x~ ð!Þ. Moreover, since F(x, !) is random lsc, x~ ð!Þ can be
chosen to be measurable, i.e., x~ 2 Mð, F , X Þ. It follows that

E½Fðx~ ð!Þ, !Þ  E½#ð!Þ þ ":

Since " is an arbitrary positive number, this implies that the left hand side of
(2.15) is less than or equal to the right hand side of (2.15). Finally, if the event
‘‘#(!) ¼  1’’ happens with positive probability, then both sides of (2.15) are
equal to  1. u

5
See Section 5.3 of the Appendix for the definition and discussion of random lower semicontinuous
functions.
18 A. Ruszczyński and A. Shapiro

We also have that problem (2.12) is equivalent to

Min E½Fðxð!Þ, !Þ ð2:16Þ


xðÞ2M

s:t: xð!Þ ¼ E½xð!Þ, 8! 2 : ð2:17Þ

Constraints (2.17) give an extension of constraints (2.11), and represent the


nonanticipativity condition.6 Since problem (2.13) is a relaxation of (2.16)–
(2.17), and because of (2.15), we obtain that
 
inf E½Fðx, !Þ  E inf Fðx, !Þ : ð2:18Þ
x2X x2X

The above inequality also follows directly from the obvious inequality
Fðx, !Þ  #ð!Þ for all x 2 X and ! 2 .
Let us give now a formulation where the second stage decision variables
appear explicitly:

Min E½Vðx, ð!ÞÞ, ð2:19Þ


x2X

where Vðx, Þ is the optimal value of the second stage problem


Min Fðx, y, Þ
y2Y

s:t: Gi ðx, y, Þ  0, i ¼ 1, . . . , m: ð2:20Þ

Here X is a subset of Rn1 , Y is a subset of Rn2 , and

F : Rn1  Rn2  Rd ! R,
Gi : Rn1  Rn2  Rd ! R, i ¼ 1, . . . , m,

are the objective and the constraint functionals, respectively.


Alternatively, in an abstract form the above two stage stochastic
programming problem can be formulated as follows

Min E½Fðx, yð!Þ, ð!ÞÞ ð2:21Þ


x2X , yðÞ2Y

s:t: Gi ðx, yð!Þ, ð!ÞÞ  0, i ¼ 1, . . . , m, ð2:22Þ

x 2 X, ð2:23Þ

6
Since the expected value of two random variables which may differ on a set of measure zero is the
same, it actually suffices to verify the constraints (2.17) for P-almost every ! 2 .
Ch. 1. Stochastic Programming Models 19

yð!Þ 2 Y, ð2:24Þ

where X :¼ Rn1 and Y is a space of measurable functions from  to Rn2 . In


that formulation yð!Þ is viewed as a random vector in Rn2 . Note, however, an
important difference between random vectors ð!Þ and yð!Þ. Vector ð!Þ
represents the random data of the problem with a given (known) distribution,
while yð!Þ denotes the second stage decision variables. We have explicitly
marked the dependence of y on the elementary event ! to stress the recourse
nature of these variables. The inequalities (2.22) and the inclusion (2.24) are
understood in the almost sure sense, i.e., they have to hold for P-almost every7
! 2 . Recall that the probability measure P on ð, F Þ generates the
corresponding probability distribution of ðð!Þ, yð!ÞÞ viewed as a random
vector. Therefore, ‘‘for P-almost every ! 2 ’’ means that the event happens
for almost every realization of the random vector ð, yÞ.
The difficulty in the formulation (2.21)–(2.24) is the fact that the second
stage decisions y are allowed to be functions of the elementary event !. We
need to specify from which classes of functions these decisions have to be
chosen, i.e., to define the functional space Y. The mappings y :  ! Rn2 , have
to be measurable with respect to the sigma algebra F and such that the
expectation in (2.21) makes sense. Otherwise we shall not be able to talk in a
meaningful way about the expectation of the objective functional and the
‘almost sure’ satisfaction of the constraints. Moreover, in fact y is a function
of . Therefore, we can identify the probability space ð, F , PÞ with the
probability space ðRd , B, PÞ of the random vector , and view yðÞ as an
element of a space of measurable mappings from Rd into Rn2 . In particular, in
the case of finitely many realizations 1 , . . . , K , we can identify the sample
space with the set  :¼ f1, . . . , Kg equipped with the sigma algebra of all its
subsets. In that case it suffices to consider mappings y : f1, . . . , Kg ! Rn2 ,
which could be identified with vectors y1 , . . . , yK 2 Rn2 . As a result, the
decision space in the case of finitely many realizations is just
Rn1  R n2
   Rnffl}2 :
 ffl{zfflfflfflfflfflfflfflfflfflfflfflffl
|fflfflfflfflfflfflfflfflfflfflfflffl
K times
The constraints (2.22)–(2.24) can be absorbed into the objective function by
defining

(
Fðx, y, Þ, if x 2 X, y 2 Y, Gi ðx, y, Þ  0, i ¼ 1, . . . , m,
Fðx, y, Þ :¼
þ 1, otherwise:

7
Written: ‘‘a.e. ! 2 ’’.
20 A. Ruszczyński and A. Shapiro

Then problem (2.21)–(2.24) can be written in the form



Min E Fðx, yð!Þ, ð!ÞÞ : ð2:25Þ
x2, yðÞ2Y

In a way similar to the proof of Proposition 5 it is possible to show that the


two formulations (2.19)–(2.20) and (2.21)–(2.24) are equivalent if for every
x 2 X , the function Fðx,  , Þ is random lsc and the expectation of the optimal
value function inf y2Rn2 Fðx, y, ð!ÞÞ is well defined.
Let us now consider both parts of the decision vector, x and y as random
elements. We obtain the problem

Min E½Fðxð!Þ, yð!Þ, ð!ÞÞ


xðÞ, yðÞ

s:t: Gi ðxð!Þ, yð!Þ, ð!ÞÞ  0, i ¼ 1, . . . , m,

xð!Þ 2 X, yð!Þ 2 Y:

All constraints here are assumed to hold P-almost surely, i.e., for a.e.
! 2 . The above problem is an analogue of (2.13) with optimization
performed over mappings ðxðÞ, yðÞÞ in an appropriate functional space, and as
in the finite scenario case, is a relaxation of the problem (2.21)–(2.24). To
make it equivalent to the original formulation we must add the nonanti-
cipativity constraint which can be written, for example, in the form (2.17).
For example, consider the two-stage linear program (2.2)–(2.3). We can
write it in the form



Min E cT x þ qð!ÞT yð!Þ
x, yðÞ

s:t: Tð!Þx þ Wð!Þyð!Þ ¼ hð!Þ, a:e: ! 2 ,


Ax ¼ b, x  0,
yð!Þ  0, a:e: ! 2 ,

with yðÞ being a mapping from  into Rn2 . In order for the above problem to
make sense the mapping yð!Þ should be measurable and the corresponding
expected value should be well defined. Suppose for a moment that vector q is
not random, i.e., it does not depend on !. Then we can assume that yð!Þ is an
element of the space Ln12 ð, F , PÞ of F -measurable mappings8 y :  ! Rn2

8
In fact an element of Ln12 ð, F , PÞ is a class of mappings which may differ from each other on sets of
P-measure zero.
Ch. 1. Stochastic Programming Models 21
R
such that  jjyð!Þjj dPð!Þ < þ 1. If qð!Þ
R is random we can consider a space
of measurable mappings yðÞ such that  jqð!ÞT yð!Þj dPð!Þ < þ 1.

2.5 Value of perfect information

Consider a two stage stochastic programming problem in the form (2.19)


with Vðx, Þ being the optimal value of the second stage problem (2.20). If we
have a perfect information about the data , i.e., the value of  is known at the
time when the first stage decision should be made, then the optimization
problem becomes the deterministic problem

Min Vðx, Þ, ð2:26Þ


x2X

and can be written in the following equivalent form

Minx2X, y2Y Fðx, y, Þ


s:t: Gi ðx, y, Þ  0, 1, . . . , m: ð2:27Þ

Of course, the optimal solution xðÞ (if it exists) and the optimal value ðÞ of
problem (2.26) depend on the realization  of the data. The average of ðÞ
over all possible realizations of the random data  ¼ ð!Þ, i.e., the expected
value
 
E½ ðÞ ¼ E inf Vðx, ð!ÞÞ , ð2:28Þ
x2X

is called the wait-and-see solution.


We have that for any x 2 X and any  the inequality Vðx, Þ  ðÞ holds,
and hence
 
E½Vðx, ð!Þ  E inf Vðx, ð!ÞÞ : ð2:29Þ
x2X

Therefore, as it was mentioned earlier (see (2.18)), it follows that


 
inf E½Vðx, ð!Þ  E inf Vðx, ð!ÞÞ : ð2:30Þ
x2X x2X

That is, the optimal value of the stochastic programming problem (2.19) is
always greater than or equal to E½ ðÞ. Suppose further that problem (2.19)
has an optimal solution x̂. We have that Vðx̂, Þ  ðÞ is nonnegative for all ,
22 A. Ruszczyński and A. Shapiro

and hence its expected value is zero iff Vðx̂, Þ  ðÞ ¼ 0 w.p.1. That is, the
equality in (2.30) holds, iff

Vðx̂, ð!ÞÞ ¼ inf Vðx, ð!ÞÞ for a:e: ! 2 : ð2:31Þ


x2X

In particular, the equality in (2.30) holds, if there exists an optimal solution


of (2.26) which does not depend on  w.p.1.
The difference Vðx̂, Þ  ðÞ is the value of perfect information of knowing
the realization . Consequently,
 
EVPI :¼ inf E½Vðx, ð!ÞÞ  E inf Vðx, ð!ÞÞ ð2:32Þ
x2X x2X

represents the expected value of perfect information. It follows from (2.30) that
EVPI is always nonnegative, and EVPI ¼ 0 iff condition (2.31) holds.

3 Multistage models

3.1 The linear case

The two-stage model is a special case of a more general structure, called the
multi-stage stochastic programming model, in which the decision variables
and constraints are divided into groups corresponding to stages t ¼ 1, . . . , T:
The fundamental issue in such a model is the information structure: what is
known at stage t when decisions associated with this period are made? We first
give a general description of such multistage models and then discuss
examples in Section 3.4.
Let x1 , . . . , xT be decision vectors corresponding to time periods (stages)
1, . . . , T. Consider the following linear programming problem

Min cT1 x1 þ cT2 x2 þ cT3 x3 þ ... þ cTT xT


s:t: A11 x1 ¼ b1 ,
A21 x1 þ A22 x2 ¼ b2 ,
A32 x2 þ A33 x3 ¼ b3 ,
........................................................................
AT,T 1 xT1 þ ATT xT ¼ bT ,
x1  0, x2  0, x3  0, ... xT  0:
ð3:1Þ

We view it as a multiperiod stochastic program where c1 , A11 and b1 are


known, but some (all) of the entries of the cost vectors c2 , . . . , cT , matrices
Ch. 1. Stochastic Programming Models 23

At, t1 and Att , t ¼ 2, . . . , T, and right hand side vectors b2 , . . . , bT are random.
At each stage some of these quantities become known, and we have the
following sequence of actions:

decision ðx1 Þ
observation 2 :¼ ðc2 , A21 , A22 , b2 Þ
decision ðx2 Þ
..
.
observation T :¼ ðcT , AT, T 1 , ATT , bT Þ
decision ðxT Þ:

Our objective is to design the decision process in such a way that the
expected value of the total cost is minimized while optimal decisions are
allowed to be made at every time period t ¼ 1, . . . , T:
Let us denote by t the data which become known at time period t. In the
setting of the multiperiod problem (3.1), t is assembled from the components
of ct , At, t1 , Att , bt , some (all) of which can be random, and the data
1 ¼ ðc1 , A11 , b1 Þ at the first stage of problem (3.1) which is assumed to be
known. For 1  t1  t2  T, denote by

½t1 , t2  :¼ ðt1 , . . . , t2 Þ

the history of the process from time t1 to time t2 . In particular, ½1, t represents
the information available up to time t. The important condition in the above
multistage process is that every decision vector xt may depend on the
information available at time t (that is, ½1, t ), but not on the results of
observations to be made at later stages. This differs multistage stochastic
programs from deterministic multiperiod problems, in which all the inform-
ation is assumed to be available at the beginning.
There are several possible ways how multistage stochastic programs can be
formulated in a precise mathematical form. In one such formulation
xt ¼ xt ð½1, t Þ, t ¼ 2, . . . , T, is viewed as a function of ½1, t ¼ ð1 , . . . , t Þ, and
the minimization in (3.1) is performed over appropriate functional spaces (as
it was discussed in Section 2.4 in the case of two-stage programming). If the
number of scenarios is finite, this leads to a formulation of the linear
multistage stochastic program as one large (deterministic) linear programming
problem. We discuss that further in the following Section 3.2. It is also useful
to connect dynamics of the multistage process starting from the end as
follows.
Let us look at our problem from the perspective of the last stage T. At that
time the values of all problem data, ½1, T  , are already known, and the values
24 A. Ruszczyński and A. Shapiro

of the earlier decision vectors, x1 , . . . , xT1 , have been chosen. Our problem is,
therefore, a simple linear programming problem

Min cTT xT
xT

s:t: AT, T 1 xT1 þ ATT xT ¼ bT ,


xT  0: ð3:2Þ

The optimal value of this problem depends on the earlier decision vector
xT1 and data T ¼ ðcT , AT, T1 , AT, T , bT Þ, and is denoted by QT ðxT1 , T Þ.
At stage T  1 we know xT2 and ½1, T 1 . We face, therefore, the following
two-stage stochastic programming problem


Min cTT1 xT1 þ E QT ðxT1 , T Þ j ½1, T1
xT1

s:t: AT1, T 2 xT2 þ AT1, T 1 xT1 ¼ bT1 ,


xT1  0: ð3:3Þ

The optimal value of the above problem depends on xT2 and data ½1, T1 ,
and is denoted QT1 ðxT2 , ½1, T1 Þ.
Generally, at stage t ¼ 2, . . . , T  1, we have the problem


Min cTt xt þ E Qtþ1 ðxt , ½1, tþ1 Þ j ½1, t
xt

s:t: At, t1 xt1 þ At, t xt ¼ bt ,


xt  0: ð3:4Þ

Its optimal value is denoted Qt ðxt1 , ½1, t Þ and is called the cost-to-go
function.
Note that, since 1 is not random, the conditional distribution of t þ 1 given
½1, t is the same as the conditional distribution of t þ 1 given ½2, t ,
t ¼ 2, . . . , T  1. Therefore, it suffices to take the conditional expectation in
(3.4) (in (3.3)) with respect to ½2, t (with respect to ½2, T1 ), only.
On top of all these problems is the problem to find the first decisions, x1 ,

Min cT1 x1 þ E½Q2 ðx1 , 2 Þ


x1

s:t: A11 x1 ¼ b1 ,
x1  0: ð3:5Þ

Note that all subsequent stages t ¼ 2, . . . , T are absorbed in the above


problem (3.5) into the function Q2 ðx1 , 2 Þ through the corresponding
Ch. 1. Stochastic Programming Models 25

expected values. Note also that since 1 is not random, the optimal value
Q2 ðx1 , 2 Þ does not depend on 1 . In particular, if T ¼ 2, then (3.5) coincides
with the formulation (2.2) of a two-stage linear problem.
We arrived in this way at the following nested formulation:
2 2 2 333
6 7
Min cT1 x1 þE4 min cT2 x2 þE6
4þE4 min cTT xT 57
55:
A11 x1 ¼b1 A21 x1 þA22 x2 ¼b2 AT,T1 xT1 þATT xT ¼bT
x2 0 xT 0
x1 0

Recall that the random process 1 , . . . , T is said to be Markovian, if for each


t ¼ 2, . . . , T  1 the conditional distribution of t þ 1 given ½1, t ¼ ð1 , . . . , t Þ
is the same as the conditional distribution of t þ 1 given t . If the process
1 , . . . , T is Markovian, the model is simplified considerably. In the
Markovian case, for given T1 , the conditional expectation in problem
(3.3) does not depend on 1 , . . . , T2 , and hence the optimal value of (3.3)
depends only on xT2 and T1 . Similarly, at stage t ¼ 2, . . . , T  1, the
optimal value of problem (3.4) is then a function of xt1 and t , and can be
denoted by Qt ðxt1 , t Þ. We shall call then t the information state of the model.
In particular, the process 1 , . . . , T is Markovian if the random vectors t ,
t ¼ 2, . . . , T, are mutually independent. In that case the conditional
expectation in problem (3.3) does not depend on ½1, T1 , and hence the
optimal value QT1 ðxT2 , T1 Þ of (3.3) depends on T1 only through the
linear constraint of that problem, and similarly, at stages t ¼ T  2, . . .,
the optimal value Qt ðxt1 , t Þ depends on t only through the linear constraint
of (3.4).
The assumption that the blocks At1 , . . . , At, t2 in the constraint matrix are
zeros, allowed us to express the optimal value Qt of (3.4) as the function of the
immediately preceding decision, xt1 , rather than all earlier decisions
x1 , . . . , xt1 . Suppose now that we deal with an underlying model with a full
lower block triangular constraint matrix:

Min cT1 x1 þ cT2 x2 þ cT3 x3 þ ... þ cTT xT


s:t: A11 x1 ¼ b1 ,
A21 x1 þ A22 x2 ¼ b2 ,
A31 x1 þ A32 x2 þ A33 x3 ¼ b3 ,
........................................................................
AT1 x1 þ AT2 x2 þ ... þ AT,T1 xT1 þ ATT xT ¼ bT ,
x1  0, x2  0, x3  0, ... xT  0:
ð3:6Þ
26 A. Ruszczyński and A. Shapiro

Then, of course, each subproblem (3.4) depends on the entire history of our
decisions, x½1, t1 :¼ ðx1 , . . . , xt1 Þ. It takes on the form


Min cTt xt þ E Qt þ 1 ðx½1, t , ½1, t þ 1 Þ j ½1, t
xt

s:t: At1 x1 þ    þ At, t1 xt1 þ At, t xt ¼ bt ,

xt  0: ð3:7Þ

Its optimal value is denoted Qt ðx½1, t1 , ½1, t Þ.


Sometimes it is convenient to convert such a lower triangular formulation
into the staircase formulation from which we started our presentation. This
can be accomplished by introducing additional variables rt which summarize
the relevant history of our decisions. We shall call these variables the model
state variables (to distinguish from information states discussed before). The
relations that describe the next values of the state variables as a function of the
current values of these variables, current decisions and current random
parameters are called model state equations.
For the general problem (3.6) the vectors x½1, t ¼ ðx1 , . . . , xt Þ are sufficient
model state variables. They are updated at each stage according to the state
equation x½1, t ¼ ðx½1, t1 , xt Þ (which is linear), and the constraint in (3.7) can
be formally written as

½At1 At2 . . . At, t1 x½1, t1 þ At, t xt ¼ bt :

Although, it looks a little awkward in this general case, in many problems it


is possible to define model state variables of reasonable size. As an example let
us consider the structure

Min cT1 x1 þ cT2 x2 þ cT3 x3 þ  þ cTT xT


s:t: A11 x1 ¼ b1 ,
B1 x1 þ A22 x2 ¼ b2 ,
B1 x1 þ B2 x 2 þ A33 x3 ¼ b3 ,
........................................................................
B1 x1 þ B2 x 2 þ ... þ BT1 xT1 þ ATT xT ¼ bT ,
x1  0, x2  0, x3  0, ... xT  0,

in which all blocks Ait , i ¼ 2, . . . , T are identical and observed at time t. Then
we can define the state variables rt , t ¼ 1, . . . , T recursively by the state
Ch. 1. Stochastic Programming Models 27

equation rt ¼ rt1 þ Bt xt , t ¼ 1, . . . , T  1, where r0 ¼ 0. Subproblem (3.7)


simplifies substantially:


Min cTt xt þ E Qtþ1 ðrt , ½1, tþ1 Þ j ½1, t
xt , rt

s:t: rt1 þ Att xt ¼ bt ,


rt ¼ rt1 þ Bt xt ,
xt  0:

Its optimal value depends on rt1 and is denoted Qt ðrt1 , ½1, t Þ.


Trucking Example 9 (discussed in Section 3.4) uses such model state
variables: the capacities rt available at all locations at the end of day t. We do
not need to remember all decisions made in the past, we only need to know the
numbers of trucks at each location today.
It should be clear, too, that the simple sign constraints xt  0 can be
replaced in our model by a general constraint xt 2 Xt , where Xt is a convex
polyhedron defined by some linear equations and inequalities (local for
stage t). The set Xt may be random, too, but has to become known at stage t.

3.2 The case of finitely many scenarios

Suppose that in our basic problem (3.1) there are only finitely many, say K,
different values the problem data can take. We shall call them scenarios.
With each scenario k is associated probability pk and the corresponding
sequence of decisions9 xk ¼ ðxk1 , xk2 , . . . , xkT Þ: Of course, it would not be
appropriate to try to find the optimal values of these decisions by solving the
relaxed version of (3.1):
PK T k k T k k T k
Min k¼1 pk ½ ðc1 Þ x1 þ ðc2 Þ x2 þ ðc3 Þ x3 þ  þ ðckT ÞT xkT 
s:t: A11 xk1 ¼ b1 ,
Ak21 xk1 þ Ak22 xk2 ¼ bk2 ,
Ak32 xk2 þ Ak33 xk3 ¼ bk3 ,
.....................................................................................
AkT,T 1 xkT1 þ AkTT xkT ¼ bkT ,

xk1 0, xk2 0, xk3 0, ... xkT 0,


k¼1,...,K: ð3:8Þ

9
To avoid ugly collisions of subscripts we change our notation a little and we put the index of the
scenario, k, as a superscript.
28 A. Ruszczyński and A. Shapiro

The reason is the same as in the two-stage case: in the problem above all
parts of the decision vector are allowed to depend on all parts of the random
data, while in reality each part xt is allowed to depend only on the data known
up to stage t. In particular, problem (3.8) may suggest different values of x1 for
each scenario k, but we need only one value.
It is clear that we need the nonanticipativity constraints

xk1 ¼ x1j for all 1  k < j  K, ð3:9Þ

similarly to (2.10). But this is not sufficient, in general. Consider the second
part of the decision vector, x2 . It is allowed to depend only on ½1, 2 ¼ ð1 , 2 Þ,
k
so it has to have the same value for all scenarios k for which ½1, 2 is identical.
We must therefore, satisfy the equations

xk2 ¼ x2j k
for all k, j for which ½1, j
2 ¼ ½1, 2 : ð3:10Þ

Generally, at stage t ¼ 1, . . . , T, the scenarios that have the same history


½1, t cannot be distinguished, so we need to enforce the nonanticipativity
constraints

xkt ¼ xtj k
for all k, j for which ½1, j
t ¼ ½1, t , t ¼ 1, . . . , T: ð3:11Þ

Problem (3.8) together with the nonanticipativity constraints (3.11)


becomes equivalent to our original formulation (3.1).
Let us observe that if in the problem (3.8) only the constraints (3.9) are
enforced, then from the mathematical point of view the obtained problem
becomes a two-stage stochastic linear program with K scenarios. In that two-
stage program the first stage decision vector is x1 , the second stage decision
vector is ðx2 , . . . , xK Þ, the technology matrix is A21 and the recourse matrix is
the block matrix
2 3
A22 0 :::::: 0 0
6 7
6A A33 :::::: 0 0 7
6 32 7
6 7:
6 7
6 :::::::::::::::::: 7
4 5
0 0 :::::: AT, T1 ATT

Since the obtained two-stage problem is a relaxation of the multistage


problem (3.1), its optimal value gives a lower bound for the optimal value of
Ch. 1. Stochastic Programming Models 29

problem (3.1) and in that sense can be useful. Note, however, that this model
does not make much sense, since it assumes that at the end of the process
when all realizations of the random data become known, one can go back in
time and make all decisions x2 , . . . , xK .
It is useful to depict the possible sequences of data ½1, t in a form of
a scenario tree. It has nodes organized in levels which correspond to
stages 1, 2, . . . , T. At level 1 we have only one root node, and we associate
with it the value of 1 (which is known at stage 1). At level 2 we have at
least as many nodes as many different realizations of 2 may occur. Each
of them is connected with the root node by an arc. For each node i of
level 2 (which corresponds to a particular realization 2i of 2 ) we create
at least as many nodes at level 3 as different values of 3 may follow 2i ,
and we connect them with the node i, etc. Generally, nodes at level
t correspond to possible values of t that may occur. Each of them is
connected to a unique node at level t  1, called the ancestor node, which
corresponds to the identical first t  1 parts of the process ½1, t , and is also
connected to nodes at level t þ 1, which correspond to possible continuations
of history ½1, t .
Note that, in general, realizations ti are vectors and it may happen that
some of the values ti , associated with nodes at a given level t, are equal to each
other. Nevertheless, such equal values may be represented by different nodes
since they may correspond to different histories of the process. Note also that
if for every t ¼ 1, . . . , T all realizations ti are different from each other, then the
random process 1, . . . , T is Markovian because of the tree structure of the
process. Indeed, in that case the conditional probability of t to be at state
ti depends on the previous history of the process only through the ancestor
node at level t  1.
In order to illustrate the above ideas let us discuss the following simple
example.

Example 6 (Scenario Tree). An example of the scenario tree is depicted


in Fig. 2. Numbers along the arcs represent conditional probabilities of moving
from one node to the next. The associated process t ¼ ðct , At, t1 , Att , bt Þ,
t ¼ 1, . . . , T, with T ¼ 4, is defined as follows. All involved variables are
assumed to be one dimensional, with ct , At, t1 , Att , t ¼ 2, 3, 4, being fixed and
only right hand side variables bt being random. The numerical values
(realizations) of the random process b1 , . . . , bT are indicated by the bold
numbers at the nodes of the tree. Numerical values of ct , At, t1 , Att will be
specified later. That is, at level t ¼ 1, b1 has unique value 36. At level t ¼ 2, b2
has two values 15 and 50 with respective probabilities 0:4 and 0:6. At level
t ¼ 3 we have 5 nodes with which are associated the following numerical
values (from left to right) 10, 20, 12, 20, 70. That is, b3 can take 4 different
values with respective probabilities Pfb3 ¼ 10g ¼ 0:4  0:1, Pfb3 ¼ 20g ¼
0:4  0:4 þ 0:6  0:4, Pfb3 ¼ 12g ¼ 0:4  0:5 and Pfb3 ¼ 70g ¼ 0:6  0:6. At
30 A. Ruszczyński and A. Shapiro

Fig. 2. Scenario tree. Nodes represent information states. Paths from the root to leaves
represent scenarios. Numbers along the arcs represent conditional probabilities of moving
to the next node. Bold numbers represent numerical values of the process.

level t ¼ 4, the numerical values associated with eight nodes are defined, from
left to right, as 10, 10, 30, 12, 10, 20, 40, 70. The respective probabilities can
be calculated by using the corresponding conditional probabilities. For
example,

Pfb4 ¼ 10g ¼ 0:4  0:1  1:0 þ 0:4  0:4  0:5 þ 0:6  0:4  0:4:

Note that although some of the realizations of b3 , and hence


of 3 , are equal to each other, they are represented by different nodes.
This is necessary in order to identify different histories of the process
corresponding to different scenarios. The same remark applies to b4
and 4 . Altogether, there are eight scenarios in this tree. Figure 3 illustrates
the way in which sequences of decisions are associated with scenarios from
Fig. 2.
The process bt (and hence the process t ) in the above example is not
Markovian. For instance,

Pfb4 ¼ 10 j b3 ¼ 20, b2 ¼ 15, b1 ¼ 36g ¼ 0:5,

while

Pfb4 ¼ 10, b3 ¼ 20g


Pfb4 ¼ 10 j b3 ¼ 20g ¼
Pfb3 ¼ 20g

0:5  0:4  0:4 þ 0:4  0:4  0:6


¼ ¼ 0:44 6¼ 0:5:
0:4  0:4 þ 0:4  0:6
Ch. 1. Stochastic Programming Models 31

Fig. 3. Sequences of decisions for scenarios from Fig. 2. Horizontal dotted lines represent
the equations of nonanticipativity.

On the other hand, the process bt in this example is a martingale.10 For


instance,

E½b2 j b1 ¼ 36 ¼ E½b2  ¼ 15  0:4 þ 50  0:6 ¼ 36,


E½b3 j b2 ¼ 15, b1 ¼ 36 ¼ 10  0:1 þ 20  0:4 þ 12  0:5 ¼ 15, etc:

Suppose now that cT ¼ 1 and AT , T1 ¼ ATT ¼ 1. Then the cost-to-go


function QT ðxT1 , T Þ is given by the optimal value of the problem

Min xT subject to xT1 þ xT ¼ bT , xT  0,


xT

and hence


bT  xT1 , if xT1  bT ,
QT ðxT1 , T Þ ¼
þ 1, otherwise:

Note again that bT has six possible realizations.


Suppose further that cT1 ¼ 1 and AT1, T2 ¼ AT1, T1 ¼ 1. Then the
cost-to-go function QT 1 ðxT2 , ½1, T1 Þ is the optimal value of the problem

Min xT1 þ E½QT ðxT1 , T Þ j ½1, T1 


xT1

subject to xT2 þ xT1 ¼ bT1 , xT1  0:

10


Recall that a random process Zt , t ¼ 1, . . . , is called a martingale if the equalities E Zt þ 1 jZ½1, t ¼
Zt , t ¼ 1, . . . , hold with probability one.
32 A. Ruszczyński and A. Shapiro

The history b½1, 3 of the process bt , and hence the history ½1, 3 of the process
t , is in one-to-one correspondence with the nodes of the tree at level t ¼ 3. It
i
has 5 possible realizations ½1, 3 , i ¼ 1, . . . , 5, numbered from left to right, i.e.,
for i ¼ 1 it corresponds to the realization b1 ¼ 36, b2 ¼ 15, b3 ¼ 10 of b½1, 3 .
We have that



1
1 10  xT1 , if xT1 10,
E QT ðxT1 ,4 Þj½1,3 ¼½1,3 ¼QT ðxT 1 ,4 Þ ¼
þ 1, otherwise,


where 41 ¼ 1, 1, 1, b14 and b14 ¼ 10. Consequently,


1
10, if 0  xT2  10,
QT1 xT2 , ½1, 3 ¼ þ 1, otherwise:

Similarly,


2
1 2 1
E QT xT1 , 4 j½1, 3 ¼ ½1, 3 ¼ QT ðxT1 , 4 Þ þ QT ðxT1 , 43 Þ,
2 2

and hence


2
20, if 10  xT2  20,
QT1 xT2 , ½1, 3 ¼
þ 1, otherwise,

etc. By continuing the above backward calculations (which, of course, depend


on numerical values of c2, A21, A22 and c1, A11) one can either show that the
problem is infeasible or find the optimal value and an optimal solution of the
first stage problem.
It is also possible to solve this multistage problem by formulating it as a
linear programming problem of the form (3.8) subject to the corresponding
nonanticipativity constraints. Such linear program will have 4  8 ¼ 32
decision variables, 16 nonanticipativity constraints and four linear equality
constraints.
Consider now a scenario tree and corresponding process 1 , . . . , T . With
each scenario of the tree is associated a probability pk , k ¼ 1, . . . , K. These
probabilities are related to the time structure of the multistage process and can
be constructed as follows. In order to deal with the nested structure of
problems (3.4) we need to specify the conditional distribution of t given11

11
Since 1 is not random, for t ¼ 2 the distribution of 2 is independent of 1 .
Ch. 1. Stochastic Programming Models 33

½1, t1 , t ¼ 2, . . . , T. Consider a node i 2 N and its ancestor a ¼ aðiÞ in the


scenario tree. Denote by ai the probability of moving from the node a to
the node i. For instance, in the tree of Fig. 2 it is possible to move from the
root node to two nodes at stage t ¼ 2, say i1 and i2, with the corresponding
probabilities 1i1 ¼ 0:4 and 1i2 ¼ 0:6. Clearly the numbers ai should be
nonnegative and for a given a 2 N the sum of ai over all continuations i 2 N
of the node a should be equal to one. Each probability ai can be viewed as the
conditional probability of the process being in the node i given its history up
to the ancestor node a ¼ aðiÞ. Note also that probabilities ai are in one-to-one
correspondence with the arcs of the scenario tree. Every scenario can be
defined by its nodes i1 , i2 , . . . , iT , arranged in the chronological order, i.e.,
node i2 (at level t ¼ 2) is connected to the root i1 ¼ 1, node i3 is connected to
the node i2 , etc. The probability of that scenario is then given by the product
i1 i2 , i2 i3 , . . . , iT1 iT . The conditional probabilities ai describe the probabil-
istic structure of the considered problem and could be specified together with
the corresponding scenario tree.
It is possible to derive these conditional probabilities from scenario
probabilities pk as follows. Let us denote by BðiÞ the set of scenarios passing
through node i (at level t) of the scenario tree, and let pðiÞ :¼ P½BðiÞ . If
i1 , i2 , . . . , it , with i1 ¼ 1 and it ¼ i, is the history of the process up to node i,
then the probability pðiÞ is given by the product

pðiÞ ¼ i1 i2 , i2 i3 , . . . , it1 it

of the corresponding conditional probabilities. In another way we can write


this in the recursive form pðiÞ ¼ ai pðaÞ , where a ¼ aðiÞ is the ancestor of the
node i. This equation defines the conditional probability ai from the
probabilities pðiÞ and pðaÞ . Note that if a ¼ aðiÞ is the ancestor of the node i,
then BðiÞ  BðaÞ and hence pðiÞ  pðaÞ . Consequently if pðaÞ > 0, then
ai ¼ pðiÞ =pðaÞ . Otherwise BðaÞ is empty, i.e., no scenario is passing through
the node a, and hence no scenario is passing through the node i.
Recall that a stochastic process Zt , t ¼ 1, 2, . . ., that can take a finite
number fz1 , . . . , zm g of different values, is said to be a Markov chain if
 
P Zt þ 1 ¼ zj jZt ¼ zi , Zt1 ¼ zit1 , . . . , Z1 ¼ zi1 ¼ pij , ð3:12Þ

for all states zit1 , . . . , zi1 ,zi , zj and all t ¼ 1, . . . : In some instances it is natural
to model the data process as a Markov chain with the corresponding state
space12 f 1 , . . . , m g and probabilities pij of moving from state i to state j ,
i, j ¼ 1, . . . , m. We can model such process by a scenario tree. At stage t ¼ 1
there is one root node to which is assigned one of the values from the state
space, say i . At stage t ¼ 2 there are m nodes to which are assigned values

12
In our modeling, values 1 , . . . , m can be numbers or vectors.
34 A. Ruszczyński and A. Shapiro

1 , . . . , m with the corresponding probabilities pi1 , . . . , pim . At stage t ¼ 3


there are m2 nodes, such that each node at stage t ¼ 2, associated with a state
a , a ¼ 1, . . . , m, is the ancestor of m nodes at stage t ¼ 3 to which are assigned
values 1 , . . . , m with the corresponding conditional probabilities pa1 , . . . , pam .
At stage t ¼ 4 there are m3 nodes, etc. At each stage t of such T-stage Markov
chain process there are mt1 nodes, the corresponding random vector
(variable) t can take values 1 , . . . , m with respective probabilities which
can be calculated from the history of the process up to time t, and the total
number of scenarios is mT1 . We have here that the random vectors
(variables) 1 , . . . , T are independently distributed iff pij ¼ pi0 j for any
i, i0 , j ¼ 1, . . . , m, i.e., the conditional probability pij of moving from state i
to state j does not depend on i.
In the above formulation of the Markov chain the corresponding scenario
tree represents the total history of the process with the number of scenarios
growing exponentially with the number of stages. Now if we approach the
problem by writing the cost-to-go functions Qt ðxt1 , t Þ, going backwards,
then we do not need to keep the track of the history of the process. That is, at
every stage t the cost-to-go function Qt ð, t Þ only depends on the current state
(realization) t ¼ i , i ¼ 1, . . . , m, of the process. On the other hand, if we want
to write the corresponding optimization problem (in the case of a finite
number of scenarios) as one large linear programming problem, we still need
the scenario tree formulation. This is the basic difference between the
stochastic and dynamic programming approaches to the problem. That is, the
stochastic programming approach does not necessarily rely on the Markovian
structure of the considered process. This makes it more general at the price of
considering a possibly very large number of scenarios.
There are many ways to express the nonanticipativity constraints (3.11)
which may be convenient for different solution methods. One way is
particularly elegant from the theoretical point of view:

P
j2A ðkÞ pj xtj
xkt ¼ P t , k ¼ 1, . . . , K, t ¼ 1, . . . , T, ð3:13Þ
j2At ðkÞ pj

j k
where At ðkÞ :¼ f j : ½1, t ¼ ½1, t g is the set of scenarios that share with scenario
k the history up to stage t. The expression at the right hand side of the above
relation is the conditional expectation of xt under the condition that
k
½1, t ¼ ½1, t , where xt is viewed as a random variable which can take values
xtj with probabilities pj , j ¼ 1, . . . , K. We can therefore rewrite (3.13) as


xt ¼ E xt j ½1, t , t ¼ 1, . . . , T: ð3:14Þ

This formulation of the nonanticipativity constraints can be conveniently


extended to the case of a general distribution of the data ½1, T .
Ch. 1. Stochastic Programming Models 35

The nonanticipativity conditions (3.14) can be analytically eliminated from


the multistage model. As before, denote by N the set of nodes of the scenario
tree (with root 1), and let i 2 N be a node at level t. Recall that BðiÞ denotes the
set of scenarios passing through node i and aðiÞ denotes the ancestor of node i.
We have that xt has to be constant for scenarios k 2 BðiÞ . Let us denote the
value of xt associated with node i by xðiÞ . Similarly, let cðiÞ , DðiÞ
i ,W
ðiÞ
and hðiÞ be
k k k
the values ct , At, t1 , Att and bt , respectively, corresponding to node i. We can
rewrite then the corresponding linear programming problem as follows
X
Min pðiÞ ðcðiÞ ÞT xðiÞ
i2N
ðiÞ aðiÞ
s:t: D x þ W ðiÞ xðiÞ ¼ hðiÞ , i 2 N nf1g,
ð1Þ ð1Þ ð1Þ
W x ¼h ,
xðiÞ  0, i 2 N:

3.3 The general model

In the general multistage model, similarly to the linear case, we have a


sequence of data vectors 1 2 Rd1 , 2 2 Rd2 , . . . , T 2 RdT , and a sequence of
decisions: x1 2 Rn1 , x2 2 Rn2 , . . . , xT 2 RnT . We assume that 1 is already
known and random vectors 2 , . . . , T are observed at the corresponding time
periods. The decision process has then the form:

decision ðx1 Þ 4 observation ð2 Þ 4 decision ðx2 Þ 4


      4 observation ðT Þ 4 decision ðxT Þ:

The values of the decision vector xt , chosen at stage t, may depend on the
information ½1, t available up to time t, but not on the results of future
observations. We can formulate this requirement using nonanticipativity
constraints. That is, we view each xt ¼ xt ðÞ as an element of the space of
measurable mappings from  to Rnt , and hence consider xt ð!Þ as a random
(vector valued) process of time t. It has to satisfy the following additional
condition, called the nonanticipativity constraint,

xt ¼ E½xt j ½1, t , t ¼ 1, . . . , T: ð3:15Þ

If F t is the sigma algebra generated by13 ½1, t , then F 1  F 2  . . . 


F T  F , and condition (3.15) ensures that xt ð!Þ is measurable with respect F t .

13
F t is the minimal subalgebra of the sigma algebra F such that 1 ð!Þ, . . . , t ð!Þ are F t -measurable.
Since 1 is not random, F 1 contains only two sets ; and . We can assume that F T ¼ F .
36 A. Ruszczyński and A. Shapiro

One can use this measurability requirement as a definition of the


nonanticipativity constraint.
To describe the objective and other constraints, let us denote the decisions
associated with stages 1, . . . , T, as before, by x½1, t :¼ ðx1 , . . . , xt Þ: We have the
objective functional

F : Rn1 þ  þ nT
 Rd1 þ  þ dT
! R,

and constraint functionals

Gti : Rn1 þ  þ nt


 Rd1 þ  þ dt
! R, t ¼ 2, . . . , T, i ¼ 1, . . . , mt :

The multistage stochastic programming problem is abstractly formulated as


follows


Min E Fðx½1, T ð!Þ, ½1, T  ð!ÞÞ
s:t: Gti ðx½1, t ð!Þ, ½1, t ð!ÞÞ  0, i ¼ 1, . . . , mt , t ¼ 1, . . . , T,
xt ð!Þ 2 Xt , t ¼ 1, . . . , T,


xt ¼ E xt j ½1, t , t ¼ 1, . . . , T: ð3:16Þ

In the above formulation Xt is a convex closed subset of Rnt , and all


constraints are assumed to hold almost surely.
The nested formulation can be developed similarly to the linear case. At
stage T we know ½1, T and x½1, T1 and we have the problem

Min F x½1, T1 , xT , ½1, T
xT

s:t: GTi ðx½1, T 1 , xT , ½1, T Þ  0, i ¼ 1, . . . , mT ,


xT 2 X T : ð3:17Þ

Its optimal value is denoted QT x½1, T 1 , ½1, T . Generally, at stage
t ¼ T  1, . . . , 1 we have the problem


Min E Qtþ1 ðx½1, t1 , xt , ½1, tþ1 Þ  ½1, t
xt

s:t: Gti x½1, t1 , xt , ½1, t  0, i ¼ 1, . . . , mt ,
xt 2 X t : ð3:18Þ

Its optimal value is denoted Qt x½1, t1 , ½1, t .
If F and Gti are random lsc functions and the sets Xt are closed and
bounded, then all Qt are random lsc functions, too. This can be proved by
Ch. 1. Stochastic Programming Models 37

recursively applying Theorem 20 (from the Appendix) to problems (3.18) at


stages T, T  1, . . . , 1. By the forward induction, for t ¼ 1, . . . , T, we can also
prove that each problem (3.18) has its data measurable with respect to F t and
has, by the measurable selection theorem (Theorem 16 in the Appendix), a
solution which is F t -measurable (if a solution exists at all). Therefore, under
natural assumptions, the multistage stochastic program (3.16) is a well defined
model.

3.4 Examples of multistage models

Example 7 (Financial Planning). Suppose that there are n investment


opportunities, with random returns Rt ¼ ðR1t , . . . , Rnt Þ in time periods
t ¼ 1, . . . , T. One of possible investments is just cash. Our objective is to
invest the given amount W0 at time t ¼ 0 so as to maximize the expected
utility of our wealth at the last period T. The utility of wealth W is represented
by a concave nondecreasing function UðWÞ. In our investment strategy we are
allowed to rebalance our portfolio after each period, but without injecting
additional cash into it.
Let x10 , . . . , xn0 denote the initial amounts invested in assets 1, . . . , n at
time t ¼ 0. Clearly, they have to be nonnegative and to satisfy the condition

X
n
xi0 ¼ W0 : ð3:19Þ
i¼1

We can put an equation sign here, because one of our assets is cash.
After the first period, our wealth may change, due to random returns from
the investments, and at time t ¼ 1 it will be equal to

X
n
W1 ¼ ð1 þ Ri1 Þxi0 : ð3:20Þ
i¼1

If we stop at that time, our problem becomes the stochastic programming


problem
" !#
X
n
Maxn E U ð1 þ Ri1 Þxi0
x0 2R
i¼1
X
n
s:t: xi0 ¼ W0 ,
i¼1
xi0  0, i ¼ 1, . . . , n: ð3:21Þ

In particular, if UðWÞ:W, i.e., we want to maximize the expected wealth,


then the objective function in the above problem (3.21) becomes
38 A. Ruszczyński and A. Shapiro
Pn
i¼1 ð1 þ E½Ri1 Þxi0 , and hence problem (3.21) becomes a deterministic
optimization program. It has the trivial optimal solution of investing
everything into the asset with the maximum expected return.
Suppose, on the other hand, that UðWÞ is defined as


ð1 þ qÞðW  aÞ, if W  a,
UðWÞ :¼ ð3:22Þ
ð1 þ rÞðW  aÞ, if W  a,

with r > q > 0 and a > 0. We can view the involved parameters as follows: a is
the amount that we have to pay at time t ¼ 1, q is the interest at which we can
invest the additional wealth W  a, provided that W > a, and r is the interest
at which we will have to borrow if W is less than a. For the above utility
function, problem (3.21) can be formulated as the following two-stage
stochastic linear program

Maxn E½Qðx0 , R1 Þ,


x0 2R
X
n
s:t: xi0 ¼ W0 ,
i¼1
xi0  0, i ¼ 1, . . . , n, ð3:23Þ

where Qðx0 , R1 Þ is the optimal value of the second stage program

Max ð1 þ qÞy  ð1 þ rÞz


y, z2R
X
n
s:t: ð1 þ Ri1 Þxi0 ¼ a þ y  z,
i¼1
y  0, z  0: ð3:24Þ

Suppose now that T > 1. In that case we can rebalance the portfolio at time
t ¼ 1, by specifying the amounts x11 , . . . , xn1 invested in the assets in the
second period. Note that we already know the actual returns in the first
period, so it is reasonable to use this information in the rebalancing decisions.
Thus, our second stage decisions are actually functions of R1 ¼ ðR11 , . . . , Rn1 Þ,
and they can be written as x11 ðR1 Þ, . . . , xn1 ðR1 Þ. We also must remember about
our balance of wealth:

X
n
xi1 ðR1 Þ ¼ W1 ð3:25Þ
i¼1
Ch. 1. Stochastic Programming Models 39

and the condition of nonnegativity. In general, the wealth after period t is


equal to
X
n
Wt ¼ ð1 þ Rit Þxi, t1 ðR½1, t1 Þ, ð3:26Þ
i¼1

where R½1, t :¼ ðR1 , . . . , Rt Þ:


Our next decisions, x1t , . . . , xnt may depend on R1 , . . . , Rt . They have to be
nonnegative and satisfy the balance constraint,
X
n
xit ðR½1, t Þ ¼ Wt : ð3:27Þ
i¼1

At the end, the wealth after period T is


X
n
WT ¼ ð1 þ RiT Þxi, T1 ðR½1, T 1 Þ: ð3:28Þ
i¼1

Our objective is to maximize the expected utility of this wealth,

Max E½UðWT Þ: ð3:29Þ

It is a multistage stochastic programming problem, where stages are


numbered from t ¼ 0 to t ¼ T  1, and decisions xt at each stage are allowed
to depend on the history R1 , . . . , Rt of returns prior to this stage.
Of course, in order to complete the description of the above multistage
stochastic programming problem, we need to define the probability structure
of the random process R1 , . . . , RT . This can be done in many different ways.
For example, one can construct a particular scenario tree defining time
evolving of the process. If at every stage the random return of each asset is
allowed to have just two continuations independently of other assets, then the
total number of scenarios is 2nT . It also should be ensured that 1 þ Rit > 0,
i ¼ 1, . . . , n, t ¼ 1, . . . , T, for all possible realizations of the random data.
Let us consider the above multistage problem backwards, as it was
discussed in Section 3.1. At the last stage t ¼ T  1 all realizations of the
random process R1 , . . . , RT1 are known and xT2 has been chosen.
Therefore, we have to solve the problem
( " # )
Xn 
Max E U ð1 þ RiT Þxi, T1 R½1, T1
i¼1
X
n X
n
s:t: xi, T1 ¼ ð1 þ Ri, T1 Þxi, T2 ,
i¼1 i¼1
xi, T1  0, i ¼ 1, . . . , n: ð3:30Þ
40 A. Ruszczyński and A. Shapiro

Its optimal value is denoted QT1 ðxT 2 , R½1, T1 Þ. At stage t ¼ T  2 reali-
zations of the random process R1 , . . . , RT2 are known and xT3 has been
chosen. We have then to solve the following two-stage stochastic program



Max E QT1 ðxT2 , R½1, T1 ÞjR½1, T2
X
n X n
s:t: xi, T2 ¼ ð1 þ Ri, T2 Þxi, T3 ,
i¼1 i¼1
xi, T2  0, i ¼ 1, . . . , n: ð3:31Þ

Its optimal value is denoted QT2 ðxT3 , R½1, T2 Þ, etc. At stage t ¼ 0 we
have to solve the following program

Max E½Q1 ðx0 , R1 Þ


X
n
s:t: xi0 ¼ W0 ,
i¼1
xi0  0, i ¼ 1, . . . , n: ð3:32Þ

Note that in the present case the cost-to-go function QT1 ðxT2P , R½1, T1 Þ
depends on xT 2 ¼ ðx1, T2 , . . . , xn, T2 Þ only through WT1 ¼ ni¼1 ð1 þ
Ri, T 1 Þxi, T2 . That is, if Q~ T1 ðWT1 , R½1, T1 Þ is defined as the optimal
value of the problem

( " # )
X
n 
Max E U ð1 þ RiT Þxi, T 1 R½1, T1
i¼1
X
n
s:t: xi, T1 ¼ WT1 , xi, T1  0, i ¼ 1, . . . , n, ð3:33Þ
i¼1

then

!
X
n
QT1 ðxT2 , R½1, T1 Þ ¼ Q~ T1 ð1 þ Ri, T1 Þxi, T2 , R½1, T1 :
i¼1

Similarly, QT2 ðxT3 , R½1, T2 Þ depends on xT3 only through WT 2 , and
so on.
We may also note that the need for multistage modeling occurs here mainly
because of the nonlinearity of the utility function UðÞ. Indeed, if UðWÞ:W,
and the returns in different stages are independent random vectors, it is
Ch. 1. Stochastic Programming Models 41

sufficient to maximize the expected wealth after each period, in a completely


myopic fashion, by solving for t ¼ 0, . . . , T  1 the single stage models
" #
Xn
Max E ð1 þ Ri, tþ1 Þxi, t jR½1, t
xt
i¼1
X
n
s:t: xit ¼ Wt , xt  0, ð3:34Þ
i¼1

where Wt and R1 , . . . , Rt are already known. This, in turn, becomes a


deterministic model with the objective coefficients

it ðR½1, t Þ :¼ 1 þ E½Ri, tþ1 jR½1, t :

Such a model has a trivial optimal solution of investing everything in the


asset with the maximum expected return in the next period.
A more realistic situation occurs in the presence of transaction costs. These
are losses associated with the changes in the numbers of units (stocks, bonds)
held. In such a situation multistage modeling is necessary, too, even if we use
the expected wealth objective.
Let us observe now that the above problem can be also modeled
as a T-period two-stage problem. To that end suppose that one makes a
decision at the beginning of the process without thinking of rebalancing
the portfolio. That is, our decision variables are initial amounts x1 , . . . , xn
invested in assets 1, .
.Q . , n at time t ¼ 0. After T periods of time each
T
asset i
Pn
QT will be worth t¼1 ð1 þ Rit Þ xi , and hence the total wealth will be
i¼1 t¼1 ð1 þ Rit Þ x i : The corresponding stochastic program can be then
written as follows
" " # !#
X
n YT
Maxn E U ð1 þ Rit Þ xi
x2R
i¼1 t¼1
X
n
s:t: xi ¼ W 0 ,
i¼1
xi  0, i ¼ 1, . . . , n: ð3:35Þ

Problem (3.35) is a two-stage stochastic program. It gives an extension of


the two-stage problem (3.21) for T periods of time. If the utility function is
given in the form (3.22), then problem (3.35) can be formulated as a linear
two-stage stochastic program in a way similar to (3.23)–(3.24).
The difference between the two-stage (3.35) and multistage (3.29)
programs is that in the two-stage model the value xit of asset i at time t is
42 A. Ruszczyński and A. Shapiro

defined
Qby the recursive
equation14 xit ¼ ð1 þ Rit Þxi, t1 , which implies that
t
xit ¼ s¼1 ð1 þ Ris Þ xi0 . Consequently, xit is completely determined by the
initial value xi0 ¼ xi and a realization of the random process Ri1 , . . . , Rit . On
the other hand in the multistage model values xit are rebalanced at every
period of time subject to the constraints (3.26)–(3.27). Therefore the
multistage problem (3.29) can be viewed as a relaxation of the two-stage
problem (3.35), and hence has a larger optimal value.
We discuss further the above example in section ‘‘An Example of Financial
Planning’’ of chapter ‘‘Monte Carlo Sampling Methods’’.
The following example also demonstrates that in some cases the same
practical problem can be modeled as a multistage or two-stage multiperiod
program.

Example 8 (Queueing Process). Consider stochastic process It , t ¼ 1, 2, . . . ,


governed by the recursive equation

It ¼ ½It1 þ xt  Dt þ , ð3:36Þ

with initial value I0 . Here Dt are random numbers and xt represent


decision variables. The above process It can describe the waiting time of
t-th customer in a G=G=1 queue, where Dt is the interarrival time between
the ðt  1Þ-th and t-th customers and xt is the service time of ðt  1Þ-th
customer. Alternatively, we may view It as an inventory of a certain product
at time t, with Dt and xt representing the demand and production
(or reordering), respectively, of the product at time t. Equation (3.36)
assumes that the excess demand (over It1 þ xt ) is not backordered, but
simply lost.
Suppose that the process is considered over a finite horizon at periods
t ¼ 1, . . . , T. Our goal then is to minimize (or maximize) the expected value of
an objective function involving I1 , . . . , IT . For instance, one may be interested
in maximizing a profit which at time t is given by ct min ½It1 þ xt , Dt   ht It ,
where ct and ht are positive parameters representing the marginal profit and
the holding cost, respectively, of the product at period t. The negative of the
total profit is then given by

X
T  
Fðx, DÞ :¼ ht It  ct min ½It1 þ xt , Dt  :
t¼1

Here x ¼ ðx1 , . . . , xT Þ is a vector of decision variables, D ¼ ðD1 , . . . , DT Þ is a


random vector of the demands at periods t ¼ 1, . . . , T. By using the recursive

14
This defines an implementable and feasible policy for the multistage problem (3.29), see section
‘‘Multistage Models’’ of Chapter ‘‘Optimality and Quality in Stochastic Programming’’ for the
definition of implementable and feasible policies.
Ch. 1. Stochastic Programming Models 43

equation (3.36) it is straightforward to show that Fðx, DÞ can be also written in


the form

X
T X
T
Fðx, DÞ ¼ qt It  ct xt  c1 I0 ,
t¼1 t¼1

where qt :¼ ht  ctþ1 þ ct , t ¼ 1, . . . , T  1, and qT :¼ cT þ hT . We assume


that all numbers qt are positive, this certainly holds if c1 ¼    ¼ cT . By
(3.36) we have that It is a convex function of x1 , . . . , xt . Since qt are positive, it
follows that the function Fð, DÞ is convex for any realization of D.
We can formulate a corresponding stochastic programming problem
in several ways. First, suppose that the production cannot be changed
during the process as some realizations of the demands become known. That
is, a decision about production quantities x1 , . . . , xT should be made before
any realization of the demands D1 , . . . , DT is available, and is not changed
at times t ¼ 1, . . . , T. This leads to the problem of minimization of the
expectation E½Fðx, DÞ, which is taken with respect to the probability
distribution of the random vector D. Although, we have here a multiperiod
process, the above formulation can be viewed as a two-stage problem.
In fact it can be formulated as a linear two-stage stochastic program as
follows:15

 
Min cT x þ E½Qðx, DÞ , ð3:37Þ
x0

where c ¼ ðc1 , . . . , cT Þ and Qðx, DÞ is the optimal value of the problem

X
T
Min qt yt
y0
t¼1
s:t: yt1 þ xt  Dt  yt , t ¼ 1, . . . , T,
y0 ¼ I0 : ð3:38Þ

Note that It ¼ It ðx, DÞ is equal to y*t , t ¼ 1, . . . , T, where y* is the optimal


solution of (3.38).
Suppose now that the random vector D can take a finite number or
realizations (scenarios) D1 , . . . , DK with the corresponding probabilities
p1 , . . . , pK . For example, if components Dt of the demand vector form a
Markov chain with m possible realizations at each period, then the total

15
Since I0 does not depend on x, the term c1 I0 is omitted.
44 A. Ruszczyński and A. Shapiro

number of scenarios K ¼ mT . We can write then the two stage problem (3.37)–
(3.38) as the linear problem (compare with (2.7)):
!
X
T X
K X
T
Min  c t xt þ pk qt ykt
t¼1 k¼1 t¼1

s:t: ykt1 þ xt  Dkt  ykt , t ¼ 1, . . . , T,


xt  0, yk0 ¼ I0 , ykt  0, t ¼ 1, . . . , T, k ¼ 1, . . . , K: ð3:39Þ

Note that the optimal values of ykt in (3.39) represent It ðx, Dk Þ. Since
It ðx, Dk Þ depend only on the realization Dk up to time t, the nonanticipativity
constraints with respect to ykt hold in (3.39) automatically.
On the other hand, depending on the flexibility of the production process,
one can update production quantities at every time period t ¼ 1, . . . , T using
known realizations of the demand up to time t. This can be formulated as a
multistage stochastic program where an optimal decision is made at every
period of time based on available realizations of the random data. Consider
the following relaxation of (3.39):
" #
X
K X
T
Min pk qt ykt
 ct xkt
k¼1 t¼1
k
s:t: yt1 þ xtk  Dkt  ytk , t ¼ 1, . . . , T,
xtk  0, y0k ¼ I0 , ytk  0, t ¼ 1, . . . , T, k ¼ 1, . . . , K: ð3:40Þ

By adding to the above problem (3.40) the nonanticipativity constraints


associated with the scenario tree of the considered T-period process, we obtain
the linear programming formulation of the corresponding multistage
stochastic program.

Example 9 (Trucking). A trucking company serves n locations. For simplicity


we assume that it takes exactly one day for a truck to go from one location to
another, independently whether it is loaded or not. At the beginning of each
day t, the company observes for each pair of locations, i and j, a random
demand Dijt for cargo to be shipped from i to j on day t. If they have a
sufficient number of trucks at location i at this moment, they may take an
order and ship the cargo. The revenue for shipping a unit of cargo from i to j
is qij . The part of the demand that is not served is simply lost, and it does not
result in any revenue or cost. It is important to stress that the numbers of
trucks at different locations result from earlier decisions of moving the trucks
and are therefore parts of the policy. The company may also move empty
trucks between different locations (in anticipation of strong demand some-
where else). The cost of moving one unit of capacity from i to j is cij ,
Ch. 1. Stochastic Programming Models 45

independently whether it is loaded or empty (this is not a simplification,


because we can always adjust the qij ’s). Currently, the company has the
capacity ri0 at each location i. Their objective is to maximize the expected
profit in the next T days.
We recognize this problem as a multistage stochastic programming
problem. With each day (stage) t ¼ 1, . . . , T we associate the following
decision variables:
yijt - the total capacity moved from i to j, where i, j ¼ 1, . . . , n,
zijt - the amount of cargo moved from i to j, where i, j ¼ 1, . . . , n,
rit - the capacity available at i at the end of day t, where i ¼ 1, . . . , n.
Note that Diit ¼ 0 and ziit ¼ 0 by definition, and yiit is the capacity waiting
at i for the next day.
The problem takes on the form
" #
XT X
n
Max E ðqij zijt  cij yijt Þ
y, z, r
t¼1 i, j¼1

s:t: zijt  Dijt , i, j ¼ 1, . . . , n, t ¼ 1, . . . , T,


zijt  yijt , i, j ¼ 1, . . . , n, t ¼ 1, . . . , T,
X
n X
n
ri, t1 þ ykit  yijt ¼ rit , i ¼ 1, . . . , n, t ¼ 1, . . . , T,
k¼1 j¼1

r  0, y  0, z  0: ð3:41Þ

In a more refined version we may want to put some additional constraints


on the capacity riT available at each location i at the end of the planning
period.
In the above problem the demand DðtÞ ¼ ½Dij ðtÞi, j¼1,..., n is a random vector
valued process. The decisions yijt and zijt and the resulting numbers of trucks
rit at different locations may depend on all past and current demand values
DðÞ,   t, but not on the future values of the demand vector. Therefore, at
stage t, we cannot exactly predict how many trucks we shall need at each
location at stage t þ 1; we can only use past data and our knowledge of the
joint distribution of all demands to re-position our truck fleet. For a specified
scenario tree of the demand process DðtÞ, the corresponding multistage
problem can be written as a large linear program.

3.5 Relations to dynamic programming

There exist close relations between multistage stochastic programming


models and classical models of dynamic programming and optimal control.
46 A. Ruszczyński and A. Shapiro

To illustrate these relations, consider the linear dynamical system described by


the state equation

stþ1 ¼ At st þ Bt ut þ Ct et , t ¼ 1, . . . , T,

in which st denotes the state of the system at time t, ut is the control vector,
and et is a random ‘disturbance’ at time t. The matrices At , Bt and Ct are
known. The random vectors et , t ¼ 1, . . . , T, are assumed to be independent.
At time t we observe the current state value, st , but not the disturbances et .
Our objective is to find a control law, u^ t ðÞ, t ¼ 1, . . . , T, so that the actual
values of the control variables can be determined through the feedback rule:

ut ¼ u^ t ðst Þ, t ¼ 1, . . . , T  1:

We want to do it in such a way that the expected value of the performance


index,

" #
X
T 1
E Ft ðst , ut Þ þ FT ðsT Þ
t¼1

is minimized. In a more involved formulation, there may be additional


constraints on the control variables, or mixed state–control constraints:

gti ðst , ut Þ  0, i ¼ 1, . . . , mt , t ¼ 1, . . . , T  1:

For the sake of simplicity we assume that they are all incorporated into the
definition of the partial objectives, that is, Ft ðst , ut Þ ¼ þ1 if these constraints
are not satisfied.
The crucial characteristics of the optimal control model is that we look for
a solution in the form of a function of the state vector. We are allowed to
focus on such a special form of the control rule due to the independence of the
disturbances at different stages. If the disturbances are dependent in certain
ways, augmentation of the state space may reduce the model to the case of
independent et ’s.
The key role in the optimal control theory is played by the cost-to-go
function

" #
X
T 1
Vt ðst Þ :¼ infE F ðs , u Þ þ FT ðsT Þ ,
¼t
Ch. 1. Stochastic Programming Models 47

where the minimization is carried out among all possible feedback laws
applied at stages t, . . . , T  1. The functions Vt ðÞ give the dynamic program-
ming equation:

Vt ðst Þ ¼ inf ðFt ðst , ut Þ þ E½Vtþ1 ðAt st þ Bt ut þ Ct et ÞÞ, t ¼ T  1, . . . , 1:


ut

The optimal feedback rule is the minimizer of the above expression.


Except for very special cases, such as linear–quadratic or time optimal
control, the form of the optimal feedback rule may be very involved. Usually,
some functional form of the rule is assumed and parametric optimization
employed to find the best rule within a chosen class. Discretization of the state
space is a common approach here.
To transform the above model into a stochastic programming model we
just need to make the substitutions:

xt ¼ ðut , st Þ, t ¼ 1, . . . , T  1,
xT ¼ sT ,
t ¼ Ct1 et1 , t ¼ 2, . . . , T:

The function Vt ðÞ can be formally expressed as the optimal value of


Min Ft ðst , ut Þ þ E½Vtþ1 ðAt st þ Bt ut þ Ct et Þ
st , ut

s:t: st ¼ At1 st1 þ Bt1 ut1 þ t :

Thus, we can define

Qt ðst1 , ut1 , t Þ ¼ Vt ðAt1 st1 þ Bt1 ut1 þ t Þ

to perfectly match both models.


The opposite of that is also true. A multistage stochastic programming
model with model state variables and independent random parameters t can
be transformed into a control problem, as in the following example.

Example 10 (Trucking (continued)). Let us consider Example 9 in which the


demand vectors Dijt , i, j ¼ 1, . . . , n are independent for t ¼ 1, . . . , T. We can
formally define:

st :¼ ½rt1 , Dt , ut :¼ ½yt , zt , et :¼ Dtþ1 :


48 A. Ruszczyński and A. Shapiro

Then the next state stþ1 is a function of st , ut and et :

X
n X
n
rit ¼ ri, t1 þ ykit  yijt ,
k¼1 j¼1

Dtþ1 ¼ et :

At each time t ¼ 1, . . . , T we have mixed state–control constraints:

zijt  Dijt , i, j ¼ 1, . . . , n,
zijt  yijt , i, j ¼ 1, . . . , n:

The objective functional has the form:

X
n
Ft ðst , ut Þ ¼ ðqij zijt  cij yijt Þ,
i, j¼1

and depends on controls alone. So, if the demands in different days are
independent, the optimal solution has the form of a feedback rule:

yt ¼ y^t ðrt1 , Dt Þ,
zt ¼ z^t ðrt1 , Dt Þ:

The form of these functions is rather involved, though.


As we shall see it later, the stochastic programming formulation tries to
exploit as much as possible some advantageous properties of the functions
Vt ðÞ or Qt ðÞ, such as convexity, or polyhedral structure, which are hard to
exploit in the dynamic programming setting. Also, the stochastic program-
ming model does not assume the independence of the random disturbances.
It does require, though, in the scenario tree formulation the discretization of
the disturbances distributions.

4 Robust and min–max approaches to stochastic optimization

4.1 Robust models

Consider the two-stage stochastic linear program (2.2)–(2.3). In that


problem the optimal value Qðx, ð!ÞÞ of the second stage problem is optimized
on average. Of course, for a particular realization  of the random data ð!Þ
the corresponding value Qðx, Þ can be quite different from the expected
value E½Qðx, ð!Þ. An ‘‘unlucky’’ realization of ð!Þ may have disastrous
Ch. 1. Stochastic Programming Models 49

consequences for the user of stochastic programming. For instance, in


Example 1 the newsvendor may loose all his savings on an unlucky day, so
that he will have to borrow from the mob on murderous interest to continue
his business next day. In order to avoid such disastrous consequences one may
try to be more conservative and to reach a compromise between the average
(i.e, the mean) and a risk associated with variability of Qðx, Þ. It appears then
natural to add the term Var½Qðx, Þ to the objective of the optimization
problem, where coefficient   0 represents a compromise between the
expectation and variability of the objective. Unfortunately, this destroys two
important properties of the two-stage linear program (2.2)–(2.3), namely its
convexity and second stage optimality.
In order to see that let us suppose for the sake of simplicity that there is a
finite number of scenarios and hence the problem can be formulated in the
form (2.7). By adding the term Var½Qðx, Þ to the objective function in (2.2)
we obtain the problem

Min cT x þ ðQðx, 1 Þ, . . . , Qðx, K ÞÞ


x
s:t: Ax ¼ b, x  0, ð4:1Þ

where
2 !2 3
X
K X
K X
K
ðzÞ :¼ pk zk þ 4 pk z2k  pk zk 5:
k¼1 k¼1 k¼1

Now for  > 0 the objective function of the above problem is not necessarily
convex even though the functions Qð, i Þ, i ¼ 1, . . . , K, are all convex, and the
second stage optimality does not hold in the sense that problem (4.1) is not
equivalent to the problem

Min cT x þ qT1 y1 , . . . , qTK yK
x, y1 ,..., yk

s:t: Ax ¼ b,
Tk x þ Wk yk ¼ hk ,
x  0, yk  0, k ¼ 1, . . . , K: ð4:2Þ

In order to preserve the property of second stage optimality we may change


the function ðzÞ to a componentwise nondecreasing function. Recall that a
function : RK ! R is said to be componentwise nondecreasing if ðzÞ  ðz0 Þ
for any z, z0 2 RK such that z  z0 .
50 A. Ruszczyński and A. Shapiro

Proposition 11. Suppose that problem (4.2) is feasible and function ðzÞ is
componentwise nondecreasing. Then problems (4.1) and (4.2) have the same
optimal value, and if, moreover, problem (4.2) has an optimal solution, then
problems (4.1) and (4.2) have the same set of first stage optimal solutions.

Proof. Since (4.2) is feasible it follows that there exists a feasible x such that
all Qðx, k Þ, k ¼ 1, . . . , K, are less than þ1, and hence the optimal value of
problem (4.1) is also less than þ1. By (2.6) we have that Qðx, k Þ is given by
the optimal value of a linear programming problem. Therefore, if Qðx, k Þ is
finite, then the corresponding linear programming problem has an optimal
solution. It follows that if all Qðx, k Þ are finite, then ðQðx, 1 Þ, . . . , Qðx, K ÞÞ
is equal to ðqT1 y1 , . . . , qTK yK Þ for some yk , k ¼ 1, . . . , K, satisfying the
constraints of problem (4.2) and hence the optimal value of (4.1) is greater
than or equal to the optimal value of (4.2). Conversely, for a given x, Qðx, k Þ
is less than or equal to qTk yk , k ¼ 1, . . . , K, for any y1 , . . . , yk feasible for (4.2).
Since ðzÞ is componentwise nondecreasing, it follows that the optimal value
of (4.2) is greater than or equal to the optimal value of (4.1), and hence these
two optimal values are equal to each other. Moreover, it follows that if
x* , y*1 , . . . , y*K is an optimal solution of problem (4.2), then x* is an optimal
solution of problem (4.1), and vice versa. u

We also have that if ðzÞ is componentwise nondecreasing and convex,


then since functions Qð, k Þ, k ¼ 1, . . . , K are convex, the corresponding
composite function and hence the objective function of problem (4.1) are
convex. P
Of course, for ðzÞ :¼ K k¼1 pk zk problem (4.2) coincides with the
two-stage linear P
problem (2.7). Another possibility is to use a separable
function ðzÞ ¼ K k¼1 k ðzk Þ with one of the following two choices of
functions k :

k ðzk Þ :¼ pk zk þ pk ðzk  Þþ , ð4:3Þ

k ðzk Þ :¼ pk zk þ pk ½ðzk  Þþ 2 , ð4:4Þ

for some   0 and  2 R. Note that for both above choices of k , the
corresponding function ðzÞ is componentwise nondecreasing and convex.
If the parameter  in (4.4) is equal to E½Qðx, Þ and the distribution of
Qðx, Þ is symmetrical around its mean, then

ðQðx, 1 Þ, . . . , Qðx, K ÞÞ ¼ E½Qðx, Þ þ ð=2ÞVar½Qðx, Þ:

Of course, the mean (expected value) of Qðx, Þ depends on x; in practical


applications it would have to be iteratively adjusted during an optimization
procedure. An advantage of using k given in (4.3) is that then the function
Ch. 1. Stochastic Programming Models 51

ðzÞ is piecewise linear, and hence (4.2) can be formulated as a linear


programming problem. The above approach to stochastic programming is
called robust by some authors.
The model (4.2) with (4.3) or (4.4) is an example of a mean–risk model. For
a random outcome Fðx, Þ, these models use an objective which is composed of
two parts: the expected outcome (the mean) E½Fðx, Þ, and a scalar composite
measure of the size and frequency of undesirable outcome values, the risk
ðFðx, ÞÞ. The risk measure ðZÞ is understood here as a function of the entire
distribution of the random variable Z. For example, our formulas (4.3) and
(4.4) correspond to risk measures


2
1 ðZ; Þ :¼ E½ðZ  Þþ  and 2 ðZ; Þ :¼ E ðZ  Þþ ,

respectively, which represent the expected excess (or square excess) over the
target level . More sophisticated are semideviation measures, which
use, instead of the fixed target level , the expected value of the random
outcome. The simplest and most convenient in applications is the absolute
semideviation:

1 ðZÞ :¼ E½ðZ  EZÞþ : ð4:5Þ

The presence of the expected value of the outcome in the definition of the
measure makes the resulting risk term



1 ðFðx, ÞÞ ¼ E ðFðx, Þ  E½Fðx, ÞÞþ ,

a nonconvex function of x, even if Fð, Þ is convex. Nevertheless, the


corresponding mean–risk model



Min E½Fðx, Þ þ E ðFðx, Þ  E½Fðx, ÞÞþ

remains a convex problem, provided that the coefficient  in front of the risk
term is confined to ½0, 1. This can be seen from the representation:



E½Fðx, Þ þ E ðFðx, Þ  E½Fðx, ÞÞþ
¼ ð1  ÞE½Fðx, Þ þ E½maxfE½Fðx, Þ, Fðx, Þg,

in which the convexity of all terms is evident.


52 A. Ruszczyński and A. Shapiro

Example 12. Let us consider Example 3 again, Pnbut instead of


 bounding our
risk of loss by the probabilistic constraint P i¼1 R x
i i  b  1  , let us
modify the objective by subtracting a risk measure
!
Xn
 R i xi : ð4:6Þ
i¼1

For example, similarly to (4.3), we may use

ðZÞ :¼ E½ð  ZÞþ , ð4:7Þ

in which case (4.6) represents the expected shortfall below some target profit
level . If  ¼ b < 0, our measure represents the expected loss in excess of b.
Supposing that our initial capital (wealth) is W, we may formulate the
following mean–risk optimization problem
( !)
X
n Xn
Max i xi   Ri xi
x0
i¼1 i¼1
X
n
s:t: xi  W: ð4:8Þ
i¼1

Problems of this type are usually solved as a family parametrized by   0.


Their solutions can be graphically depicted in the form of the efficient frontier:
the collection of mean–risk pairs corresponding to the optimal solutions of
(4.8) for all   0.
P
If the risk measure (4.7) is used, the term E   ni¼1 Ri xi þ can be
interpreted as the expected cost of a loan to cover the shortfall below , where
 is the interest rate. In this case problem (4.8) has a convenient linear
programming formulation, provided that the distribution of the returns is
discrete. It is very similar to the model for the semideviation risk measure
discussed below.
As we deal with a maximization problem, the semideviation risk measure
(4.5) should be modified to represent the shortfall below the mean:

1 ðZÞ :¼ E½ðEZ  ZÞþ :

Then the mean–risk model (4.7) takes on the form


( " !#)
X
n X
n X
n
Max ð1  Þ i xi þ E min i x i , R i xi
x0
i¼1 i¼1 i¼1
X
n
s:t: xi  W:
i¼1
Ch. 1. Stochastic Programming Models 53

For a discrete distribution of R we can convert the above mean–risk model


into a linear programming problem. Indeed, let k ¼ 1, . . . , K denote scenarios,
and let Rik be the realization of the return P of security i in scenario k. The
K
probabilities of scenarios are p1 , . . . , pK , k¼1 pk ¼ 1. Introducing new
variables  (representing the mean), and rk , k ¼ 1, . . . , K (representing the
worst case of return and its expected value) we obtain the problem

( )
X
K
Max ð1  Þ þ  pk rk
x0, , r
k¼1
X
n
s:t: i xi ¼ ,
i¼1
rk  , k ¼ 1, . . . , K,
Xn
rk  Rik xi , k ¼ 1, . . . , K,
i¼1
X
n
xi  W:
i¼1

It can be solved by standard linear programming techniques.

4.2 Min–max stochastic programming

In practical applications probability distributions of the involved uncertain


parameters are never known exactly and can be estimated at best. Even worse,
quite often the probabilities are assigned on an ad hoc basis by a subjective
judgment. Suppose now that there is a set S of probability distributions,
defined on a sample space ð, F Þ, which in some reasonable sense give a
choice of the underlying probability distributions. For instance, in Example 3
one may foresee that the random investment returns will generally increase,
stay flat or even decrease over the next T years. By specifying means,
representing a possible trend, and variability of the investment returns one
may assign a finite number of possible probability distributions for the
random data. Alternatively, certain properties, like first- and maybe second-
order moments, unimodality or specified marginal distributions of the random
data can be postulated. Typically, this leads to an infinite set S of considered
probability distributions.
There are two basic ways of dealing with such cases of several distributions.
One can assign a priori probability distribution over S, and hence reduce
the problem to a unique distribution. Suppose, for example, that S is
finite, say S :¼ fP1 , . . . , Pl g. Then by assigning probability P i to Pi ,
i ¼ 1, . . . , l, one obtains the unique (posteriori) distribution P :¼ li¼1 i Pi .
54 A. Ruszczyński and A. Shapiro

The distribution P represents an averaging over possible distributions Pi .


Again a choice of the a priori distribution f1 , . . . , l g is often subjective.
An alternative approach is to hedge against the worst distribution by
formulating the following min–max analogue of stochastic programs (1.14),
(2.12):

Min Max EP ½Fðx, !Þ: ð4:9Þ


x2X P2S

For the above problem to make sense it is assumed, of course, that for every
P 2 S the expectation EP ½Fðx, !Þ is well defined for all x 2 X .
In order to see a relation between these two approaches let us assume for
the sake of simplicity that the set S ¼ fP1 , . . . , Pl g is finite. Then problem (4.9)
can be written in the following equivalent way

Min z
ðx, zÞ2XR

s:t: fi ðxÞ  z, i ¼ 1, . . . , l, ð4:10Þ

where fi ðxÞ :¼ EPi ½Fðx, !Þ. Suppose further that problem (4.10), and hence
problem (4.9), is feasible and for every ! 2  the function Fð, !Þ is convex. It
follows from convexity of Fð, !Þ that the functions fi ðÞ are also convex, and
hence problem (4.10) is a convex programming problem. Then, by the duality
theory of convex programming,
P there exist Lagrange multipliers i  0,
i ¼ 1, . . . , l, such that li¼1 i ¼ 1 and problem (4.10) has the same optimal
value as the problem
( )
Xl
Min f ðxÞ :¼ i fi ðxÞ
x2X
i¼1

and the set of optimal solutions of (4.10) is included in the set of optimal
solutions of the above problem. Since f ðxÞ ¼ EP* ½Fðx, !Þ, where P* :¼
P l
i¼1 i Pi , we obtain that problem (4.9) is equivalent to the stochastic
programming problem

Min EP* ½Fðx, !Þ:


x2X

This shows that, under the assumption of convexity, the min–max


approach automatically generates an a priori distribution given by the
corresponding Lagrange multipliers. Of course, in order to calculate these
Lagrange multipliers one still has to solve the min–max problem. Existence of
such Lagrange multipliers, and hence of the a priori distribution, can be also
shown for an infinite set S under the assumption of convexity and mild
regularity conditions.
Ch. 1. Stochastic Programming Models 55

5 Appendix

In this section we briefly discuss some basic concepts and definitions from
probability and optimization theories, needed for the development of
stochastic programming models. Of course, a careful derivation of the
required results goes far beyond the scope of this book. The interested reader
may look into standard textbooks for a thorough development of these topics.

5.1 Random variables

Let  be an abstract set. It is said that a set F of subsets of  is a sigma


algebra (also called sigma field) if it is closed under standard set theoretic
operations, the set  belongs to F , and if Ai 2 F , i 2 N, then [i2N Ai 2 F . The
set  equipped with a sigma algebra F is called a sample or measurable space
and denoted ð, F Þ. A set A   is said to be F -measurable if A 2 F . It is said
that the sigma algebra F is generated by its subset A if any F -measurable set
can be obtained from sets belonging to A by set theoretic operations and by
taking the union of a countable family of sets from A. That is, F is generated
by A if F is the smallest sigma algebra containing A.
If  coincides with a finite dimensional space Rm , unless stated otherwise,
we always equip it with its Borel sigma algebra B. Recall that B is generated by
the set of open (or closed) subsets of Rm . A function P : F ! ½0, 1 is called a
probability measure on ð, F Þ if PðÞ ¼ 1, and for every collection
P Ai 2 F ,
i 2 N, such that Ai \ Aj ¼ ; for all i 6¼ j, we have Pð[i2N Ai Þ ¼ i2N PðAi Þ. A
sample space ð, F Þ equipped with a probability measfure P is called a
probability space and denoted ð, F , PÞ. Recall that F is said to be P-complete
if A  B, B 2 F and PðBÞ ¼ 0, implies that A 2 F , and hence PðAÞ ¼ 0. Since
it is always possible to enlarge the sigma algebra and extend the measure in
such a way as to get complete space, we can assume without loss of generality
that considered probability measures are complete. It is said that an event
A 2 F happens P-almost surely (a.s.) or almost everywhere (a.e.) if PðAÞ ¼ 1,
or equivalently PðnAÞ ¼ 0.
A mapping V :  ! Rm is said to be measurable if for any Borel set A 2 B,
its inverse image V 1 ðAÞ :¼ f! 2  : Vð!Þ 2 Ag is F -measurable.16 A
measurable mapping Vð!Þ from probability space ð, F , PÞ into Rm is called
a random vector. Note that the mapping V generates the probability measure
(also called the probability distribution) PðAÞ :¼ PðV 1 ðAÞÞ on ðRm , BÞ, which
provides all relevant probabilistic information about the considered random
vector. Clearly an event A 2 B happens P-almost surely iff the corresponding
event V 1 ðAÞ 2 F happens P-almost surely. In particular, a measurable
mapping (function) Z :  ! R is called a random variable. Its probability

16
In fact it suffices to verify F -measurability of V 1 ðAÞ for any family of sets generating the Borel
sigma algebra of Rm .
56 A. Ruszczyński and A. Shapiro

distribution is completely defined by the cumulative distribution function (cdf )


FZ ðzÞ :¼ PfZ  zg. Note that since the Borel sigma algebra of R is generated
by the family of half line intervals ð1, a, in order to verify measurability of
Zð!Þ it suffices to verify measurability of sets f! 2  : Zð!Þ  zg for all z 2 R.
We denote random vectors (variables) by capital letters, like V, Z etc., or ð!Þ,
and often suppress their explicit dependence on the elementary event !. Also
quite often we denote by the same symbol  a particular realization of the
random vector  ¼ ð!Þ. Usually, the meaning of such notation will be clear
from the context and will not cause any confusion. The coordinate functions
V1 ð!Þ, . . . , Vm ð!Þ of the m-dimensional random vector Vð!Þ are called its
components. While considering a random vector V we often talk about its
probability distribution as the joint distribution of its components (random
variables) V1 , . . . , Vm .
Since we often deal with random variables which are given as optimal
values of optimization problems we need to consider random variables Zð!Þ
which can also take values þ1 or 1, i.e., functions Z :  ! R, where
R :¼ R [ f1g [ fþ1g denotes the set of extended real numbers. Such
functions Z :  ! R are referred to as extended real valued functions.
Operations between real numbers and symbols 1 are clear except for such
operations as adding þ1 and 1 which should be avoided. Measurability of
an extended real valued function Zð!Þ is defined in the standard way, i.e., Zð!Þ
is measurable if the set f! 2  : Zð!Þ  zg is F -measurable for any z 2 R. A
measurable extended real valued function is called an (extended) random
variable. Note that here limz!þ1 FZ ðzÞ is equal to the probability of the
event f! 2  : Zð!Þ < þ1g and can be less than one if the event
f! 2  : Zð!Þ ¼ þ1g has a positive probability.
The expected value or expectation of an (extended) random variable
Z :  ! R is defined by the integral
Z
EP ½Z :¼ Zð!Þ dPð!Þ: ð5:1Þ


When there is no ambiguity as to what probability measure is considered,


we omit the subscript P and simply write E½Z. For a nonnegative valued
measurable function Zð!Þ such that the event  :¼ f! 2  : Zð!Þ ¼ þ1g has
zero probability the above integral is defined in the usual way and can take
value þ1. If probability of the event  is positive, then, by definition,
E½Z ¼ þ1. For a general (not necessarily nonnegative valued) random
variable we would like to define17 E½Z :¼ E½Zþ   E½ðZÞþ . In order to do
that we have to ensure that we do not add þ1 and 1. We say that the
expected value E½Z of an (extended real valued) random variable Zð!Þ is well
defined if it does not happen that both E½Zþ  and E½ðZÞþ  are þ1, in which
case E½Z ¼ E½Zþ   E½ðZÞþ . That is, in order to verify that the expected

17
Recall that Zþ :¼ maxf0, Zg.
Ch. 1. Stochastic Programming Models 57

value of Zð!Þ is well defined one has to check that Zð!Þ is measurable and
either E½Zþ  < þ1 or E½ðZÞþ  < þ1. Note that if Zð!Þ and Z 0 ð!Þ are two
(extended) random variables such that their expectations are well defined and
Zð!Þ ¼ Z 0 ð!Þ for all ! 2  except possibly on a set of measure zero, then
E½Z ¼ E½Z0 . It is said that Zð!Þ is P-integrable if the expected value E½Z is
well defined and finite. The expected value of a random vector is defined
componentwise.
If the random variable Zð!Þ can take only a countable (finite) number of
different values, say z1 , z2 , . . . , then it is said that Zð!Þ has a discrete
distribution (discrete distribution with a finite support). In such cases all
relevant probabilistic informationP is contained in the probabilities
pi :¼ PfZ ¼ zi g. In that case E½Z ¼ i pi zi .

5.2 Expectation functions

Consider now the expectation optimization problem (1.14) with X :¼ Rn .


For a given x we can view FðxÞ ¼ Fðx, !Þ as a random variable. We assume
that the expectation function

f ðxÞ ¼ E½Fðx, !Þ

is well defined, i.e., for every18 x 2 Rn the function Fðx, Þ is measurable, and
either E½FðxÞþ  < þ1 or E½ðFðxÞÞþ  < þ1. The (effective) feasible set of
the problem (1.4) is given by X \ ðdom f Þ, where

dom f :¼ fx 2 Rn : f ðxÞ < þ1g

denotes the domain of f. It is said that f is proper if f ðxÞ > 1 for all x 2 Rn
and dom f 6¼ ;.
From the theoretical point of view it is convenient to incorporate the
constraints ‘‘x 2 X’’ into the objective function. That is, for any ! 2  define

Fðx, !Þ, if x 2 X,
Fðx, !Þ :¼
þ1, if x 2
6 X:

Then problem (1.4) can be written in the form

Min E½Fðx, !Þ: ð5:2Þ


x2X

18
Since we are interested here in x belonging to the feasible set X, we can assume that f ðxÞ is well
defined for x 2 X.
58 A. Ruszczyński and A. Shapiro

Clearly, the domain of the expectation function E½Fð, !Þ is X \ ðdom f Þ,


i.e., it coincides with the feasible set of problem (1.14). In the remainder of this
section we assume that the objective function Fðx, !Þ is extended real valued
and that the corresponding constraints are already absorbed into the objective
function.
For "  0 we say that x* 2 X is an "-optimal solution of the problem of
minimization of f ðxÞ over X if

f ðx* Þ  inf f ðxÞ þ ":


x2X

If the problem is infeasible (that is, f ðxÞ ¼ þ1 for every x 2 X ), then any
x* 2 X is "-optimal. If the problem is feasible, and hence inf x2X f ðxÞ < þ1,
then "-optimality of x* implies that f ðx* Þ < þ1, i.e., that x* 2 dom f . Note
that by the nature of the minimization process, if inf x2X f ðxÞ > 1, then for
any " > 0 there always exists an "-optimal solution.
An extended real valued function f : Rn ! R is called lower semicontinuous
(lsc) at a point x0 if

lim inf f ðxÞ  f ðx0 Þ:


x ! x0

It is said that f is lower semicontinuous if it is lsc at every point x 2 Rn . It is not


difficult to show that f is lsc iff its epigraph

epi f :¼ fðx, Þ : f ðxÞ  g

is a closed subset of Rn  R.

Theorem 13. Let f : Rn ! R be a proper extended real valued function. Suppose


that f is lsc and its domain dom f is bounded. Then the set arg min x2 Rn f ðxÞ of its
optimal solutions is nonempty.
Since f is proper, its domain is nonempty, and hence inf x2Rn f ðxÞ < þ1.
Let us take a number c > inf x2Rn f ðxÞ, and consider the level set
S :¼ fx : f ðxÞ  cg. We have that the set S is nonempty, is contained in
dom f and hence is bounded, and is closed since f is lsc. Consequently, the set
S is compact, and clearly argmin x2 Rn f ðxÞ coincides with argmin x2 S f ðxÞ.
Therefore, the above theorem states the well known result that a lsc real
valued function attains its minimum over a nonempty compact subset of Rn .
The expected value function f ðxÞ :¼ E½Fðx, !Þ inherits various properties of
the functions Fð, !Þ. If for P-almost ! 2  the function Fð, !Þ is convex, then
the expected value function f ðÞ is also convex. Indeed, if  is finite, then f ðÞ is
a weighted sum of convex functions with nonnegative weights, and hence is
convex. The case of a general distribution can be then proved by passing to
a limit.
Ch. 1. Stochastic Programming Models 59

As it is shown in the next proposition the lower semicontinuity of the


expected value function follows from the lower semicontinuity of Fð, !Þ.

Proposition 14. Suppose that: (i) for P-almost every ! 2  the function Fð, !Þ
is lsc at x0 , (ii) Fðx, Þ is measurable for every x in a neighborhood of x0 , (iii)
there exists P-integrable function Zð!Þ such that Fðx, !Þ  Zð!Þ for P-almost
all ! 2  and all x in a neighborhood of x0 . Then for all x in a neighborhood of
x0 the expected value function f ðxÞ :¼ E½Fðx, !Þ is well defined and lsc at x0 .

Proof. It follows from assumptions (ii) and (iii) that f ðÞ is well defined in a
neighborhood of x0 . Under assumption (iii), it follows by Fatou’s lemma that
Z Z
lim inf Fðx, !Þ dPð!Þ  lim inf Fðx, !Þ dPð!Þ: ð5:3Þ
x ! x0   x ! x0

Together with (i) this implies lower semicontinuity of f at x0 . u

In particular, let us consider the probabilistic constraints (1.20). We can


write these constraints in the form


E 1ð0, þ1Þ ðGi ðx, !ÞÞ  , i ¼ 1, . . . , m: ð5:4Þ

Suppose further that for P-almost every ! 2  the functions Gi ð, !Þ are lsc,
and for all x, Gi ðx, Þ are measurable. Then functions 1ð0, þ1Þ ðGi ð, !ÞÞ are also
lsc for P-almost every ! 2 , and clearly are bounded. Consequently we
obtain by Proposition 14 that the corresponding expected value functions in
the left hand side of (5.4) are lsc. It follows that constraints (5.4), and hence
the probabilistic constraints (1.20), define a closed subset of Rn .

5.3 Optimal values and optimal solutions

We often have to deal with optimal value functions of min or max types.
That is, consider an extended real valued function h : Rn  Rm ! R and the
associated functions

fðxÞ :¼ infm hðx, yÞ and ðxÞ :¼ sup hðx, yÞ: ð5:5Þ


y2 R y2 Rm

Proposition 15. The following holds. (i) Suppose that for every y 2 Rm
the function hð, yÞ is lsc. Then the max-function ðxÞ is lsc. (ii) Suppose that
the function hð, Þ is lsc and there exists a bounded set S  Rm such that
dom hðx, Þ  S for all x 2 Rn . Then the min-function fðxÞ is lsc.
60 A. Ruszczyński and A. Shapiro

Proof. (i) The epigraph of the max-function ðÞ is given by the intersection of
the epigraphs of hð, yÞ, y 2 Rm . By lower semicontinuity of hð, yÞ, these
epigraphs are closed, and hence their intersection is closed. It follows that ðÞ
is lsc.
(ii) Consider a point x0 2 Rn and let fxk g be a sequence converging to x0
along which the lim inf x!x0 fðxÞ is attained. If limk!1 fðxk Þ ¼ þ1, then
clearly limk!1 fðxk Þ  fðx0 Þ, and hence f is lsc at x0 . Therefore, we can
assume that fðxk Þ < þ1 for all k. Let " be a given positive number and
yk 2 Rm be such that hðxk , yk Þ  fðxk Þ þ ". Since yk 2 dom hðxk , Þ  S and S
is bounded, by passing to a subsequence if necessary we can assume that yk
converges to a point y0 . By lower semicontinuity of hð, Þ we have then that
limk!1 fðxk Þ  hðx0 , y0 Þ  "  fðx0 Þ  ". Since " was arbitrary, it follows that
limk!1 fðxk Þ  fðx0 Þ, and hence fðÞ is lsc at x0 . This completes the
proof. u

Let F : Rn   ! R and let us now consider the optimal value

#ð!Þ :¼ inf Fðx, !Þ ð5:6Þ


x2X

and the corresponding set

X * ð!Þ :¼ arg min Fðx, !Þ ð5:7Þ


x2X

of optimal solutions. In order to deal with measurability of these objects we


need the following concepts.
Let G be a mapping from  into the set of subsets of Rn , i.e., G assigns to
each ! 2  a subset (possibly empty) Gð!Þ of Rn . We refer to G as a
multifunction and write G :  ! n
! R . It is said that G is closed valued if Gð!Þ is a
n
closed subset of R for every ! 2 . A closed valued multifunction G is said to
be measurable, if for every closed set A  Rn one has that the inverse image
G1 ðAÞ :¼ f! 2  : Gð!Þ \ A 6¼ fg is F -measurable. Note that measurability of
G implies that the domain

dom G :¼ f! 2  : Gð!Þ 6¼ ;g ¼ G1 ðRn Þ

of G is an F -measurable subset of .
It is said that a mapping G : dom G ! Rn is a selection of G if Gð!Þ 2 Gð!Þ
for all ! 2 dom G. If, in addition, the mapping G is measurable, it is said that
G is a measurable selection of G.

Theorem 16 (Castaing Representation theorem). A closed valued multifunction


G:  ! n
! R is measurable iff its domain is an F -measurable subset of  and there
exists a countable family
 fGi gi2N , of measurable selections of G such that for
every ! 2 , the set Gi ð!Þ : i 2 N is dense in Gð!Þ.
Ch. 1. Stochastic Programming Models 61

It follows from the above theorem that if G :  ! n


! R is a closed valued
measurable multifunction, then there exists at least one measurable selection
of G.

Definition 17. It is said that the function ðx, !Þ ° Fðx, !Þ is random lower
semicontinuous if the associated epigraphical multifunction ! ° epi Fð, !Þ is
closed valued and measurable.
Note that close valuedness of the epigraphical multifunction means that for
every ! 2 , the epigraph epi Fð, !Þ is a closed subset of Rn  R, i.e., that
Fð, !Þ is lsc.

Theorem 18. Suppose that the sigma algebra F is P-complete. Then an extended
real valued function F : Rn   ! R is random lsc iff the following two
properties hold: (i) for every ! 2 , the function Fð, !Þ is lsc, (ii) the function
Fð, Þ is measurable with respect to the sigma algebra of Rn   given by the
product of the sigma algebras B and F .
A large class of random lower semicontinuous functions is given by the
so-called Carathe´odory functions, i.e., real valued functions F : Rn   ! R
such that Fðx, Þ is F -measurable for every x 2 Rn and Fð, !Þ continuous for
a.e. ! 2 .

Theorem 19. Let F : Rn   ! R be a random lsc function. Then the optimal


value function #ð!Þ and the optimal solution multifunction X * ð!Þ are both
measurable.
Note that it follows from lower semicontinuity of Fð, !Þ that the optimal
solution multifunction X * ð!Þ is closed valued. Note also that if Fðx, !Þ is
random lsc and G :  ! n
! R is a closed valued measurable multifunction, then
the function

Fðx, !Þ, if x 2 Gð!Þ,
Fðx, !Þ :¼
þ1, if x 62 Gð!Þ,

is also random lsc. Consequently the corresponding optimal value


! ° inf x2Gð!Þ Fðx, !Þ and the optimal solution multifunction ! °
argminx2Gð!Þ Fðx, !Þ are both measurable, and hence by the measurable selection
theorem, there exists a measurable selection xð!Þ 2 argminx2Gð!Þ Fðx, !Þ.

Theorem 20. Let F : Rnþm   ! R be a random lsc function and

#ðx, !Þ :¼ infm Fðx, y, !Þ ð5:8Þ


y2R

be the associated optimal value function. Suppose that there exists a bounded set
S  Rm such that domFðx,  , !Þ  S for all ðx, !Þ 2 Rn  . Then the optimal
value function #ðx, !Þ is random lsc.
62 A. Ruszczyński and A. Shapiro

Let us finally observe that the above framework of random lsc functions
is aimed at minimization problems. Of course, the problem of maximization
of E½Fðx, !Þ is equivalent to minimization of E½Fðx, !Þ. Therefore, for
maximization problems one would need the comparable concept of random
upper semicontinuous functions.

6 Bibliographic notes

Stochastic programming with recourse originated in the works of Beale


(1955) and Dantzig (1955). Basic properties of two-stage problems were
investigated by Wets (1966), Walkup and Wets (1967, 1969) and Kall (1976).
A comprehensive treatment of the theory and numerical methods for
expectation models can be found in Birge and Louveaux (1997). Simulation-
based approaches to stochastic optimization were discussed by various
authors, see Chapter ‘‘Monte Carlo Sampling Methods’’.
Models involving constraints on probability were introduced by Charnes
et al. (1958), Miller and Wagner (1965), and Prékopa (1970). Prékopa (1995)
discusses in detail the theory and numerical methods for linear chance-
constrained models. Applications to finance are discussed by Dowd (1997).
Klein Haneveld (1986) introduced the concept of integrated chance constraints,
which are the predecessors of conditional value at risk constraints of Uryasev
and Rockafellar (2001).
A general discussion of interchangeability of minimization and integration
operations can be found in Rockafellar and Wets (1998). Proposition 5 is a
particular case of Theorem 14.60 in Rockafellar and Wets (1998).
Expected value of perfect information is a classical concept in decision
theory (see, e.g., Raiffa (1968)). In stochastic programming this and related
concepts were analyzed first by Madansky (1960). Other advances are due to
Dempster (1981) and Birge (1982).
Early contributions to multistage stochastic programming models appeared
in Marti (1975), Beale et al. (1980), Louveaux (1980), Birge (1985), Noël and
Smeers (1986) and Dempster (1981). Varaiya and Wets (1989) discuss
relations of multistage stochastic programming and stochastic control. For
other examples and approaches to multistage modeling see Birge and
Louveaux (1997).
Robust approaches to stochastic programming were initiated by Mulvey
et al. (1995). Proposition 11 is based on the work of Takriti and Ahmed
(2002). Mean–risk models in portfolio optimization were introduced by
Markowitz (1952). For a general perspective, see Markowitz (1987) and
Luenberger (1998). Mean–absolute deviation models for portfolio problems
were introduced by Konno and Yamazaki (1991). Semideviations and other
risk measures were analyzed by Ogryczak and Ruszczyński (1999, 2001, 2002).
Min–max approach to stochastic programming was initiated in Z̆áčková
(1966), Dupačová (1980), Dupačová (1987).
Ch. 1. Stochastic Programming Models 63

There are many good textbooks on probability and measure theory, e.g.,
Billingsley (1995), to which we refer for a thorough discussion of such basic
concepts as random variables, probability space, etc. Also a proof of Fatou’s
lemma, used in the proof of Proposition 14, can be found there. For an
additional discussion of the expected value function see section ‘‘Expectation
Functions’’ of Chapter 2. Continuity and differentiability properties of the
optimal value functions, of the form defined in equation (5.5), were studied
extensively in the optimization literature (see, e.g., Bonnans and Shapiro
(2000) and the references therein).
Measurable selection theorem (Theorem 16) is due to Castaing. A thorough
discussion of measurable mappings and selections can be found in Castaing
and Valadier (1977), Ioffe and Tihomirov (1979) and Rockafellar and Wets
(1998). Random lower semicontinuous functions are called normal integrands
(see Definition 14.27 in Rockafellar and Wets (1998)) by some authors. Proofs
of theorems 18, 19 and 20 can be found in the section on normal integrands of
Rockafellar and Wets (1998).

References

Beale, E.M.L. (1955). On minimizing a convex function subject to linear inequalities. Journal of the
Royal Statistical Society Series B 17, 173–184.
Beale, E.M.L., J.J.H. Forrest, C.J. Taylor (1980). Multi-time-period stochastic programming, in:
M.A.H. Dempster (ed.), Stochastic Programming, Academic Press, New York, pp. 387–402.
Billingsley, P. (1995). Probability and Measure, John Wiley & Sons, New York.
Birge, J.R. (1982). The value of the stochastic solution in stochastic linear programs with fixed
recourse. Mathematical Programming 33, 314–325.
Birge, J.R. (1985). Decomposition and partitioning methods for multistage stochastic linear programs.
Operations Research, 989–1007.
Birge, J.R., F.V. Louveaux (1997). Introduction to Stochastic Programming, Springer-Verlag,
New York.
Bonnans, J.F., A. Shapiro (2000). Perturbation Analysis of Optimization Problems, Springer-Verlag,
New York, NY.
Castaing, C., M. Valadier (1977). Convex Analysis and Measurable Multifunctions, Lecture Notes
in Mathematics, Vol. 580, Springer-Verlag, Berlin.
Charnes, A., W.W. Cooper, G.H. Symonds (1958). Cost horizons and certainty equivalents: an
approach to stochastic programming of heating oil. Management Science 4, 235–263.
Dantzig, G.B. (1955). Linear programming under uncertainty. Management Science 1, 197–206.
Dempster, M.A.H. (1981). The expected value of perfect information in the optimal evolution of
stochastic systems, in: M. Arato, D. Vermes, A.V. Balakrishnan (eds.), Stochastic Differential
Systems, Lecture Notes in Control and Information Systems, Vol. 36. Springer-Verlag, Berlin,
pp. 25–41.
Dowd, K. (1997). Beyond Value at Risk. The Science of Risk Management, Wiley, New York.
Dupačová, J. (1980). On minimax decision rule in stochastic linear programming Studies on
Mathematical Programming, A. Prékopa Akadémiai Kiadó, Budapest, pp. 47–60.
Dupačová, J. (1987). The minimax approach to stochastic programming and an illustrative
application. Stochastics 20, 73–88.
Ioffe, A.D., V.M. Tihomirov (1979). Theory of Extremal Problems, North-Holland Publishing
Company, Amsterdam.
64 A. Ruszczyński and A. Shapiro

Kall, P. (1976). Stochastic Linear Programming, Springer-Verlag, Berlin.


Klein Haneveld, W.K. (1986). Duality in Stochastic Linear and Dynamic Programming, Lecture Notes
in Economic and Mathematical Systems, Vol. 274, Springer-Verlag, New York.
Konno, H., H. Yamazaki (1991). Mean–absolute deviation portfolio optimization model and its
application to Tokyo stock market. Management Science 37, 519–531.
Louveaux, F.V. (1980). A solution method for multistage stochastic programs with recourse with
applications to an energy investment problem. Operations Research 28, 889–902.
Luenberger, D.G. (1998). Investment Science, Oxford University Press, New York.
Madansky, A. (1960). Inequalities for stochastic linear programming problems. Management Science 6,
197–204.
Markowitz, H.M. (1952). Portfolio selection. Journal of Finance 7, 77–91.
Markowitz, H.M. (1987). Mean–Variance Analysis in Portfolio Choice and Capital Markets, Blackwell,
Oxford.
Marti, K. (1975). Über zwei- und mehrstufige stochastische Kontrollprobleme, in: Vorträge der
Wissenschaftlichen Jahrestagung der Gesellschaft für Angewandte Mathematik und Mechanik
(Bochum, 1974). Zeitschrift für Angewandte Mathematik und Mechanik 55, T281–T282.
Miller, L.B., H. Wagner (1965). Chance-constrained programming with joint constraints. Operations
Research 13, 930–945.
Mulvey, J.M., R.J. Vanderbei, S.A. Zenios (1995). Robust optimization of large-scale systems.
Operations Research 42, 264–281.
Noël, M.-C., Y. Smeers (1986). On the use of nested decomposition for solving nonlinear multistage
stochastic programs, in: Stochastic Programming (Gargnano, 1983), Lecture Notes in Control and
Inform. Sci., Vol. 76, Springer-Verlag, Berlin, pp. 235–246.
Ogryczak, W., A. Ruszczynski (1999). From stochastic dominance to mean–risk models: semidevia-
tions as risk measures. European Journal of Operational Research 116, 33–50.
Ogryczak, W., A. Ruszczyński (2001). On consistency of stochastic dominance and mean–
semideviation models. Mathematical Programming 89, 217–232.
Ogryczak, W., A. Ruszczynski (2002). Dual stochastic dominance and related mean–risk models.
SIAM Journal on Optimization 13, 60–78.
Pflug, G.Ch. (1996). Optimization of Stochastic Models. The Interface Between Simulation and
Optimization, Kluwer Academic Publishers, Boston, MA.
Prékopa, A. (1970). On probabilistic constrained programming: Proceedings of the Princeton
Symposium on Mathematical Programming, Princeton University Press, Princeton, pp. 113–138.
Prékopa, A. (1995). Stochastic Programming, Kluwer, Dordrecht, Boston.
Raiffa, H. (1968). Decision Analysis, Addison–Wesley, Reading, MA.
Rockafellar, R.T., R.J.-B. Wets (1998). Variational Analysis, Springer-Verlag, Berlin.
Takriti, S., S. Ahmed (2002). On robust optimization of two-stage systems. Mathematical
Programming. To appear.
Uryasev, S., R.T. Rockafellar (2001). Conditional value-at-risk: optimization approach). Stochastic
optimization: algorithms and applications (Gainesville, FL, 2000), Applied Optimization, Vol. 54,
Kluwer Academic Publishers, Dordrecht, pp. 411–435.
Varaiya, P., R.J.-B. Wets (1989). Stochastic dynamic optimization approaches and computation, in:
M. Iri, K. Tanabe (eds.), Mathematical Programming: Recent Developments and Applications,
Kluwer, Dordrecht, pp. 309–332.
Walkup, D., R.J.-B. Wets (1967). Stochastic programs with recourse. SIAM Journal on Applied
Mathematics 15, 1299–1314.
Walkup, D., R.J.-B. Wets (1969). Stochastic programs with recourse II: on the continuity of the
objective. SIAM Journal on Applied Mathematics 15, 1299–1314.
Wets, R.J.-B. (1966). Programming under uncertainty: the equivalent convex program. SIAM Journal
on Applied Mathematics 14, 89–105.
Z̆ácková, J. (1966). On minimax solutions of stochastic linear programming problems. C̆as. Pest. Mat.
91, 423–430.

You might also like