The Sample Average Approximation Method For Stochastic Discrete Optimization
The Sample Average Approximation Method For Stochastic Discrete Optimization
Key words. stochastic programming, discrete optimization, Monte Carlo sampling, law of large
numbers, large deviations theory, sample average approximation, stopping rules, stochastic knapsack
problem
PII. S1052623499363220
allow estimation of g(x) for each solution x. Examples of this literature are Hochberg
and Tamhane [12]; Bechhofer, Santner, and Goldsman [2]; Futschik and Pflug [7, 8];
and Nelson et al. [17]. Another approach that has been studied consists of modifying
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
the well-known simulated annealing method in order to account for the fact that the
objective function values are not known exactly. Work on this topic includes Gelfand
and Mitter [9], Alrefaei and Andradóttir [1], Fox and Heine [6], Gutjahr and Pflug [10],
and Homem-de-Mello [13]. A discussion of two-stage stochastic integer programming
problems with recourse can be found in Birge and Louveaux [3]. A branch and
bound approach to solving stochastic integer programming problems was suggested
by Norkin, Ermoliev, and Ruszczyński [18] and Norkin, Pflug, and Ruszczyński [19].
Schultz, Stougie, and Van der Vlerk [20] suggested an algebraic approach to solving
stochastic programs with integer recourse by using a framework of Gröbner basis
reductions.
In this paper we study a Monte Carlo simulation–based approach to stochastic
discrete optimization problems. The basic idea is simple indeed—a random sample of
W is generated and the expected value function is approximated by the corresponding
sample average function. The obtained sample average optimization problem is solved,
and the procedure is repeated several times until a stopping criterion is satisfied. The
idea of using sample average approximations for solving stochastic programs is a
natural one and was used by various authors over the years. Such an approach was
used in the context of a stochastic knapsack problem in a recent paper of Morton and
Wood [16].
The organization of this paper is as follows. In the next section we discuss a
statistical inference of the sample average approximation method. In particular, we
show that with probability approaching 1 exponentially fast with increase of the sam-
ple size, an optimal solution of the sample average approximation problem provides
an exact optimal solution of the “true” problem (1.1). In section 3 we outline an algo-
rithm design for the sample average approximation approach to solving (1.1), and in
particular we discuss various stopping rules. In section 4 we present a numerical ex-
ample of the sample average approximation method applied to a stochastic knapsack
problem, and section 5 gives conclusions.
2. Convergence results. As mentioned in the introduction, we are interested in
solving stochastic discrete optimization problems of the form (1.1). Let W 1 , . . . , W N
be an independently and identically distributed (i.i.d.) random sample of N realiza-
tions of the random vector W . Consider the sample average function
N
1
ĝN (x) := G(x, W j )
N j=1
We refer to (1.1) and (2.1) as the “true” (or expected value) and sample average
approximation (SAA) problems, respectively. Note that E[ĝN (x)] = g(x).
Since the feasible set S is finite, problems (1.1) and (2.1) have nonempty sets of
optimal solutions, denoted S ∗ and ŜN , respectively. Let v ∗ and v̂N denote the optimal
values,
v ∗ := min g(x) and v̂N := min ĝN (x),
x∈S x∈S
SAMPLE AVERAGE APPROXIMATION 481
of the respective problems. We also consider sets of ε-optimal solutions. That is, for
ε ≥ 0, we say that x̄ is an ε-optimal solution of (1.1) if x̄ ∈ S and g(x̄) ≤ v ∗ + ε. The
sets of all ε-optimal solutions of (1.1) and (2.1) are denoted by S ε and ŜNε , respectively.
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
Clearly for ε = 0 set S ε coincides with S ∗ , and ŜNε coincides with ŜN .
2.1. Convergence of objective values and solutions. The following propo-
sition establishes convergence with probability one (w.p.1) of the above statistical
estimators. By the statement “an event happens w.p.1 for N large enough” we mean
that for P —almost every realization ω = {W 1 , W 2 , . . .} of the random sequence—
there exists an integer N (ω) such that the considered event happens for all samples
{W 1 , . . . , W n } from ω with n ≥ N (ω). Note that in such a statement the integer
N (ω) depends on the sequence ω of realizations and therefore is random.
Proposition 2.1. The following two properties hold: (i) v̂N → v ∗ w.p.1 as
N → ∞, and (ii) for any ε ≥ 0 the event {ŜNε ⊂ S ε } happens w.p.1 for N large
enough.
Proof. It follows from the (strong) law of large numbers that for any x ∈ S, ĝN (x)
converges to g(x) w.p.1 as N → ∞. Since the set S is finite and the union of a finite
number of sets each of measure zero also has measure zero, it follows that, w.p.1,
ĝN (x) converges to g(x) uniformly in x ∈ S. That is,
Since for any x ∈ S \ S ε it holds that g(x) > v ∗ + ε and the set S is finite, it follows
that ρ(ε) > 0.
Let N be large enough such that δN < ρ(ε)/2. Then v̂N < v ∗ + ρ(ε)/2, and for
any x ∈ S \ S ε it holds that ĝN (x) > v ∗ + ε + ρ(ε)/2. It follows that if x ∈ S \ S ε , then
ĝN (x) > v̂N + ε and hence x does not belong to the set ŜNε . The inclusion ŜNε ⊂ S ε
follows, which completes the proof.
Note that if δ is a number such that 0 ≤ δ ≤ ε, then S δ ⊂ S ε and ŜNδ ⊂ ŜNε .
Consequently it follows by the above proposition that for any δ ∈ [0, ε] the event
{ŜNδ ⊂ S ε } happens w.p.1 for N large enough. It also follows that if S ε = {x∗ } is
a singleton, then ŜNε = {x∗ } w.p.1 for N large enough. In particular, if the true
problem (1.1) has a unique optimal solution x∗ , then w.p.1 for sufficiently large N
the approximating problem (2.1) has a unique optimal solution x̂N and x̂N = x∗ . Also
consider the set A := {g(x) − v ∗ : x ∈ S}. The set A is a subset of the set R+ of
nonnegative numbers and |A| ≤ |S|, and hence A is finite. It follows from the above
analysis that for any ε ∈ R+ \ A the event {ŜNε = S ε } happens w.p.1 for N large
enough.
2.2. Convergence rates. The above results do not say anything about the rates
of convergence of v̂N and ŜNδ to their true counterparts. In this section we investigate
such rates of convergence. By using the theory of large deviations (LD), we show that,
under mild regularity conditions and δ ∈ [0, ε], the probability of the event {ŜNδ ⊂ S ε }
approaches 1 exponentially fast as N → ∞. Next we briefly outline some background
of the LD theory.
482 A. J. KLEYWEGT, A. SHAPIRO, AND T. HOMEM-DE-MELLO
Consider a random (real valued) variable X having mean µ := E[X]. Its moment-
generating function M (t) := E[etX ] is viewed as an extended valued function, i.e., it
can take value +∞. It holds that M (t) > 0 for all t ∈ R, M (0) = 1, and the domain
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
of the logarithmic moment-generating function Λ(t) := log M (t), is called the (LD)
rate function of X. It is possible to show that both functions Λ(·) and I(·) are convex.
Consider an i.i.d.
sequence X1 , . . . , XN of replications of the random variable X,
N
and let ZN := N −1 i=1 Xi be the corresponding sample average. Then for any real
numbers a and t ≥ 0 it holds that P (ZN ≥ a) = P (etZN ≥ eta ), and hence it follows
from Chebyshev’s inequality that
P (ZN ≥ a) ≤ e−ta E etZN = e−ta [M (t/N )]N .
By taking the logarithm of both sides of the above inequality, changing variables
t := t/N , and minimizing over t ≥ 0, it follows for a ≥ µ that
1
(2.5) log [P (ZN ≥ a)] ≤ −I(a).
N
Note that for a ≥ µ it suffices to take the supremum in the definition (2.4) of I(a)
for t ≥ 0, and therefore this constraint is omitted. Inequality (2.5) corresponds to the
upper bound of Cramér’s LD theorem.
The constant I(a) in (2.5) gives, in a sense, the best possible exponential rate at
which the probability P (ZN ≥ a) converges to zero for a > µ. This follows from the
lower bound
1
(2.6) lim inf log [P (ZN ≥ a)] ≥ −I(a)
N →∞ N
of Cramér’s LD theorem. A simple sufficient condition for (2.6) to hold is that the
moment-generating function M (t) is finite valued for all t ∈ R. For a thorough
discussion of the LD theory, the interested reader is referred to Dembo and Zeitouni [5].
The rate function I(z) has the following properties: The function I(z) is convex
and attains its minimum at z = µ, and I(µ) = 0. Moreover, suppose that the moment-
generating function M (t) is finite valued for all t in a neighborhood of t = 0. Then X
has finite moments, and it follows by the dominated convergence theorem that M (t),
and hence the function Λ(t), are infinitely differentiable at t = 0, and Λ (0) = µ.
Consequently for a > µ the derivative of ψ(t) := ta − Λ(t) at t = 0 is greater than
zero, and hence ψ(t) > 0 for t > 0 small enough. In that case it follows that I(a) > 0.
Also, I (µ) = 0 and I (µ) = σ −2 , and hence by Taylor’s expansion
(a − µ)2
(2.7) I(a) = + o(|a − µ|2 ).
2σ 2
Consequently, for a close to µ one can approximate I(a) by (a − µ)2 /(2σ 2 ). Moreover,
for any !˜ > 0 there is a neighborhood N of µ such that
(a − µ)2
(2.8) I(a) ≥ ∀ a ∈ N.
(2 + !˜)σ 2
SAMPLE AVERAGE APPROXIMATION 483
(2.9) ŜNδ ⊂ S ε = {ĝN (x) ≤ ĝN (y) + δ} ,
x∈S\S ε y∈S
and hence
(2.10) P ŜNδ ⊂ S ε ≤ P {ĝN (x) ≤ ĝN (y) + δ} .
x∈S\S ε y∈S
Consider a mapping u : S \ S ε →
S. It follows from (2.10) that
(2.11) P ŜNδ ⊂ S ε ≤ P ĝN (x) − ĝN (u(x)) ≤ δ .
x∈S\S ε
We assume that the mapping u(x) is chosen in such a way that for some ε∗ > ε
Note that if u(·) is a mapping from S \ S ε into the set S ∗ , i.e., u(x) ∈ S ∗ for all
x ∈ S \ S ε , then (2.12) holds with
and that ε∗ > ε since the set S is finite. Therefore a mapping u(·) that satisfies
condition (2.12) always exists.
For each x ∈ S \ S ε , let
Let Ix (·) denote the LD rate function of H(x, W ). Inequality (2.14) together with (2.5)
implies that
(2.15) P ŜNδ ⊂ S ε ≤ e−N Ix (−δ) .
x∈S\S ε
It is important to note that the above inequality (2.15) is not asymptotic and is valid
for any random sample of size N .
484 A. J. KLEYWEGT, A. SHAPIRO, AND T. HOMEM-DE-MELLO
variable, or if H(x, ·) grows at most linearly and W has a distribution from the
exponential family.
Proposition 2.2. Let ε and δ be nonnegative numbers such that δ ≤ ε. Then
(2.16) P ŜNδ ⊂ S ε ≤ |S \ S ε |e−N γ(δ,ε) ,
where
Inequality (2.18) means that the probability of the event {ŜNδ ⊂ S ε } approaches 1
exponentially fast as N → ∞. This suggests that Monte Carlo sampling, combined
with an efficient method for solving the deterministic SAA problem, can efficiently
solve the type of problems under study, provided that the constant γ(δ, ε) is not “too
small.”
It follows from (2.7) that
2
(−δ − E[H(x, W )]) (ε∗ − δ)2
(2.19) Ix (−δ) ≈ ≥ ,
2σx2 2σx2
where
2
(2.21) σmax := max Var[G(u(x), W ) − G(x, W )].
x∈S\S ε
A result similar to the one of Proposition 2.2 was derived in [14] by using slightly
different arguments. The LD rate functions of the random variables G(x, W ) were
used there, which resulted in estimates of the exponential constant similar to the
estimate (2.20) but with σx2 replaced by the variance of G(x, W ). Due to a positive
correlation between G(x, W ) and G(u(x), W ), the variance of G(x, W ) − G(u(x), W )
tends to be smaller than the variance of G(x, W ), thereby providing a smaller upper
SAMPLE AVERAGE APPROXIMATION 485
To illustrate some implications of the bound (2.16) for issues of the complexity of
solving stochastic problems, let us fix a significance level α ∈ (0, 1), and estimate the
sample size N which is needed for the probability P (ŜNδ ⊂ S ε ) to be at least 1 − α.
By requiring that the right-hand side of (2.16) be less than or equal to α, we obtain
that
1 |S \ S ε |
(2.22) N ≥ log .
γ(δ, ε) α
Moreover, it follows from (2.8) and (2.17) that γ(δ, ε) ≥ (ε − δ)2 /(3σmax
2
) for all ε ≥ 0
sufficiently small. Therefore it holds that for all ε > 0 small enough and δ ∈ [0, ε), a
sufficient condition for (2.22) is that
2
3σmax |S|
(2.23) N ≥ 2
log .
(ε − δ) α
It appears that the bound (2.23) may be too conservative for practical estimates
of the required sample sizes (see the discussion in section 4.2). However, the esti-
mate (2.23) has interesting consequences for complexity issues. A key characteristic
of (2.23) is that N depends only logarithmically both on the size of the feasible set S
and on the tolerance probability α. An important implication of such behavior is the
following. Suppose that (i) the size of the feasible set S grows at most exponentially
2
in the length of the problem input, (ii) the variance σmax grows polynomially in the
length of the problem input, and (iii) the complexity of finding a δ-optimal solution
for (2.1) grows polynomially in the length of the problem input and the sample size
N . Then a solution can be generated in time that grows polynomially in the length of
the problem input such that, with probability at least 1 − α, the solution is ε-optimal
for (1.1). A careful analysis of these issues is beyond the scope of this paper, and
requires further investigation.
Now suppose for a moment that the true problem has unique optimal solution
x∗ , i.e., S ∗ = {x∗ } is a singleton, and consider the event that the SAA problem (2.1)
has unique optimal solution x̂N and x̂N = x∗ . We denote that event by {x̂N = x∗ }.
Furthermore, consider the mapping u : S \ S ε → {x∗ }, i.e., u(x) ≡ x∗ , and the
corresponding constant γ ∗ := γ(0, 0). That is,
with Ix (·) being the LD rate function of G(x∗ , W )−G(x, W ). Note that E[G(x∗ , W )−
G(x, W )] = g(x∗ ) − g(x), and hence E[G(x∗ , W ) − G(x, W )] < 0 for every x ∈ S \
{x∗ }. Therefore, if Assumption (A) holds, i.e., the moment-generating function of
G(x∗ , W ) − G(x, W ) is finite valued in a neighborhood of 0, then γ ∗ > 0.
Proposition 2.3. Suppose that the true problem has unique optimal solution x∗
and the moment-generating function of each random variable G(x∗ , W ) − G(x, W ),
x ∈ S \ {x∗ }, is finite valued on R. Then
1
(2.25) lim log [1 − P (x̂N = x∗ )] = −γ ∗ .
N →∞ N
486 A. J. KLEYWEGT, A. SHAPIRO, AND T. HOMEM-DE-MELLO
N →∞ N
Consider the complement of the event {x̂N = x∗ }, which is denoted {x̂N = x∗ }. The
event {x̂N = x∗ } is equal to the union of the events {ĝN (x) ≤ ĝN (x∗ )}, x ∈ S \ {x∗ }.
Therefore, for any x ∈ S \ {x∗ },
By using the lower bound (2.6) of Cramér’s LD theorem, it follows that the inequality
1
(2.27) lim inf log [1 − P (x̂N = x∗ )] ≥ −Ix (0)
N →∞ N
holds for every x ∈ S \ {x∗ }. Inequalities (2.26) and (2.27) imply (2.25).
Suppose that S ∗ = {x∗ } and consider the number
Var[G(x, W ) − G(x∗ , W )]
(2.28) κ := max ∗ .
x∈S\{x } [g(x) − g(x∗ )]2
It follows from (2.7) and (2.24) that κ ≈ 1/(2γ ∗ ). One can view κ as a condition
number of the true problem. That is, the sample size required for the event {x̂N = x∗ }
to happen with a given probability is roughly proportional to κ. The number defined
in (2.28) can be viewed as a discrete version of the condition number introduced in [22]
for piecewise linear continuous problems.
For a problem with a large feasible set S, the number minx∈S\{x∗ } g(x) − g(x∗ ),
although positive if S ∗ = {x∗ }, tends to be small. Therefore the sample size required
to calculate the exact optimal solution x∗ with a high probability could be very
large, even if the optimal solution x∗ is unique. For ill-conditioned problems it makes
sense to search for approximate (ε-optimal) solutions of the true problem. In that
respect the bound (2.16) is more informative since the corresponding constant γ(δ, ε)
is guaranteed to be at least of the order (ε − δ)2 /(2σmax2
).
It is also insightful to note the behavior of the condition number
k κ for a discrete
optimization problem with linear objective function G(x, W ) := i=1 Wi xi and fea-
sible set S given by the vertices of the unit hypercube in Rk , i.e., S := {0, 1}k . In
that case the corresponding true optimization problem is
k
min g(x) = w̄i xi ,
x∈{0,1}k
i=1
where w̄i := E[Wi ]. Suppose that w̄i > 0 for all i ∈ {1, . . . , k}, and hence the origin
is the unique optimal solution of the true problem, i.e., S ∗ = {0}. Let
Var[Wi ]
ϑ2i :=
(E[Wi ])2
Cov[Wi , Wj ]
ρij :=
Var[Wi ] Var[Wj ]
SAMPLE AVERAGE APPROXIMATION 487
denote the correlation coefficient between Wi and Wj . It follows that for any x ∈
{0, 1}k \ {0},
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
k k k
Var W x
j=1 ρij ϑi w̄i xi ϑj w̄j xj
i=1 i i
i=1
2 = k k ≤ max ϑ2i .
k w̄ x w̄ x i∈{1,...,k}
i=1 w̄i xi
i=1 j=1 i i j j
Thus
k
Var i=1 Wi xi
κ = max 2 = max ϑ2i .
x∈{0,1}k \{0} k i∈{1,...,k}
w̄
i=1 i ix
The last equality follows because the maximum is attained by setting xi = 1 for the
index i for which Wi has the maximum squared coefficient of variation ϑ2i , and setting
xj = 0 for the remaining variables. Thus, in this example the condition number κ is
equal to the maximum squared coefficient of variation of the Wi ’s.
2.3. Asymptotics of sample objective values. Next we discuss the asymp-
totics of the SAA optimal objective value v̂N . For any subset S of S the inequal-
ity v̂N ≤ minx∈S ĝN (x) holds. In particular, by taking S = S ∗ , it follows that
v̂N ≤ minx∈S ∗ ĝN (x), and hence
E[v̂N ] ≤ E min∗ ĝN (x) ≤ min∗ E[ĝN (x)] = v ∗ .
x∈S x∈S
That is, the estimator v̂N has a negative bias (cf. Norkin, Pflug, and Ruszczyński [19]
and Mak, Morton, and Wood [15]).
It follows from Proposition 2.1 that w.p.1, for N sufficiently large, the set ŜN of
optimal solutions of the SAA problem is included in S ∗ . In that case it holds that
Since the opposite inequality always holds, it follows that, w.p.1, v̂N√−minx∈S ∗ ĝN (x) =
0 for N√large enough. Multiplying both sides of this equation by N it follows that,
w.p.1, N [v̂N − minx∈S ∗ ĝN (x)] = 0 for N large enough, and hence
√
(2.29) lim N v̂N − min∗ ĝN (x) = 0 w.p.1.
N →∞ x∈S
Since
√ convergence w.p.1 implies convergence in probability, it follows from (2.29) that
N [v̂N − minx∈S ∗ ĝN (x)] converges in probability to zero, i.e.,
exists.
√ Then it follows by the central limit theorem (CLT) that, for any x ∈ S,
N [ĝN (x) − g(x)] converges in distribution to a normally distributed variable Z(x)
with zero mean and variance σ 2 (x). Moreover, again by the CLT, random variables
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
Z(x) have the same covariance function as G(x, W ), i.e., the covariance between
Z(x) and Z(x ) is equal to the covariance between G(x, W ) and G(x , W ) for any
x, x ∈ S. Hence the following result is obtained (it is similar to an asymptotic result
for stochastic programs with continuous decision variables which was derived in [21]).
We use “⇒” to denote convergence in distribution.
Proposition 2.4. Suppose that variances σ 2 (x), defined in (2.30), exist for every
x ∈ S ∗ . Then
√
(2.31) N (v̂N − v ∗ ) ⇒ min∗ Z(x),
x∈S
where Z(x) are normally distributed random variables with zero mean and the co-
variance function given by the corresponding covariance function of G(x, W ). In
particular, if S ∗ = {x∗ } is a singleton, then
√
(2.32) N (v̂N − v ∗ ) ⇒ N (0, σ 2 (x∗ )).
Although for any given x the mean (expected value) of Z(x) is zero, the expected
value of the minimum of Z(x) over a subset S of S can be negative and tends to be
smaller for a larger set S . Therefore, it follows from (2.31) that for ill-conditioned
problems, where the set of optimal or nearly optimal solutions is large, the estimate
v̂N of v ∗ tends to be heavily biased. Note that convergence in distribution does not
necessarily imply convergence of the √ corresponding means. Under mild additional
conditions it follows from (2.31) that N [E(v̂N ) − v ∗ ] → E[minx∈S ∗ Z(x)].
3. Algorithm design. In the previous section we established a number of con-
vergence results for the SAA method. The results describe how the optimal value v̂N
and the set ŜNε of ε-optimal solutions of the SAA problem converge to their true coun-
terparts v ∗ and S ε , respectively, as the sample size N increases. These results provide
some theoretical justification for the proposed method. When designing an algorithm
for solving stochastic discrete optimization problems, many additional issues have to
be addressed. Some of these issues are discussed in this section.
3.1. Selection of the sample size. In an algorithm, a finite sample size N or
a sequence of finite sample sizes has to be chosen, and the algorithm has to stop after
a finite amount of time. An important question is how these choices should be made.
Estimate (2.23) gives a bound on the sample size required to find an ε-optimal solution
with probability at least 1 − α. This estimate has two shortcomings for computational
purposes. First, for many problems it is not easy to compute the estimate, because
2
σmax and in some problems also |S| may be hard to compute. Second, as demonstrated
in section 4.2, the bound may be far too conservative to obtain a practical estimate
of the required sample size. To choose N , several trade-offs should be taken into
account. With larger N , the objective function of the SAA problem tends to be a
more accurate estimate of the true objective function, an optimal solution of the SAA
problem tends to be a better solution, and the corresponding bounds on the optimality
gap, discussed later, tend to be tighter. However, depending on the SAA problem (2.1)
and the method used for solving the SAA problem, the computational complexity for
solving the SAA problem increases at least linearly, and often exponentially, in the
sample size N . Thus, in the choice of sample size N , the trade-off between the quality
SAMPLE AVERAGE APPROXIMATION 489
of an optimal solution of the SAA problem and the bounds on the optimality gap,
on the one hand, and computational effort, on the other hand, should be taken into
account. Also, the choice of sample size N may be adjusted dynamically, depending
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
plicity of presentation, suppose that each SAA replication produces one candidate
solution, which can be an optimal (ε-optimal) solution of the SAA problem. Let x̂Nm
denote the candidate solution produced by the mth SAA replication (trial). The opti-
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
mality gap g(x̂Nm )−v ∗ can be estimated, as described in the next section. If a stopping
criterion based on the optimality gap estimate is satisfied, then no more replications
are performed. Otherwise, additional SAA replications with the same sample size N
are performed, or the sample size N is increased. The following argument provides a
simple guideline as to whether an additional SAA replication with the same sample
size N is likely to provide a better solution than the best solution found so far.
Note that, by construction, the random variables g(x̂Nm ), m = 1, . . . , are i.i.d.,
and their common probability distribution has a finite support because the set S is
finite. Suppose that M replications with sample size N have been performed so far.
If the probability distribution of g(x̂N ) were continuous, then the probability that the
(M + 1)th SAA replication with the same sample size would produce a better solution
than the best of the solutions produced by the M replications so far would be equal
to 1/(M + 1). Because in fact the distribution of g(x̂N ) is discrete, this probability is
less than or equal to 1/(M + 1). Thus, when 1/(M + 1) becomes sufficiently small,
additional SAA replications with the same sample size are not likely to be worth the
effort, and either the sample size N should be increased or the procedure should be
stopped.
3.3. Performance bounds. To assist in making stopping decisions, as well as
for other performance evaluation purposes, one would like to compute the optimality
gap g(x̂) − v ∗ for a given solution x̂ ∈ S. Unfortunately, the very reason for the
approach described in this paper implies that both terms of the optimality gap are
hard to compute. As before,
N
1
ĝN (x̂) := G(x̂, W j )
N j=1
is an unbiased estimator of g(x̂), and the variance of ĝN (x̂) is estimated by SN2 (x̂)/N ,
where SN2 (x̂) is the sample variance of G(x̂, W j ), based on the sample of size N .
An estimator of v ∗ is given by
M
1 m
v̄N
M
:= v̂ ,
M m=1 N
where v̂Nm denotes the optimal objective value of the mth SAA replication. Note
that E[v̄NM ] = E[v̂N ], and hence the estimator v̄NM has the same negative bias as v̂N .
Proposition 2.4 indicates that this bias tends to be bigger for ill-conditioned problems
with larger sets of optimal, or nearly optimal, solutions. Consider the corresponding
estimator ĝN (x̂) − v̄NM of the optimality gap g(x̂) − v ∗ , at the point x̂. Since
(3.1) E ĝN (x̂) − v̄NM = g(x̂) − E[v̂N ] ≥ g(x̂) − v ∗ ,
it follows that on average the above estimator overestimates the optimality gap g(x̂)−
v ∗ . It is possible to show (Norkin, Pflug, and Ruszczyński [19], and Mak, Morton,
and Wood [15]) that the bias v ∗ − E[v̂N ] is monotonically decreasing in the sample
size N .
SAMPLE AVERAGE APPROXIMATION 491
M M (M − 1) m=1
If the M samples, of size N , and the evaluation sample, of size N , are independent,
then the variance of the optimality gap estimator ĝN (x̂) − v̄NM can be estimated by
SN2 (x̂)/N + SM2 /M .
An estimator of the optimality gap g(x̂) − v ∗ with possibly smaller variance is
ḡN (x̂) − v̄NM , where
M
M
1 m
ḡNM (x̂) := ĝ (x̂)
M m=1 N
and ĝNm (x̂) is the sample average objective value at x̂ of the mth SAA sample of size
N,
N
1
ĝNm (x̂) := G(x̂, W mj ).
N j=1
In the four terms on the right-hand side of the above equation, the first term has
expected value zero; the second term is the true optimality gap; the third term is the
bias term, which has positive expected value decreasing in the sample size N ; and the
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
fourth term is the accuracy term, which is decreasing in the number M of replications
and the sample size N . Thus a disadvantage of these optimality gap estimators is
that the gap estimator may be large if M , N , or N is small, even if x̂ is an optimal
solution, i.e., g(x̂) − v ∗ = 0.
3.4. Postprocessing, screening, and selection. Suppose a decision has been
made to stop, for example when the optimality gap estimator has become small
enough. At this stage the candidate solution x̂ ∈ S with the best value of ĝN (x̂)
can be selected as the chosen solution. However, it may be worthwhile to perform a
more detailed evaluation of the candidate solutions produced during the replications.
There are several statistical screening and selection methods for selecting subsets of
solutions or a single solution, among a (reasonably small) finite set of solutions, using
samples of the objective values of the solutions. Many of these methods are described
in Hochberg and Tamhane [12] and Bechhofer, Santner, and Goldsman [2]. In the
numerical tests described in section 4, a combined procedure was used, as described
in Nelson et al. [17]. During thefirst stage of the combined procedure, a subset S
of the candidate solutions S := x̂N1 , . . . , x̂NM is chosen (called screening) for further
evaluation, based on its sample average values ĝN (x̂Nm ). During the second stage,
sample sizes N ≥ N are determined for more detailed evaluation, based on the
sample variances SN2 (x̂Nm ). Then N − N additional observations are generated, and
the candidate solution x̂ ∈ S with the best value of ĝN (x̂) is selected as the chosen
solution. The combined procedure guarantees that the chosen solution x̂ has objec-
tive value g(x̂) within a specified tolerance δ of the best value minx̂m ∈S g(x̂Nm ) over
N
all candidate solutions x̂Nm with probability at least equal to specified confidence level
1 − α.
3.5. Algorithm. Next we state a proposed algorithm for the type of stochastic
discrete optimization problem studied in this paper.
SAA Algorithm for Stochastic Discrete Optimization.
1. Choose initial sample sizes N and N , a decision rule for determining the
number M of SAA replications (possibly involving a maximum number M of
SAA replications with the same sample size, such that 1/(M +1) is sufficiently
small), a decision rule for increasing the sample sizes N and N if needed,
and tolerance ε.
2. For m = 1, . . . , M , do steps 2.1 through 2.3.
2.1 Generate a sample of size N and solve the SAA problem (2.1) with
objective value v̂Nm and ε-optimal solution x̂Nm .
2.2 Estimate the optimality gap g(x̂Nm ) − v ∗ and the variance of the gap
estimator.
2.3 If the optimality gap and the variance of the gap estimator are suffi-
ciently small, go to step 4.
3. If the optimality gap or the variance of the gap estimator is too large, increase
the sample sizes N and/or N , and return to step 2.
4. Choose the best solution x̂ among all candidate solutions x̂Nm produced, using
a screening and selection procedure. Stop.
4. Numerical tests. In this section we describe an application of the SAA
method to an optimization problem. The purposes of these tests are to investigate
SAMPLE AVERAGE APPROXIMATION 493
the viability of the SAA approach, as well as to study the effects of problem param-
eters, such as the number of decision variables and the condition number κ, on the
performance of the method.
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
where [x]+ := max{x, 0}. This problem can also be described as a knapsack problem,
where a subset of k items has to be chosen, given a knapsack of size q into which to
fit the items. The size Wi of each item i is random, and a per unit penalty of c has
to be paid for exceeding the capacity of the knapsack. For this reason the problem is
called the static stochastic knapsack problem (SSKP).
This problem was chosen for several reasons. First, expected value terms similar
to that in the objective function of (4.1) occur in many interesting stochastic opti-
mization problems. One such example is airline crew scheduling. An airline crew
schedule is made up of crew pairings, where each crew pairing consists of a number of
consecutive days (duties) of flying by a crew. Let {p1 , . . . , pk } denote the set of pair-
ings that can be chosen from. Then a crew schedule can be denoted by the decision
vector x ∈ {0, 1}k , where xi = 1 means that pairing pi is flown. The cost Ci (x) of a
crew pairing pi is given by
Ci (x) = max bd (x), f ti (x), gni ,
d∈pi
where bd (x) denotes the cost of duty d in pairing pi , ti (x) denotes the total time
duration of pairing pi , ni denotes the number of duties in pairing pi , and f and g
are constants determined by contracts. Even ignoring airline recovery actions such as
cancellations and rerouting, bd (x) and ti (x) are random variables. The optimization
problem is then
k
min E[Ci (x)]xi ,
x∈X ⊂{0,1}k
i=1
where X denotes the set of feasible crew schedules. Thus the objective function of
the crew pairing problem can be written in a form similar to that of the objective
function of (4.1).
Another example is a stochastic shortest path problem, where travel times are
random and a penalty is incurred for arriving late at the destination. In this case,
494 A. J. KLEYWEGT, A. SHAPIRO, AND T. HOMEM-DE-MELLO
where bij is the cost of traversing arc (i, j), tij is the time of traversing arc (i, j), q
is the available time to travel to the destination, and c is the penalty per unit time
late. The optimization problem is then
min E[C(x)],
x∈X
where X denotes the set of feasible paths in the network from the specified origin to
the specified destination.
Asecond reason for choosing the SSKP is that objective functions with terms such
k
as E[ i=1 Wi xi −q]+ are interesting for the following reason. For many stochastic op-
timization problems good solutions can be obtained by replacing the random variables
W by their means and then solving the resulting deterministic optimization problem
maxx G(x, E[W ]), called the expected value problem (Birge and Louveaux [3]). It
is easy to see that this may not be the case if the objective contains an expected
value term as in (4.1). For a given solution x, this term may be very large but may
become small if W1 , . . . , Wk are replaced by their means. In such a case, the ob-
tained expected value problem may produce very bad solutions for the corresponding
stochastic optimization problem.
The SSKP was also chosen because it is of interest by itself. One application
is the decision faced by a contractor who can take on several contracts, such as an
electricity supplier who can supply power to several groups of customers or a building
contractor who can bid on several construction projects. The amount of work that will
be required by each contract is unknown at the time the contracting decision has to be
made. The contractor has the capacity to do work at a certain rate at relatively low
cost, for example to generate electricity at a low-cost nuclear power plant. However,
if the amount of work required exceeds the capacity, additional capacity has to be
obtained at high cost, for example additional electricity can be generated at high-cost
oil or natural gas–fired power plants. Norkin, Ermoliev, and Ruszczyński [18] also
give several interesting applications of stochastic discrete optimization problems.
Note that the SAA problem of the SSKP can be formulated as the following
integer linear program:
k c
N
maxx,z i=1 ri xi − N j=1 zj
k j
(4.2) subject to zj ≥ i=1 Wi xi − q, j = 1, . . . , N,
xi ∈ {0, 1}, i = 1, . . . , k,
zj ≥ 0, j = 1, . . . , N.
This problem can be solved with the branch and bound method, using the linear
programming relaxation to provide upper bounds.
4.2. Numerical results. We present results for two sets of instances of the
SSKP. The first set of instances has 10 decision variables, and the second set has
20 decision variables each. For each set we present one instance (called instances
10D and 20D, respectively) that was designed to be hard (large condition number κ),
and one randomly generated instance (called instances 10R and 20R, respectively).
SAMPLE AVERAGE APPROXIMATION 495
Table 4.1
Condition numbers κ, optimal values v ∗ , and values g(x̄) of optimal solutions x̄ of expected
value problems maxx G(x, E[W ]), for instances presented.
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
Table 4.1 shows the condition numbers, the optimal values v ∗ , and the values g(x̄) of
the optimal solutions x̄ of the associated expected value problems maxx G(x, E[W ])
for the four instances.
For all instances of the SSKP, the size variables Wi are independent normally
distributed, for ease of evaluation of the results produced by the SAA method, as
described in the next paragraph. For the randomly generated instances, the rewards
ri were generated from the uniform (10, 20) distribution, the mean sizes µi were
generated from the uniform (20, 30) distribution, and the size standard deviations σi
were generated from the uniform (5, 15) distribution. For all instances, the per unit
penalty c = 4.
If Wi ∼ N (µi , σi2 ), i = 1, . . . , k, are independent normally distributed random
variables, then the objective function of (4.1) can be written in closed form. That
k
is, the random variable Z(x) := i=1 Wi xi − q is normally distributed with mean
k k
µ(x) = i=1 µi xi − q and variance σ(x)2 = i=1 σi2 x2i . It is also easy to show, since
Z(x) ∼ N (µ(x), σ(x)2 ), that
µ(x) σ(x) −µ(x)2
E[Z(x)]+ = µ(x)Φ + √ exp ,
σ(x) 2π 2σ(x)2
The benefit of such a closed form expression is that the objective value g(x) can be
computed quickly and accurately, which is useful for solving small instances of the
problem by enumeration or branch and bound (cf. Cohn and Barnhart [4]) and for
evaluation of solutions produced by the SAA Algorithm. Good numerical approxi-
mations are available for computing Φ(x), such as Applied Statistics Algorithm AS66
(Hill [11]). The SAA Algorithm was executed without the benefit of a closed form
expression for g(x), as would be the case for most probability distributions; (4.3) was
used only to evaluate the solutions produced by the SAA Algorithm.
The first numerical experiment was conducted to observe how the exponential
convergence rate established in Proposition 2.2 applies in the case of the SSKP, and
to investigate how the convergence rate is affected by the number of decision variables
and the condition number κ. Figures 4.1 and 4.2 show the estimated probability that
an SAA optimal solution x̂N has objective value g(x̂N ) within relative tolerance d
of the optimal value v ∗ , i.e., P̂ [v ∗ − g(x̂N ) ≤ d v ∗ ], as a function of the sample size
N , for different values of d. The experiment was conducted by generating M =
1000 independent SAA replications for each sample size N , computing SAA optimal
496 A. J. KLEYWEGT, A. SHAPIRO, AND T. HOMEM-DE-MELLO
1
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
0.9
Fraction of Sample Solutions within delta of Optimal
0.8
d = 0.05
0.7
0.6
d = 0.04
0.5
d = 0.03
0.4
d = 0.02
0.3 d = 0.01
0.2
0.1
d = 0.0
0
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200
Sample Size N
Fig. 4.1. Probability of SAA optimal solution x̂N having objective value g(x̂N ) within relative
tolerance d of the optimal value v ∗ , P̂ [v ∗ −g(x̂N ) ≤ d v ∗ ], as a function of sample size N for different
values of d, for instance 20D.
1
d = 0.05
d = 0.03
0.9
d = 0.04
Fraction of Sample Solutions within delta of Optimal
0.8
d = 0.02
0.7
0.6
0.5 d = 0.01
d = 0.0
0.4
0.3
0.2
0.1
0
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200
Sample Size N
Fig. 4.2. Probability of SAA optimal solution x̂N having objective value g(x̂N ) within relative
tolerance d of the optimal value v ∗ , P̂ [v ∗ −g(x̂N ) ≤ d v ∗ ], as a function of sample size N for different
values of d, for instance 20R.
SAMPLE AVERAGE APPROXIMATION 497
solutions x̂Nm , m = 1, . . . , M , and their objective values g(x̂Nm ) using (4.3), and then
counting the number Md of times that v ∗ − g(x̂Nm ) ≤ d v ∗ . Then the probability was
estimated by P̂ [v ∗ − g(x̂N ) ≤ d v ∗ ] = Md /M , and the variance of this estimator was
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
estimated by
- P̂ ] = Md (1 − Md /M ) .
Var[
M (M − 1)
97
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
96
True Objective Value of SAA Optimal Solution
93
N = 350
92 N = 250
91 N = 150
90
89 N = 50
88
0 20 40 60 80 100 120 140 160 180 200 220 240 260
Replication Number
131
N = 150 N = 350 N = 550 N = 750 N = 950
N = 50 N = 250 N = 450 N = 650 N = 850 N = 1000
130
True Objective Value of SAA Optimal Solution
129
128
127
126
125
124
0 20 40 60 80 100 120 140 160 180 200 220 240
Replication Number
for each of the instances, the expected value problem maxx G(x, E[W ]) was solved,
with its optimal solution denoted by x̄. The objective value g(x̄) of each x̄ is shown in
Table 4.1. It is interesting to note that even with small sample sizes N , every solution
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
respectively; the correlation Cor[v̄- M , ḡ M (x̂)]; and the computation times of the gap
N N
estimates. In each case, the bias v̄NM − v ∗ formed the major part of the optimality
gap estimate; the standard deviations of the gap estimators were small compared
with the bias. There was positive correlation between v̄NM and ḡNM (x̂), and the second
gap estimator had smaller variances, but this benefit is obtained at the expense of
relatively large additional computational effort.
2
In section 2.2, an estimate N ≈ 3σmax log(|S|/α)/(ε − δ)2 of the required sample
size was derived. For the instances presented here, using ε = 0.5, δ = 0, and α = 0.01,
these estimates were of the order of 106 and thus much larger than the sample sizes
that were actually required for the specified accuracy. The sample size estimates
2
using σmax were smaller than the sample size estimates using maxx∈S Var[G(x, W )]
by a factor of approximately 10.
500 A. J. KLEYWEGT, A. SHAPIRO, AND T. HOMEM-DE-MELLO
1.2
Average Optimal SAA Objective Value / True Optimal Value
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
1.18
1.16
1.14
1.12
1.1
1.08
1.06
1.02
Instance 10R
1
0 100 200 300 400 500 600 700 800 900 1000
Sample Size N
Fig. 4.5. Relative bias v̄NM /v ∗ of the optimality gap estimator as a function of the sample size
N , for instances 10D and 10R, with 10 decision variables.
1.1
Average Optimal SAA Objective Value / True Optimal Value
1.09
1.08
1.07
1.06
1.05
1.04
1.03
Instance 20D
1.02
1.01
Instance 20R
1
0 100 200 300 400 500 600 700 800 900 1000
Sample Size N
Fig. 4.6. Relative bias v̄NM /v ∗ of the optimality gap estimate as a function of the sample size
N , for instances 20D and 20R, with 20 decision variables.
SAMPLE AVERAGE APPROXIMATION 501
Table 4.2
Optimality gap estimates v̄NM − ĝN (x̂) and v̄NM − ḡNM (x̂), with their variances and computation
times.
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
Several variance reduction techniques can be used. Compared with simple random
sampling, Latin hypercube sampling reduced the variances by factors varying between
1.02 and 2.9 and increased the computation time by a factor of approximately
k 1.2.
Also, to estimate g(x) for any solution x ∈ S, it is natural to use i=1 Wi xi as a
k k
control variate, because i=1 Wi xi should be correlated with [ i=1 Wi xi − q]+ , and
k
the mean of i=1 Wi xi is easy to compute. Using this control variate reduced the
variances of the estimators of g(x) by factors between 2.0 and 3.0 and increased the
computation time by a factor of approximately 2.0.
5. Conclusion. We proposed a sample average approximation method for solv-
ing stochastic discrete optimization problems, and we studied some theoretical as well
as practical issues important for the performance of this method. It was shown that
the probability that a replication of the SAA method produces an optimal solution
increases at an exponential rate in the sample size N . It was found that this conver-
gence rate depends on the conditioning of the problem, which in turn tends to become
poorer with an increase in the number of decision variables. It was also shown that the
sample size required for a specified accuracy increases proportional to the logarithm
of the number of feasible solutions. It was found that for many instances the SAA
method produces good and often optimal solutions with only a few replications and a
small sample size. However, the optimality gap estimator considered here was in each
case too weak to indicate that a good solution had been found. Consequently the
sample size had to be increased substantially before the optimality gap estimator in-
dicated that the solutions were good. Thus, a more efficient optimality gap estimator
can make a substantial contribution toward improving the performance guarantees of
the SAA method during execution of the algorithm. The SAA method has the advan-
tage of ease of use in combination with existing techniques for solving deterministic
optimization problems.
The proposed method involves solving several replications of the SAA prob-
lem (2.1), and possibly increasing the sample size several times. An important issue is
the behavior of the computational complexity of the SAA problem (2.1) as a function
of the sample size. Current research aims at investigating this behavior for particular
classes of problems.
502 A. J. KLEYWEGT, A. SHAPIRO, AND T. HOMEM-DE-MELLO
REFERENCES
[1] M. H. Alrefaei and S. Andradóttir, A simulated annealing algorithm with constant tem-
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
perature for discrete stochastic optimization, Management Science, 45 (1999), pp. 748–764.
[2] R. E. Bechhofer, T. J. Santner, and D. M. Goldsman, Design and Analysis of Experiments
for Statistical Selection, Screening and Multiple Comparisons, John Wiley, New York, NY,
1995.
[3] J. R. Birge and F. Louveaux, Introduction to Stochastic Programming, Springer Ser. Oper.
Res., Springer-Verlag, New York, NY, 1997.
[4] A. Cohn and C. Barnhart, The stochastic knapsack problem with random weights: A heuris-
tic approach to robust transportation planning, in Proceedings of the Triennial Symposium
on Transportation Analysis (TRISTAN III), San Juan, PR, 1998.
[5] A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications, Springer-Verlag,
New York, NY, 1998.
[6] B. L. Fox and G. W. Heine, Probabilistic search with overrides, Ann. Appl. Probab., 5 (1995),
pp. 1087–1094.
[7] A. Futschik and G. C. Pflug, Confidence sets for discrete stochastic optimization, Ann.
Oper. Res., 56 (1995), pp. 95–108.
[8] A. Futschik and G. C. Pflug, Optimal allocation of simulation experiments in discrete
stochastic optimization and approximative algorithms, European J. Oper. Res., 101 (1997),
pp. 245–260.
[9] S. B. Gelfand and S. K. Mitter, Simulated annealing with noisy or imprecise energy mea-
surements, J. Optim. Theory Appl., 62 (1989), pp. 49–62.
[10] W. Gutjahr and G. C. Pflug, Simulated annealing for noisy cost functions, J. Global Optim.,
8 (1996), pp. 1–13.
[11] I. D. Hill, Algorithm AS 66: The normal integral, Applied Statistics, 22 (1973), pp. 424–427.
[12] Y. Hochberg and A. Tamhane, Multiple Comparison Procedures, John Wiley, New York, NY,
1987.
[13] T. Homem-de-Mello, Variable-Sample Methods and Simulated Annealing for Discrete
Stochastic Optimization, manuscript, Department of Industrial, Welding and Systems En-
gineering, The Ohio State University, Columbus, OH, 1999.
[14] T. Homem-de-Mello, Monte Carlo methods for discrete stochastic optimization, in Stochastic
Optimization: Algorithms and Applications, S. Uryasev and P. M. Pardalos, eds., Kluwer
Academic Publishers, Norwell, MA, 2000, pp. 95–117.
[15] W. K. Mak, D. P. Morton, and R. K. Wood, Monte Carlo bounding techniques for deter-
mining solution quality in stochastic programs, Oper. Res. Lett., 24 (1999), pp. 47–56.
[16] D. P. Morton and R. K. Wood, On a stochastic knapsack problem and generalizations, in
Advances in Computational and Stochastic Optimization, Logic Programming, and Heuris-
tic Search: Interfaces in Computer Science and Operations Research, D. L. Woodruff, ed.,
Kluwer Academic Publishers, Dordrecht, the Netherlands, 1998, pp. 149–168.
[17] B. L. Nelson, J. Swann, D. M. Goldsman, and W. Song, Simple procedures for selecting
the best simulated system when the number of alternatives is large, Oper. Res., to appear.
[18] V. I. Norkin, Y. M. Ermoliev, and A. Ruszczyński, On optimal allocation of indivisibles
under uncertainty, Oper. Res., 46 (1998), pp. 381–395.
[19] V. I. Norkin, G. C. Pflug, and A. Ruszczyński, A branch and bound method for stochastic
global optimization, Math. Programming, 83 (1998), pp. 425–450.
[20] R. Schultz, L. Stougie, and M. H. Van der Vlerk, Solving stochastic programs with integer
recourse by enumeration: A framework using Gröbner basis reductions, Math. Program-
ming, 83 (1998), pp. 229–252.
[21] A. Shapiro, Asymptotic analysis of stochastic programs, Ann. Oper. Res., 30 (1991), pp. 169–
186.
[22] A. Shapiro, T. Homem-de-Mello, and J. C. Kim, Conditioning of Convex Piecewise Linear
Stochastic Programs, manuscript, School of Industrial and Systems Engineering, Georgia
Institute of Technology, Atlanta, GA, 2000.