0% found this document useful (0 votes)

26 views24 pages

The Sample Average Approximation Method For Stochastic Discrete Optimization

Uploaded by

Jianli Shi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views24 pages

The Sample Average Approximation Method For Stochastic Discrete Optimization

Uploaded by

Jianli Shi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

SIAM J. OPTIM.

c 2001 Society for Industrial and Applied Mathematics

Vol. 12, No. 2, pp. 479–502

THE SAMPLE AVERAGE APPROXIMATION METHOD FOR

STOCHASTIC DISCRETE OPTIMIZATION∗
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

ANTON J. KLEYWEGT† , ALEXANDER SHAPIRO† , AND TITO HOMEM-DE-MELLO‡

Abstract. In this paper we study a Monte Carlo simulation–based approach to stochastic

discrete optimization problems. The basic idea of such methods is that a random sample is generated
and the expected value function is approximated by the corresponding sample average function. The
obtained sample average optimization problem is solved, and the procedure is repeated several times
until a stopping criterion is satisﬁed. We discuss convergence rates, stopping rules, and computational
complexity of this procedure and present a numerical example for the stochastic knapsack problem.

Key words. stochastic programming, discrete optimization, Monte Carlo sampling, law of large
numbers, large deviations theory, sample average approximation, stopping rules, stochastic knapsack
problem

AMS subject classiﬁcations. 90C10, 90C15

PII. S1052623499363220

1. Introduction. In this paper we consider optimization problems of the form

(1.1) min {g(x) := EP G(x, W )} .

x∈S

Here W is a random vector having probability distribution P , S is a ﬁnite set (e.g.,

S can be a finite subset of Rn with integer coordinates), G(x, w) is a real valued
function of two (vector) variables x and w, and EP G(x, W ) = G(x, w)P (dw) is the
corresponding expected value. We assume that the expected value function g(x) is well
defined, i.e., for every x ∈ S the function G(x, ·) is measurable and EP {|G(x, W )|} <
∞.
We are particularly interested in problems with the following characteristics:
1. The expected value function g(x) := EP G(x, W ) cannot be written in a closed
form, and/or its values cannot be easily calculated.
2. The function G(x, w) is easily computable for given x and w.
3. The set S of feasible solutions, although finite, is very large, so that enumer-
ation approaches are not feasible. For instance, in the example presented in
section 4, S = {0, 1}k and hence |S| = 2k ; i.e., the size of the feasible set
grows exponentially with the number of variables.
It is well known that many discrete optimization problems are hard to solve.
Another difficulty here is that the objective function g(x) can be complicated and/or
difficult to compute even approximately. Therefore stochastic discrete optimization
problems are difficult indeed and little progress in solving such problems numerically
has been reported so far. There is an extensive literature addressing stochastic discrete
optimization problems in which the number of feasible solutions is sufficiently small to
∗ Received by the editors November 1, 1999; accepted for publication (in revised form) May 14,

2001; published electronically December 14, 2001.

http://www.siam.org/journals/siopt/12-2/36322.html
† School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA

30332-0205 ([email protected], [email protected]). The ﬁrst au-

thor’s work was supported by the National Science Foundation under grant DMI-9875400. The
second author’s work was supported by the National Science Foundation under grant DMS-0073770.
‡ Department of Industrial, Welding and Systems Engineering, The Ohio State University, Colum-

bus, OH 43210-1271 ([email protected]).

479
480 A. J. KLEYWEGT, A. SHAPIRO, AND T. HOMEM-DE-MELLO

allow estimation of g(x) for each solution x. Examples of this literature are Hochberg
and Tamhane [12]; Bechhofer, Santner, and Goldsman [2]; Futschik and Pﬂug [7, 8];
and Nelson et al. [17]. Another approach that has been studied consists of modifying
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

the well-known simulated annealing method in order to account for the fact that the
objective function values are not known exactly. Work on this topic includes Gelfand
and Mitter [9], Alrefaei and Andradóttir [1], Fox and Heine [6], Gutjahr and Pflug [10],
and Homem-de-Mello [13]. A discussion of two-stage stochastic integer programming
problems with recourse can be found in Birge and Louveaux [3]. A branch and
bound approach to solving stochastic integer programming problems was suggested
by Norkin, Ermoliev, and Ruszczyński [18] and Norkin, Pflug, and Ruszczyński [19].
Schultz, Stougie, and Van der Vlerk [20] suggested an algebraic approach to solving
stochastic programs with integer recourse by using a framework of Gröbner basis
reductions.
In this paper we study a Monte Carlo simulation–based approach to stochastic
discrete optimization problems. The basic idea is simple indeed—a random sample of
W is generated and the expected value function is approximated by the corresponding
sample average function. The obtained sample average optimization problem is solved,
and the procedure is repeated several times until a stopping criterion is satisfied. The
idea of using sample average approximations for solving stochastic programs is a
natural one and was used by various authors over the years. Such an approach was
used in the context of a stochastic knapsack problem in a recent paper of Morton and
Wood [16].
The organization of this paper is as follows. In the next section we discuss a
statistical inference of the sample average approximation method. In particular, we
show that with probability approaching 1 exponentially fast with increase of the sam-
ple size, an optimal solution of the sample average approximation problem provides
an exact optimal solution of the “true” problem (1.1). In section 3 we outline an algo-
rithm design for the sample average approximation approach to solving (1.1), and in
particular we discuss various stopping rules. In section 4 we present a numerical ex-
ample of the sample average approximation method applied to a stochastic knapsack
problem, and section 5 gives conclusions.
2. Convergence results. As mentioned in the introduction, we are interested in
solving stochastic discrete optimization problems of the form (1.1). Let W 1 , . . . , W N
be an independently and identically distributed (i.i.d.) random sample of N realiza-
tions of the random vector W . Consider the sample average function
N
1
ĝN (x) := G(x, W j )
N j=1

and the associated problem

(2.1) min ĝN (x).
x∈S

We refer to (1.1) and (2.1) as the “true” (or expected value) and sample average
approximation (SAA) problems, respectively. Note that E[ĝN (x)] = g(x).
Since the feasible set S is ﬁnite, problems (1.1) and (2.1) have nonempty sets of
optimal solutions, denoted S ∗ and ŜN , respectively. Let v ∗ and v̂N denote the optimal
values,
v ∗ := min g(x) and v̂N := min ĝN (x),
x∈S x∈S
SAMPLE AVERAGE APPROXIMATION 481

of the respective problems. We also consider sets of ε-optimal solutions. That is, for
ε ≥ 0, we say that x̄ is an ε-optimal solution of (1.1) if x̄ ∈ S and g(x̄) ≤ v ∗ + ε. The
sets of all ε-optimal solutions of (1.1) and (2.1) are denoted by S ε and ŜNε , respectively.
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Clearly for ε = 0 set S ε coincides with S ∗ , and ŜNε coincides with ŜN .
2.1. Convergence of objective values and solutions. The following propo-
sition establishes convergence with probability one (w.p.1) of the above statistical
estimators. By the statement “an event happens w.p.1 for N large enough” we mean
that for P —almost every realization ω = {W 1 , W 2 , . . .} of the random sequence—
there exists an integer N (ω) such that the considered event happens for all samples
{W 1 , . . . , W n } from ω with n ≥ N (ω). Note that in such a statement the integer
N (ω) depends on the sequence ω of realizations and therefore is random.
Proposition 2.1. The following two properties hold: (i) v̂N → v ∗ w.p.1 as
N → ∞, and (ii) for any ε ≥ 0 the event {ŜNε ⊂ S ε } happens w.p.1 for N large
enough.
Proof. It follows from the (strong) law of large numbers that for any x ∈ S, ĝN (x)
converges to g(x) w.p.1 as N → ∞. Since the set S is ﬁnite and the union of a ﬁnite
number of sets each of measure zero also has measure zero, it follows that, w.p.1,
ĝN (x) converges to g(x) uniformly in x ∈ S. That is,

(2.2) δN := max |ĝN (x) − g(x)| → 0, w.p.1 as N → ∞.

x∈S

Since |v̂N − v ∗ | ≤ δN , it follows that, w.p.1, v̂N → v ∗ as N → ∞.

For a given ε ≥ 0 consider the number

(2.3) ρ(ε) := min ε g(x) − v ∗ − ε.

x∈S\S

Since for any x ∈ S \ S ε it holds that g(x) > v ∗ + ε and the set S is finite, it follows
that ρ(ε) > 0.
Let N be large enough such that δN < ρ(ε)/2. Then v̂N < v ∗ + ρ(ε)/2, and for
any x ∈ S \ S ε it holds that ĝN (x) > v ∗ + ε + ρ(ε)/2. It follows that if x ∈ S \ S ε , then
ĝN (x) > v̂N + ε and hence x does not belong to the set ŜNε . The inclusion ŜNε ⊂ S ε
follows, which completes the proof.
Note that if δ is a number such that 0 ≤ δ ≤ ε, then S δ ⊂ S ε and ŜNδ ⊂ ŜNε .
Consequently it follows by the above proposition that for any δ ∈ [0, ε] the event
{ŜNδ ⊂ S ε } happens w.p.1 for N large enough. It also follows that if S ε = {x∗ } is
a singleton, then ŜNε = {x∗ } w.p.1 for N large enough. In particular, if the true
problem (1.1) has a unique optimal solution x∗ , then w.p.1 for sufficiently large N
the approximating problem (2.1) has a unique optimal solution x̂N and x̂N = x∗ . Also
consider the set A := {g(x) − v ∗ : x ∈ S}. The set A is a subset of the set R+ of
nonnegative numbers and |A| ≤ |S|, and hence A is finite. It follows from the above
analysis that for any ε ∈ R+ \ A the event {ŜNε = S ε } happens w.p.1 for N large
enough.
2.2. Convergence rates. The above results do not say anything about the rates
of convergence of v̂N and ŜNδ to their true counterparts. In this section we investigate
such rates of convergence. By using the theory of large deviations (LD), we show that,
under mild regularity conditions and δ ∈ [0, ε], the probability of the event {ŜNδ ⊂ S ε }
approaches 1 exponentially fast as N → ∞. Next we briefly outline some background
of the LD theory.
482 A. J. KLEYWEGT, A. SHAPIRO, AND T. HOMEM-DE-MELLO

Consider a random (real valued) variable X having mean µ := E[X]. Its moment-
generating function M (t) := E[etX ] is viewed as an extended valued function, i.e., it
can take value +∞. It holds that M (t) > 0 for all t ∈ R, M (0) = 1, and the domain
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

{t : M (t) < +∞} of the moment-generating function is an interval containing zero.

The conjugate function

(2.4) I(z) := sup{tz − Λ(t)},

t∈R

of the logarithmic moment-generating function Λ(t) := log M (t), is called the (LD)
rate function of X. It is possible to show that both functions Λ(·) and I(·) are convex.
Consider an i.i.d.
sequence X1 , . . . , XN of replications of the random variable X,
N
and let ZN := N −1 i=1 Xi be the corresponding sample average. Then for any real
numbers a and t ≥ 0 it holds that P (ZN ≥ a) = P (etZN ≥ eta ), and hence it follows
from Chebyshev’s inequality that

P (ZN ≥ a) ≤ e−ta E etZN = e−ta [M (t/N )]N .

By taking the logarithm of both sides of the above inequality, changing variables
t := t/N , and minimizing over t ≥ 0, it follows for a ≥ µ that
1
(2.5) log [P (ZN ≥ a)] ≤ −I(a).
N
Note that for a ≥ µ it suffices to take the supremum in the definition (2.4) of I(a)
for t ≥ 0, and therefore this constraint is omitted. Inequality (2.5) corresponds to the
upper bound of Cramér’s LD theorem.
The constant I(a) in (2.5) gives, in a sense, the best possible exponential rate at
which the probability P (ZN ≥ a) converges to zero for a > µ. This follows from the
lower bound
1
(2.6) lim inf log [P (ZN ≥ a)] ≥ −I(a)
N →∞ N
of Cramér’s LD theorem. A simple sufficient condition for (2.6) to hold is that the
moment-generating function M (t) is finite valued for all t ∈ R. For a thorough
discussion of the LD theory, the interested reader is referred to Dembo and Zeitouni [5].
The rate function I(z) has the following properties: The function I(z) is convex
and attains its minimum at z = µ, and I(µ) = 0. Moreover, suppose that the moment-
generating function M (t) is finite valued for all t in a neighborhood of t = 0. Then X
has finite moments, and it follows by the dominated convergence theorem that M (t),
and hence the function Λ(t), are infinitely differentiable at t = 0, and Λ (0) = µ.
Consequently for a > µ the derivative of ψ(t) := ta − Λ(t) at t = 0 is greater than
zero, and hence ψ(t) > 0 for t > 0 small enough. In that case it follows that I(a) > 0.
Also, I (µ) = 0 and I (µ) = σ −2 , and hence by Taylor’s expansion

(a − µ)2
(2.7) I(a) = + o(|a − µ|2 ).
2σ 2
Consequently, for a close to µ one can approximate I(a) by (a − µ)2 /(2σ 2 ). Moreover,
for any !˜ > 0 there is a neighborhood N of µ such that

(a − µ)2
(2.8) I(a) ≥ ∀ a ∈ N.
(2 + !˜)σ 2
SAMPLE AVERAGE APPROXIMATION 483

In particular, one can take !˜ = 1.

Now we return to problems (1.1) and (2.1). Consider numbers ε ≥ 0, δ ∈ [0, ε],
and the event {ŜNδ ⊂ S ε }. It holds that
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

(2.9) ŜNδ ⊂ S ε = {ĝN (x) ≤ ĝN (y) + δ} ,
x∈S\S ε y∈S

and hence
 

(2.10) P ŜNδ ⊂ S ε ≤ P {ĝN (x) ≤ ĝN (y) + δ} .
x∈S\S ε y∈S

Consider a mapping u : S \ S ε →
S. It follows from (2.10) that

(2.11) P ŜNδ ⊂ S ε ≤ P ĝN (x) − ĝN (u(x)) ≤ δ .
x∈S\S ε

We assume that the mapping u(x) is chosen in such a way that for some ε∗ > ε

(2.12) g(u(x)) ≤ g(x) − ε∗ for all x ∈ S \ S ε .

Note that if u(·) is a mapping from S \ S ε into the set S ∗ , i.e., u(x) ∈ S ∗ for all
x ∈ S \ S ε , then (2.12) holds with

(2.13) ε∗ := min g(x) − v ∗ ,

x∈S\S ε

and that ε∗ > ε since the set S is ﬁnite. Therefore a mapping u(·) that satisﬁes
condition (2.12) always exists.
For each x ∈ S \ S ε , let

H(x, w) := G(u(x), w) − G(x, w).

Note that E[H(x, W )] = g(u(x))−g(x), and hence E[H(x, W )] ≤ −ε∗ . Let W 1 , . . . , W N

be an i.i.d. random sample of N realizations of the random vector W , and consider
the sample average function
N
1
ĥN (x) := H(x, W j ) = ĝN (u(x)) − ĝN (x).
N j=1

It follows from (2.11) that

(2.14) P ŜNδ ⊂ S ε ≤ P ĥN (x) ≥ −δ .
x∈S\S ε

Let Ix (·) denote the LD rate function of H(x, W ). Inequality (2.14) together with (2.5)
implies that

(2.15) P ŜNδ ⊂ S ε ≤ e−N Ix (−δ) .
x∈S\S ε

It is important to note that the above inequality (2.15) is not asymptotic and is valid
for any random sample of size N .
484 A. J. KLEYWEGT, A. SHAPIRO, AND T. HOMEM-DE-MELLO

Assumption (A). For every x ∈ S the moment-generating function of the random

variable H(x, W ) is ﬁnite valued in a neighborhood of 0.
The above assumption (A) holds, for example, if H(x, W ) is a bounded random
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

variable, or if H(x, ·) grows at most linearly and W has a distribution from the
exponential family.
Proposition 2.2. Let ε and δ be nonnegative numbers such that δ ≤ ε. Then

(2.16) P ŜNδ ⊂ S ε ≤ |S \ S ε |e−N γ(δ,ε) ,

where

(2.17) γ(δ, ε) := min ε Ix (−δ).

x∈S\S

Moreover, if Assumption (A) holds, then γ(δ, ε) > 0.

Proof. Inequality (2.16) is an immediate consequence of inequality (2.15). It
holds that −δ > −ε∗ ≥ E[H(x, W )], and hence it follows by Assumption (A) that
Ix (−δ) > 0 for every x ∈ S \ S ε . This implies that γ(δ, ε) > 0.
The following asymptotic result is an immediate consequence of inequality (2.16),
1
(2.18) lim sup log 1 − P (ŜNδ ⊂ S ε ) ≤ −γ(δ, ε).
N →∞ N

Inequality (2.18) means that the probability of the event {ŜNδ ⊂ S ε } approaches 1
exponentially fast as N → ∞. This suggests that Monte Carlo sampling, combined
with an eﬃcient method for solving the deterministic SAA problem, can eﬃciently
solve the type of problems under study, provided that the constant γ(δ, ε) is not “too
small.”
It follows from (2.7) that
2
(−δ − E[H(x, W )]) (ε∗ − δ)2
(2.19) Ix (−δ) ≈ ≥ ,
2σx2 2σx2

where ε∗ is deﬁned in (2.13) and

σx2 := Var[H(x, W )] = Var[G(u(x), W ) − G(x, W )].

Therefore the constant γ(δ, ε), given in (2.17), can be approximated by

(−δ − E[H(x, W )])2 (ε∗ − δ)2 (ε − δ)2

(2.20) γ(δ, ε) ≈ min ε ≥ > ,
x∈S\S 2σx2 2
2σmax 2
2σmax

where
2
(2.21) σmax := max Var[G(u(x), W ) − G(x, W )].
x∈S\S ε

A result similar to the one of Proposition 2.2 was derived in [14] by using slightly
diﬀerent arguments. The LD rate functions of the random variables G(x, W ) were
used there, which resulted in estimates of the exponential constant similar to the
estimate (2.20) but with σx2 replaced by the variance of G(x, W ). Due to a positive
correlation between G(x, W ) and G(u(x), W ), the variance of G(x, W ) − G(u(x), W )
tends to be smaller than the variance of G(x, W ), thereby providing a smaller upper
SAMPLE AVERAGE APPROXIMATION 485

bound on P (ŜNδ ⊂ S ε ), especially when u(x) is chosen to minimize Var[G(x, W ) −

G(u(x), W )]/[g(x) − g(u(x))]2 . This suggests that the estimate given in (2.20) could
be more accurate than the one obtained in [14].
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

To illustrate some implications of the bound (2.16) for issues of the complexity of
solving stochastic problems, let us ﬁx a signiﬁcance level α ∈ (0, 1), and estimate the
sample size N which is needed for the probability P (ŜNδ ⊂ S ε ) to be at least 1 − α.
By requiring that the right-hand side of (2.16) be less than or equal to α, we obtain
that

1 |S \ S ε |
(2.22) N ≥ log .
γ(δ, ε) α

Moreover, it follows from (2.8) and (2.17) that γ(δ, ε) ≥ (ε − δ)2 /(3σmax
2
) for all ε ≥ 0
suﬃciently small. Therefore it holds that for all ε > 0 small enough and δ ∈ [0, ε), a
suﬃcient condition for (2.22) is that
2

3σmax |S|
(2.23) N ≥ 2
log .
(ε − δ) α

It appears that the bound (2.23) may be too conservative for practical estimates
of the required sample sizes (see the discussion in section 4.2). However, the esti-
mate (2.23) has interesting consequences for complexity issues. A key characteristic
of (2.23) is that N depends only logarithmically both on the size of the feasible set S
and on the tolerance probability α. An important implication of such behavior is the
following. Suppose that (i) the size of the feasible set S grows at most exponentially
2
in the length of the problem input, (ii) the variance σmax grows polynomially in the
length of the problem input, and (iii) the complexity of ﬁnding a δ-optimal solution
for (2.1) grows polynomially in the length of the problem input and the sample size
N . Then a solution can be generated in time that grows polynomially in the length of
the problem input such that, with probability at least 1 − α, the solution is ε-optimal
for (1.1). A careful analysis of these issues is beyond the scope of this paper, and
requires further investigation.
Now suppose for a moment that the true problem has unique optimal solution
x∗ , i.e., S ∗ = {x∗ } is a singleton, and consider the event that the SAA problem (2.1)
has unique optimal solution x̂N and x̂N = x∗ . We denote that event by {x̂N = x∗ }.
Furthermore, consider the mapping u : S \ S ε → {x∗ }, i.e., u(x) ≡ x∗ , and the
corresponding constant γ ∗ := γ(0, 0). That is,

(2.24) γ∗ = min Ix (0),

x∈S\{x∗ }

with Ix (·) being the LD rate function of G(x∗ , W )−G(x, W ). Note that E[G(x∗ , W )−
G(x, W )] = g(x∗ ) − g(x), and hence E[G(x∗ , W ) − G(x, W )] < 0 for every x ∈ S \
{x∗ }. Therefore, if Assumption (A) holds, i.e., the moment-generating function of
G(x∗ , W ) − G(x, W ) is ﬁnite valued in a neighborhood of 0, then γ ∗ > 0.
Proposition 2.3. Suppose that the true problem has unique optimal solution x∗
and the moment-generating function of each random variable G(x∗ , W ) − G(x, W ),
x ∈ S \ {x∗ }, is ﬁnite valued on R. Then
1
(2.25) lim log [1 − P (x̂N = x∗ )] = −γ ∗ .
N →∞ N
486 A. J. KLEYWEGT, A. SHAPIRO, AND T. HOMEM-DE-MELLO

Proof. It follows from (2.18) that

1
(2.26) lim sup log [1 − P (x̂N = x∗ )] ≤ −γ ∗ .
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

N →∞ N

Consider the complement of the event {x̂N = x∗ }, which is denoted {x̂N = x∗ }. The
event {x̂N = x∗ } is equal to the union of the events {ĝN (x) ≤ ĝN (x∗ )}, x ∈ S \ {x∗ }.
Therefore, for any x ∈ S \ {x∗ },

P (x̂N = x∗ ) ≥ P (ĝN (x) ≤ ĝN (x∗ )) .

By using the lower bound (2.6) of Cramér’s LD theorem, it follows that the inequality
1
(2.27) lim inf log [1 − P (x̂N = x∗ )] ≥ −Ix (0)
N →∞ N
holds for every x ∈ S \ {x∗ }. Inequalities (2.26) and (2.27) imply (2.25).
Suppose that S ∗ = {x∗ } and consider the number

Var[G(x, W ) − G(x∗ , W )]
(2.28) κ := max ∗ .
x∈S\{x } [g(x) − g(x∗ )]2

It follows from (2.7) and (2.24) that κ ≈ 1/(2γ ∗ ). One can view κ as a condition
number of the true problem. That is, the sample size required for the event {x̂N = x∗ }
to happen with a given probability is roughly proportional to κ. The number deﬁned
in (2.28) can be viewed as a discrete version of the condition number introduced in [22]
for piecewise linear continuous problems.
For a problem with a large feasible set S, the number minx∈S\{x∗ } g(x) − g(x∗ ),
although positive if S ∗ = {x∗ }, tends to be small. Therefore the sample size required
to calculate the exact optimal solution x∗ with a high probability could be very
large, even if the optimal solution x∗ is unique. For ill-conditioned problems it makes
sense to search for approximate (ε-optimal) solutions of the true problem. In that
respect the bound (2.16) is more informative since the corresponding constant γ(δ, ε)
is guaranteed to be at least of the order (ε − δ)2 /(2σmax2
).
It is also insightful to note the behavior of the condition number
k κ for a discrete
optimization problem with linear objective function G(x, W ) := i=1 Wi xi and fea-
sible set S given by the vertices of the unit hypercube in Rk , i.e., S := {0, 1}k . In
that case the corresponding true optimization problem is
k

min g(x) = w̄i xi ,
x∈{0,1}k
i=1

where w̄i := E[Wi ]. Suppose that w̄i > 0 for all i ∈ {1, . . . , k}, and hence the origin
is the unique optimal solution of the true problem, i.e., S ∗ = {0}. Let

Var[Wi ]
ϑ2i :=
(E[Wi ])2

denote the squared coeﬃcient of variation of Wi , and let

Cov[Wi , Wj ]
ρij :=
Var[Wi ] Var[Wj ]
SAMPLE AVERAGE APPROXIMATION 487

denote the correlation coeﬃcient between Wi and Wj . It follows that for any x ∈
{0, 1}k \ {0},

Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

k k k
Var W x
j=1 ρij ϑi w̄i xi ϑj w̄j xj
i=1 i i
i=1
2 = k k ≤ max ϑ2i .
k w̄ x w̄ x i∈{1,...,k}
i=1 w̄i xi
i=1 j=1 i i j j

Thus

k
Var i=1 Wi xi
κ = max 2 = max ϑ2i .
x∈{0,1}k \{0} k i∈{1,...,k}
w̄
i=1 i ix

The last equality follows because the maximum is attained by setting xi = 1 for the
index i for which Wi has the maximum squared coeﬃcient of variation ϑ2i , and setting
xj = 0 for the remaining variables. Thus, in this example the condition number κ is
equal to the maximum squared coeﬃcient of variation of the Wi ’s.
2.3. Asymptotics of sample objective values. Next we discuss the asymp-
totics of the SAA optimal objective value v̂N . For any subset S of S the inequal-
ity v̂N ≤ minx∈S ĝN (x) holds. In particular, by taking S = S ∗ , it follows that
v̂N ≤ minx∈S ∗ ĝN (x), and hence

E[v̂N ] ≤ E min∗ ĝN (x) ≤ min∗ E[ĝN (x)] = v ∗ .
x∈S x∈S

That is, the estimator v̂N has a negative bias (cf. Norkin, Pﬂug, and Ruszczyński [19]
and Mak, Morton, and Wood [15]).
It follows from Proposition 2.1 that w.p.1, for N suﬃciently large, the set ŜN of
optimal solutions of the SAA problem is included in S ∗ . In that case it holds that

v̂N = min ĝN (x) ≥ min∗ ĝN (x).

x∈ŜN x∈S

Since the opposite inequality always holds, it follows that, w.p.1, v̂N√−minx∈S ∗ ĝN (x) =
0 for N√large enough. Multiplying both sides of this equation by N it follows that,
w.p.1, N [v̂N − minx∈S ∗ ĝN (x)] = 0 for N large enough, and hence

√
(2.29) lim N v̂N − min∗ ĝN (x) = 0 w.p.1.
N →∞ x∈S

Since
√ convergence w.p.1 implies convergence in probability, it follows from (2.29) that
N [v̂N − minx∈S ∗ ĝN (x)] converges in probability to zero, i.e.,

v̂N = min∗ ĝN (x) + op (N −1/2 ).

x∈S

Furthermore, since v ∗ = g(x) for any x ∈ S ∗ , it follows that

√
√ √
N min∗ ĝN (x) − v ∗ = N min∗ [ĝN (x) − v ∗ ] = min∗ N [ĝN (x) − g(x)] .
x∈S x∈S x∈S

Suppose that for every x ∈ S the variance

(2.30) σ 2 (x) := Var[G(x, W )]

488 A. J. KLEYWEGT, A. SHAPIRO, AND T. HOMEM-DE-MELLO

exists.
√ Then it follows by the central limit theorem (CLT) that, for any x ∈ S,
N [ĝN (x) − g(x)] converges in distribution to a normally distributed variable Z(x)
with zero mean and variance σ 2 (x). Moreover, again by the CLT, random variables
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Z(x) have the same covariance function as G(x, W ), i.e., the covariance between
Z(x) and Z(x ) is equal to the covariance between G(x, W ) and G(x , W ) for any
x, x ∈ S. Hence the following result is obtained (it is similar to an asymptotic result
for stochastic programs with continuous decision variables which was derived in [21]).
We use “⇒” to denote convergence in distribution.
Proposition 2.4. Suppose that variances σ 2 (x), deﬁned in (2.30), exist for every
x ∈ S ∗ . Then
√
(2.31) N (v̂N − v ∗ ) ⇒ min∗ Z(x),
x∈S

where Z(x) are normally distributed random variables with zero mean and the co-
variance function given by the corresponding covariance function of G(x, W ). In
particular, if S ∗ = {x∗ } is a singleton, then
√
(2.32) N (v̂N − v ∗ ) ⇒ N (0, σ 2 (x∗ )).

Although for any given x the mean (expected value) of Z(x) is zero, the expected
value of the minimum of Z(x) over a subset S of S can be negative and tends to be
smaller for a larger set S . Therefore, it follows from (2.31) that for ill-conditioned
problems, where the set of optimal or nearly optimal solutions is large, the estimate
v̂N of v ∗ tends to be heavily biased. Note that convergence in distribution does not
necessarily imply convergence of the √ corresponding means. Under mild additional
conditions it follows from (2.31) that N [E(v̂N ) − v ∗ ] → E[minx∈S ∗ Z(x)].
3. Algorithm design. In the previous section we established a number of con-
vergence results for the SAA method. The results describe how the optimal value v̂N
and the set ŜNε of ε-optimal solutions of the SAA problem converge to their true coun-
terparts v ∗ and S ε , respectively, as the sample size N increases. These results provide
some theoretical justification for the proposed method. When designing an algorithm
for solving stochastic discrete optimization problems, many additional issues have to
be addressed. Some of these issues are discussed in this section.
3.1. Selection of the sample size. In an algorithm, a finite sample size N or
a sequence of finite sample sizes has to be chosen, and the algorithm has to stop after
a finite amount of time. An important question is how these choices should be made.
Estimate (2.23) gives a bound on the sample size required to find an ε-optimal solution
with probability at least 1 − α. This estimate has two shortcomings for computational
purposes. First, for many problems it is not easy to compute the estimate, because
2
σmax and in some problems also |S| may be hard to compute. Second, as demonstrated
in section 4.2, the bound may be far too conservative to obtain a practical estimate
of the required sample size. To choose N , several trade-offs should be taken into
account. With larger N , the objective function of the SAA problem tends to be a
more accurate estimate of the true objective function, an optimal solution of the SAA
problem tends to be a better solution, and the corresponding bounds on the optimality
gap, discussed later, tend to be tighter. However, depending on the SAA problem (2.1)
and the method used for solving the SAA problem, the computational complexity for
solving the SAA problem increases at least linearly, and often exponentially, in the
sample size N . Thus, in the choice of sample size N , the trade-off between the quality
SAMPLE AVERAGE APPROXIMATION 489

of an optimal solution of the SAA problem and the bounds on the optimality gap,
on the one hand, and computational eﬀort, on the other hand, should be taken into
account. Also, the choice of sample size N may be adjusted dynamically, depending
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

on the results of preliminary computations. This issue is addressed in more detail

later.
Typically, estimating the objective value g(x) of a feasible solution x ∈ S by the
sample average ĝN (x) requires much less computational effort than solving the SAA
problem (for the same sample size N ). Thus, although computational complexity
considerations motivate one to choose a relatively small sample size N for the SAA
problem, it makes sense to choose a larger sample size N to obtain an accurate
estimate ĝN (x̂N ) of the objective value g(x̂N ) of an optimal solution x̂N of the SAA
problem. A measure of the accuracy of a sample average estimate ĝN (x̂N ) of g(x̂N )
is given by the corresponding sample variance SN2 (x̂N )/N , which can be calculated
from the same sample of size N . Again the choice of N involves a trade-off between
computational effort and accuracy, measured by SN2 (x̂N )/N .
3.2. Replication. If the computational complexity of solving the SAA problem
increases faster than linearly in the sample size N , it may be more efficient to choose
a smaller sample size N and to generate and solve several SAA problems with i.i.d.
samples, that is, to replicate generating and solving SAA problems.
With such an approach, several issues have to be addressed. One question is
whether there is a guarantee that an optimal (or ε-optimal) solution for the true
problem will be produced if a sufficient number of SAA problems, based on indepen-
dent samples of size N , are solved. One can view such a procedure as Bernoulli trials
with probability of success p = p(N ). Here “success” means that a calculated optimal
solution x̂N of the SAA problem is an optimal solution of the true problem. It follows
from Proposition 2.1 that this probability p tends to 1 as N → ∞, and, moreover, by
Proposition 2.2 it tends to 1 exponentially fast if Assumption (A) holds. However, for
a finite N the probability p can be small or even zero. The probability of producing
an optimal solution of the true problem at least once in M trials is 1 − (1 − p)M ,
and this probability tends to one as M → ∞, provided p is positive. Thus a relevant
question is whether there is a guarantee that p is positive for a given sample size N .
The following example shows that the sample size N required for p to be positive is
problem-specific, cannot be bounded by a function that depends only on the number
of feasible solutions, and can be arbitrarily large.
Example. Suppose that S := {−1, 0, 1}, that W can take two values w1 and w2
with respective probabilities 1 − γ and γ, and that G(−1, w1 ) := −1, G(0, w1 ) := 0,
G(1, w1 ) := 2, and G(−1, w2 ) := 2k, G(0, w2 ) := 0, G(1, w2 ) := −k, where k is
an arbitrary positive number. Let γ = 1/(k + 1). Then g(x) = (1 − γ)G(x, w1 ) +
γG(x, w2 ), and thus g(−1) = k/(k + 1), g(0) = 0, and g(1) = k/(k + 1). Therefore
x∗ = 0 is the unique optimal solution of the true problem. If the sample does not
contain any observations w2 , then x̂N = −1 = x∗ . Suppose the sample contains at
least one observation w2 . Then ĝN (1) ≤ [2(N − 1) − k] /N . Thus ĝN (1) < 0 = ĝN (0)
if N ≤ k/2, and x̂N = 1 = x∗ . Thus a sample of size N > k/2 at least is required,
in order for x∗ = 0 to be an optimal solution of the SAA problem. (Note that
Var[G(−1, W ) − G(0, W )] and Var[G(1, W ) − G(0, W )] are Θ(k), which causes the
problem to become harder as k increases.)
Another issue that has to be addressed is the choice of the number M of replica-
tions. In a manner similar to the choice of sample size N , the number M of replications
may be chosen dynamically. One approach to doing this is discussed next. For sim-
490 A. J. KLEYWEGT, A. SHAPIRO, AND T. HOMEM-DE-MELLO

plicity of presentation, suppose that each SAA replication produces one candidate
solution, which can be an optimal (ε-optimal) solution of the SAA problem. Let x̂Nm
denote the candidate solution produced by the mth SAA replication (trial). The opti-
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

mality gap g(x̂Nm )−v ∗ can be estimated, as described in the next section. If a stopping
criterion based on the optimality gap estimate is satisfied, then no more replications
are performed. Otherwise, additional SAA replications with the same sample size N
are performed, or the sample size N is increased. The following argument provides a
simple guideline as to whether an additional SAA replication with the same sample
size N is likely to provide a better solution than the best solution found so far.
Note that, by construction, the random variables g(x̂Nm ), m = 1, . . . , are i.i.d.,
and their common probability distribution has a finite support because the set S is
finite. Suppose that M replications with sample size N have been performed so far.
If the probability distribution of g(x̂N ) were continuous, then the probability that the
(M + 1)th SAA replication with the same sample size would produce a better solution
than the best of the solutions produced by the M replications so far would be equal
to 1/(M + 1). Because in fact the distribution of g(x̂N ) is discrete, this probability is
less than or equal to 1/(M + 1). Thus, when 1/(M + 1) becomes sufficiently small,
additional SAA replications with the same sample size are not likely to be worth the
effort, and either the sample size N should be increased or the procedure should be
stopped.
3.3. Performance bounds. To assist in making stopping decisions, as well as
for other performance evaluation purposes, one would like to compute the optimality
gap g(x̂) − v ∗ for a given solution x̂ ∈ S. Unfortunately, the very reason for the
approach described in this paper implies that both terms of the optimality gap are
hard to compute. As before,

N
1
ĝN (x̂) := G(x̂, W j )
N j=1

is an unbiased estimator of g(x̂), and the variance of ĝN (x̂) is estimated by SN2 (x̂)/N ,
where SN2 (x̂) is the sample variance of G(x̂, W j ), based on the sample of size N .
An estimator of v ∗ is given by

M
1 m
v̄N
M
:= v̂ ,
M m=1 N

where v̂Nm denotes the optimal objective value of the mth SAA replication. Note
that E[v̄NM ] = E[v̂N ], and hence the estimator v̄NM has the same negative bias as v̂N .
Proposition 2.4 indicates that this bias tends to be bigger for ill-conditioned problems
with larger sets of optimal, or nearly optimal, solutions. Consider the corresponding
estimator ĝN (x̂) − v̄NM of the optimality gap g(x̂) − v ∗ , at the point x̂. Since

(3.1) E ĝN (x̂) − v̄NM = g(x̂) − E[v̂N ] ≥ g(x̂) − v ∗ ,

it follows that on average the above estimator overestimates the optimality gap g(x̂)−
v ∗ . It is possible to show (Norkin, Pﬂug, and Ruszczyński [19], and Mak, Morton,
and Wood [15]) that the bias v ∗ − E[v̂N ] is monotonically decreasing in the sample
size N .
SAMPLE AVERAGE APPROXIMATION 491

The variance of v̄NM is estimated by

M
SM2 1 m 2
(3.2) = v̂N − v̄NM .
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

M M (M − 1) m=1
If the M samples, of size N , and the evaluation sample, of size N , are independent,
then the variance of the optimality gap estimator ĝN (x̂) − v̄NM can be estimated by
SN2 (x̂)/N + SM2 /M .
An estimator of the optimality gap g(x̂) − v ∗ with possibly smaller variance is
ḡN (x̂) − v̄NM , where
M

M
1 m
ḡNM (x̂) := ĝ (x̂)
M m=1 N
and ĝNm (x̂) is the sample average objective value at x̂ of the mth SAA sample of size
N,
N
1
ĝNm (x̂) := G(x̂, W mj ).
N j=1

The variance of ḡNM (x̂) − v̄NM is estimated by

M
S̄M2 1 m 2
= ĝN (x̂) − v̂Nm − ḡNM (x̂) − v̄NM .
M M (M − 1) m=1
Which estimator of the optimality gap has the least variance depends on the cor-
relation between ĝNm (x̂) and v̂Nm , as well as on the sample sizes N , N , and M . For
many applications, one would expect positive correlation between ĝNm (x̂) and v̂Nm . The
additional computational eﬀort to compute ĝNm (x̂) for m = 1, . . . , M should also be
taken into account when evaluating any such variance reduction. Either way, the
CLT can be applied to the optimality gap estimators ĝN (x̂) − v̄NM and ḡNM (x̂) − v̄NM ,
so that the accuracy of an optimality gap estimator can be taken into account by
adding a multiple zα of its estimated standard deviation to the gap estimator. Here
zα := Φ−1 (1 − α), where Φ(z) is the cumulative distribution function of the standard
normal distribution. For example, if x̂ ∈ S denotes the candidate solution with the
best value of ĝN (x̂) found after M replications, then an optimality gap estimator
taking accuracy into account is given by either
1/2
SN2 (x̂) SM2
ĝN (x̂) − v̄N + zα
M
+
N M
or
S̄
ḡNM (x̂) − v̄NM + zα √M .
M
For algorithm control, it is useful to separate an optimality gap estimator into its
components. For example,
1/2
SN2 (x̂) SM2
ĝN (x̂) − v̄N + zα
M
+
N M
(3.3) 1/2
∗
∗ SN2 (x̂) SM2
= ĝN (x̂) − g(x̂) + (g(x̂) − v ) + v − v̄N + zα M
+ .
N M
492 A. J. KLEYWEGT, A. SHAPIRO, AND T. HOMEM-DE-MELLO

In the four terms on the right-hand side of the above equation, the ﬁrst term has
expected value zero; the second term is the true optimality gap; the third term is the
bias term, which has positive expected value decreasing in the sample size N ; and the
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

fourth term is the accuracy term, which is decreasing in the number M of replications
and the sample size N . Thus a disadvantage of these optimality gap estimators is
that the gap estimator may be large if M , N , or N is small, even if x̂ is an optimal
solution, i.e., g(x̂) − v ∗ = 0.
3.4. Postprocessing, screening, and selection. Suppose a decision has been
made to stop, for example when the optimality gap estimator has become small
enough. At this stage the candidate solution x̂ ∈ S with the best value of ĝN (x̂)
can be selected as the chosen solution. However, it may be worthwhile to perform a
more detailed evaluation of the candidate solutions produced during the replications.
There are several statistical screening and selection methods for selecting subsets of
solutions or a single solution, among a (reasonably small) finite set of solutions, using
samples of the objective values of the solutions. Many of these methods are described
in Hochberg and Tamhane [12] and Bechhofer, Santner, and Goldsman [2]. In the
numerical tests described in section 4, a combined procedure was used, as described
in Nelson et al. [17]. During thefirst stage of the combined procedure, a subset S
of the candidate solutions S := x̂N1 , . . . , x̂NM is chosen (called screening) for further
evaluation, based on its sample average values ĝN (x̂Nm ). During the second stage,
sample sizes N ≥ N are determined for more detailed evaluation, based on the
sample variances SN2 (x̂Nm ). Then N − N additional observations are generated, and
the candidate solution x̂ ∈ S with the best value of ĝN (x̂) is selected as the chosen
solution. The combined procedure guarantees that the chosen solution x̂ has objec-
tive value g(x̂) within a specified tolerance δ of the best value minx̂m ∈S g(x̂Nm ) over
N
all candidate solutions x̂Nm with probability at least equal to specified confidence level
1 − α.
3.5. Algorithm. Next we state a proposed algorithm for the type of stochastic
discrete optimization problem studied in this paper.
SAA Algorithm for Stochastic Discrete Optimization.
1. Choose initial sample sizes N and N , a decision rule for determining the
number M of SAA replications (possibly involving a maximum number M of
SAA replications with the same sample size, such that 1/(M +1) is sufficiently
small), a decision rule for increasing the sample sizes N and N if needed,
and tolerance ε.
2. For m = 1, . . . , M , do steps 2.1 through 2.3.
2.1 Generate a sample of size N and solve the SAA problem (2.1) with
objective value v̂Nm and ε-optimal solution x̂Nm .
2.2 Estimate the optimality gap g(x̂Nm ) − v ∗ and the variance of the gap
estimator.
2.3 If the optimality gap and the variance of the gap estimator are suffi-
ciently small, go to step 4.
3. If the optimality gap or the variance of the gap estimator is too large, increase
the sample sizes N and/or N , and return to step 2.
4. Choose the best solution x̂ among all candidate solutions x̂Nm produced, using
a screening and selection procedure. Stop.
4. Numerical tests. In this section we describe an application of the SAA
method to an optimization problem. The purposes of these tests are to investigate
SAMPLE AVERAGE APPROXIMATION 493

the viability of the SAA approach, as well as to study the eﬀects of problem param-
eters, such as the number of decision variables and the condition number κ, on the
performance of the method.
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

4.1. Resource allocation problem. We apply the method to the following

resource allocation problem. A decision maker has to choose a subset of k known
alternative projects to take on. For this purpose a known quantity q of relatively low-
cost resource is available to be allocated. Any additional amount of resource required
can be obtained at a known incremental cost of c per unit of resource. The amount
Wi of resource required by each project i is not known at the time the decision has
to be made, but we assume that the decision maker has an estimate of the proba-
bility distribution of W = (W1 , . . . , Wk ). Each project i has an expected net reward
(expected revenue minus expected resource use times the lower cost) of ri . Thus the
optimization problem can be formulated as follows:
 $ k %+ 
 k 
(4.1) max ri xi − c E Wi xi − q ,
x∈{0,1}k  
i=1 i=1

where [x]+ := max{x, 0}. This problem can also be described as a knapsack problem,
where a subset of k items has to be chosen, given a knapsack of size q into which to
fit the items. The size Wi of each item i is random, and a per unit penalty of c has
to be paid for exceeding the capacity of the knapsack. For this reason the problem is
called the static stochastic knapsack problem (SSKP).
This problem was chosen for several reasons. First, expected value terms similar
to that in the objective function of (4.1) occur in many interesting stochastic opti-
mization problems. One such example is airline crew scheduling. An airline crew
schedule is made up of crew pairings, where each crew pairing consists of a number of
consecutive days (duties) of flying by a crew. Let {p1 , . . . , pk } denote the set of pair-
ings that can be chosen from. Then a crew schedule can be denoted by the decision
vector x ∈ {0, 1}k , where xi = 1 means that pairing pi is flown. The cost Ci (x) of a
crew pairing pi is given by
 
 
Ci (x) = max bd (x), f ti (x), gni ,
 
d∈pi

where bd (x) denotes the cost of duty d in pairing pi , ti (x) denotes the total time
duration of pairing pi , ni denotes the number of duties in pairing pi , and f and g
are constants determined by contracts. Even ignoring airline recovery actions such as
cancellations and rerouting, bd (x) and ti (x) are random variables. The optimization
problem is then
k

min E[Ci (x)]xi ,
x∈X ⊂{0,1}k
i=1

where X denotes the set of feasible crew schedules. Thus the objective function of
the crew pairing problem can be written in a form similar to that of the objective
function of (4.1).
Another example is a stochastic shortest path problem, where travel times are
random and a penalty is incurred for arriving late at the destination. In this case,
494 A. J. KLEYWEGT, A. SHAPIRO, AND T. HOMEM-DE-MELLO

the cost C(x) of a path x is given by

 +

Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

C(x) = bij + c  tij − q  ,

(i,j)∈x (i,j)∈x

where bij is the cost of traversing arc (i, j), tij is the time of traversing arc (i, j), q
is the available time to travel to the destination, and c is the penalty per unit time
late. The optimization problem is then

min E[C(x)],
x∈X

where X denotes the set of feasible paths in the network from the specified origin to
the specified destination.
Asecond reason for choosing the SSKP is that objective functions with terms such
k
as E[ i=1 Wi xi −q]+ are interesting for the following reason. For many stochastic op-
timization problems good solutions can be obtained by replacing the random variables
W by their means and then solving the resulting deterministic optimization problem
maxx G(x, E[W ]), called the expected value problem (Birge and Louveaux [3]). It
is easy to see that this may not be the case if the objective contains an expected
value term as in (4.1). For a given solution x, this term may be very large but may
become small if W1 , . . . , Wk are replaced by their means. In such a case, the ob-
tained expected value problem may produce very bad solutions for the corresponding
stochastic optimization problem.
The SSKP was also chosen because it is of interest by itself. One application
is the decision faced by a contractor who can take on several contracts, such as an
electricity supplier who can supply power to several groups of customers or a building
contractor who can bid on several construction projects. The amount of work that will
be required by each contract is unknown at the time the contracting decision has to be
made. The contractor has the capacity to do work at a certain rate at relatively low
cost, for example to generate electricity at a low-cost nuclear power plant. However,
if the amount of work required exceeds the capacity, additional capacity has to be
obtained at high cost, for example additional electricity can be generated at high-cost
oil or natural gas–fired power plants. Norkin, Ermoliev, and Ruszczyński [18] also
give several interesting applications of stochastic discrete optimization problems.
Note that the SAA problem of the SSKP can be formulated as the following
integer linear program:
k c
N
maxx,z i=1 ri xi − N j=1 zj
k j
(4.2) subject to zj ≥ i=1 Wi xi − q, j = 1, . . . , N,
xi ∈ {0, 1}, i = 1, . . . , k,
zj ≥ 0, j = 1, . . . , N.

This problem can be solved with the branch and bound method, using the linear
programming relaxation to provide upper bounds.
4.2. Numerical results. We present results for two sets of instances of the
SSKP. The ﬁrst set of instances has 10 decision variables, and the second set has
20 decision variables each. For each set we present one instance (called instances
10D and 20D, respectively) that was designed to be hard (large condition number κ),
and one randomly generated instance (called instances 10R and 20R, respectively).
SAMPLE AVERAGE APPROXIMATION 495
Table 4.1
Condition numbers κ, optimal values v ∗ , and values g(x̄) of optimal solutions x̄ of expected
value problems maxx G(x, E[W ]), for instances presented.
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Instance Condition number κ Optimal value v∗ Expected value g(x̄)

10D 107000 42.7 26.2
10R 410 46.3 28.2
20D 954000 96.5 75.9
20R 233 130.3 109.0

Table 4.1 shows the condition numbers, the optimal values v ∗ , and the values g(x̄) of
the optimal solutions x̄ of the associated expected value problems maxx G(x, E[W ])
for the four instances.
For all instances of the SSKP, the size variables Wi are independent normally
distributed, for ease of evaluation of the results produced by the SAA method, as
described in the next paragraph. For the randomly generated instances, the rewards
ri were generated from the uniform (10, 20) distribution, the mean sizes µi were
generated from the uniform (20, 30) distribution, and the size standard deviations σi
were generated from the uniform (5, 15) distribution. For all instances, the per unit
penalty c = 4.
If Wi ∼ N (µi , σi2 ), i = 1, . . . , k, are independent normally distributed random
variables, then the objective function of (4.1) can be written in closed form. That
k
is, the random variable Z(x) := i=1 Wi xi − q is normally distributed with mean
k k
µ(x) = i=1 µi xi − q and variance σ(x)2 = i=1 σi2 x2i . It is also easy to show, since
Z(x) ∼ N (µ(x), σ(x)2 ), that

µ(x) σ(x) −µ(x)2
E[Z(x)]+ = µ(x)Φ + √ exp ,
σ(x) 2π 2σ(x)2

where Φ denotes the standard normal cumulative distribution function. Thus, it

follows that
k

µ(x) σ(x) µ(x)2
(4.3) g(x) = ri xi − c µ(x)Φ + √ exp − .
i=1
σ(x) 2π 2σ(x)2

The benefit of such a closed form expression is that the objective value g(x) can be
computed quickly and accurately, which is useful for solving small instances of the
problem by enumeration or branch and bound (cf. Cohn and Barnhart [4]) and for
evaluation of solutions produced by the SAA Algorithm. Good numerical approxi-
mations are available for computing Φ(x), such as Applied Statistics Algorithm AS66
(Hill [11]). The SAA Algorithm was executed without the benefit of a closed form
expression for g(x), as would be the case for most probability distributions; (4.3) was
used only to evaluate the solutions produced by the SAA Algorithm.
The first numerical experiment was conducted to observe how the exponential
convergence rate established in Proposition 2.2 applies in the case of the SSKP, and
to investigate how the convergence rate is affected by the number of decision variables
and the condition number κ. Figures 4.1 and 4.2 show the estimated probability that
an SAA optimal solution x̂N has objective value g(x̂N ) within relative tolerance d
of the optimal value v ∗ , i.e., P̂ [v ∗ − g(x̂N ) ≤ d v ∗ ], as a function of the sample size
N , for different values of d. The experiment was conducted by generating M =
1000 independent SAA replications for each sample size N , computing SAA optimal
496 A. J. KLEYWEGT, A. SHAPIRO, AND T. HOMEM-DE-MELLO

1
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

0.9
Fraction of Sample Solutions within delta of Optimal

0.8
d = 0.05

0.7

0.6
d = 0.04
0.5
d = 0.03
0.4
d = 0.02
0.3 d = 0.01

0.2

0.1
d = 0.0
0
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200
Sample Size N

Fig. 4.1. Probability of SAA optimal solution x̂N having objective value g(x̂N ) within relative
tolerance d of the optimal value v ∗ , P̂ [v ∗ −g(x̂N ) ≤ d v ∗ ], as a function of sample size N for diﬀerent
values of d, for instance 20D.

1
d = 0.05
d = 0.03
0.9
d = 0.04
Fraction of Sample Solutions within delta of Optimal

0.8
d = 0.02

0.7

0.6

0.5 d = 0.01
d = 0.0
0.4

0.3

0.2

0.1

0
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200
Sample Size N

Fig. 4.2. Probability of SAA optimal solution x̂N having objective value g(x̂N ) within relative
tolerance d of the optimal value v ∗ , P̂ [v ∗ −g(x̂N ) ≤ d v ∗ ], as a function of sample size N for diﬀerent
values of d, for instance 20R.
SAMPLE AVERAGE APPROXIMATION 497

solutions x̂Nm , m = 1, . . . , M , and their objective values g(x̂Nm ) using (4.3), and then
counting the number Md of times that v ∗ − g(x̂Nm ) ≤ d v ∗ . Then the probability was
estimated by P̂ [v ∗ − g(x̂N ) ≤ d v ∗ ] = Md /M , and the variance of this estimator was
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

estimated by

- P̂ ] = Md (1 − Md /M ) .
Var[
M (M − 1)

- P̂ ])1/2 on each side of the point

The figures also show error bars of length 2(Var[
estimate Md /M .
One noticeable effect is that the probability that an SAA replication generates
an optimal solution (d = 0) increases much more slowly with increase in the sample
size N for the harder instances (10D and 20D) with poor condition numbers κ than
for the randomly generated instances with better condition numbers. However, the
probability that an SAA replication generates a reasonably good solution (e.g., d =
0.05) increases quite quickly with increase in the sample size N for both the harder
instances and for the randomly generated instances.
The second numerical experiment demonstrates how the objective values g(x̂Nm )
of SAA optimal solutions x̂Nm change as the sample size N increases, and how this
change is affected by the number of decision variables and the condition number κ.
In this experiment, the maximum number of SAA replications with the same sample
size N was chosen as M = 50. Additionally, after M = 20 replications with the
same sample size N , the variance SM2 of v̂Nm was computed as in (3.2), because it
is an important term in the optimality gap estimator (3.3). If SM2 was too large, it
indicated that the optimality gap estimate would be too large and that the sample size
N should be increased. Otherwise, if SM2 was not too large, then SAA replications
were performed with the same sample size N until M SAA replications had occurred.
If the optimality gap estimate was greater than a specified tolerance, then the sample
size N was increased and the procedure was repeated. Otherwise, if the optimality gap
estimate was less than a specified tolerance, then a screening and selection procedure
was applied to all the candidate solutions x̂Nm generated, and the best solution among
these was chosen.
Figures 4.3 and 4.4 show the objective values g(x̂Nm ) of SAA optimal solutions x̂Nm
produced during the course of the algorithm. There were several noticeable effects.
First, good and often optimal solutions were produced early in the execution of the al-
gorithm, but the sample size N had to be increased several times thereafter before the
optimality gap estimate became sufficiently small for stopping, without any improve-
ment in the quality of the generated solutions. Second, for the randomly generated
instances a larger proportion of the SAA optimal solutions x̂Nm were optimal or had
objective values close to optimal, and optimal solutions were produced with smaller
sample sizes N than were required for the harder instances. For example, for the
harder instance with 10 decision variables (instance 10D), the optimal solution was
first produced after m = 6 replications with sample size N = 120; and for instance
10R, the optimal solution was first produced after m = 2 replications with sample size
N = 20. Also, for the harder instance with 20 decision variables (instance 20D), the
optimal solution was not produced in any of the 270 total number of replications (but
the second-best solution was produced 3 times); and for instance 20R, the optimal
solution was first produced after m = 15 replications with sample size N = 50. Third,
498 A. J. KLEYWEGT, A. SHAPIRO, AND T. HOMEM-DE-MELLO

97
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

96
True Objective Value of SAA Optimal Solution

95 N = 450 N = 600 N = 800 N = 950

N = 550 N = 700 N = 850 N = 1000
94

93
N = 350
92 N = 250

91 N = 150

89 N = 50

88
0 20 40 60 80 100 120 140 160 180 200 220 240 260
Replication Number

m ) of SAA optimal solutions x̂m as the sample

Fig. 4.3. Improvement of objective values g(x̂N N
size N increases, for instance 20D.

131
N = 150 N = 350 N = 550 N = 750 N = 950
N = 50 N = 250 N = 450 N = 650 N = 850 N = 1000
130
True Objective Value of SAA Optimal Solution

129

128

127

126

125

124
0 20 40 60 80 100 120 140 160 180 200 220 240
Replication Number

m ) of SAA optimal solutions x̂m as the sample

Fig. 4.4. Improvement of objective values g(x̂N N
size N increases, for instance 20R.
SAMPLE AVERAGE APPROXIMATION 499

for each of the instances, the expected value problem maxx G(x, E[W ]) was solved,
with its optimal solution denoted by x̄. The objective value g(x̄) of each x̄ is shown in
Table 4.1. It is interesting to note that even with small sample sizes N , every solution
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

x̂Nm produced had a better objective value g(x̂Nm ) than g(x̄).

As mentioned above, in the second numerical experiment it was noticed that often
the optimality gap estimate is large, even if an optimal solution has been found, i.e.,
v ∗ −g(x̂) = 0. (This is also a common problem in deterministic discrete optimization.)
Consider the components of the optimality gap estimator ĝN (x̂) − v̄NM given in (3.3).
The first component g(x̂) − ĝN (x̂) can be made small with relatively little compu-
tational effort by choosing N sufficiently large. The second component, the true
optimality gap v ∗ − g(x̂), is often small after only a few replications m with a small
sample size N . The fourth component zα (SN2 (x̂)/N + SM2 /M )1/2 can also be made
small with relatively little computational effort by choosing N and M sufficiently
large. The major part of the problem seems to be caused by the third term v̄NM − v ∗
and by the fact that E[v̄NM ] − v ∗ ≥ 0, as identified in (3.1). It was also mentioned that
the bias E[v̄NM ] − v ∗ decreases as the sample size N increases. However, the second
numerical experiment indicated that a significant bias can persist even if the sample
size N is increased far beyond the sample size needed for the SAA method to produce
an optimal solution.
The third numerical experiment demonstrates the effect of the number of decision
variables and the condition number κ on the bias in the optimality gap estimator.
Figures 4.5 and 4.6 show how the relative bias v̄NM /v ∗ of the optimality gap estimate
changes as the sample size N increases, for different instances. The most noticeable
effect is that the bias decreases much more slowly for the harder instances than for the
randomly generated instances as the sample size N increases. This is in accordance
with the asymptotic result (2.31) of Proposition 2.4.
Two estimators of the optimality gap v ∗ − g(x̂) were discussed in section 3.3,
namely, v̄NM − ĝN (x̂) and v̄NM − ḡNM (x̂). It was mentioned that the second estimator may
have smaller variance than the first, especially if there is positive correlation between
ĝNm (x̂) and v̂Nm . It was also pointed out that the second estimator requires additional
computational effort, because after x̂ is produced by solving the SAA problem for one
sample, the second estimator requires the computation of ĝNm (x̂) for all the remaining
samples m = 1, . . . , M . The fourth numerical experiment compares the optimality
gap estimates and their variances. Sample sizes of N = 50 and N = 2000 were used,
and M = 50 replications were performed.
Table 4.2 shows the optimality gap estimates v̄NM − ĝN (x̂) and v̄NM − ḡNM (x̂), with
- M − ĝ (x̂)] = S 2 (x̂)/N + S 2 /M and Var[v̄
their variances Var[v̄ - M − ḡ M (x̂)] = S̄ 2 /M ,
N N N M N N M

respectively; the correlation Cor[v̄- M , ḡ M (x̂)]; and the computation times of the gap
N N
estimates. In each case, the bias v̄NM − v ∗ formed the major part of the optimality
gap estimate; the standard deviations of the gap estimators were small compared
with the bias. There was positive correlation between v̄NM and ḡNM (x̂), and the second
gap estimator had smaller variances, but this benefit is obtained at the expense of
relatively large additional computational effort.
2
In section 2.2, an estimate N ≈ 3σmax log(|S|/α)/(ε − δ)2 of the required sample
size was derived. For the instances presented here, using ε = 0.5, δ = 0, and α = 0.01,
these estimates were of the order of 106 and thus much larger than the sample sizes
that were actually required for the specified accuracy. The sample size estimates
2
using σmax were smaller than the sample size estimates using maxx∈S Var[G(x, W )]
by a factor of approximately 10.
500 A. J. KLEYWEGT, A. SHAPIRO, AND T. HOMEM-DE-MELLO

1.2
Average Optimal SAA Objective Value / True Optimal Value
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

1.18

1.16

1.14

1.12

1.1

1.08

1.06

1.04 Instance 10D

1.02
Instance 10R
1
0 100 200 300 400 500 600 700 800 900 1000
Sample Size N

Fig. 4.5. Relative bias v̄NM /v ∗ of the optimality gap estimator as a function of the sample size
N , for instances 10D and 10R, with 10 decision variables.

1.1
Average Optimal SAA Objective Value / True Optimal Value

1.09

1.08

1.07

1.06

1.05

1.04

1.03
Instance 20D

1.02

1.01
Instance 20R
1
0 100 200 300 400 500 600 700 800 900 1000
Sample Size N

Fig. 4.6. Relative bias v̄NM /v ∗ of the optimality gap estimate as a function of the sample size
N , for instances 20D and 20R, with 20 decision variables.
SAMPLE AVERAGE APPROXIMATION 501
Table 4.2
Optimality gap estimates v̄NM − ĝN (x̂) and v̄NM − ḡNM (x̂), with their variances and computation
times.
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Opt. gap Estimate . NM − ĝN (x̂)]

Var[v̄ CPU
Instance v∗ − g(x̂) v̄
M
N
− ĝN (x̂) = S2
(x̂)/N + SM
2 /M time
N

10D 0 3.46 0.200 0.02

10R 0 1.14 0.115 0.01
20D 0.148 8.46 0.649 0.02
20R 0 3.34 1.06 0.02

Opt. gap Estimate . NM − ḡNM (x̂)]

Var[v̄ Correlation CPU
Instance v∗ − g(x̂) v̄
M
N
− ḡ (x̂)
M
N
2 /M
= S̄M . NM , ḡNM (x̂)]
Cor[v̄ time
10D 0 3.72 0.121 0.203 0.24
10R 0 1.29 0.035 0.438 0.24
20D 0.148 9.80 0.434 0.726 0.49
20R 0 3.36 0.166 0.844 0.47

Several variance reduction techniques can be used. Compared with simple random
sampling, Latin hypercube sampling reduced the variances by factors varying between
1.02 and 2.9 and increased the computation time by a factor of approximately
k 1.2.
Also, to estimate g(x) for any solution x ∈ S, it is natural to use i=1 Wi xi as a
k k
control variate, because i=1 Wi xi should be correlated with [ i=1 Wi xi − q]+ , and
k
the mean of i=1 Wi xi is easy to compute. Using this control variate reduced the
variances of the estimators of g(x) by factors between 2.0 and 3.0 and increased the
computation time by a factor of approximately 2.0.
5. Conclusion. We proposed a sample average approximation method for solv-
ing stochastic discrete optimization problems, and we studied some theoretical as well
as practical issues important for the performance of this method. It was shown that
the probability that a replication of the SAA method produces an optimal solution
increases at an exponential rate in the sample size N . It was found that this conver-
gence rate depends on the conditioning of the problem, which in turn tends to become
poorer with an increase in the number of decision variables. It was also shown that the
sample size required for a speciﬁed accuracy increases proportional to the logarithm
of the number of feasible solutions. It was found that for many instances the SAA
method produces good and often optimal solutions with only a few replications and a
small sample size. However, the optimality gap estimator considered here was in each
case too weak to indicate that a good solution had been found. Consequently the
sample size had to be increased substantially before the optimality gap estimator in-
dicated that the solutions were good. Thus, a more eﬃcient optimality gap estimator
can make a substantial contribution toward improving the performance guarantees of
the SAA method during execution of the algorithm. The SAA method has the advan-
tage of ease of use in combination with existing techniques for solving deterministic
optimization problems.
The proposed method involves solving several replications of the SAA prob-
lem (2.1), and possibly increasing the sample size several times. An important issue is
the behavior of the computational complexity of the SAA problem (2.1) as a function
of the sample size. Current research aims at investigating this behavior for particular
classes of problems.
502 A. J. KLEYWEGT, A. SHAPIRO, AND T. HOMEM-DE-MELLO

REFERENCES

[1] M. H. Alrefaei and S. Andradóttir, A simulated annealing algorithm with constant tem-
Downloaded 05/29/23 to 131.175.147.3 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

perature for discrete stochastic optimization, Management Science, 45 (1999), pp. 748–764.
[2] R. E. Bechhofer, T. J. Santner, and D. M. Goldsman, Design and Analysis of Experiments
for Statistical Selection, Screening and Multiple Comparisons, John Wiley, New York, NY,
1995.
[3] J. R. Birge and F. Louveaux, Introduction to Stochastic Programming, Springer Ser. Oper.
Res., Springer-Verlag, New York, NY, 1997.
[4] A. Cohn and C. Barnhart, The stochastic knapsack problem with random weights: A heuris-
tic approach to robust transportation planning, in Proceedings of the Triennial Symposium
on Transportation Analysis (TRISTAN III), San Juan, PR, 1998.
[5] A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications, Springer-Verlag,
New York, NY, 1998.
[6] B. L. Fox and G. W. Heine, Probabilistic search with overrides, Ann. Appl. Probab., 5 (1995),
pp. 1087–1094.
[7] A. Futschik and G. C. Pflug, Conﬁdence sets for discrete stochastic optimization, Ann.
Oper. Res., 56 (1995), pp. 95–108.
[8] A. Futschik and G. C. Pflug, Optimal allocation of simulation experiments in discrete
stochastic optimization and approximative algorithms, European J. Oper. Res., 101 (1997),
pp. 245–260.
[9] S. B. Gelfand and S. K. Mitter, Simulated annealing with noisy or imprecise energy mea-
surements, J. Optim. Theory Appl., 62 (1989), pp. 49–62.
[10] W. Gutjahr and G. C. Pflug, Simulated annealing for noisy cost functions, J. Global Optim.,
8 (1996), pp. 1–13.
[11] I. D. Hill, Algorithm AS 66: The normal integral, Applied Statistics, 22 (1973), pp. 424–427.
[12] Y. Hochberg and A. Tamhane, Multiple Comparison Procedures, John Wiley, New York, NY,
1987.
[13] T. Homem-de-Mello, Variable-Sample Methods and Simulated Annealing for Discrete
Stochastic Optimization, manuscript, Department of Industrial, Welding and Systems En-
gineering, The Ohio State University, Columbus, OH, 1999.
[14] T. Homem-de-Mello, Monte Carlo methods for discrete stochastic optimization, in Stochastic
Optimization: Algorithms and Applications, S. Uryasev and P. M. Pardalos, eds., Kluwer
Academic Publishers, Norwell, MA, 2000, pp. 95–117.
[15] W. K. Mak, D. P. Morton, and R. K. Wood, Monte Carlo bounding techniques for deter-
mining solution quality in stochastic programs, Oper. Res. Lett., 24 (1999), pp. 47–56.
[16] D. P. Morton and R. K. Wood, On a stochastic knapsack problem and generalizations, in
Advances in Computational and Stochastic Optimization, Logic Programming, and Heuris-
tic Search: Interfaces in Computer Science and Operations Research, D. L. Woodruﬀ, ed.,
Kluwer Academic Publishers, Dordrecht, the Netherlands, 1998, pp. 149–168.
[17] B. L. Nelson, J. Swann, D. M. Goldsman, and W. Song, Simple procedures for selecting
the best simulated system when the number of alternatives is large, Oper. Res., to appear.
[18] V. I. Norkin, Y. M. Ermoliev, and A. Ruszczyński, On optimal allocation of indivisibles
under uncertainty, Oper. Res., 46 (1998), pp. 381–395.
[19] V. I. Norkin, G. C. Pflug, and A. Ruszczyński, A branch and bound method for stochastic
global optimization, Math. Programming, 83 (1998), pp. 425–450.
[20] R. Schultz, L. Stougie, and M. H. Van der Vlerk, Solving stochastic programs with integer
recourse by enumeration: A framework using Gröbner basis reductions, Math. Program-
ming, 83 (1998), pp. 229–252.
[21] A. Shapiro, Asymptotic analysis of stochastic programs, Ann. Oper. Res., 30 (1991), pp. 169–
186.
[22] A. Shapiro, T. Homem-de-Mello, and J. C. Kim, Conditioning of Convex Piecewise Linear
Stochastic Programs, manuscript, School of Industrial and Systems Engineering, Georgia
Institute of Technology, Atlanta, GA, 2000.

Signals and Systems
No ratings yet
Signals and Systems
17 pages
D. H. Fremlin - Measure Theory, Volume 5, Part 2 (2000, Torres Fremlin)
No ratings yet
D. H. Fremlin - Measure Theory, Volume 5, Part 2 (2000, Torres Fremlin)
375 pages
Unity Feedback System Unity Feedback System: Final Value Theorem
No ratings yet
Unity Feedback System Unity Feedback System: Final Value Theorem
5 pages
Maths For: Every Learner
No ratings yet
Maths For: Every Learner
20 pages
Discrete Structures Assignments
No ratings yet
Discrete Structures Assignments
30 pages
Spatial Reasoning
100% (4)
Spatial Reasoning
28 pages
Aptitude Test Descriptions
No ratings yet
Aptitude Test Descriptions
40 pages
Lecture Note 15
No ratings yet
Lecture Note 15
6 pages
Scientific Notation
No ratings yet
Scientific Notation
15 pages
Properties of S-Matrices: Zar Khitab
No ratings yet
Properties of S-Matrices: Zar Khitab
12 pages
Multinomial Distribution
No ratings yet
Multinomial Distribution
11 pages
Integration: Area and Estimating Finite Sums
No ratings yet
Integration: Area and Estimating Finite Sums
12 pages
For Intervention (Gen - Math)
No ratings yet
For Intervention (Gen - Math)
4 pages
Self-Intersection Detection and Elimination in Freeform Curves and Surfaces
No ratings yet
Self-Intersection Detection and Elimination in Freeform Curves and Surfaces
10 pages
2051436 (1)
No ratings yet
2051436 (1)
15 pages
An Intuitive Guide To Linear Algebra - BetterExplained
No ratings yet
An Intuitive Guide To Linear Algebra - BetterExplained
6 pages
Ant Ma8551 U1
No ratings yet
Ant Ma8551 U1
73 pages
A2-Sinusoids and DFT
No ratings yet
A2-Sinusoids and DFT
5 pages
Exogeneity Assumptions
No ratings yet
Exogeneity Assumptions
3 pages
Grade Section Group No. Sample
No ratings yet
Grade Section Group No. Sample
31 pages
STAT201 Probability Theory and Applications (1410) - Kwong Koon Shing
No ratings yet
STAT201 Probability Theory and Applications (1410) - Kwong Koon Shing
2 pages
Lecture # 1 (Ex.1.1 To Ex.1.3)
No ratings yet
Lecture # 1 (Ex.1.1 To Ex.1.3)
6 pages
Avionics Manual 1 To 4
No ratings yet
Avionics Manual 1 To 4
20 pages
Lesson 1 Laws of Exponents
No ratings yet
Lesson 1 Laws of Exponents
6 pages
FAI QB UNIT 3 With Answers
No ratings yet
FAI QB UNIT 3 With Answers
12 pages
Limit and Continuity
No ratings yet
Limit and Continuity
18 pages
Lab 206
No ratings yet
Lab 206
4 pages
8 - S.Y.B.Sc. Mathematics
No ratings yet
8 - S.Y.B.Sc. Mathematics
8 pages
Phy-214
No ratings yet
Phy-214
2 pages
Paper Crane Lesson Plan
No ratings yet
Paper Crane Lesson Plan
3 pages
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6458)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (464)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (1005)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5181)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (582)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1022)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (280)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2016)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1090)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4135)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2033)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2814)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4372)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Tóibín
3.5/5 (2133)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)

The Sample Average Approximation Method For Stochastic Discrete Optimization

Uploaded by

The Sample Average Approximation Method For Stochastic Discrete Optimization

Uploaded by

SIAM J. OPTIM.

c 2001 Society for Industrial and Applied Mathematics

THE SAMPLE AVERAGE APPROXIMATION METHOD FOR

ANTON J. KLEYWEGT† , ALEXANDER SHAPIRO† , AND TITO HOMEM-DE-MELLO‡

Abstract. In this paper we study a Monte Carlo simulation–based approach to stochastic

AMS subject classiﬁcations. 90C10, 90C15

1. Introduction. In this paper we consider optimization problems of the form

(1.1) min {g(x) := EP G(x, W )} .

Here W is a random vector having probability distribution P , S is a ﬁnite set (e.g.,

2001; published electronically December 14, 2001.

30332-0205 ([email protected], [email protected]). The ﬁrst au-

bus, OH 43210-1271 ([email protected]).

and the associated problem

(2.2) δN := max |ĝN (x) − g(x)| → 0, w.p.1 as N → ∞.

Since |v̂N − v ∗ | ≤ δN , it follows that, w.p.1, v̂N → v ∗ as N → ∞.

(2.3) ρ(ε) := min ε g(x) − v ∗ − ε.

{t : M (t) < +∞} of the moment-generating function is an interval containing zero.

(2.4) I(z) := sup{tz − Λ(t)},

In particular, one can take !˜ = 1.

(2.12) g(u(x)) ≤ g(x) − ε∗ for all x ∈ S \ S ε .

(2.13) ε∗ := min g(x) − v ∗ ,

H(x, w) := G(u(x), w) − G(x, w).

Note that E[H(x, W )] = g(u(x))−g(x), and hence E[H(x, W )] ≤ −ε∗ . Let W 1 , . . . , W N

It follows from (2.11) that

Assumption (A). For every x ∈ S the moment-generating function of the random

(2.17) γ(δ, ε) := min ε Ix (−δ).

Moreover, if Assumption (A) holds, then γ(δ, ε) > 0.

where ε∗ is deﬁned in (2.13) and

σx2 := Var[H(x, W )] = Var[G(u(x), W ) − G(x, W )].

Therefore the constant γ(δ, ε), given in (2.17), can be approximated by

(−δ − E[H(x, W )])2 (ε∗ − δ)2 (ε − δ)2

bound on P (ŜNδ ⊂ S ε ), especially when u(x) is chosen to minimize Var[G(x, W ) −

(2.24) γ∗ = min Ix (0),

Proof. It follows from (2.18) that

P (x̂N = x∗ ) ≥ P (ĝN (x) ≤ ĝN (x∗ )) .

denote the squared coeﬃcient of variation of Wi , and let

v̂N = min ĝN (x) ≥ min∗ ĝN (x).

v̂N = min∗ ĝN (x) + op (N −1/2 ).

Furthermore, since v ∗ = g(x) for any x ∈ S ∗ , it follows that

Suppose that for every x ∈ S the variance

(2.30) σ 2 (x) := Var[G(x, W )]

on the results of preliminary computations. This issue is addressed in more detail

The variance of v̄NM is estimated by

The variance of ḡNM (x̂) − v̄NM is estimated by

4.1. Resource allocation problem. We apply the method to the following

the cost C(x) of a path x is given by

C(x) = bij + c  tij − q  ,

Instance Condition number κ Optimal value v∗ Expected value g(x̄)

where Φ denotes the standard normal cumulative distribution function. Thus, it

- P̂ ])1/2 on each side of the point

95 N = 450 N = 600 N = 800 N = 950

m ) of SAA optimal solutions x̂m as the sample

m ) of SAA optimal solutions x̂m as the sample

x̂Nm produced had a better objective value g(x̂Nm ) than g(x̄).

1.04 Instance 10D

Opt. gap Estimate . NM − ĝN (x̂)]

10D 0 3.46 0.200 0.02

Opt. gap Estimate . NM − ḡNM (x̂)]

You might also like

bound on P (ŜNδ ⊂ S ε ), especially when u(x) is chosen to minimize Var[G(x, W ) −

P (x̂N = x∗ ) ≥ P (ĝN (x) ≤ ĝN (x∗ )) .