0% found this document useful (0 votes)
54 views10 pages

Lim 05429427

This paper considers the problem of optimizing a complex stochastic system over a discrete set of feasible values of a parameter. We propose a new gradient-based method that mimics the Newton-Raphson method. The proposed algorithm converges to a local optimizer with probability one as n goes to infinity with rate 1 / n.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views10 pages

Lim 05429427

This paper considers the problem of optimizing a complex stochastic system over a discrete set of feasible values of a parameter. We propose a new gradient-based method that mimics the Newton-Raphson method. The proposed algorithm converges to a local optimizer with probability one as n goes to infinity with rate 1 / n.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Proceedings of the 2009 Winter Simulation Conference

M. D. Rossetti, R. R. Hill, B. Johansson, A. Dunkin, and R. G. Ingalls, eds.


NEWTONRAPHSON VERSION OF STOCHASTIC APPROXIMATION OVER DISCRETE SETS
Eunji Lim
Department of Industrial Engineering
University of Miami
Coral Gables, FL 33124, U.S.A.
ABSTRACT
This paper considers the problem of optimizing a complex stochastic system over a discrete set of feasible values of a
parameter when the objective function can only be estimated through simulation. We propose a new gradientbased method
that mimics the NewtonRaphson method and makes use of both the gradient and the Hessian of the objective function.
The proposed algorithm is designed to give guidance on how to choose the sequence of gains which plays a critical role in
the empirical performance of a gradientbased algorithm. In addition to the desired fast convergence in the rst few steps
of the procedure, the proposed algorithm converges to a local optimizer with probability one as n goes to innity with rate
1/n where n is the number of iterations.
1 INTRODUCTION
When we consider the problem of optimizing a complex system whose performance depends on a parameter, we often nd
that the system requires simulation as an inevitable means of evaluating its performance. When optimizing such systems
over a set of feasible values of a parameter, we need different approaches than the traditional optimization methods in order
to account for the errors incurred from measuring the system performance through simulation.
In many practical situations, the set of feasible values of a parameter is nite or countably innite rather than continuous.
For instance, when controlling an inventory of a certain product, the cost function associated with the inventory system is a
function of the ordering quantity of the product which can only take on nonnegative integers. When allocating buffers to
several stations in a queueing system, the average sojourn time in the system depends on the number of buffers which can
only take on nonnegative integers. To optimize such systems over discrete sets, several algorithms such as the simulated
annealing method by Gelfand and Mitter (1989), the random search method by Andradottir (1995), the nested partitions
method by Shi and Olafsson (2000), the sample average approximation method by Kleywegt et al. (2001), a method balancing
the search process and the simulation effort by Lin and Lee (2006), and COMPASS by Hong and Nelson (2006) have been
proposed.
Recently, gradientbased methods have been proposed by Lim and Glynn (2006) and Dup ac and Herkenrath (1982) for
the case where the parameter space is discrete. The algorithm resembles stochastic approximation which was designed for
continuous parameter spaces. Stochastic approximation (Robbins and Monro 1951) is a recursive method which updates
n
,
the estimated optimal point at the nth iteration, by the following formula:

n+1
=
n
a
n
D
n
. (1)
Algorithm (1) above moves along the direction of negative gradient so that the objective function is reduced in each step.
In the discrete counterpart of (1) as proposed in Lim and Glynn (2006), D
n
mimics the derivative of the objective function
by computing the difference of the objective values evaluated at the nearest point of
n
to the right and the nearest point of

n
to the left.
When applying a gradientbased algorithm such as (1) in practice, we encounter a problem of choosing the right sequence
of gains a
n
. To ensure the convergence of (1), a
n
needs to satisfy a
n
= and a
2
n
<. A typical choice of a
n
is c/n
613 978-1-4244-5771-7/09/$26.00 2009 IEEE
Lim
for some positive constant c. In most practical situations where the computational budget and hence the number of simulation
runs is predetermined, the choice of the sequence of gains a
n
inuences the empirical performance of the algorithm.
When the gains are too large, the estimates for the optimal solution will bounce around the parameter space whereas when
the gains are too small, the estimates for the optimal solution will seem to get stuck at some point. Furthermore, one often
has no a priori information on the choice of the sequence of gains.
To illustrate the signicance of the choice of a
n
, consider the problem of minimizing a cost function f which depends
on a parameter in f () =
2
for all integers . For the purpose of simplicity, assume that the measurement of the objective
function f () is exact, i.e., the problem is deterministic. We assume that the initial point
1
is 100.10. Since one has no a
priori information on a
n
, one needs to guess the values of a
1
and a
n
. Suppose that one decides to try sequences of gains
a
n
= 5/n, a
n
= 1/n, and a
n
= 1/5n, then the trajectories of [
n
], the nearest integer to
n
, generated from:

n+1
=
n
a
n
( f (
n
|) f (
n
|)), (2)
for each choice of the sequence are as follows:
a
n
= 5/n 100 905 3618 8441 12661 12660
a
n
= 1/n 100 101 1 0 0 0
a
n
= 1/5n 100 60 48 42 38 35
In (2), x| is the smallest integer greater than or equal to x and x| is the largest integer less than or equal to x. When
a
n
= 5/n, the estimated optimal points [
n
] tend to bounce around too much whereas when a
n
= 1/5n, [
n
] tends to converge
slowly.
To suggest the choice of a
n
, we consider a NewtonRaphson type approach. Recall that the NewtonRaphson method
to nd the optimal point of f is based on the following recursion:

n+1
=
n

f
/
(
n
)
f
//
(
n
)
, (3)
where f
/
and f
//
are the rst and second derivatives of f , respectively. Considering that stochastic approximation (1) is
asymptotically optimal when a
n
= 1/( f
//
(

)n) where

is the optimal point of f as proven in Chung (1954), we propose


the following procedure:

n+1
=
n

c
n
D
n
H
n
, (4)
where D
n
is an estimate of f
/
() and H
n
is an estimate of f
//
(

).
The algorithm (4) resembles the NewtonRaphson method and makes use of both the gradient and the Hessian information
of the objective function. It suggests a multiple of the reciprocal of the Hessian as the sequence of gains. In addition to
fast convergence in the rst few iterations, the proposed algorithm enjoys a nice asymptotic property. It converges to a local
optimizer with probability one at rate 1/n where n is the number of iterations. The convergence rate 1/n is induced from
the fact that the objective function is dened on a discrete set and hence E(D
n
/H
n
) in (4) stays at a constant value when

n
gets close to

, whereas when a smooth objective function is dened on a continuous set, E(D


n
) in (1) converges to
zero as
n
gets close to

. Therefore, the convergence rate of (4) differs from the conventional convergence rates that are
obtained in continuous cases.
This paper is organized as follows. In Section 2, we present the proposed algorithm and our main results. In Section 3,
we provide proofs for the main results. In section 4, we illustrate the empirical behavior of the proposed algorithm.
2 PROBLEM FORMULATION AND MAIN RESULTS
Consider the following problem of optimization
min
Z
f () = E[X()], (5)
614
Lim
where Z is the set of integers. Suppose that f () cannot be evaluated exactly, it must be estimated through simulation. Our
goal is to generate a sequence of random variables
1
,
2
, . . . that converges to the optimal solution

of f .
The proposed algorithm proceeds as follows. Choose
1
randomly from Z. Given
1
, . . . ,
n
, one generates X
+
n
, X

n
, X
/
n
,
X
//
n
, and X
///
n
where
X
+
n
= f (
n
|) +
+
n
X

n
= f (
n
|) +

n
X
/
n
= f ([
n
] 1) +
/
n
X
//
n
= f ([
n
]) +
//
n
X
///
n
= f ([
n
] +1) +
///
n
and (
+
n
,

n
,
/
n
,
//
n
,
///
n
; n = 1, 2, . . .) are independent and identically distributed random variables with mean zero. Let
F
1
F
2
. . . be an increasing sequence of elds such that
+
n
,

n
,
/
n
,
//
n
, and
///
n
are F
n
measurable and independent
of F
n1
for all n 2. Then
n+1
is computed from the recursion

n+1
=
n

c
n
X
+
n
X

n
H
n
, (6)
where H
n
is a truncated average of (X
/
i
2X
//
i
+X
///
i
; i = 1, . . . , n). That is,
H
n
=
_
_
_
a, if G
n
< a
G
n
, otherwise
b, if G
n
> b,
where
G
n
=
1
n
n

i=1
(X
/
i
2X
//
i
+X
///
i
).
At iteration n, the estimate of the optimal solution is the nearest integer to
n
.
Below is the detailed description of the proposed algorithm.
Algorithm
1. Set n = 1 and choose
n
randomly from Z.
2. Set

n+1
=
n

c
n
X
+
n
X

n
H
n
.
3. Set n = n+1 and go to Step 2.
The following assumptions will be needed:
A1. f has only one local minimum at

. We say Z is a local minimum if f () f ( +1) and f () f ( 1).


A2. There exist known constants a and b such that
0 < a < f (

+1) 2f (

) + f (

1) < b <.
A3. max
_
Var(
+
1
), Var(

1
), Var(
/
1
), Var(
//
1
), Var(
///
1
)
_

2
<.
A4. There exists a constant > 0 such that [ f ( +1) f ()[ for all Z.
A5. There exists a constant C such that
[ f ( +1) f ()[ C(1+[

[),
for all Z.
615
Lim
A6. c is large enough so that c > ( f (

+1) 2 f (

) + f (

1)), where
= inf
R
[ f (|) f (|)[
[

[
> 0
A7. There exists K < such that
max([
+
1
[, [

1
[, [
/
1
[, [
//
1
[, [
///
1
[) < K
and
[
1
[ K
with probability one.
We now state our main results as follows:
Theorem 1 Under A1A5,
n

as n with probability one.


Theorem 2 Under A1A7,
[
n

[ = O
p
_
1
n
_
as n .
3 PROOFS
Proof of Theorem 1 For simplicity of writing, we assume

= 0. Let A
n
= (X
+
n
X

n
)/H
n
. First, we prove
inf
[
n

[1/
(
n

)EA
n
b
1
> 0 for every > 0 (7)
E(A
2
n
[F
n
) C
1
(1+(
n

)
2
) for some constant C
1
. (8)
For any
n
with [
n

[ 1/,
(
n

)E(A
n
[F
n
)
=
n
E
_
X
+
n
X

n
H
n
_
=
n
( f (
n
|) f (
n
|))E [1/A
n
] .
Note that A1 implies f (|) f (|) 0 if > 0 and f (|) f (|) 0 if 0. So, ( f (|) f (|)) =
[[[ ( f (|) f (|))[ for any R. So,

n
( f (
n
|) f (
n
|))E [1/H
n
] = [
n
[[ f (
n
|) f (
n
|)[E [1/H
n
]
b
1
by A4.
Hence (7) is proven. Note that
E(A
2
n
[F
n
)
= E
_
(X
+
n
X

n
)
2
H
2
n

F
n
_

_
( f (
n
|) f (
n
|))
2
+2
2
_
E
_
1
H
2
n

F
n
_
b
2
_
( f (
n
|) f (
n
|))
2
+2
2
_
by A2
b
2
_
C
2
(1+
n
|)
2
+2
2
_
by A5
616
Lim
b
2
_
C
2
(2+[
n
[)
2
+2
2
_
C
1
(1+[
n
[
2
),
for some constant C
1
. Hence, (8) is proven.
To prove
n
0, note
E(
2
n+1
[F
n
)
= E
_
_

c
n
A
n
_
2
[F
n
_
=
2
n
+
c
2
n
2
E(A
2
n
[F
n
)
2c
n

n
E(A
n
[F
n
)

2
n
+
c
2
n
2
C
1
_
1+
2
n
_

2c
n

n
E(A
n
[F
n
) by (8)

_
1+
c
2
C
1
n
2
_

2
n
+
c
2
C
1
n
2

2c
n

n
E(H(
n
)[F
n
).
By the theorem for almost supermartingales in Robbins and Siegmund (1971),
2
n

for some random variable

as
n and

n=1
1
n

n
E(A
n
[F
n
) < (9)
with probability one. We now need to show

= 0 with probability one. Suppose, on the contrary, that is such that

() ,= 0. Then there exist > 0 and N such that for all n > N,
n
() 1/. Since
n
E(A
n
[F
n
) b
1
> 0 for
all
n
() 1/ from (7), we have

n=1
1
n

n
E(A
n
[F
n
) =,
which contradicts (9). Hence

= 0 with probability one. 2


Proof of Theorem 2 It sufces to prove
sup
n=1,2,...
E (n[
n

[) <
since for any constant C,
P(n[
n

[ >C)
E (n[
n

[)
C
.
For simplicity of writing, assume

= 0. First assume
1
=

. (We will consider the general case later.) Let Y


n
=
n(
n

), n = 1, . . .. We will prove that there exists N such that for all n N,


E(Y
n+1
Y
n
[F
n
)
/
on the event Y
n
>

(10)
and
E(Y
n+1
Y
n
[F
n
)
//
on the event Y
n
<

(11)
for some constants
/
,
//
> 0. From

n+1
=
n

c
n
X
+
n
X

n
H
n
,
617
Lim
we get
(n+1)
n+1
= (n+1)
n
c
_
1+
1
n
_
X
+
n
X

n
H
n
,
or equivalently,
Y
n+1
=Y
n
+
n
c
_
1+
1
n
_
X
+
n
X

n
H
n
.
When Y
n
>

,
E(Y
n+1
Y
n
[F
n
)
=
n
c
_
1+
1
n
_
E
_
X
+
n
X

n
H
n
_
=
n
c
_
1+
1
n
_
( f (
n
|) f (
n
|))E
_
1
A
n
_
.
Under A1A4,
n

as n with probability one by Theorem 1, and hence 1/A


n
1/( f (1) 2 f (0) + f (1))
with probability one as n . Since 1/b 1/[H
n
[ 1/a for all n, by the bounded convergence theorem, E[1/H
n
]
( f (1) 2 f (0) + f (1))
1
as n . Take small enough so that
c( f (1) 2f (0) + f (1))
1
1 > c. (12)
Then there exists N such that for all n N,
[E[1/H
n
] (( f (1) 2 f (0) + f (1))
1
[ .
For n N,

n
c
_
1+
1
n
_
( f (
n
|) f (
n
|))E
_
1
H
n
_

n
c
_
1+
1
n
_
( f (1) 2f (0) + f (1))
1
)( f (
n
|) f (
n
|))

n
| c
_
1+
1
n
_
( f (1) 2 f (0) + f (1))
1
)( f (
n
|) f (
n
|)) because
n
> 0

n
| c( f (1) 2f (0) + f (1))
1
)( f (
n
|) f (
n
|))

n
| c(( f (1) 2 f (0) + f (1))
1
)
n
| by A6
=
n
|
_
1c( f (1) 2f (0) + f (1))
1
+c
_

/

n
| for some
/
> 0 by (12)

/
because
n
| 1 for all
n
> 0.
Hence (10) holds. (11) follows in a similar way.
Let
i
= E((i +1)
i+1
i
i
[F
i
) for i = 1, 2, . . .. Note that A7 implies
[
i
[ b(K, N), (13)
for all i = 1, . . . , N with probability one where b(K, N) is a constant depending on K and N because
[
i
[ = [E((i +1)
i+1
i
i
[F
i
)[
= E([(i +1)
i+1
i
i
[[F
i
)
618
Lim
= E
_

i
c
_
1+
1
i
_
X
+
i
X

i
H
i

F
i
_
[
i
[ +c
_
1+
1
i
_
E
_
[X
+
i
X

i
[
[H
i
[

F
i
_
[
i
[ +c
_
1+
1
i
_
a
1
( f (
i
|) f (
i
|) +2K)
[
i
[ +c
_
1+
1
i
_
a
1
(C(1+
i
|) +2K)
C
2
(1+[
i
[)
for some constant C
2
and
[
i
[ =

i1

c
i 1
X
+
i1
X

i1
H
i1

[
i1
[ +
C
3
i 1
(1+[
i1
[)

_
1+
C
3
i 1
_
[
i1
[ +
C
3
i 1
. . .
_
1+
C
3
i 1
__
1+
C
3
i 2
_
. . .
_
1+
C
3
1
_
[
1
[ +
i1

k=1
k1

j=1
_
1+
C
3
i j
_
C
3
i k
,
for come constant C
3
.
We will compute an upper bound for E[Y
n
[ that does not depend on n. Let U = maxk n : Y
k
0 denote the last
time up to n such that Y
k

k=1,...
takes a nonpositive value.
For t > N b(K, N), we have
P(Y
n
t)
=
n1

k=1
P(Y
n
t,U = k)

n1

k=1
P(Y
n
Y
k
t,Y
k
0,Y
i
> 0 for k < i < n)

N1

k=1
P(Y
n
Y
n1

n1
+. . . +Y
k+1
Y
k

k
t +(nN) b(K, N)(Nk))
+
n1

k=N
P(Y
n
Y
n1

n1
+. . . +Y
k+1
Y
k

k
t +(nk))

N1

k=1
E [Y
n
Y
n1

n1
+. . . +Y
k+1
Y
k

k
[
p
[t +(nN) b(K, N)(Nk)[
p
+
n1

k=N
E [Y
n
Y
n1

n1
+. . . +Y
k+1
Y
k

k
[
p
[t +(nk)[
p
for any p > 6.
Note that (Y
n+1
Y
n

n
: n = 1, . . .) is a sequence of a martingale difference with sup
n
[Y
n+1
Y
n

n
[
p
< because
E[Y
n+1
Y
n

n
[
p
= E

c
_
1+
1
n
_
X
+
n
X

n
H
n
c
_
1+
1
n
_
E
_
X
+
n
X

n
H
n
_

p
619
Lim
= c
p
_
1+
1
n
_
p
E

X
+
n
X

n
H
n
E
_
X
+
n
X

n
H
n
_

p
c
p
2
p
E

X
+
n
X

n
( f (
n
|) f (
n
|))E (1/H
n
)H
n
H
n

p
c
p
2
p
b
p
EC
4
(1+[
n
[)
p
c
p
2
p
b
p
(C
4
+E[
n
[
p
)
for some constant C
4
and
E[
n+1
[
p
= E

c
n
X
+
n
X

n
H
n

p
E[
n
[
p
+
c
p
n
p
b
p
C
5
(1+E[
n
[
p
)
=
_
1+
C
5
n
p
_
E[
n
[
p
+
C
5
n
p
for some constants C
5
and C
6
. Hence, by Lemma 1 of Venter (1966), E[
n
[
p
is bounded.
Note
n1

k=N
E [Y
n
Y
n1

n1
+. . . +Y
k+1
Y
k

k
[
p
[t +(nk)[
p

n1

k=N
C
p
(nk)
p/2
(t +(nk))
p

k=1
C
p
k
p/2
(t +k)
p
and that
N1

k=1
E [Y
n
Y
n1

n1
+. . . +Y
k+1
Y
k

k
[
p
[t +(nN) b(K, N)N[
p

N1

k=1
C
p
(nk)
p/2
[t +(nN) b(K, N)N[
p
for some constant C
p
by Lemma 2.1 of Li (2003). So,
E max(Y
n
, 0) =
_

0
P(Y
n
>t)dt
= N b(K, N) +
_

Nb(K,N)

k=1
C
p
k
p/2
(t +k)
p
+
_

Nb(K,N)
N1

k=1
C
p
(nk)
p/2
[t +(nN) b(K, N)N[
p
= N b(K, N) +

k=1
_

b(K,N)
C
p
k
p/2
(t +k)
p
+
N1

k=1
_

Nb(K,N)
C
p
(nk)
p/2
[t +(nN) b(K, N)N[
p
= N b(K, N) +

k=1
1
p1
C
p
k
p/2
[N b(K, N) +k[
p1
+
N

k=1
1
p1
C
p
(nk)
p/2
[N b(K, N) +(nN) b(K, N)N[
p
K
/
for some constant K
/
< which does not depend on n. E max(Y
n
, 0) < follows in a similar way. Hence the desired
result is proven. For the general case, let Y
n
= n(
n

) (
1

). Then it follows E(n(


n

) (
1

))
+
< and
620
Lim
E(n(
n

) (
1

))

<. So
E[n(
n

)[ E[n(
n

)
1
[ +E[
1
[ +

<
by A7. Hence, the desired result is proven. 2
4 A NUMERICAL EXAMPLE
Consider the singleperiod newsvendor problem where units are ordered and stacked at the beginning of the period. The
goal is to nd an ordering level that minimizes the cost function f () = E[c +hmax(0, D) +pmax(0, D)], where
c is the unit cost for producing each unit, h is the holding cost per unit remaining at the end of the period, p is the shortage
cost per unit of unsatised demand, and the expectation is taken with respect to the random demand D. The optimal solution
for this problem is given by

= F
1
_
pc
p+h
_
,
where F is the cumulative distribution function of D.
We compare the proposed algorithm:

n+1
=
n

1
n
X
+
n
X

n
A
n
(14)
to the following algorithm:

n+1
=
n

1
n
(X
+
n
X

n
), (15)
which was proposed in Lim and Glynn (2006) and which does not make use of the Hessian information.
Table 1 compares the performance of (14) and (15) with c = 3, h = 5, and p = 9. Demand follows a Poisson distribution
with parameter 100, resulting in

= 98. In (14) and (15), 1/n is adjusted according to the total number of the iterations
as follows:
Total number of iterations
10 n
so that a
n
1/1, 1/2, 1/3, . . . , 1/10 and a
n
does not get too small at the end of iterations.
At each
n
, X
+
n
is the average of 500 replications of c
n
| +h(
n
| D)
+
+ p(D
n
|)
+
. X

n
, X
/
n
, X
//
n
, and X
///
n
are
computed in a similar way. Table 1 shows the sample mean and the sample standard deviation of
n
based on 200 independent
replications with
1
= 5.231. n
1
and n
2
are the total numbers of iterations for (14) and (15), respectively. The ratio of n
1
to
n
2
is set to be 2 to 5 reecting the fact that we need to generate X() at 5 and 2 different values of at each iteration of
(14) and (15), respectively. Hence the same amount of computational budget is allocated to (14) and (15).
Table 1: Performance of Algorithms (14) and (15)
n
1
= 15, n
2
= 6 n
1
= 25, n
2
= 10 n
1
= 35, n
2
= 14

= 98 [
n

[ Variance MSE [
n

[ Variance MSE [
n

[ Variance MSE
Algorithm (14) 51.14 1003.00 3618.29 24.00 967.21 1543.21 18.14 907.21 1236.27
Algorithm (15) 69.31 73.49 4877.32 50.94 106.92 2701.80 33.69 153.76 1288.78
Table 1 shows that the proposed algorithm, Algorithm (14), approaches the optimal solution faster than Algorithm (15),
but shows more variability. However, the overall efciency summarized by the mean square error indicates that the proposed
algorithm outperforms Algorithm (15) since the variance increase is more than offset by the bias reduction.
621
Lim
ACKNOWLEDGMENTS
The author would like to thank all anonymous referees for their valuable comments and suggestions.
REFERENCES
Andradottir, S. 1995. A method for discrete stochastic optimization. Management Science 41:19461961.
Chung, K. L. 1954. On a stochastic approximation method. Annals of Mathematical Statistics 25:463483.
Dup ac, V., and U. Herkenrath. 1982. Stochastic approximation on a discrete set and the multiarmed bandit problem.
Communications in StatisticsSequential Analysis 1:125.
Gelfand, S., and S. Mitter. 1989. Simulated annealing with noisy or imprecise energy measurements. Journal of Optimization
Theory and Applications 2:4962.
Hong, L. J., and B. L. Nelson. 2006. Discrete optimization via simulation using compass. Operations Research 54:115129.
Kleywegt, A., A. Shapiro, and T. Homem-de-Mello. 2001. The sample average approximation for stochastics discrete
optimization. SIAM Journal on Optimization 12:479502.
Li, Y. 2003. A martingale inequality and large deviations. Statistics and Probability Letters 62:317321.
Lim, E., and P. W. Glynn. 2006. Discrete optimization via simulation in the presence of regularity. INFORMS National
Meeting, Pittsburgh.
Lin, X., and L. H. Lee. 2006. A new approach to discrete stochastic optimization problems. European Journal of Operational
Research 172:761782.
Robbins, H., and S. Monro. 1951. A stochastic approximation method. Annals of Mathematical Statistics 22:400407.
Robbins, H., and D. Siegmund. 1971. A convergence theorem for nonnegative almost supermartingales and some applications.
In Optimizing Motheds in Statistics, 233257. New York: Academic Press.
Shi, L., and S. Olafsson. 2000. Nested partitions method for stochastic optimization. Methodology and Computing in Applied
Probability 2:271291.
Venter, J. H. 1966. On Dvoretzky stochastic approximation theorems. Annals of Mathematical Statistics 37:15341544.
AUTHOR BIOGRAPHY
EUNJI LIM is an Assistant Professor in the Industrial Engineering Department at the University of Miami. She received
her Ph.D. in Management Science and Engineering from Stanford University. Her research interest includes stochastic
optimization, statistical inference under shape restrictions, and simulation.
622

You might also like