0% found this document useful (0 votes)
2 views

Computer Experiments

The document discusses the use of deterministic computer simulations in various scientific and engineering applications, particularly in semiconductor design. It outlines the importance of exploring input variables through statistical approaches, emphasizing the need for randomness to generate confidence intervals and improve optimization processes. Two main statistical approaches to computer experiments are presented: a Bayesian method using Gaussian processes and a frequentist method utilizing random sampling techniques.

Uploaded by

vishalghosh67
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Computer Experiments

The document discusses the use of deterministic computer simulations in various scientific and engineering applications, particularly in semiconductor design. It outlines the importance of exploring input variables through statistical approaches, emphasizing the need for randomness to generate confidence intervals and improve optimization processes. Two main statistical approaches to computer experiments are presented: a Bayesian method using Gaussian processes and a frequentist method utilizing random sampling techniques.

Uploaded by

vishalghosh67
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

S. Ghosh and C. R. Rao, eds., Handbook of Statistics, Vol.

13 N
© 1996 Elsevier Science B.V. All rights reserved.

Computer Experiments

J. R. K o e h l e r a n d A. B. O w e n

1. Introduction

Deterministic computer simulations of physical phenomena are becoming widely used


in science and engineering. Computers are used to describe the flow of air over an
airplane wing, combustion of gases in a flame, behavior of a metal structure under
stress, safety of a nuclear reactor, and so on.
Some of the most widely used computer models, and the ones that lead us to work in
this area, arise in the design of the semiconductors used in the computers themselves.
A process simulator starts with a data structure representing an unprocessed piece of
silicon and simulates the steps such as oxidation, etching and ion injection that produce
a semiconductor device such as a transistor. A device simulator takes a description of
such a device and simulates the flow of current through it under varying conditions to
determine properties of the device such as its switching speed and the critical voltage
at which it switches. A circuit simulator takes a list of devices and the way that they
are arranged together and computes properties of the circuit as a whole.
In each of these computer simulations, the user must specify the values of some
governing variables. For example in process simulation the user might have to specify
the duration and temperature of the oxidation step, and doses and energies for each
of the ion implantation steps. These are continuously valued variables. There may
also be discrete variables, such as whether to use wet or dry oxidation. Most of this
chapter treats the case of continuous variables, but some of it is easily adaptable to
discrete variables, especially those taking only two values.
Let X E R ~ denote the vector of input values chosen for the computer program. We
will write X as the row vector ( X 1 , . . . , X p) using superscripts to denote components
of X. We assume that each component X j is continuously adjustable between a lower
and an upper limit, which after a linear transformation can be taken to be 0 and 1
respectively. (For some results where every input is dichotomous see Mitchel et al.
(1990).) The computer program is denoted by f and it computes q output quantities,
denoted by Y E R q.

Y=f(X), X E [0,1] p. (1)

Some important quantities describing a computer model are the number of inputs
p, the number of outputs q and the speed with which f can be computed. These vary

261
262 J. R. Koehler and A. B. Owen

enormously in applications. In the semiconductor problems we have considered p is


usually between 4 and 10. Other computer experiments use scores or even hundreds of
input variables. In our motivating applications q is usually larger than 1. For example
interest might center on the switching speed of a device and also on its stability as
measured by a breakdown voltage. For some problems f takes hours to evaluate on
a supercomputer and for others f runs in milliseconds on a personal computer.
Equation (1) differs from the usual X - Y relationship studied by statisticians in
that there is no random error term. If the program is run twice with the same X,
the same Y is obtained both times. Therefore it is worth discussing why a statistical
approach is called for.
These computer programs are written to calculate Y from a known value of X. The
way they are often used however, is to search for good values of X according to some
goals for Y. Suppose that X1 = ( X ~ , . . . , X p) is the initial choice for X. Often X1
does not give a desirable Y1 = f ( X 1 ) . The engineer or scientist can often deduce why
this is, from the program output, and select a new value, Xz for which Yz = f ( X 2 )
is likely to be an improvement. This improvement process can be repeated until a
satisfactory design is found. The disadvantage of this procedure is that it may easily
miss some good designs X, because it does not fully explore the design space. It can
also be slow, especially when p is large, or when improvements of y l say, tend to
appear with worsenings of y2 and vice versa.
A commonly used way of exploring the design space around X1 is to vary each
of the X] one at a time. As is well known to statisticians, this approach can be
misleading if there are strong interactions among the components of X. Increasing
X 1 may be an improvement and increasing X 2 may be an improvement, but increasing
them both together might make things worse. This would usually be determined from a
confirmation run in which both X 1 and X 2 have been increased. The greater difficulty
with interactions stems from missed opportunities: the best combination might be to
increase X 1 while decreasing X 2, but one at a time experimentation might never lead
the user to try this. Thus techniques from experimental design may be expected to
help in exploring the input space.
This chapter presents and compares two statistical approaches to computer experi-
ments. Randomness is required in order to generate probability or confidence intervals.
The first approach introduces randomness by modeling the function f as a realization
of a Gaussian process. The second approach does so by taking random input points
(with some balance properties).

2. Goals in computer experiments

There are many different but related goals that arise in computer experiments. The
problem described in the previous section is that of finding a good value for X accord-
ing to some criterion on Y. Here are some other goals in computer experimentation:
finding a simple approximation f that is accurate enough over a region A of X values,
estimating the size of the error f ( X o ) - f ( X o ) for some X0 c A, estimating fA f dX,
sensitivity analysis of Y with respect to changes in X, finding which X J are most
important for each response y k , finding which competing goals for Y conflict the
most, visualizing the function f and uncovering bugs in the implementation of f.
Computer experiments 263

2.1. Optimization
Many engineering design problems take the form of optimizing y1 over allowable
values of X. The problem may be to find the fastest chip, or the least expensive soda
can. There is often, perhaps usually, some additional constraint on another response y2.
The chip should be stable enough, and the can should be able to withstand a specified
internal pressure.
Standard optimization methods, such as quasi-Newton or conjugate gradients (see
for example Gill et al., 1981) can be unsatisfactory for computer experiments. These
methods usually require first and possibly second derivatives of f , and these may be
difficult to obtain or expensive to run. The standard methods also depend strongly on
having good starting values. Computer experimentation as described below is useful
in the early stages of optimization where one is searching for a suitable starting value.
It is also useful when searching for several widely separated regions of the predictor
space that might all have good Y values. Given a good starting value, the standard
methods will be superior if one needs to locate the optimum precisely.

2.2. Visualization
As Diaconis (1988) points out, being able to compute a function f at any given value
X does not necessarily imply that one "understands" the function. One might not
know whether the function is continuous or bounded or unimodal, where its optimum
is or whether it has asymptotes.
Computer experimentation can serve as a primitive way to visualize functions. One
evaluates f at a well chosen set of points x l , . . . , xn obtaining responses Y l , . . . , Yn.
Then data visualization methods may be applied to the p + q dimensional points
(xi,yi), i = 1 , . . . ,n. Plotting the responses versus the input variables (there are
pq such plots) identifies strong dependencies, and plotting residuals from a fit can
show weaker dependencies. Selecting the points with desirable values of Y and then
producing histograms and plots of the corresponding X values can be used to identify
the most promising subregion of X values. Sharifzadeh et al. (1989) took this approach
to find that increasing a certain implant dose helped to make two different threshold
voltages near their common targets and nearly equal (as they should have been).
Similar exploration can identify which input combinations are likely to crash the
simulator.
Roosen (1995) has used computer experiment designs for the purpose of visualizing
functions fit to data.

2.3. Approximation
The original program f may be exceedingly expensive to evaluate. It may however be
possible to approximate f by some very simple function ], the approximation holding
adequately in a region of interest, though not necessarily over the whole domain of f .
If the function f is fast to evaluate, as for instance a polynomial, neural network or
a MARS model (see Friedman, 1991), then it may be feasible to make millions of ]
264 J. R. Koehler and A. B. Owen

evaluations. This makes possible brute force approximations for the other problems.
For example, optimization could be approached by finding the best value of f(x) over
a million random runs x.
Approximation by computer experiments involves choosing where to gather
(xi, f(xi)) pairs, how to construct an approximation based on them and how to assess
the accuracy of this approximation.

2.4. Integration
Suppose that X* is the target value of the input vector, but in the system being
modeled the actual value of X will be random with a distribution d F that hopefully
is concentrated near X*. Then one is naturally interested in f f(X) dF, the average
value of Y over this distribution. Similarly the variance of Y and the probability
that Y exceeds some threshold can be expressed in terms of integrals. This sort of
calculation is of interest to researchers studying nuclear safety. McKay (1995) surveys
this literature.
Integration and optimization goals can appear together in the same problem. In
robust design problems (Phadke, 1988), one might seek the value X0 that minimizes
the variance of Y as X varies randomly in a neighborhood of X0.

3. Approaches to computer experiments

There are two main statistical approaches to computer experiments, one based on
Bayesian statistics and a frequentist one based on sampling techniques. It seems to
be essential to introduce randomness in one or other of these ways, especially for the
problem of gauging how much an estimate ](Xo) might differ from the true value
f(Xo).
In the Bayesian framework, surveyed below in Sections 4 and 5, f is a realization
of a random process. One sets a prior distribution on the space of all functions from
[0, 1]p to R q. Given the values Yi = f(xi), i = 1 , . . . , n , one forms a posterior
distribution on f or at least on certain aspects of it such as f(xo). This approach is
extremely elegant. The prior distribution is usually taken to be Gaussian so that any
finite list of function values has a multivariate normal distribution. Then the posterior
distribution, given observed function values is also multivariate normal. The posterior
mean interpolates the observed values and the posterior variance may be used to
give 95% posterior probability intervals. The method extends naturally to incorporate
measurement and prediction of derivatives, partial derivatives and definite integrals
of f .
The Bayesian framework is well developed as evidenced by all the work cited
below in Sections 4 and 5. But, as is common with Bayesian methods there may
be difficulty in finding an appropriate prior distribution. The simulator output might
not have as many derivatives as the underlying physical reality, and assuming too
much smoothness for the function can lead to Gibbs-effect overshoots. A numerical
difficulty also arises: the Bayesian approach requires solving n linear equations in
Computer experiments 265

n unknowns when there are n data points. The effort involved grows as n 3 while
the effort in computing f(X1),..., f(Xn) grows proportionally to n. Inevitably this
limits the size of problems that can be addressed. For example, suppose that one
spends an hour computing f ( x l ) , . . . , f(xn) and then one minute solving the linear
equations. If one then finds it necessary to run 24 times as many function evaluations,
the time to compute the f(xi) grows from an hour to a day, while the time to solve
the linear equations grows from one minute to over nine and a half days.
These difficulties with the Bayesian approach motivate a search for an alternative.
The frequentist approach, surveyed in Sections 6 and 7, introduces randomness by tak-
ing function values x l , . . . , xn that are partially determined by pseudo-random number
generators. Then this randomness in the xi is propagated through to randomness in
f(xo). This approach allows one to consider f to be deterministic, and in particular
to avoid having to specify a distribution for f . The material given there expands on
a proposal of Owen (1992a). There is still much more to be done.

4. Bayesian prediction and inference

A Bayesian approach to modeling simulator output (Sacks et al., 1989a, b; Welch


et al., 1990) can be based on a spatial model adapted from the geo-statistical Kriging
model (Matheron, 1963; Journel and Huibregts, 1978; Cressie, 1986, 1993; Ripley,
1981). This approach treats the bias, or systematic departure of the response surface
from a linear model, as the realization of a stationary random function. This model has
exact predictions at the observed responses and predicts with increasing error variance
as the prediction point moves away from all the design points.
This section introduces the Kriging (or Bayesian) approach to modeling the response
surfaces of computer experiments. Several correlation families are discussed as well
as their effect on prediction and error analysis. Additionally, extensions to this model
are presented that allow the use and the modeling of gradient information.

4.1. The Kriging model


The Kriging approach uses a two component model. The first component consists of
a general linear model while the second (or lack of fit) component is treated as the
realization of a stationary Gaussian random function. Define S = [0, 1]p to be the
design space and let x c S be a scaled p-dimensional vector of input values. The
Kriging approach models the associated response as

k
= Zjhj( ) + (2)
j=l

where the hj's are known fixed functions, the /3j's are unknown coefficients to be
estimated and Z(x) is a stationary Gaussian random function with E[Z(x)] = 0 and
covariance

Cov[Z(x ), = 2n( j - (3)


266 J. R. Koehlerand A. B. Owen

For any point x c S, the simulator output Y ( x ) at that point has the Gaussian
distribution with mean ~ / 3 j hj (x) and variance ~r2. The linear component models the
drift in the response, while the systematic lack-of-fit (or bias) is modeled by the second
component. The smoothness and other properties of Z(.) are controlled by R(.).
Let design D -- {xi, i = 1 , . . . , n} C S yield responses y)) = { y ( x l ) , . . . , y(xn)}
and consider a linear predictor

~(xo)--;(:co)y.
of an unobserved point x0. The Kriging approach of Matheron (1963) treats if(x0) as
a random variable by substituting liD for YD where

Y~ = ( Y ( x l ) , . . . , Y(xn)).

The best linear unbiased predictor (BLUP) finds the A(x0) that minimizes

MSE[Y(xo)] = E[A'YD -- Y(xo)J 2

subject to the unbiasedness condition

E[AtYD] = E[Y(xo)].

The BLUP of Y(xo) is given by

9(xo) -' (YD -- HDfl)


= h ' (xo)/3^ + VxoVD
' (4)

where

h'(xo) = (hi ( x o ) , • • •, hk(xo)),


( H ~ )~j = hj( :Cd,
(Y~)~j = Cov[Z(:Cd, Z(:Cj)],
V'o = (Cov[Z(:Co), Z(:C~)],..., Cov[Z(xo), Z(:C~)])
and

~ = [HtV-1H]-I H t V - 1 y D

is the generalized least squares estimate of/3. The mean square error of Y(x0) is

The first component of equation (4) is the generalized least squares prediction
at point :Co given the design covariance matrix VD, while the second component
Computerexperiments 267

"....
q
/..."
/.-
/
o

c/

t i i i i

0.0 0.2 0.4 0.6 0.8 1.0

Fig. 1. A predictionexamplewith n = 3.

"pulls" the generalized least squares response surface through the observed data points.
The elasticity of the response surface "pull" is solely determined by the correlation
function R(.). The predictions at the design points are exactly the corresponding
observations, and the mean square error equals zero. As a prediction point x0 moves
away from all of the design points, the second component of equation (4) goes to
zero, yielding the generalized least squares prediction, while the mean square error at
that point goes to 0-2 q- h'(xo) [H'VDIH]-1 h(xo). In fact, these results are true in
the wide sense if the Gaussian assumption is removed.
As an example, consider an experiment where n -~ 3, p = 1, a 2 = .05, R(d) =
e x p ( - 2 0 d 2) and D = {.3, .5, .8}. The response of the unknown function at the
design is y~ = (.7, .3, .5). The dashed line of Figure 1 is the generalized least
squares prediction surface for h(.) = 1 where fl = .524. The effect of the second
component of equation (4) is to pull the dashed line through the observed design
points as shown by the solid line. The shape of the surface or the amount of elasticity
!
of the "pull" is determined by the vector v=V o--1 as a function of x and therefore
is completely determined by R(.). The dotted lines are +2ffMSE[Y(x)] or 95%
pointwise confidence envelopes around the prediction surface. The interpretation of
these pointwise confidence envelopes is that for any point x0, if the unknown function
is truly generated by a random function with constant mean and correlation function
R(d) = exp(-20d2), then approximately 95% of the sample paths that go through
the observed design points would be between these dotted lines at x0. The predictions
and confidence intervals can be very different for different a 2 and R(-). The effect of
different correlation functions is discussed in Section 4.3. Clearly, the true function is
not "generated" stochastically. The above model is used for prediction and to quantify
the uncertainty of the prediction. This naturally leads to a Bayesian interpretation of
this methodology.
268 J. R. Koehler and A. B. Owen

4.2. A fully Bayesian interpretation


An alternative to the above interpretation of equation (2) is the fully Bayesian interpre-
tation which uses the model as a way of quantifying the uncertainty of the unknown
function. The Bayesian approach (Currin et al., 1991; O'Hagan, 1989) uses the same
model but has a different interpretation of the/3j's. Here the/3j's are random vari-
ables with prior distribution 7rj. The effect of these prior distributions is to quantify
the prior belief of the unknown function or to put a prior distribution on a large class
of functions ~. Hence hopefully the true function y(.) E ~. The mixed convolution of
the r9,s and rr(Z) yield the prior distribution II(G) for subsets of functions G C ~.
Once the data YD = YD has been observed, the posterior distribution 17(G [ YD)
is calculated. The mean

~(~o) =/g(xo)ZZ(glYD = YO)dg

and variance

var(P(x0) I YD = YD) = f (9(XO)- r(x0))2/-/(g I Y D = YD) dg

of the posterior distribution at each input point are then used as the predictor and
measure of error, respectively, at that point. In general, the Kriging and Bayesian
approaches will lead to different estimators. However, if the prior distribution of Z(.)
is Gaussian and if the prior distribution of the/3j's is diffuse, then the two approaches
yield identical estimators.
As an example, consider the case where the prior distribution of the vector of/3's
is

/3 ~ Nk(b, T2Z)

and the prior distribution of Z(.) is a stationary Gaussian distribution with expected
value zero and covariance function given by equation (3). After the simulator function
has been evaluated at the experimental design, the posterior distribution of/3 is

/31Y~ ~ Nk(a,~)
where

~ = ~ [HtVD1yD + r-2Z-'b]

and

= [H,V~IH + ~-2~-1]-,
and the posterior distribution of Y(xo) is

Y(xo) [ YD ~,~ N(v~o VD'YD + %off,


t ~ (72 t V-1
- %0 D V~:o+ ct~o2]C~o)
Computer experiments 269

where

/
Cmo = h' - V~oVD1H.'"-

Hence the posterior distribution is still Gaussian but it is no longer stationary. Now if
-r2 ~ oe then

A
/3~/3,
-+ [H'VD 1H] -1

and hence the posterior variance of Y ( x o ) is

Var(V(x0) lV/~) "= ~2 - - Vxo


, V D-1 Vxo -]- Cxo
, [H,V~,H]-I Cxo

~- ~ ' v-'
-- Vxo D Vxo + h t [H'V~'H]-' h

-- 2h ! [ H t V ~ IH] -I H t V ~ I v x o
+ ~o -IH [H'Vp H] -~ H ' V p ~o
: ~ - [ - h' l i t ' v ; ~]-' h
J- 2h ! [HtVD 1H ] -1 gtVDlVxo]

- [[V'~o~rV-'D -- VD'II [H'VD lit] -' H'VD')V~o]

: a a - (h'(xo),V'o) ( )-'( )
0 H 'D
HD VD
h(xo)
Vxo

which is the same variance as the BLUP in the Kriging approach. Therefore, if Z(.)
has a Gaussian prior distribution and if the/3's have a diffuse prior, the Bayesian and
the Kriging approaches yield identical estimators.
Currin et al. (1991) provide a more in depth discussion of the Bayesian approach
for the model with a fixed mean (h - 1). O'Hagan (1989) discusses Bayes Linear
Estimators (BLE) and their connection to equations (2) and (4). The Bayesian ap-
proach, which uses random functions as a method of quantifying the uncertainty of
the unknown simulator function Y(.), is more subjective than the Kriging or frequen-
tist approach. While both approaches require prior knowledge or an objective method
of estimating the covariance function, the Bayesian approach additionally requires
knowledge of parameters of the prior distribution of/3 (b and Z). For this reason,
the Kriging results and Bayesian approach with diffuse prior distributions and the
Gaussian assumption are widely used in computer experiments.
270 J. R. Koehler and A. B. Owen

0.0 0.2 0.4 0.6 0.8 1.0


X
(a) 0 = 2

>.
d
.' ".. ,:.,, ./,' '..
..: ",,.,..," ",.. ,. ,.
o ............. •- ,,. ........ ..." ,.. ......
¢5
0.0 0.2 0.4 0.6 0.8 1.0
X
(b) O= 100

Fig. 2. The effects of 0 on prediction.

4.3. Correlation functions


As discussed above, the selection of R(.) plays a crucial role in constructing designs
and in the predictive process. Consider the example of Section 4.1 where n = 3,
p = 1, D = {.3, .5, .8), y~ = {.7, .3, .5}, R(d) = e x p { - 0 d 2} and 0 = 20. Figure
2(a) shows the effect on prediction for 0 = 2. N o w / 3 - - 1.3 and the surface elasticity
is very low. The predictions outside of the design are actually higher than the observed
surface since the convex nature of the observed response indicate that the design range
contains a local minimum for the total process. Eventually, the extrapolations would
return to the value of/3. Additionally, the 95% pointwise confidence intervals are
much narrower within the range of the design than in Figure 1. Figure 2(b) displays
the prediction when 0 = 100. Here fl = .5 and the surface elasticity is very high.
The prediction line is typically .5 with smooth curves pulling the surface through the
design points. The 95% pointwise confidence intervals are wider than before.
This section presents some simplifying restrictions on R(.) and four families of
univariate correlation functions used in generating the simplified correlation functions.
Examples of realization of these families will be shown to explain the effect on
prediction by varying the parameter of these families. Furthermore, the maximum
likelihood method for estimating the parameters of a correlation family along with a
technique for implementation will be discussed in Section 4.4.

4.3.1. Restrictions on R(.)


Any positive definite function R with R(x, x) = 1 could be used as a correlation
function, but for simplicity, it is common to restrict R(.) such that for any xl, x2 E S

R ( x l , z2) = R ( x l - ~2)
Computer experiments 271

so that the process Z(.) is stationary. Some types of nonstationary behavior in the mean
function of Y(.) can be modeled by the linear term in equation (2). A further restriction
makes the correlation function depend only on the magnitude of the distance.

R(xl, x2) = R(IXl - x2l).

In higher dimensions (p ~> 2) a product correlation function,


p

j=l

is often used for mathematical convenience. That is, R(.) is a product of univariate
correlation functions and, hence, only univariate correlation functions are of inter-
est. The product correlation function has been used for prediction in spatial settings
(Ylvisaher, 1975; Curin et al., 1991; Sacks et al., 1989a, b; Welch et al., 1990, 1992).
Several choices for the factors in the product correlation function are outlined below.

>" "7

0.0 0~2 0.4 0.6 0,8 1.0


X
(a) (0,7) = (.15,.03)
tq

>.-

. . . . . . . . . . . . . . . . . . . . . . . . 7
m
?
0.0 0.2 0.4 0.6 0.8 "1.0
X
(b) (p,~ = (.45,.2.0)

to
>-

0.0 0.2 0.4 0.6 0.8 1.0


X
(c) (P,r) = {.70,.50)

to

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9
0.0 0.2 0.4 0.6 0.8 1.0
X
(d) (p,~ = (.95,.90)

Fig. 3. Realizations for the cubic correlation function (p, 7) -----(a) (.15, .03), (b) (.45, .20), (c) (.70, .50),
and (d) (.95, .90).
272 J. R. Koehler and A. B. Owen

4.3.2. Cubic
The (univariate) cubic correlation family is parameterized by p E [0, 1] and 7 E [0, 1]
and is given for d E [0, 1] by

R(d) = 1
-
3(1 p) d2 +
(1 - p)(1 -
id13
2+7 2+7

where p and 7 are restricted by

5-)'2 + 87 - 1
P> ,72+47+7

to ensure that the function is positive definite (see Mitchell et al., 1990). Here
p -- corr(Y(0),Y(1)) is the correlation between endpoint observations and 7 =
corr(Y'(0), Y'(1)) is the correlation between endpoints of the derivative process. The
cubic correlation function implies that the derivative process has a linear correlation
process with parameter "7.
A prediction model in one dimension for this family is a cubic spline interpolator.
In two dimensions, when the correlation is a product of univariate cubic correlation
functions the predictions are piece-wise cubic in each variable.
Processes generated with the cubic correlation function are once mean square dif-
ferentiable. Figure 3 shows several realizations of processes with the cubic correlation
function and parameter pairs (.15, .03), (.45, .20), (.70, .50), (.95, .9). Notice that
the realizations are quite smooth and almost linear for parameter pair (.95, .90).

4.3.3. Exponential
The (univariate) exponential correlation family is parameterized by 0 E (0, o~) and is
given by

R ( d ) = exp(-01dl)

for d C [0, 1]. Processes with the exponential correlation function are Ornstein-
Uhlenbeck processes (Parzen, 1962). The exponential correlation function is not mean
square differentiable.
Figure 4 presents several realizations of one dimensional processes with the expo-
nential correlation function and 0 = 0.5, 2.0, 5.0, 20. Figure 4(a) is for 0 = 0.5
and these realizations have very small global trends but much local variation. Figure
4(d) is for 0 = 20, and is very jumpy. Mitchell et al. (1990) also found necessary and
sufficient conditions on the correlation function so that the derivative process has an
exponential correlation function. These are called smoothed exponential correlation
functions.

4.3.4. Gaussian
Sacks et al. (1989b) generalized the exponential correlation function by using

R ( d ) = exp(-Oidl q)
Computer experiments 273

\_. .¢~/~...._r ~.~. .~ ~ . ." ~.. -4/---,-"~.. ,-~,, •


-~---'<~_. v ~ ,,,/ ~ . . . . '~'15-~::,' .. " --P"-.--'," "

"7
0.0 0.2 0.4 0.6 0.8 1.0
X
(a) e = 0.5

t~

>-
"7

0.0 0.2 0.4 0.6 0.8 1.0


X
(b) O = 2
t~
t~

>" o :-", " ~ f" ", .-. ~/c', " . . . . . . . . . .

0.0 0.2 0.4 0.6 0.8 1.0


X
(c) e = s

h " , ~ J~
• . .~q~ /, ,

0.0 0.2 0.4 0.6 0.8 1.0


X
(d) e = 20

Fig. 4. Realizations for the exponential correlation function with 0 = (a) 0.5, (b) 2.0, (c) 5.0, and (d) 20.0.

where 0 < q ~< 2 and 0 E (0, c~). Taking q = 1 recovers the exponential correlation
function. As q increases, this correlation function produces smoother realizations.
However, as long as q < 2, these processes are not mean square differentiable.
The Gaussian correlation function is the case q = 2 and the associated processes are
infinitely mean square differentiable. In the Bayesian interpretation, this correlation
function puts all of the prior mass on analytic functions (Currin et al., 1991). This
correlation function is appropriate when the simulator output is known to be ana-
lytic. Figure 5 displays several realizations for various 0 for the Gaussian correlation
function. These realizations are very smooth, even when 0 = 50.

4.3.5. Mat&n
All of the univariate correlation functions described above are either zero, once or
infinitely many times mean square differentiable. Stein (1989) recommends a more
flexible family of correlation function (Matrrn, 1947; Yaglom, 1987). The Matrrn
correlation function is parameterized by 0 E (0, c~) and v E (--1, c~) and is given by

R(d)= (Oldl) " K ,(Oldl)


274 J. R. Koehler and A. B. Owen
tq.

>- I

it ............
0.0
- -

0.2 0.4
• .
~ ~:_:_

0,6
..

-~-___.~_
......

0.8
.

..--:- . ~ : . : - -. : _ _

1.0
X
(a) e = o . s

!I-- ....... - - -
:"-':. . . . . .•. : " .... :-7 :: . • .................
0.0 0.2 0.4 0.6 0.8 1.0
X
(b) e = 2

• .~ .,.:- . . . . . . . . . . . . . . . • .> ~...._.::7. ~ -- ~ _ _ ~-~:~_.,~ ...

0.0 0.2 0.4 0,6 0.8 1,0


X
(c) e = 1o

~ j

0.0 0,2 0.4 0.6 0.8 1,0


(d) X = 50

Fig. 5. Realizations for the Gaussian correlation function with 0 = (a) 0.5, (b) 2.0, (c) 10.0, and (d) 50.0.

where Kv(.) is a modified Bessel function of order v. The associated process will be
m times differentiable if and only if v > m. Hence, the amount of differentiability
can be controlled by v while 0 controls the range of the correlations. This correlation
family is more flexible than the other correlation families described above due to the
control of the differentiability of the predictive surface.
Figure 6 displays several realizations of processes with the Mat6rn correlation func-
tion with v = 2.5 and various values of 0. For small values of 0, the realizations are
very smooth and flat while the realizations are erratic fo r large values of 0.

4.3.6. Summary
The correlation functions described above have been applied in computer experiments.
Software for predicting with them is described in Koehler (1990). The cubic corre-
lation function yields predictions that are cubic splines. The exponential predictions
are non-differentiable while the Gaussian predictions are infinitely differentiable. The
Mat6rn correlation function is the most flexible since the degree of differentiability
and the smoothness of the predictions can be controlled. In general, enough prior
information to fix the parameters of a particular correlation family and ~r2 will not be
available. A pure Bayesian approach would place a prior distribution on the parame-
t e r s o f a family and use the posterior-distribution of the parameter in the estimation
Computer experiments 275

> ................

0.0 0.2 0.4 0.6 0.8 1.0


X
(a) o = 2

>- ~ . - . ~ ............................... 2~...-.~

0.0 0.2 0.4 0.6 0.8 1.0


X
(b) e = 4

-;-

0.0 0.2 0.4 0.6 0.8 1.0


X
(c) 0 = 10

>. ~ ' ; - - . ~< ........... . / . ~ . - ~


, ~ 77-~ "-i

0.0 0.2 0.4 0.6 0.8 1.0


X
(d) e = 25

Fig. 6. Realizations for the Mat6rn correlation function with v = 2.5 and 0 = (a) 2.0, (b) 4.0, (c) 10.0, and
(d) 25.0.

process. Alternatively, an empirical Bayes approach which uses the data to estimate
the parameters of a correlation family and ~r2 is often used. The maximum likelihood
estimation procedure will be presented and discussed in the next section.

4.4. Correlation function estimation - maximum likelihood

The previous subsections of this section presented the Kriging model, and families of
correlation functions. The families of correlations are all parameterized by one or two
parameters which control the range of correlation and the smoothness of the corre-
sponding processes. This model assumes that ~rz, the family and parameters of R(.)
are known. In general, these values are not completely known a priori. The appro-
priate correlation family might be known from the simulator's designers experience
regarding the smoothness of the function. Also, ranges for ~r2 and the parameters
of R(-) might be known if a similar computer experiment has been performed. A
pure Bayesian approach is to quantify this knowledge into a prior distribution on ~r2
and R(.). How to distribute a non-informative prior across the different correlation
276 J. R. Koehler and A. B. Owen

families and within each family is unclear. Furthermore, the calculation of the posterior
distribution is generally intractable.
An alternative and more objective method of estimating these parameters is an
empirical Bayes approach which finds the parameters which are most consistent with
the observed data. This section presents the maximum likelihood method for estimating
fl, ~r2 and the parameters of a fixed correlation family when the underlying distribution
of Z(.) is Gaussian. The best parameter set from each correlation family can be
evaluated to find the overall "best" a 2 and R(.).
Consider the case where the distribution of Z(.) is Gaussian. Then the distribution
for the response at the n design points Yo is multinormal and the likelihood is given
by

lik (/3, a 2, R I Yz)) = (27r)-n/2a-'~ IRD1-1/2


× exp{-~-~(YD-HI3)'RDI(YD-HP)}

where .RD is the design correlation matrix. The log likelihood is

Iml(/3,0"2,RD IYD)=--'~n in (27r) --"~


n In (a 2) -- ~ln
1 (IRD[)

(Yo - HZ)'R 1 - HZ). (5)

Hence

Olml(fl, cr2,R [ Yo) 1


Off = - - ~ (H'RD1YD -- H'RD1Hfl)

which when set to zero yields the maximum likelihood estimate of/3 that is the same
as the generalized least squares estimate,

flint = [H'RD' H]-1 H, RDIYD. (6)

Similarly,

alml(~'Cr2'RDi~cr2 [ YD) = --~a + ~1 (YD -- H~3)tRDI(yD - Hi3)

which when set to zero yields the maximum likelihood estimate of 0.2

ffml ~ n
(7)
Computerexperiments 277

Therefore, if RD is known, the maximum likelihood estimates of/3 and ~r2 are easily
calculated. However, if R(-) is parameterized by 0 -- (01,..., 0s),

Olmt(/3'cr2'RUOOi I YD) 21OlRDlooi 21i(yD~ -- H/3)taro1 H /3)


1 ITf~RDAORD~
i
1 (YD-H/3) rR D10RD
~ 0 ~-1/~s
[~D--Hfl) (8)

does not generally yield an analytic solution for 0 when set to zero for i = 1 , . . . , s.
(Commonly s = p or 2p, but this need not be assumed.)
An alternative method to estimate/9 is to use a nonlinear optimization routine using
equation (5) as the function to be optimized. For a given value of 0, estimates of/3
and cr2 are calculated using equations (6) and (7), respectively. Next, equation (8) is
used in calculating the partial derivatives of the objective function. See Mardia and
Marshall (1984) for an overview of the maximum likelihood procedure.

4.5. Estimatingand using derivatives


In the manufacturing sciences, deterministic simulators help describe the relationships
between product design, and the manufacturing process to the product's final charac-
teristics. This allows the product to be designed and manufactured efficiently. Equally
important are the effects of uncontrollable variation in the manufacturing parameters
to the end product. If the product's characteristics are sensitive to slight variations in
the manufacturing process, the yield, or percentage of marketable units produced, may
decrease. Furthermore, understanding the sensitivities of the product's characteristics
can help design more reliable products and increase the overall quality of the product.
Many simulators need to solve differential equations and can provide the gradient of
the response at a design point with little or no additional computational cost. However,
some simulators require that the gradient be approximated by a difference equation.
Then the cost of finding a directional derivative at a point is equal to evaluating an
additional point while approximating the total gradient requires p additional runs.
Consider Figure 7 for an example in p = 1 showing the effects of including gradient
information on prediction. The solid lines, Y in Figure 7(a) and Y' in Figure 7(b),
are the true function and it's derivative, respectively, while the long dashed lines are
Kriging predictors Y3 and ~ ' based on n = 3 observations. As expected Y3 goes
through the design points, D = {.2, .5, .8}, but Y3' is a poor predictor of Y'. The
short dashed lines are the n = 3 predictors with derivative information Y3, and Y3t,.
Notice that this predictor now matches Y' and Y at D and the interpolations are over
all much better. The addition of gradient information substantially improves the fits
of both Y and Y~. The dotted lines are the n = 6 predictors Y6 and Y6' and is a fairer
comparison if the derivative costs are equal to the response cost. The predictor Y6 is
a little better on the interior of S but Yr' is worse at x --- 0 than Y3',.
278 J. R. Koehler and A. B. Owen

~f
f l'//.,,,
- -2,/. = .......
>- .... J

jJ

tN.
0

0.0 0.2 0.4 0.6 0.8 1.0


x
(a) The response

/
Q I,.
(5 ,

5-

',.% t-

t%l
O
o,

0.0 0.2 0.4 0.6 0.8 1.0


X
(b) The derivative

A A A
Fig. 7. (a) An example of a response (Y) and three predictors (Ya, Y3, ,Y6). (b) An example of a derivative
(Y') and three predictors (Y3t, Y3~,,Y6~).

The Kriging m e t h o d o l o g y easily extends to m o d e l gradients. To see this for p = 1,


let E[Y(.)] = # and d = t2 - tl, then

Coy [ Y ( t l ) , Y'(t2)] = E [Y(tl)Y'(t2)] - E [ Y ( t l ) ] E [Y'(t2)] .

Now due to the stationarity of Y(.), E[Y'(.)] = 0 and

Coy [Y(h), Y'(t2)] = E [Y(ta)Y'(ta)]


=E[Y(tl)limY(t2+~ -Y(t2)

= E [lim Y(h)Y(t2 + ~) - Y(h)Y(tz)]


k6~O 6
= cr2 lim R(d + 6) - R(d)
&~o 6
= a2R ' (d)
Computer experiments 279

for differentiable R(.). Similarly,

Cov [Y' (tl), Y(t2)] ---- -a2R '(d)

and

Cov [Y'(t,), Y/(t2) ] = -o-2Rtt(d)

For more general p and for higher derivatives, following Morris et al. (1993) let

y(a, .....~P)(t) = atl~,)...at(a,)Y(t)


a~

p
where a = ~ j = l aj and tj is the jth component of t. Then E[Y( a' ..... ap)] = 0 and

p
CoY [y(al .....ap)(tl),y(bl .....bp)(t2)] ~---(--1)a(T2H RSaJ-{"bJ)(t2j- tlj)
j=l
for R(d) = rIj=l
P Rj(dj).
Furthermore, for directional derivatives, let Y~(t) be the directional derivative of
Y(t) in the direction u = ( " 1 , . . . , "p)', ~P=I u2 = 1,

L~Y(t) u
Y'(t) = ~ j = (vY(t),.).
j=l

Then E[Yd(t)] = 0 and for d = t - s,

Coy [Y(s), Y~(t)] = E [Y(s)Y~(t)]


P [ ay(t).]
= EE Y(s)---~--j ~j
j=l
" [r(~),--~,
=~Cov ~r(t)]j .j
j=l
a2 X-~ O/~(d)

= ~2(R(e),-) (9)

where/~(d) = [OR(d)/Odl,..., OR(d)/Odp]'. Similarly,


Coy [r~'(~), Y(t)] = - ~ ( R ( ~ ) , . ) (1o)
280 J. R. Koehler and A. B. Owen

and

Cov Iv" (~), Y" (t)] = -~,',/~(d),~, (11)

where

0ZR(d)
(/~(d))~, = Od~Od,

is the matrix of 2nd partial derivatives evaluated at d.


The Kriging methodology is modified to model gradient information by letting
, = .. y,(~, ...,y,°~(~,)]r
YD [y(Xl ),. , y(Xn), ytUll(Xl), ), '

where uit is the direction of the lth directional derivative at xi. Also let

,* = (,,,,...,#,0,0,... ,0)'

with n Us and m n 0s and let V* be the combined covariance matrix for the design
responses and derivatives with the entries as prescribed above (equations (9), (10),
and (11)). Then
t* .--1
2(~0) = ~ + v~0v (v~ - ~*)

and

9"(x0) = v~0,.v
'* *-' ( ~ , - ~*)

where v~0 '* = C o v [ r ' ( x 0 ) , r 3 ] .


'* = Cov[Y(x0), ¥~], and v~0,,
Notice that once differentiable random functions need twice differentiable corre-
lation functions. One problem with using the total gradient information is the rapid
increase in the covariance matrix. For each additional design point, V* increases by
p + 1 rows and columns. Fortunately, these new rows and columns generally have
lower correlations than the corresponding rows and columns for an equal number of
response. The inversion of V* is more computationally stable than for an equally
sized VD. More research is needed to provide general guidelines for using gradient
information efficiently.

4.6. Complexity of computer experiments


Recent progress in complexity theory, a branch of theoretical computer science, has
shed some light on computer experiments. The dissertation of Ritter (1995) contains an
excellent summary of this area. Consider the case where Y ( x ) = Z ( x ) , that is where
there is no regression function. If for r >/ 1 all of the rrth order partial derivatives of
Z ( x ) exist in the mean square sense and obey a Holder condition of order/3, then it
Computerexperiments 281

is possible (see Ritter et al., 1993) to approximate Z(x) with an L z error that decays
as O(n-(r+~)/P). This error is a root mean square average over randomly generated
functions Z.
When the covariance has a tensor product form, like those considered here, one can
do even better. Ritter et al. (1995) show that the error rate for approximation in this case
is n-r-1/Z(logn) (p-l)(r+l) for products of covariances satisfying Sacks-Ylvisaker
conditions of order r / > 0. When Z is a p dimensional Wiener sheet process, for which
r = 0, the result is n-1/Z(logn) (p-l) which was first established by Wozniakowski
(1991).
In the general case, the rate for integration is n -I/2 times the rate for approxi-
mation. A theorem of Wasilkowski (1994) shows that a rate n -d for approximation
can usually be turned into a r a t e n - d - l ~ 2 for integration by the simple device of
fitting an approximation with n/2 function evaluations, integrating the approxima-
tion, and then adjusting the result by the average approximation error on n/2 more
Monte Carlo function evaluations. For tensor product kernels the rate for integration is
n - r - 1(log n) (P- 1)/2 (see Paskov, 1993), which has a more favorable power of log n
than would arise via Wasilkowski's theorem.
The fact that much better rates are possible under tensor product models than for
general covariances suggests that the tensor product assumption may be a very strong
one. The tensor product assumption is at least strong enough that under it, there is no
average case curse of dimensionality for approximation.

5. Bayesian designs

Selecting an experimental design, D, is a key issue in building an efficient and infor-


mative Kriging model. Since there is no random error in this model, we wish to find
designs that minimize squared-bias. While some experimental design theories (Box
and Draper, 1959; Steinberg, 1985) do investigate the case where bias rather than
solely variance plays a crucial role in the error of the fitted model, how good these
designs are for the pure bias problem of computer experiments is unclear. Box and
Draper (1959) studied the effect of scaling factorial designs by using a first order
polynomial model when the true function is a quadratic polynomial. Box and Draper
(1983) extended the results to using a quadratic polynomial model when the true re-
sponse surface is a cubic polynomial. They found that mean squared-error optimal
designs are close to bias optimal designs. Steinberg (1985) extended these ideas fur-
ther by using a prior model proposed by Young (1977) that puts prior distributions
on the coefficients of a sufficiently large polynomial. However, model (2) is more
flexible than high ordered polynomials and therefore better designs are needed.
This section introduces four design optimality criteria for use with computer exper-
iments: entropy, mean squared-error, maximin and minimax designs. Entropy designs
maximize the amount of information expected for the design while mean squared-error
designs minimize the expected mean squared-error. Both these designs require a priori
knowledge of the correlation function R(.). The design criteria described below are
for the case of fixed design size n. Simple sequential designs, where the location of
282 J. R. Koehler and A. B. Owen

(D to (o to

o o
d d • d d
0.0 0.6 0.0 0.6 0.0 0.6 0.0 0.6
Xl Xl Xl X1
n=l n=2 n=3 n=4

(o ~1 to ¢,o

d d d
0.0 0.6 0.0 0.6 0.0 0.6 0.0 0.6
Xl XI Xl X1
,,1=5 n=6 n=7 n=8

o o
d d d d
0.0 0.6 0.0 0.6 0.0 0.6 0.0 0.6
Xl Xl Xl X1
n=9 n=lO n=11 n=12

o
o- - - d d c5
0.0 0.6 0.0 0.6 0.0 0.6 0.0 0.6
Xl Xl Xl Xl
n=13 n=14 n=15 n=16

Fig. 8(a). Maximumentropy designs for p = 2, n = 1-16, and the Gaussian correlationfunctionwith
0 = (0.5, 0.5).

the r~th design point is determined after the first n - 1 points have been evaluated, will
not be presented due to their tendencies to replicate (Sacks et al., 1989b). However,
sequential block strategies could be used where the above designs could be used as
starting blocks. Depending upon the ultimate goal of the computer experiment, the
first design block might be utilized to refine the design and reduce the design space.

5.1. Entropy designs

Lindley (1956) introduced a measure, based upon Shannon's entropy (Shannon, 1948),
of the amount of information provided by an experiment. This Bayesian measure
uses the expected reduction in entropy as a design criterion. This criterion has been
used in Box and Hill (1967) and Borth (1975) for model discrimination. Shewry and
Wynn (1987) showed that, if the design space is discrete (i.e., a lattice in [0, 1Iv),
then minimizing the expected posterior entropy is equivalent to maximizing the prior
entropy.
Computerexperiments 283

o
. . . . d . . . . . . d
0.0 0.6 0.0 0,6 0.0 0.6 0.0 0.6
Xl Xl Xl Xl
n=l n=2 n=3 n=4

o o
d d N -
0.0 0.6 0.0 0.6 0.0 0.6 0.0 0.6
Xl Xl X1 Xl
n-=5 n=6 n=7 n=8

o
d o . . . . . . . d . . . . . . . .
0.0 0.6 0.0 0.6 0.0 0.6 0.0 0.6
Xl X1 X1 X1
n=9 n=lO n=11 n=12

o
o - o . . . . d . . . . . .
0.0 0.6 0.0 0.6 0.0 0.6 0.0 0.6
Xl Xl Xl Xl
n=t3 n=14 n=15 n=16

Fig. 8(b). Maximum entropy designs for p = 2, n = 1-16, and the Gaussian correlation function with
0 = (2, 2).

DEFINITION 1. A design D E is a Maximum Entropy Design if

Ey [-lnP(YD~)] = m~nEv [ - l n P ( Y D ) ]

where P(YD) is the density of YD.

In the Gaussian case, this is equivalent to finding a design that maximizes the
determinant of the variance of YD. In the Gaussian prior case, where/3 --~ Nk (b, r 2 S ) ,
the determinant of the unconditioned covariance matrix is

ivD + r2HSH, I= VD + T2H~H ' H


0 I

=( -T2ZH t
' °)
T2SH , I
284 J. R. Koehler and A. B. Owen

o o
d • d . . . . d
0.0 0.6 0.0 O.fi 0.0 0.6 0.0 0.6
X1 X1 X1 X1
n=l n=2 n=3 n=4

o o o o
d d d d
0.0 0.6 0.0 0.6 0.0 0.6 0.0 0.6
X1 X1 X1 X1
n=5 n=6 n=7 n=8

o o o o
d . . . . . . . d . . . . . . . d . . . . . . . . d . . . . . . .
0.0 0.6 0.0 0.6 0.0 0.6 0.0 0.6
X1 Xl Xl X1
n=9 n=lO n=11 n=12

o o o o
d d . . . . . ~ d
0.0 0.6 0.0 0.6 0.0 0.6 0.0 0.6
Xl Xl Xl X1
n=13 n=14 n=15 n=16

Fig. 8(c). Maximum entropy designs for p : 2, n : 1-16, and the Gaussian correlation function with
0 = (10, 10).

VD H
-T2ZH ' I

I 0 H
I -~-21JH' I

_vD
-
H
0 T2ZH'V~IH+I
1
= lVol l~2rH'Vp H + 1I
: IVDIIH'V~ 1H + T- z S - ' l l ~ z~[.

Since ~_2S is fixed, the maximum entropy criterion is equivalent to finding the design
D E that maximizes

Iv~l I ~ ' v ~ l H + ~-2~-11 .


Computer experiments 285

If the prior distribution is diffuse, T2 ~ ee, the maximum entropy criterion is equiv-
alent to

IVDI IH'vD ' H I


and if/3 is treated as fixed, then the maximum entropy criterion is equivalent to IVz)I.
Shewry and Wynn (1987, 1988) applied this measure in designs for spatial models.
Currin et al. (1991) and Mitchell and Scott (1987) have applied the entropy measure to
finding designs for computer experiments. By this measure, the amount of information
in experimental design is dependent on the prior knowledge of Z(.) through R(.).
In general, R(.) will not be known a priori. Additionally, these optimal designs are
difficult to construct due to the required n x p dimensional optimization of the n design
point locations. Currin et al. (1991) describe an algorithm adopted from DETMAX
(Mitchell, 1974) which successively removes and adds points to improve the design.
Figure 8(a) shows the optimal entropy designs for p = 2, n = 1 , . . . , 16, R(d) =
e x p { - 0 ~ d~} where 0 = 0.5, 2, 10. The entropy designs tend to spread the points
out in the plane and favor the edge of the design space over the interior. For example,
the n = 16 designs displayed in Figure 8(a) have 12 points on the edge and only 4
points in the interior. Furthermore, most of the designs are similar across the different
correlation functions although there are some differences. Generally, the ratio of the
edge to interior points are constant. The entropy criterion appears to be insensitive
to changes in the location of the interior points. Johnson et al. (1990) indicate that
entropy designs for extremely "weak" correlation functions are in a limiting sense
maximin designs (see Section 5.3).

5.2. Mean squared-error designs


Box and Draper (1959) proposed minimizing the normalized integrated mean squared-
error (IMSE) of Y(x) over [0, 1]v. Welch (1983) extended this measure to the case
when the bias is more complicated. Sacks and Schiller (1988) and Sacks et al. (1989a)
discuss in more detail IMSE designs for computer experiments,

DEFINITION 2. A design D1 is an Integrated Mean Squared-Error (IMSE) design if


J(DI) = n~n J(D)

where

J(Z)) = 1 rio E[Y(x) - Y(x)] 2 dx.

J(D) is dependent on R(.) through Y(x). For any design, J(D) can be expressed as

J ( D ) = a 2 - trace
{I
0 H'
H lid
"h(x)h'(x) h(x)v~
v~h'(x) v~v~
1}dx
286 J. R. Koehler and A. B. Owen

and, as pointed out by Sacks et al. (1989a), if the elements of h(x) and V= are products
of functions of a single input variable, the multidimensional integral simplifies to prod-
ucts of one-dimensional integrals. As in the entropy design criterion, the minimization
of J ( D ) is a optimization in n x p dimensions and is also dependent on R(.).
Sacks and Schiller (1988) describe the use of a simulated annealing method for con-
structing IMSE designs for bounded and discrete design spaces. Sacks et al. (1989b)
use a quasi-Newton optimizer on a Cray X-MP48. They found that optimizing a
n = 16, p = 6 design with 01 . . . . . 06 = 2 took 11 minutes. The PACE program
(Koehler, 1990) uses the optimization program NPSOL (Gill et al., 1986) to solve the
IMSE optimization for a continuous design space. For n = 16, p = 6, this optimiza-
tion requires 13 minutes on a DEC3100, a much less powerful machine than the Cray.
Generally, these algorithms can find only local minima and therefore many random

:o
d d d
0 0

,,¢
(5 d (:5

o o o
6 d d
0.0 0.4 O.B 0.0 0.4 0.8 0.0 0.4 0.8
Xl X1 Xl
n=l n=2 n=3

c0 co 0 0
0 0 0 0
(:5 (5

0 0
(:5 d d
0 0
0 0 0 0
o o 0
(5 (5
0.0 0.4 O.B 0.0 0.4 0.8 0.0 0.4 0.8
Xl Xl Xl
n=4 n=5 n=6

0
0 0 0 0 0 0
co co
0 0 d (5 0

0 0 0
(5 (5 0 0
d
0 0 0
0 0 0 0
o o 0 0
,5 c;
0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 O.B
Xl Xl Xl
n=7 n=B n=9

Fig. 9(a). Minimum integrated mean square error designs for 19 = 2, n = 1-9, and the Gaussian correlation
function with 0 = (.5, .5).
Computer experiments 287

starts are required.


Since J(D) is dependent on R(.), robust designs need to be found for general
R(.). Sacks et al. (1989a) found that for n = 9, p = 2 and R(d) = exp{-O ~=1 ~}
(see Section 4.3.4 for details on the Gaussian correlation function) the IMSE design
for 0 = 1 is robust in terms of relative efficiency. However, this analysis used a
quadratic polynomial model and the results may not extend to higher dimensions nor
different linear model components. Sacks et al. (1989b) used the optimal design for
the Gaussian correlation function with 0 = 2 for design efficiency-robustness.
Figure 9(a) displays IMSE designs for p = 2 and n = 1 , . . . ,9 for 0 = .5, 2, 10.
The designs, in general lie in the interior of S. For fixed design size n, the designs
usually are similar geometrically for different O values with the scale decreasing as
0 increases. They have much symmetry for some values of n, particularly n = 12.
Notice that for the case when n = 5 that the design only takes on three unique values
for each of the input variables. These designs tend to have clumped projections onto

o o o o
d . . . . . . d . . . . d c5
0.0 0.6 0.0 0.6 0.0 0.6 0.0 0.6
X1 Xl X1 Xl
n=l n=2 n=3 n=4

to ~o (o ~o

0
o o 0 o o
d (5 . . . . 6 c5
0.0 0.6 0.0 0.6 0.0 0.6 0.0 0.6
X1 Xl Xl Xl
n=5 n=6 n=7 n=8

d d . . . . . . d c5
0.0 0.6 0.0 0.6 0.0 0.6 0.0 0.6
X1 X1 Xl Xl
n=9 n=lO n=11 n=12

~ ~
t°°°°
0o ~ ~
o o
t0
O0
0
0 0 o o
6 . . . . d
0.0 0.6 0.0 0.6 0.0 0.6 0.0 0.6

Xl Xl Xl Xl
n=13 n=14 n=15 n=16

Fig. 9(b). Minimum integrated mean square error designs f o r p = 2, n = 1-16, and the Ganssian correlation
function with O = (2, 2).
288 J. R. Koehler and A. B. Owen

o o o o
d . . . . . d ~ d
0.0 0.6 0.0 0.6 0.0 0.6 0.0 0.6
Xl Xl Xl X1
n=l n=2 n=3 n=4

o o o o
d 6 ..... c5 d
0.0 0.6 0.0 0.6 0.0 0.6 0.0 0.6
Xl X1 Xl X1
n=5 n=6 n=7 n=8

o o o o
d d d 6
0.0 0.6 0.0 0.6 0.0 0.6 0.0 0.6
X1 Xl Xl Xl
n=9 n=lO n=11 n=12

~o
o
o: l °o
~o
o 1o
oooo°° 0
[ o
~o ~
o
0 00 O ~
o o o o
c~ • • • 6 6 d
0.0 0.6 0.0 0,6 0.0 0.6 0.0 0.6

Xl Xl Xl Xl
n=13 n=14 n=15 n=16

Fig. 9(c). Minimum integrated mean square errordesigns for p = 2, n = 1-16, and the Gaussian correlation
function with 0 = (10, 10).

lower dimension marginals of the input space. Better projection properties are needed
when the true function is only dependent on a subset of the input variables.

5.3. Maximin and minimax designs


Johnson et al. (1990) developed the idea of minimax and maximin designs. These
designs are dependent on a distance measure or metric. Let d(., .) be a metric on
[0, 1]p. Hence Vzl, x2, z3 c [0, 1]p,

d(Xl,Xz)=d(xz, xl),
d(xl,x2) ~ O,
d ( z l , x 2 ) = O ~ x l =x2,
d(Xl,X2) < d(xl,x3) + d(x3,x2).
Computer experiments 289

¢q. 0
0
t'Xl
0 0
× ~. 0
0

o. 0 0
0 . . . . . .

o
0.0 0.4 0.8
Xl
(a) Minimax

o
o
0

0.0 0.4 0.8


Xl
(b) Maximin

Fig. 10. (a) Minimax and (b) Maximin designs for n = 6 and p = 2 with Euclidean distance.

DEFINITION 3. Design DMI is a Minimax Distance Design if

r ~ n m a x d ( x , D) = max d(x, DMI)

where

d(x, D) = min d(x, xo).


xoE D

Minimax distance designs ensure that all points in [0, 1]p are not too far from a
design point. Let d(., .) be Euclidean distance and consider placing a p-dimensional
sphere with radius r around each design point. The idea of a minimax design is to
place the n points so that the design space is covered by the spheres with minimal r.
As an illustration, consider the owner of a petroleum corporation who wants to open
some franchise gas stations. The gas company would like to locate the stations in the
most convenient sites for the customers. A minimax strategy of placing gas stations
would ensure that no customer is too far from one of the company's stations.
Figure 10(a) shows a minimax design for p -- 2 and n = 6 with d(., .) being
Euclidean distance. The maximum distance to a design point is .318. For small n,
minimax designs will generally lie in the interior of the design space.

DEFINITION 4. A design DMA is a Maximin Distance Design if

max min d(x,,x2) = min d(x~,x2).


D Xl~x2EO Xl,fC2EDMA
290 J. R. Koehler and A. B. Owen

Again, let d(., .) be Euclidean distance. Maximin designs pack the n design points,
with their associated spheres, into the design space, S, with maximum radius. Parts
of the sphere may be out of S but the design points must be in S. Analogous to the
minimax illustration above is the position of the owners the gas station franchises.
They wish to minimize the competition from each other by locating the stations as
far apart as possible. A maximin strategy for placing the franchises would ensure that
no two stations are too close to each other.
Figure 10(b) shows a maximin design for p = 2, n = 6 and d(., .) Euclidean
distance. For small n, maximin designs will generally lie on the exterior of S and fill
in the interior as n becomes large.

5.4. Hyperbolic cross points


Under the tensor product covariance models, it is possible to approximate and integrate
functions with greater accuracy than in the general case. One gets the same rates of
convergence as in univariate problems, apart from a multiplicative penalty that is some
power of log n. Hyperbolic cross point designs, also known as sparse grids have been
shown to achieve optimal rates in these cases. See Ritter (1995). These point sets were
first developed by Smolyak (1963). They were used in interpolation by Wahba (1978)
and Gordon (1971) and by Paskov (1993) for integration. Chapter 4 of Ritter (1995)
gives a good description of the construction of these points and lists other references.

6. Frequentist prediction and inference

The frequentist approach to prediction and inference in computer experiments is based


on numerical integration. For a scalar function Y = f ( X ) , consider a regression model
of the form

Y = y(x) - z(x)z (12)

where Z ( X ) is a row vector of predictor functions and fl is a vector of parameters.


Suitable functions Z might include low order polynomials, trigonometric polynomials
wavelets, or some functions specifically geared to the application. Ordinarily Z ( X )
includes a component that is always equal to 1 in order to introduce an intercept term
into equation (12).
It is unrealistic to expect that the function f will be exactly representable as the
finite linear combination given by (12), and it is also unrealistic to expect that the
residual will be a random variable with mean zero at every fixed Xo. This is why
we only write f - Z/3. There are many ways to define the best value of/3, but an
especially natural approach is to choose/3 to minimize the mean squared error of the
approximation, with respect to some distribution F on [0, 1]p. Then the optimal value
for fl is
Computer experiments 291

So if one can integrate over the domain of X then one can fit regression approximations
there.
The quality of the approximation may be assessed globally by the integrated mean
squared error

f (Y - Z(X)fl) 2 dF.

For simplicity we take the distribution F to be uniform on [0, 1]v. Also for simplicity
the integration schemes to be considered usually estimate f g(X)dF by

n
1
n
i:1

for well chosen points x l , . . . , xn. Then ilLS may be estimated by linear regression

fi = n z(xi)'z(xo _1
n Z(xi)tf(xi),
i=l i=1

or when the integrals of squares and cross products of Z's are known by

:
(/ z(X)'Z(X)dF
)-',--
n i:1
Z(xO'f(.O. (13)

Choosing the components of Z to be an orthogonal basis, such as tensor prod-


ucts of orthogonal polynomials, multivariate Fourier series or wavelets, equation (13)
simplifies to

n
1
(14)
i=1

and one can avoid the cost of matrix inversion. The computation required by equation
(14) grows proportionally to nr not n 3, where r = r(n) is the number of regression
variables in Z. If r = O(n) then the computations grow as n 2. Then, in the example
from Section 3, an hour of function evaluation followed by a minute of algebra would
scale into a day of function evaluation followed by 9.6 hours of algebra, instead of
the 9.6 days that an n 3 algorithm would require. If the Z(xi) exhibit some sparsity
then it may be possible to reduce the algebra to order n or order n log n.
Thus the idea of turning the function into data and making exploratory plots can
be extended to turning the function into data and applying regression techniques. The
theoretically simplest technique is to take Xi iid U[0, 1]v. Then (Xi, Yi) are iid pairs
292 J. R. Koehler and A. B. Owen

with the complication that Y has zero variance given X. The variance matrix of ~ is
then

(/)-1 (/)-1
1 Z'ZdF Var(Z(X)'Y(X)) ZtZdF (15)
n

and for orthogonal predictors this simplifies further to

1Var (Z(X)'Y(X)) . (16)


n
Thus any integration scheme that allows one to estimate variances and covariances
of averages of Y times components of Z allows one to estimate the sampling variance
matrix of the regression coefficients/3. For iid sampling one can estimate this variance
matrix by

1 - ,
n-r-1 (Z(xi)Y(xi)-t3) (Z(xi)Y(xi)-fl)
i=1
when the row vector Z comprises an intercept and r additional regression coefficients.
This approach to computer experimentation should improve if more accurate inte-
gration techniques are substituted for the iid sampling. Owen (1992a) investigates the
case of Latin hypercube sampling for which a central limit theorem also holds.
Clearly more work is needed to make this method practical. For instance a scheme
for deciding how many predictors should be in Z, or otherwise for regularizing/3 is
required.

7. Frequentist experimental designs

The frequentist approach proposed in the previous section requires a set of points
x l , . . . , xn that are good for numerical integration and also allow one to estimate the
sampling variance of the corresponding integrals. These two goals are somewhat at
odds. Using an iid sample makes variance estimation easier while more complicated
schemes described below improve accuracy but make variance estimation harder.
The more basic goal of getting points x~ into "interesting corners" of the input
space, so that important features are likely to be found is usually well served by point
sets that are good for numerical integration.
We assume that the region of interest is the unit cube [0, 1]p, and that the integrals
of interest are with respect to the uniform distribution over this cube. Other regions of
interest can usually be reduced to the unit cube and other distributions can be changed
to the uniform by a change of variable that can be subsumed into f .
Throughout this section we consider an example with p = 5, and plot the design
points xi.
Computer experiments 293

1.0

0.8

0.6 -

0.4 -

0.2 -

• • • • •

0.0
J I i r I I
0.0 0.2 0.4 0.6 0.8 1.0

X1

Fig. 11.25 distinct points among 625 points in a 55 grid.

7.1. Grids

Since varying one coordinate at a time can cause one to miss important aspects of f ,
it is natural to consider instead sampling f on a regular grid. One chooses k different
values for each of X I through X p and then runs all k p combinations. This works
well for small values of p, perhaps 2 or 3, but for larger p it becomes completely
impractical because the number of runs required grows explosively.
Figure 11 shows a projection of 55 = 625 points from a uniform grid in [0, 1]5 onto
two of the input variables. Notice that with 625 runs, only 25 distinct values appear
~ n the plane, each representing 25 input settings in the other three variables. Only 5
distinct values appear for each of input variable taken singly. In situations where one
of the responses y k depends very strongly on only one or two of the inputs X j the
grid design leads to much wasteful duplication.
The grid design does not lend itself to variance estimation since averages over
the grid are not random. The accuracy of a grid based integral is typically that of a
univariate integral based on k = n 1/p evaluations. (See Davis and Rabinowitz, 1984.)
For large p this is a severe disadvantage.
294 J. R. Koehler and A. B. Owen

1.0

0.8

0.6

0.4

0.2

0.0 i..!...................................................................................................

I I I I I I
0.0 0.2 0.4 0.6 0.8 1.0

Xl

Fig. 12. A 34 point Fibonacci lattice in [0, 1]2.

7.2. Good lattice points


A significant improvement on grids may be obtained in integration by the method of
good lattice points. (See Sloan and Joe (1994) and Niederreiter (1992) for background
and Fang and Wang (1994) for applications to statistics.)
For good lattice points

j {hi(i-i)+0.5}_
X i -~ ?~

where {z} is z modulo 1, that is, z minus the greatest integer less than or equal to z
and hj are integers with hi = 1. The points vi with v~ = ihj/n for integer i form
a lattice in R p. The points xi are versions of these lattice points confined to the unit
cube, and the term "good" refers to a careful choice of n and hj usually based on
number theory.
Figure 12 shows the Fibonacci lattice for p = 2 and n = 34. For more details
see Sloan and Joe (1994). Here hi = 1 and h2 = 21. The Fibonacci lattice is only
available in 2 dimensions. Appendix A of Fang and Wang (1994) lists several other
choices for good lattice points, but the smallest value of n there for p = 5 is 1069.
Computer experiments 295

Hickernell (1996) discusses greedy algorithms for finding good lattice points with
smaller n.
The recent text (Sloan and Joe, 1994) discusses lattice rules for integration, which
generalize the method of good lattice points. Cranley and Patterson (1976) consider
randomly perturbing the good lattice points by adding, modulo 1, a random vector
uniform over [0, 1]p to all the xi. Taking r such random offsets for each of the n data
points gives nr observations with r - 1 degrees of freedom for estimating variance.
Lattice integration rules can be extraordinarily accurate on smooth periodic inte-
grands and thus an approach to computer experiments based on Cranley and Patter-
son's method might be expected to work well when both f ( x ) and Z ( x ) are smooth
and periodic. Bates et al. (1996) have explored the use of lattice rules as designs for
computer experiments.

7.3. Latin hypercubes


While good lattice points start by improving the low dimensional projections of grids,
Latin hypercube sampling starts with iid samples. A Latin hypercube sample has

X~ - ~rJ(i) - Uj (17)
n

where the 7rJ are independent uniform random permutations of the integers 1 through
n, and the U~ are independent U[0, 1] random variables independent of the 7rj.
Latin hypercube sampling was introduced by McKay et al. (1979) in what is widely
considered to be the first paper on computer experiments. The sample points are
stratified on each of p input axes. A common variant of Latin hypercube sampling
has centered points

= J(i) - 0 . 5 (18)
n

Point sets of this type were studied by Patterson (1954) who called them lattice
samples.
Figure 13 shows a projection of 25 points from a (centered) Latin hypercube sample
over 5 variables onto two of the coordinate axes. Each input variable gets explored
in each of 25 equally spaced bins.
The stratification in Latin hypercube sampling usually reduces the variance of es-
timated integrals. Stein (1987) finds an expression for the variance of a sample mean
under Latin hypercube sampling. Assuming that f f ( X ) 2 d F < co write

p
f(x) = , + j(xJ) + e ( x ) (19)
j=l
296 J. R. Koehler and A. B. Owen

i i i i i i i i i i i i i i ~ i i ~ i i i ~ i i l l
~ . o . . . . . !~..?.~!-..t.....i~.~..~..~..i.....!....~i.....i..~.!.~.i~.~..i..~..i~...!~..?~.~..i~.i~.~.!~..~.?.i~.~.....~....~!~....~ .....
.... ,.....,.....,.....,.....,....,.....,.....,....,.....,.....,.....,....,.....,.....,....,.....,.....,....,....,.....,.....,....,.....,.....,....,.
.... !i---!v! ......... V T i ! T I T V V V ! - ! i i V V V I - i - ~ ~
! ! i ! i i ! i ! ! i ! ! ! ! ! ! i ! ! i ! ! ! O !
i i i i !e! i ! i i i i i i i i i i i i i ! i i i !
.....! i " V ! - i V'T'! I-'V!" ; - : i " i - V ? - i T~ I!Fi?IT~ .....
.... i~.?....~..`..~...?....!~.~.i....~?.....~...~.!~.~..i..~..~...~....i....~..~.4.~..!.....i~..@...~.~"i..~.~.i..~..!.~..~?.....?
i ! i ! i i O ! ! i i l ! i ~ i ! ~ ! ! i i ! : i : :
! ! ! ! i i ! i ! i i ! i ! ! i ! ~ [ ! ! O i ! i ! i
....i T V T TVVIqi~ T r V T K i IKTI[VIV; 2
....~ ~ - V T !i TT ! Ti-~ i-V-~ii-T-TTT-i V!T~~
0,6 .... !~....+..~.~.~..~.....~.~..~`.~.-~..~..~...~*..~.~.....~.~...~....~.....~...~.~.".~....`~.~...~..~.*..-.~....:.....~...~.~..~....~....~
! ! i i ! ! i o ! ! i l i ! i ~ i i i ! i i i ! ! ~ .....
....!'"'?'"'!'""!'""?'"'?'"+ "'"~""?'"'!'""!'""?'"'~'-"'['""~'"'!"'"'!'""!""'~"-?'"'!'"'?'"'!'""!'""~'"'?
i i i i i i i O i i i i i i ~ i i ~ i i i ~ i i l l
cM .... ~;~.~.:...-.~.....~.~..;~...:`....~.~.~..~`*...~....4.....~..~...~:~...~....~.~.~.....~.-.*.~..~...~.:~....~.""....~.~..~...*
i ! i ! i i l l l i l l l i i i i i i i i i O i l l ....
X i !Oii i i ! i ! i i i i i i i i i i ! ! i i i i i
! ! ! i i i ! ! ! ! i ! ! i ! i ! i i i ! i l l ! O i
~.4-...+...+....i....+....i....i~....i.....i...~.i...~.i.....i.....i....i.....i.....i....i.....i.....i....i....~.....i....~....i.....i".~.~....~.
: : : : : : : : : : : : : : : : : : : : : : : : : : :
....[ V V T VVT!! V!!-Vi VVT io~ ~ i ! ~ ~ ~ ~
i ! i ioli ! i i i i ! i i ! i ! i i i i i ! i i !
....I K V T - V ! VVTVTriVT V i! K! VTKVV"! ~
....i i V T T V V T T V i ' f V T T V T i i TVT VVT K7
0.2 ~ -?'"'?"
: : : ~'""i
: : : - :?'"!--
: : ~-
: -'~'"
: : :~ : ~'i'i'"
: : : ?'-'~"'"!
: : : : --~: ""~"-
: : -~: : "~'"'y ~ --"~'"-! -- ~- ""!"-" ~---i
i ! i ! i i i ! ! i ! i i i ! ~ i i ! i o ! ! i ! i !
...., , , . , . , . , . . , , . , . , , . , . , , . , , , , , , , , , , , ,
....i - - T - - i - T T i T - i V ! T ! ! ~ T !VTTTVVVi VF!
i i ! ! ! ! i ! i ! i ! i l i ~ O i ! ! ! ! i ! [ ! i
oo 2!i-i ??i!ii?iiiiiiilH?iHiii?i-i
I I I I I I
0.0 0.2 0.4 0.6 0.8 1.0

X1

Fig. 13.25 points of a Latin hypercube sample. The range of each input variable may be partitioned into
25 bins of equal width, drawn here with horizontal and vertical dotted lines, and each such bin contains
one of the points.

where tz = f f(X)dF and aj(x) = fx:xJ=~(f(X) -#)dF_j in which dF_j =


IIk#j dXk is the uniform distribution over all input variables except the j'th. Equa-
tion (19) expresses f as the sum of a grand mean p, univariate main effects aj and
a residual from additivity e(X).
Stein shows that under Latin hypercube sampling

Var(l£ ) 1 Ie(X)2dF+o

whereas under iid sampling

V a r ( - f ( x~ ni ) l
~/~ i=l
) (/ = _1
n
e(X) 2dF + ~
j=l
o~j(XJ)2 d F ) . (21)

By balancing the univariate margins, Latin hypercube sampling has removed the main
effects of the function f from the error variance.
Computer e3qoeriments 297

Owen (1992a) proves a central limit theorem for Latin hypercube sampling of
bounded functions and Loh (1993) proves a central limit theorem under weaker con-
ditions. For variance estimation in Latin hypercube sampling see (Stein, 1987; Owen,
1992a).

7.4. Better Latin hypercubes


Latin hypercube samples look like random scatter in any bivariate plot, though they
are quite regular in each univariate plot. Some effort has been made to find especially
good Latin hypercube samples.
One approach has been to find Latin hypercube samples in which the input variables
have small correlations. Iman and Conover (1982) perturbed Latin hypercube samples
in a way that reduces off diagonal correlation. Owen (1994b) showed that the technique
in Iman and Conover (1982) typically reduces off diagonal correlations by a factor
of 3, and presented a method that empirically seemed to reduce the off diagonal
correlations by a factor of order n from O(n -1/2) to O(n-3/2). This removes certain
bilinear terms from the lead term in the error. Dandekar (1993) found that iterating
the method in Iman and Conover (1982) can lead to large improvements.
Small correlations are desirable but not sufficient, because one can construct cen-
tered Latin hypercube samples with zero correlation (unless n is equal to 2 modulo 4)
which are nonetheless highly structured. For example the points could be arranged in
a diamond shape in the plane, thus missing the center and comers of the input space.
Some researchers have looked for Latin hypercube samples having good properties
when considered as designs for Bayesian prediction. Park (1994) studies the IMSE
criterion and Morris and Mitchell (1995) consider entropy.

7.5. Randomized orthogonal arrays


An orthogonal array A is an n by p matrix of integers 0 ~< A~ ~< b - 1. The array has
strength t ~< p if in every n by t submatrix of A all of the bt possible rows appear
the same number A of times. Of course n = Abt.
Independently Owen (1992b, 1994a) and Tang (1992, 1993) considered using or-
thogonal arrays to improve upon Latin hypercube samples.
A randomized orthogonal array (Owen, 1992b) has two versions,

X [ = 7cj(A~) + U~
b (22)

and

X ] = ~'J(AJ) + 0.5
b (23)

just as Latin hypercube sampling has two versions. Indeed Latin hypercube sampling
corresponds to strength t = 1, with A = 1. Here the 7rj are independent uniform
298 J. R. KoehIer and A. B. Owen

1.0

0.8

Q
0.6

|
X

0.4

0.2

0.0

0.0 0.2 0.4 0.6 0.8 1.0

X1

Fig. 14. 25 points of a randomly centered randomized orthogonal array. For whichever two (of five) variables
that are plotted, there is one point in each reference square.

permutations of 0 , . . . , b - 1. Patterson (1954) considered some schemes like the


centered version.
If one were to plot the points of a randomized orthogonal array in t or fewer of the
coordinates, the result would be a regular grid. The points of a randomized orthogonal
array of strength 2 appear to be randomly scattered in 3 dimensions.
Figure 14 shows a projection of 25 points from a randomly centered randomized
orthogonal array over 5 variables onto two of the coordinate points. Each pair of
variables gets explored in each of 25 square bins. The plot for the centered version of
a randomized orthogonal array is identical to that for a grid as shown in Figure 11.
The analysis of variance decomposition used-above for Latin hypercube sampling
can be extended to include interactions among 2 or more factors. See Efron and Stein
(1981), Owen (1992b) and Wahba (1990) for details. Gu and Wahba (1993) describe
how to estimate and form confidence intervals for these main effects in noisy data.
Owen (1992b) shows that main effects and interactions of t or fewer variables do
not contribute to the asymptotic variance of a mean over a randomized orthogonal
array, and Owen (1994a) shows that the variance is approximately n -1 times the sum
of integrals of squares of interactions among more than t inputs.
Computer experiments 299

1.0
i iitil iii+iii.ltll ii
.... r-. --~--+--+--,----+.-i---+--+.---,--.+------+-.--~
. . . . . . . . . . . . t.... i'""i"'-!'"
.... iiil
~" ...............
..... ~ " ' ~ : ' " " ~ ....................................... i T o T i ......... i T T ! ........... F T T - [

0.8
..... :....,~.....~,....~ .......... ~.....~..,.~.....~ ................................ . . ,

.... :.---,~.--.-i.--..:- ......... ~----+.--+....~ .......... :--.--~---.-i..---:- ....

....." ' T ' ! ' i .... ! ' " - ~ " T ' ! .......... ! ' ! ' " " ! " " ~

0.6

..... : ' " ' : ' " " : ' " " ~ .......... W ' " ~ ' " ' T ' " ' ! ...... . • ' ," .......... : ' " ' : ' " " : ' " " ~

C'kl i i i i ! i ! i : ! : :, i i ! i i i i i
x .....:~
.....
.. . . .". ~-. . .-. ..~~. , . .". : - r. . . . ... . ... ..... . . . . ... ~!. .-. . .;~..-. . ,-. : .r. . .~. ~. . .. . . . .. . . . :r... -. . . -. : , -. . . -+ . ,~. . ~ - ~
.......... i] ..........................
i
. . . . . . . .
i i i •
~....+...,:.....~ iilZii?51iiSi
.....

:: :: i :: i :: i i i :: i i | i i i i i :. :.o!
0.4
~ ~ ~ ~ ~ ~ i ~ .... ..o.~
-, •~ •i .~
..... i~ ! " ' +! ~ ' " ~i " ! +! - ~ " +! + ~ 'i " ~ " '!~ ' + ' " ~ ii1! ! i f'-i-'".:'"i .......... F'"TT-"i

......................
.................................
i i i l -
.... T I T ~
1"'2 W'" "'!'"'!"'"'i"'"'1""'?""!"'"'['""~

......... i i i i
..........
........... t i i i
....
i!iiiiii
i~ii
iiiiiiil
',Ùi~
I
0.2
.....
~ i
:.....~.....~.....:
~ i ..........
:: i
~...._~.....:
:: i_~
.....
i :; i i i i i ! i :: :: i
. . . .
..... i . . . . . . .
........................... " , , , ,
i i i i
.......... I--T-Y1
.....iiiiiiiiiiiiiiiiiilLiiiiiiiiiiiilL~iiiiilL.iiiiiiiilSii'.iiiiii..! ~...............................
,~r~ ............................
i...~_~...~
i ! i ! i ! ! i o! i i i

0.0
i i :, i i ': i ~ i

0.0 0.2 0.4 0.6 0.8 1.0

X1

Fig. 15. 25 points of an orthogonal array based Latin hypercube sample. For whichever two (of five)
variables that are plotted, there is one point in each reference square bounded by solid lines. Each variable
is sampled once within each of 25 horizontal or vertical bins.

Tang (1993) introduced orthogonal array based Latin hypercube samples. The points
of these designs are Latin hypercube samples X~, such that [bX]J is an orthogonal
array. Here b is an integer and [zJ is the smallest integer less than or equal to z. Tang
(1993) shows that for a strength 2 array the main effects and two variable interactions
do not contribute to the integration variance.
Figure 15 shows a projection of 25 points from an orthogonal array based Latin
hypercube sample over 5 variables onto two of the coordinate points. Each variable
individually gets explored in each of 25 equal bins and each pair of variables gets
explored in each of 25 squares.

7.6. Scrambled nets


Orthogonal arrays were developed to balance discrete experimental factors. As seen
above they can be embedded into the unit cube and randomized with the result that
sampling variance is reduced. But numerical analysts and algebraists have developed
some integration techniques directly adapted to balancing in a continuous space. Here
we describe (t, m, s)-nets and their randomizations. A full account of (t, m, s)-nets
300 J. R. Koehler and A. B. Owen

! ! ! i i i ! i i i ~ i : : : : ! ! i i
1.0

TT"T'i .......... i""~'"Ti .... . , . • " ' ' '


-i--.,.i.---i---,.il~ i i ......... i"V~'"'i'"
! i i ! i i"'i'""ii i. . .! . .!i . .~i . i~ i~ , ! i!

i i i i i ! i l . . . . i i i ~ : : : :
0.8 : : : : : : : : : : : : : : : : 1 : : :

.:.....~.,...~....,~ ....

: : : : J i l l : : : : i ! i l i i l l

0.6 : : : : : : : : : : : : : : : : . . . .

..... ! ~ i i ~ ~ ~ ; ~ ~ ~ ~ i i i i ..... i--i..--.!-~:


. . . . ....
..... ~-'"i"-!"-"i"--
. . . .
D-"i'---!--"i'"'-i
: : : :
.... ~'-"i---!--'"i
. . . .
:: ,
:: .
: : ;: :
;: : ; i
X ..... ? i ? ~ .......... i V T i . . . . . . . . ~. .? . ~ . ! ? ei i ~ i : : : :
..... :.,...~.....~.....: .....
: : : : : : : : : : : : : : : : : : : :

0.4 ! i ! i i i l l . . . .
..... i, !....-i--~ : : : : ..... !..i....i~,i ..... ~ ! i i !. . . !. i ~

..... !-~t-~ ......... i i-ii j i ; ; ; . . . . ~ ~ ~ ~


~....~.....i...m : : : : : : : : : : : : ..... ~.,..:.....:.....~
J i l l i ~ i l i ; i ~ i l i ~ ~ i ! i O
: : : : : : : : : : : : . . . . : : : :
0.2 : : : : : : : : : : : : : : : : : : : :

..... !.....i.....i...,.i...i . . . . : : : : : : : : : : : :
: : : : : : : : : : : : . . . . . . . .
..... :i"--:-'"!
: : :
r . :. : . : . . . . .
: . . . . . .
"'i":i-
. .
o"i
. .
......
. .
i -i---i'" ~i
..... :i- :
~.--.~-~
: :
.......... !.+-~--.~.--
: : : :
~i,-.i---i-.--i
: : : :
..... i: :
i: :
i i . . . .

: : : : . . . . : : : : : 1 : : : : : :
0.0 ~ : : : : . . . . . . . . . . . .

0.0 0.2 0.4 0.6 0.8 1.0

X1

Fig. 16. 2 5 points o f a s c r a m b l e d (0, 2, 5 ) - n e t in b a s e 5. F o r w h i c h e v e r t w o ( o f five) variables that are


plotted, there is o n e point in e a c h r e f e r e n c e square. E a c h variable is s a m p l e d o n c e w i t h i n e a c h o f 2 5 e q u a l
bins.

is given by Niederreiter (1992). Their randomization is described by Owen (1995,


1996a).
Let p = s >~ 1 and b >~ 2 be integers. An elementary subcube in base b is of the
form

E= ~-j, bkj
j=l

for integers kj, cj with kj > / 0 and 0 ~< cj < b k~ .


Let m >~ 0 be an integer. A set of points Xi, i = 1 , . . . , bm, of from [0, 1) s is
a (0, m, s)-net in base b if every elementary subcube E in base b of volume b -,'~
has exactly 1 of the points. That is, every cell that "should" have one point of the
sequence does have one point of the sequence.
This is a very strong form of equidistribution and by weakening it somewhat, con-
structions for more values of s and b become available. Let ~ ~< m be a nonnegative
integer. A finite set of bm points from [0, 1) s is a (t, m, s)-net in base b if every ele-
mentary subcube in base b of volume b~-'~ contains exactly bt points of the sequence.
Computer experiments 301

1.0
! ~ ; !o :: e i i i i i ; . . . .
:: :: i ' .oL...i....L...L... o:: :: i i . . . . . ; ~ !
rii"! ........... ~ ; ! ~ ..... : : : ......... ~.----~...i---- .......... ----÷---+.--÷ .....
. . . . . . . . i.--e--...::--~ ..... !o ~ ~ ~ ~ ~ ~ ~ ' ' ~ !
.... :.-'--~-----r'----~. . . . : . .~ . .: . .: ~ ~. ' ". ! ' ." ?
. .........
= '.' "." *.' ~. ..................... i 0 " ! .....
-~-i-----i-...-i----.i ......... i----!,--~-i .......... ~--..i..-..i-..i... ,---i.....i--.-.~.-i
. . . . .....
0.8 . . . . ': :~ :~ ": i i i i ~... ~ !-=

..... i - i ........ .......... i i!io !ii' ......... .....


......~..--~o~ -:- . . . . . . . . ~-..~ ---~.--~.........i ' i i ~..........i i i i ! °
i. .i. . . . .
I-----i ........... ~i ............. ~ i :: e :: -----i---0----~.---~-----
i~'---~--"~---"~
. . . . .....
. . . . i !
..... i ! " "i - !"" " ~'" ~ "~" " ~........ % ' ~ " i " - i " ," " ~'"*i " ~ ~ ..................................
0.6
: : : :
..... ~-.'--!-----~'i-~ ........ !---'-~--'-'~-----~
. . . . . . . . . . .o i ~ ' " " " " : ~ !
~ :
~ ........ i " " ! ' ! " ~ " i
" : : ..... .i..-.i..-.-i.-i
. . .
......

.............................
i io:: i ~ i ! i i ........i i i i l " i " ~ - ÷ ~ =: : :................................
~ ~ ~
. . . .
..... !.....~.....i.....~i.~.....:.....:....~.....i ..... ----.i----i-oi----.i . . . . . . . . . . . . . !.-~---.-i--.-e ....
.....i.. ~. ..-..! -. : = ..........! ~ ~. ~. i . .........
. . i-i-..i----i"
. . , • ~
--ii-~- i............................
i i i i
0.4
O! i :: ~ : : ' ' " " ' " '
.....i.--!.---i.--i
.... .........!--.i--~.--i.........i..i....i=..!:~
i o=: i........ i-=i--'-i
.... ..... --~-!-~-!ii!" ....
......~,---i-i--"i--i~
... ~ '~ ~• : i ~ ~ ! i .... i.L..L..LJ .-r --i -i-----'~.....
..... 7i,T"; ...... !--i---.-i-----i.--, ~. ~ ..... i..--i--.--i-----i ....
..... :.....L...2.....L .......... :.....:....:.....: ..... : : : " . . . . ~ ~ ~ ~•
• ~ ~ ~ i i O i ......i-oi---i---.~ ....... e i - ~ - ! .......... ~ - ~ ~ - ....
0.2
..... ~.....~.....~.....~....~ ....i...i..~.i.....i ...................... ; .......... i ....i.....i.....i..... i ................................
. . . . ......
..... ~ ' " ~ " i O ~ - ' " ~ " ! ' " " ' ! - ' i . . . ~!"'-i"'"~
. .... ] -i-i i i . ....... ., . ' ' ....
..... I T O " , .......... : : : : i !o i i ~I-7~ .......... r " - - ! - - ! ....
" 'i i i i .... i . . . . . .~. . . . .i,. . . . . . . . . . . . . . . . . . . . . . .
I 'ii°i 'i ........ ; '° ~; ....
0.0

0.0 0.2 0.4 0.6 0.8 1.0

X1

Fig. 17, The 125 points of a scrambled (0, 3, 5)-net in base 5. For whichever two (of five) variables that
are plotted, the result is a 5 by 5 grid of 5 point Latin hypercube samples. Each variable is sampled once
within each of 125 equal bins. Each triple of variables can be partitioned into 125 congruent cubes, each
of which has one point.

C e l l s that " s h o u l d " h a v e bt p o i n t s do h a v e bt points, t h o u g h cells that " s h o u l d " h a v e


1 p o i n t m i g h t not.
B y c o m m o n u s a g e the n a m e (t, m , s ) - n e t a s s u m e s that the letter s is u s e d to d e n o t e
the d i m e n s i o n o f the i n p u t space, t h o u g h o n e c o u l d s p e a k o f (t, m , p)-nets. A n o t h e r
c o n v e n t i o n to n o t e is that the s u b c u b e s are h a l f - o p e n . T h i s m a k e s it c o n v e n i e n t to
partition the i n p u t s p a c e into c o n g r u e n t s u b c u b e s .
T h e b a l a n c e p r o p e r t i e s o f a (t, ra, s ) - n e t are .greater than those o f an o r t h o g o n a l
array. I f X { is a (t, m , s ) - n e t in b a s e b then [ b X ] J is an o r t h o g o n a l array o f strength
r a i n { s , m - t}. B u t the net also has b a l a n c e p r o p e r t i e s w h e n r o u n d e d to d i f f e r e n t
p o w e r s o f b on all axes, so l o n g as the p o w e r s s u m to no m o r e than m - t. T h u s the
net c o m b i n e s aspects o f o r t h o g o n a l arrays a n d m u l t i - l e v e l o r t h o g o n a l arrays all in o n e
p o i n t set.
In the c a s e o f a (0, 4, 5 ) - n e t in base 5, one has 625 p o i n t s in [0, 1) 5 a n d o n e can
c o u n t that there are 4 3 7 5 0 e l e m e n t a r y s u b c u b e s o f v o l u m e 1 / 6 2 5 o f v a r y i n g a s p e c t
ratios e a c h o f w h i c h has o n e o f the 625 points.
302 J. R. Koehler and A. B. Owen

1.0

0.8

0.6

t'M
X

0.4

0.2

0.0

0.0 0.2 0.4 0.6 0.8 1.0

X1

Fig. 18. The 625 points of a scrambled (0, 4, 5)-net in base 5. For whichever two (of five) variables that
are plotted, the square can be divided into 625 squares of side 1/25 or into 625 rectangles of side 1/5 by
1/125 or into 625 rectangles of side 1/125 by 1/5 and each such rectangle has one of the points. Each
variable is sampled once within each of 625 equal bins. Each triple of variables can be partitioned into 625
hyperrectangles in three different ways and each such hyperrectangle has one of the points. Each quadruple
of variables can be partitioned into 625 congruent hypercubes of side 1/5, each of which has one point.

For t >~ 0, an infinite sequence (Xi)i~>I of points from [0, 1) s is a (t, s)-sequence
in base b if for all k >~ 0 and m >~ t the finite sequence ~ ~(k+l)b'~ is a (t, ra, s)-net
( X iJi=kb'-+l
in base b.
The advantage of a (t, s)-sequence is that if one finds that the first br~ points are not
sufficient for an integration problem, one can find another b~ points that also form a
(t, ra, s)-net and tend to fill in places not occupied by the first set. If one continues
to the point of having b such (t, m, s)-nets, then the complete set of points comprises
a (t, m + 1, s)-net.
The theory of (t, re, s)-nets and (t, s)-sequences is given in Niederreiter (1992). A
famous result of the theory is that integration over a (t, ra, s)-net can attain an accuracy
of order O(log(n)s-l/n) while restricting to (t, s)-sequences raises this slightly to
O ( l o g ( n ) S / n ) . These results require that the integrand be of bounded variation in the
sense of Hardy and Krause. For large s, it takes unrealistically large n for these rates
Computer experiments 303

to be clearly better than rz-1/2 but in examples they seem to outperform simple Monte
Carlo.
The construction of (t, m, s)-nets and (t, s)-sequences is also described in Nieder-
reiter (1992). Here we remark that for prime numbers s a construction by Faure (1982)
gives (0, s)-nets in base s and Niederreiter extended the method to prime powers s.
(See Niederreiter, 1992.) Thus one can choose b to be the smallest prime power greater
than or equal to s and use the first s variables of the corresponding (0, b)-sequence
in base b.
Owen (1995) describes a scheme to randomize (t, m, s)-nets and (t, s)-sequences.
The points are written in a base b expansion and certain random permutations are
applied to the coefficients in the expansion. The result is to make each permuted Xi
uniformly distributed over [0, 1) s while preserving the (t, m, s)-net or (t, s)-sequence
structure of the ensemble of X~. Thus the sample estimate n -I ~i~=1f(X~) is unbi-
ased for f f(X) d F and the variance of it may be estimated by replication. On some
test integrands in (Owen, 1995) the randomized nets outperformed their unrandom-
ized counterparts. It appears that the unscrambled nets have considerable structure,
stemming from the algebra underlying them, and that this structure is a liability in
integration.
Figure 16 shows the 25 points of a scrambled (0, 2, 5)-net in base 5 projected onto
two of the five input coordinates. These points are the initial 25 points of a (0, 5)-
sequence in base 5. This design has the equidistribution properties of an orthogonal
array based Latin hypercube sample. Moreover every consecutive 25 points in the
sequence X25a+l, X25a+z, • • •, Xzs(~+l) has these equidistribution properties. The first
125 points, shown in Figure 17 have still more equidistribution properties: any triple
of the input variables can be split into 125 subcubes each with one of the Xi, in any
pair of variables the points appear as a 5 by 5 grid of 5 point Latin hypercube samples
and each individual input variable can be split into 125 cells each having one point.
The first 625 points, are shown in Figure 18.
Owen (1996a) finds a variance formula for means over randomized (t, m, s)-nets
and (t, s)-sequences. The formula involves a wavelet-like anova combining nested
terms on each coordinate, all crossed against each other. It turns out that for any
square integrable integrand, the resulting variance is o(n -1) and it therefore beats any
of the usual variance reduction techniques, which typically only reduce the asymptotic
coefficient of n -1.
For smooth integrands with s = 1, the variance is in fact O(n -3) and in the general
case Owen (1996b) shows that the variance is O(rz-3(logn)S-1).

8. Selected applications

One of the largest fields using and developing deterministic simulators is in the de-
signing and manufacturing of VLSI circuits. Alvarez et al. (1988) describe the use of
SUPREM-III (Ho et al., 1984) and SEDAN-II (Yu et al., 1982) in designing BIMOS
devices for manufacturability. Aoki et al. (1987), use CADDETH a two dimensional
device simulator, for optimizing devices and for accurate prediction of device sensitiv-
ities. Sharifzadeh et al. (1989) use SUPREME-III and PISCES-II (Pinto et al., 1984)
304 J. R. Koehler and A. B. Owen

to compute CMOS device characteristics as a function of the designable technology


parameters. Nasif et al. (1984) describe the use of FABRICS-II to estimate circuit
delay times in integrated circuits.
The input variables for the above work are generally device sizes, metal concentra-
tions, implant doses and gate oxide temperatures. The multiple responses are threshold
voltages, subthreshold slopes, saturation currents and linear transconductance although
the output variables of concern depend on the technology under investigation. The en-
gineers use the physical/numerical simulators to assist them in optimizing process,
device, and circuit design before the costly step of building prototype devices. They
are also concerned with minimizing transmitted variability as this can significantly re-
duce the performance of the devices and hence reduce yield. For example, Welch et al.
(1990), Currin et al. (1991) and Sacks et al. (1989b) discuss the use of simulators to
investigate the effect of transistor dimensions on the asynchronization of two clocks.
They want to find the combination of transistor widths that produce zero clock skews
with very small transmitted variability due to uncontrollable manufacturing variability
in the transistors.
TIMS, a simulator developed by T. Osswald and C. L. Tucker III, helps in optimiz-
ing a compression mold filling process for manufacturing automobiles (Church et al.,
1988). In this process a sheet of molding compound is cut and placed in a heated mold.
The mold is slowly closed and a constant force is applied during the curing reaction.
The controlling variables of the process are the geometry and thickness of the part,
the compound viscosity, shape and location within the charge, and the mold closing
speed. The simulator then predicts the position of the flow front as a function of time.
Miller and Frenklach (1983) discuss the use of computers to solve systems of
differential equations describing chemical kinetic models. In their work, the inputs
to the simulator are vectors of possibly unknown combustion rate constants and the
outputs are induction-delay times and concentrations of chemical species at specified
reaction times. The objectives of their investigations are to find values of the rate
constants that agree with experimental data and to find the most important rate constant
to the process. Sacks et al. (1989a) explore some of the design issues and applications
to this field.
TWOLAYER, a thermal energy storage model developed by Alan Solomon and
his colleagues at the Oak Ridge National Laboratory, simulates heat transfer through
a wall containing two layers of different phase change material. Currin et al. (1991)
utilize TWOLAYER in a computer experiment. The inputs into TWOLAYER are the
layers dimensions, the thermal properties of the materials and the characteristics of the
heat source. The object of interest was finding the configuration of the input variables
that produce the highest value of a heat storage utility index.
FOAM (Bartell et al., 1981) models the transport of polycyclic aromatic hydro-
carbon spills in streams using structure activity relationships. Bartell et al. (1983)
modified this model to predict the fate of anthracene when introduced into ponds.
This model tracks the "evaporation and dissolution of anthracene from a surface slick
of synthetic oil, volatilization and photolytic degradation of dissolved anthracene,
sorption to suspended particulate matter and sediments and accumulation by pond
biota" (Bartell, 1983). They used Monte Carlo error analyses to assess the effect of
the uncertainty in model parameters on their results.
Computer experiments 305

References

Alvarez, A. R., B. L. Abdi, D. L. Young, H. D. Weed, J. Teplik and E. Herald (1988). Application of
statistical design and response surface methods to computer-aided VLSI device design. IEEE Trans.
Comput. Aided Design 7(2), 271-288.
Aoki, Y., H. Masuda, S. Shimada and S. Sato (1987). A new design-centering methodology for VLSI
device development. IEEE Trans. Comput. Aided Design 6(3), 452-461.
Bartell, S. M., R. H. Gardner, R. V. O'Neill and J. M. Giddings (1983). Error analysis of predicted fate of
anthracene in a simulated pond. Environ. Toxicol. Chem. 2, 19-28.
Bartell, S. M., J. P. Landrum, J. P. Giesy and G. J. Leversee (1981). Simulated transport of polycyclic
aromatic hydrocarbons in artificial streams. In: W. J. Mitch, R. W. Bosserman and J. M. Klopatek, eds.,
Energy and Ecological Modelling. Elsevier, New York, 133-143.
Bates, R. A., R. J. Buck, E. Riccomagno and H. P. Wynn (1996). Experimental design and observation for
large systems (with discussion). J. Roy. Statist. Soc. Sen. B 58(1), 77-94.
Borth, D. M. (1975). A total entropy criterion for the dual problem of model discrimination and parameter
estimation. J. Roy. Statist. Soc. Ser. B 37, 77-87.
Box, G. E. P. and N. R. Draper (1959). A basis for the selection of a response surface design. J. Amer.
Statist. Assoc. 54, 622-654.
Box, G. E. P. and N. R. Draper (1963). The choice of a second order rotatable design. Biometrika 50,
335-352.
Box, G. E. P. and W. J. Hill (1967). Discrimination among mechanistic models. Technometrics 9, 57-70.
Church, A., T. Mitchell and D. Fleming (1988). Computer experiments to optimize a compression mold
filling process. Talk given at the Workshop on Design for Computer Experiments in Oak Ridge, TN,
November.
Cranley, R. and T. N. L. Patterson (1976). Randomization of number theoretic methods for multiple
integration. SlAM J. Numer. Anal 23, 904-914.
Cressie, N. A. C. (1986). Kriging nonstationary data. J. Amen. Statist. Assoc. 81, 625-634.
Cressie, N. A. C. (1993). Statistics .for Spatial Data (Revised edition). Wiley, New York.
Currin, C., M. Mitchell, M. Morris and D. Ylvisaker (1991). Bayesian prediction of deterministic functions,
with applications to the design and analysis of computer experiments. J. Amen. Statist. Assoc. 86, 953-963.
Dandekar, R. (1993). Performance improvement of restricted pairing algorithm for Latin hypercube sam-
pling Draft Report, Energy Information Administration, U.S.D.O.E.
Davis, P. J. and P. Rabinowitz (1984). Methods of Numerical Integration, 2nd. edn. Academic Press, San
Diego.
Diaconis, P. (1988). Bayesian numerical analysis In: S. S. Gupta and J. O. Berger, eds., Statistical Decision
Theory and Related Topics IV, Vol. 1. Springer, New York, 163-176.
Efron, B. and C. Stein (1981). The jackknife estimate of variance. Ann. Statist. 9, 586-596.
Fang, K. T. and Y. Wang (1994). Number-theoretic Methods in Statistics. Chapman and Hall, London.
Faure, H. (1982). Discrrpances des suites associres ~ un syst~me de numrration (en dimension s). Acta
Arithmetica 41, 337-351.
Friedman, J. H. (1991). Multivariate adaptive regression splines (with Discussion). Ann. Statist. 19, 1-67.
Gill, P. E., W. Murray, M. A. Saunders and M. H. Wright (1986). User's guide for npsol (version 4.0):
A Fortran package for nonlinear programming. SOL 86-2, Stanford Optimization Laboratory, Dept. of
Operations Research, Stanford University, California, 94305, January.
Gill, P. E., W. Murray and M. H. Wright (1981). Practical Optimization. Academic Press, London.
Gordon, W. J. (1971). Blending function methods of bivariate and multivariate interpolation and approxi-
mation. SlAM J. Numer. Anal. 8, 158-177.
Gu, C. and G. Wahba (1993). Smoothing spline ANOVA with component-wise Bayesian "confidence
intervals". J. Comp. Graph. Statist. 2, 97-117.
Hickernell, E J. (1996). Quadrature error bounds with applications to lattice rules. SIAM J. Numer. Anal.
33 (in press).
Ho, S. P., S. E. Hansen and P. M. Fahey (1984). Suprem III - a program for integrated circuit process
modeling and simulation. TR-SEL84 1, Stanford Electronics Laboratories.
306 J. R. Koehler and A. B. Owen

lman, R. L. and W. J. Conover (1982). A distributon-free approach to inducing rank correlation among
input variables. Comm. Statist. Bll(3), 311-334.
Johnson, M. E., L. M. Moore and D. Ylvisaker (1990). Minimax and maximin distance designs. J. Statist.
Plann. Inference 26, 131-148.
Joumel, A. G. and C. J. Huijbregts (1978). Mining Geostatistics. Academic Press, London.
Koehler, J. R. (1990). Design and estimation issues in computer experiments. Dissertation, Dept. of
Statistics, Stanford University.
Lindley, D. V. (1956). On a measure of the information provided by an experiment. Ann. Math. Statist.
27, 986-1005.
Loh, W.-L. (1993). On Latin hypercube sampling. Tech. Report No. 93-52, Dept. of Statistics, Purdue
University.
Loh, W.-L. (1994). A combinatorial central limit theorem for randomized orthogonal array sampling designs.
Tech. Report No. 94-4, Dept. of Statistics, Purdue University.
Mardia, K. V. and R. J. Marshall (1984). Maximum likelihood estimation of models for residual covariance
in spatial regression. Biometrika 71(1), 135-146.
Matrrn, B. (1947). Method of estimating the accuracy of line and sample plot surveys. Medd. Skogsforskn
Inst. 36(1).
Matheron, G. (1963). Principles of geostatistics. Econom. Geol. 58, 1246--1266.
McKay, M. (1995). Evaluating prediction uncertainty. Report NUREG/CR-6311, Los Alamos National
Laboratory.
McKay, M., R. Beckman and W. Conover (1979). A comparison of three methods for selecting values of
input variables in the analysis of output from a computer code. Technometrics 21(2), 239-245.
Miller, D. and M. Frenklach (1983). Sensitivity analysis and parameter estimation in dynamic modeling of
chemical kinetics. Internat. J. Chem. Kinetics 15, 677-696.
Mitchell, T. J. (1974). An algorithm for the construction of 'D-optimal' experimental designs. Technometrics
16, 203-210.
Mitchell, T., M. Morris and D. Ylvisaker (1990). Existence of smoothed stationary processes on an interval.
Stochastic Process. Appl. 35, 109-119.
Mitchell, T., M. Morris and D. Ylvisaker (1995). Two-level fractional factorials and Bayesian prediction.
Statist. Sinica 5, 559-573.
Mitchell, T. J. and D. S. Scott (1987). A computer program for the design of group testing experiments.
Comm. Statist. Theory Methods 16, 2943-2955.
Morris, M. D. and T. J. Mitchell (1995). Exploratory designs for computational experiments. J. Statist.
Plann. Inference 43, 381-402.
Morris, M. D., T. J. Mitchell and D. Ylvisaker (1993). Bayesian design and analysis of computer experi-
ments: Use of derivative in surface prediction. Technometrics 35(3), 243-255.
Nassif, S. R., A. J. Strojwas and S. W. Director (1984). FABRICS II: A statistically based IC fabrication
process simulator. IEEE Trans. Comput. Aided Design 3, 40-46.
Niederreiter, H. (1992). Random Number Generation and Quasi-Monte Carlo Methods. SIAM, Philadelphia,
PA.
O'Hagan, A. (1989). Comment: Design and analysis of computer experiments. Statist. Sci. 4(4), 430-432.
Owen, A. B. (1992a). A central limit theorem for Latin hypercube sampling. J. Roy. Statist. Soc. Ser. B
54, 541-551.
Owen, A. B. (1992b). Orthogonal arrays for computer experiments, integration and visualization. Statist.
Sinica 2, 439-452.
Owen, A. B. (1994a). Lattice sampling revisited: Monte Carlo variance of means over randomized orthog-
onal arrays. Ann. Statist. 22, 930-945.
Owen, A. B. (1994b). Controlling correlations in latin hypercube samples. J. Amer. Statist. Assoc. 89,
1517-1522.
Owen, A. B. (1995). Randomly permuted (t, m, s)-nets and (t, s)-sequences. In: H. Niederreiter and
E J.-S. Shiue, eds., Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing. Springer,
New York, 299-317.
Owen, .a/. B. (1996a). Monte Carlo variance of scrambled net quadrature. SIAM J. Numer. AnaL, to appear.
Computer experiments 307

Owen, A. B. (1996b). Scrambled net variance for integrals of smooth functions. Tech. Report Number
493, Department of Statistics, Stanford University.
Paskov, S. H. (1993). Average case complexity of multivariate integration for smooth functions. J. Com-
plexity 9, 291-312.
Park, J.-S. (1994) Optimal Latin-hypercube designs for computer experiments. J. Statist. Plann. Inference
39, 95-111.
Parzen, A. B. (1962). Stochastic Processes. Holden-Day, San Francisco, CA.
Patterson, H. D. (1954). The errors of lattice sampling. J. Roy. Statist. Soc. Ser. B 16, 140-149.
Phadke, M. (1988). Quality Engineering Using Robust Design. Prentice-Hall, Englewood Cliffs, NJ.
Pinto, M. R., C. S. Rafferty and R. W. Dutton (1984). PISCES-II-posson and continuity equation solver.
DAGG-29-83-k 0125, Stanford Electron. Lab.
Ripley, B. (1981). Spatial Statistics. Wiley, New York.
Ritter, K. (1995). Average case analysis of numerical problems. Dissertation, University of Edangen.
Ritter, K., G. Wasilkowski and H. Wozniakowski (1993). On multivariate integration for stochastic pro-
cesses. In: H. Brass and G. Hammerlin, eds., Numerical Integration, Birkhauser, Basel, 331-347.
Ritter, K., G. Wasilkowski and H. Wozniakowski (1995). Multivariate integration and approximation for
random fields satisfying Sacks-Ylvisaker conditions. Ann. AppL Prob. 5, 518-540.
Roosen, C. B. (1995). Visualization and exploration of high-dimensional functions using the functional
ANOVA decomposition. Dissertation, Dept. of Statistics, Stanford University.
Sacks, J. and S. Schiller (1988). Spatial designs. In: S. S. Gupta and J. O. Berger, eds., Statistical Decision
Theory and Related Topics IV, Vol. 2. Springer, New York, 385-399.
Sacks, J., S. B. Schiller and W. J. Welch (1989). Designs for computer experiments. Technometrics 31(1),
41-47.
Sacks, J., W. J. Welch, T. J. Mitchell and H. P. Wynn (1989). Design and analysis of computer experiments.
Statist. Sci. 4(4), 409-423.
Shannon, C. E. (1948). A mathematical theory of communication. Bell Syst. Tech. J. 27, 379-423, 623-656.
Sharifzadeh, S., J. R. Koehler, A. B. Owen and J. D. Shott (1989). Using simulators to model transmitted
variability in IC manufacturing. IEEE Trans. Semicond. Manufact. 2(3), 82-93.
Shewry, M. C. and H. P. Wynn (1987). Maximum entropy sampling. J. AppL Statist. 14, 165-170.
Shewry, M. C. and H. P. Wynn (1988). Maximum entropy sampling and simulation codes. In: Proc. 12th
World Congress on Scientific Computation, Vol. 2, IMAC88, 517-519.
Sloan, I. H. and S. Joe (1994). Lattice Methods for Multiple Integration. Oxford Science Publications,
Oxford.
Smolyak, S. A. (1963). Quadrature and interpolation formulas for tensor products of certain classes of
functions. Soviet Math. Dokl. 4, 240-243.
Stein, M. L. (1987). Large sample properties of simulations using Latin hypercube sampling. Technometrics
29(2), 143-151.
Stein, M. L. (1989). Comment: Design and analysis of computer experiments. Statist. Sci. 4(4), 432-433.
Steinberg, D. M. (1985). Model robust response surface designs: Scaling two-level factorials. Biometrika
72, 513-26.
Tang, B. (1992). Latin hypercubes and supersaturated designs. Dissertation, Dept. of Statistics and Actuarial
Science, University of Waterloo.
Tang, B. (1993). Orthogonal array-based Latin hypercubes. J. Amer. Statist. Assoc. 88, 1392-1397.
Wahba, G. (1978). Interpolating surfaces: High order convergence rates and their associated designs,
with applications to X-ray image reconstruction. Tech. report 523, Statistics Depmtment, University of
Wisconsin, Madison.
Wahba, G. (1990). Spline Models for Observational Data. CBMS-NSF Regional Conference Series in
Applied Mathematics, Vol. 59. SIAM, Philadelphia, PA.
Wasilkowski, G. (1993). Integration and approximation of multivariate functions: Average case complexity
with Wiener measure. Bull Amer. Math. Soc. (N. S.) 28, 308-314. Full version J. Approx. Theory 77,
212-227.
Wozniakowski H. (1991). Average case complexity of multivariate integration. Bull. Amer. Math. Soc.
(N. S.) 24, 185-194.
308 J. R. Koehler and A. B. Owen

Welch, W. J. (1983). A mean squared error criterion for the design of experiments. Biometrika 70(1),
201-213.
Welch, W. Yu, T. Kang and J. Sacks (1990). Computer experiments for quality control by parameter design.
J. Quality TechnoL 22, 15-22.
Welch, W. J., J. R. Buck, J. Sacks, H. P. Wynn, T. J. Mitchell and M. D. Morris. Screening, prediction, and
computer experiments. Technometrics 34(1), 15-25.
Yaglom, A. M. (1987). Correlation Theory of Stationary and Related Random Functions, Vol. 1. Springer,
New York.
Ylvisaker, D. (1975). Designs on random fields. In: J. N. Srivastava, ed., A Survey of Statistical Design
and Linear Models. North-Holland, Amsterdam, 593~507.
Young, A. S. (1977). A Bayesian approach to prediction using polynomials. Biometrika 64, 309-317.
Yu, Z., G. G. Y. Chang and R. W. Dutton (1982). Supplementary report on sedan II. TR-G201 12, Stanford
Electronics Laboratories.

You might also like