Ezgp: Easy-To-Interpret Gaussian Process Models For Computer Experiments With Both Quantitative and Qualitative Factors
Ezgp: Easy-To-Interpret Gaussian Process Models For Computer Experiments With Both Quantitative and Qualitative Factors
Abstract. Computer experiments with both quantitative and qualitative (QQ) inputs are commonly used in
science and engineering applications. Constructing desirable emulators for such computer exper-
iments remains a challenging problem. In this article, we propose an easy-to-interpret Gaussian
process (EzGP) model for computer experiments to reflect the change of the computer model under
the different level combinations of qualitative factors. The proposed modeling strategy, based on
an additive Gaussian process, is flexible to address the heterogeneity of computer models involving
multiple qualitative factors. We also develop two useful variants of the EzGP model to achieve com-
arXiv:2203.10130v1 [stat.ME] 18 Mar 2022
putational efficiency for data with high dimensionality and large sizes. The merits of these models
are illustrated by several numerical examples and a real data application.
Key words. Additive model; Big Data; Categorical Data; Emulator; Kriging.
on constructing correlations between the levels for each qualitative factor, and then use the
multiplicative structure to link them with the correlation functions for quantitative factors
[21, 35]. Such a multiplicative correlation function requires the “shape” of local variation as
a function of quantitative factors to be the same for all level combinations of the qualitative
factors; that is, the correlation parameters and process variances are the same for different
qualitative level combinations [33]. This is a strong assumption since computer models can be
quite different for distinct qualitative level combinations, especially when there are multiple
qualitative factors. Such a way to construct correlation functions for qualitative factors is
also applied to the additive GP models in [4]. Yet, it may not be interpretable in practice.
As an illustration, consider a computer experiment with one quantitative factor x and two
qualitative factors z1 and z2 each having two nominal values. For its four different qualitative
level combinations, the corresponding computer models are 3x, 4 sin(1.5x), x3 and ln(x), which
are shown in Figure 1. Here, it is not easy to interpret if one simply uses a scalar value, i.e., the
correlation between two levels for each qualitative factor, to quantify the complex relationship
between different functions of computer models [4, 21]. It would be more natural to use
indicator functions to reflect the GP being adjusted from a base GP under the different level
combinations of the qualitative factors.
Figure 1. The response curve with respect to the quantitative factor x under each level combination of the
two qualitative factors
y(x, z1 = 1, z2 = 1) = 3x y(x, z1 = 1, z2 = 2) = 4sin(1.5 x) y(x, z1 = 2, z2 = 1) = x3 y(x, z1 = 1, z2 = 1) = ln(x)
3.0
1.0
4
0
0.8
−1
3
2.0
0.6
−2
2
y
y
0.4
−3
1.0
0.2
−4
0.0
0.0
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x x x
In this article, we first lay out a general additive GP structure, and then make several
reasonable assumptions to appropriately adopt indicator functions in developing the proposed
EzGP model. The proposed method has a clear interpretation of its additive covariance struc-
ture to reflect the relationship between the response and the quantitative factors and identify
how the qualitative factors affect such relations. It is suitable for dealing with discontinuities
in response surfaces due to qualitative factors in computer experiments. The key idea of the
proposed EzGP model is to start with a base GP accounting for only quantitative factors,
and have GP components in an additive fashion to adjust the different level combinations
of the qualitative factors. It follows a similar spirit of using indicator functions in variance
decomposition, but at the scope of Gaussian process under each level of the qualitative fac-
tors. Compared to existing models using scalars to quantify the correlations between levels in
the qualitative factors [4, 21], the EzGP model does not explicitly construct the correlation
functions for the qualitative factors. Instead, it quantifies the relationships among different
response surfaces under the different level combinations of the qualitative factors through an
additive combination of several GPs, which leads to an easy-to-interpret covariance structure.
The EzGP model is proposed for computer experiments with QQ inputs where multiple
EZGP MODELS FOR QQ FACTORS 3
qualitative factors are involved. Specifically, we focus on complex computer experiments where
differences between the computer models for the distinct level combinations are large. In such
cases, the types of functions in computer models can be different, e.g. the toy example in
Figure 1, the simulation example in [33] and some real data in [5, 24]. Based on the EzGP
model, we further develop a variant, the efficient EzGP (EEzGP) model, suitable for computer
experiments with a large number of qualitative factors. We also develop another variant, the
localized EzGP (LEzGP) method, to efficiently deal with large sample sizes.
The remainder of this article is organized as follows. Section 2 provides a brief introduction
to GP models and review existing methods. Section 3 details the proposed EzGP, EEzGP
and LEzGP methods. Section 4 presents several numerical examples and section 5 reports a
real application of the proposed models. Section 6 concludes this work and discusses some
future work. All proofs and technical details are relegated to the Appendix.
2. Notation and Literature Review. In this section, we introduce notation and review
some current literature. Throughout this paper, we consider an n-run computer experiment
with p quantitative factors and q qualitative factors. We denote the ith quantitative factor
as x(i) (i = 1, . . . , p) and the j th qualitative factor as z (j) (j = 1, . . . , q). There are mj levels
({1, . . . , mj }) of the qualitative factor z (j) . Denote the k th (k = 1, . . . , n) data input as wk =
(xTk , zTk )T where xk = (xk1 , . . . , xkp )T ∈ Rp is the quantitative part and zk = (zk1 , . . . , zkq )T ∈
Nq is the qualitative part (coded in levels) of the input. Denote Y (wk ) as the output from
the input wk and the response (or output) vector y = (Y (w1 ), . . . , Y (wn ))T .
In the standard GP model [14, 23, 25], the inputs are all quantitative and the outputs
can be viewed as realizations of a GP. The correlation between outputs is determined by
a stationary correlation function, e.g., Gaussian, power-exponential and Matérn correlation
functions. To model the relationship between outputs Y (x) and inputs x, one popular GP
model, known as an ordinary GP (Kriging) model, assumes,
where µ is the constant mean, G(x) is a GP with zero mean and the covariance function
φ(·) = σ 2 R(·|θ). A popular choice for R(·|θ) is the Gaussian correlation function
p
X
(2.2) R(xi , xj |θ) = exp{− θk (xik − xjk )2 },
k=1
where two inputs xi = (xi1 , . . . , xip )T and xj = (xj1 , . . . , xjp )T , and the correlation parameters
θ = (θ1 , . . . , θp )T with all θk > 0 (k = 1, . . . , p).
To deal with QQ inputs, a popular GP based model [21, 35] was introduced among many
others [8, 27, 33, 34]. Specifically, an ordinary GP model with a multiplicative covariance
function is considered (for any two inputs w1 and w2 ):
q
Y
(2.3) Cov(Y (w1 ), Y (w2 )) = σ 2 τz(j)
1j z2j
R(x1 , x2 |θ),
j=1
(j)
where the parameter τz1j z2j represents the correlation between two levels (z1j and z2j ) in the
4 Q. XIAO, A. MANDAL, CD. LIN AND X. DENG
(j)
j th qualitative factor z (j) , and R(x1 , x2 |θ) is defined in (2.2). Denote Tj = (τz1j z2j )mj ×mj as
(j)
the correlation matrix for z (j) (j = 1, . . . , q). Three different functions of τz1j z2j can be used:
(j)
1. the exchangeable correlation function (EC) [12]: τz1j z2j = c (0 < c < 1) when z1j 6= z2j ;
(j)
otherwise, τz1j z2j = 1 ;
(j)
2. the multiplicative correlation function (MC) [19]: τz1j z2j = exp{−(θz1j + θz2j )} when
(j)
z1j 6= z2j ; otherwise, τz1j z2j = 1, where θz1j , θz2j > 0;
3. the unrestrictive correlation function (UC) [21, 35]: define Tj = Lj LTj where Lj is
(j) (j) (j)
a lower triangular matrix; for the rth row (lr1 , . . . , lrr ) in Lj , l11 = 1, and for r =
2, . . . , mj ,
(j)
lr1 = cos(ϕj,r,1 )
(j)
lrs = sin(ϕj,r,1 ) . . . sin(ϕj,r,s−1 ) cos(ϕj,r,s ), for s = 2, . . . , r − 1
(j)
lrr = sin(ϕj,r,1 ) . . . sin(ϕj,r,r−1 ),
where σj2 and θ (j) (j = 1, . . . , q) are the process variance parameters and the correlation
parameters corresponding to z (j) , respectively. The same as above, three different choices
(j)
of τz1j z2j : the exchangeable, multiplicative and unrestrictive correlation functions, can be
adopted in (2.4). We denote them as the AD EC, AD MC and AD UC models, respectively.
(j)
Note that if any τz1j z2j has a zero (or near zero) value, the overall covariance in (2.3) will be
zero (or near zero). Such problems are avoided in the additive model structure in (2.4).
3. Easy-to-Interpret Gaussian Process (EzGP) Models. In this section, we first lay out
a general additive GP structure and describe in details the proposed EzGP model. Then
we illustrate the Efficient EzGP (EEzGP) model for data with many qualitative factors, and
discuss the Localized EzGP (LEzGP) method for data of large run sizes.
3.1. The EzGP Model. For an n-run computer experiment with p quantitative factors
and q qualitative factors, we model the output at w = (xT , zT )T as
It means that for any level combination of z, Y (w) is a Gaussian process. Specifically, we
consider
where G0 and Gz (h) (h = 1, . . . q) are independent Gaussian processes with mean zero and the
covariance functions φ0 and φh (h = 1, . . . q), respectively. Here, G0 is a standard GP taking
EZGP MODELS FOR QQ FACTORS 5
only quantitative inputs x, which can be viewed as the base GP reflecting the intrinsic relation
between y and x. The standard Gaussian covariance function is adopted for G0 , which is
p
( )
(0)
X
(3.3) φ0 (xi , xj |θ 0 ) = σ02 exp − θk (xik − xjk )2
,
k=1
(0) (0)
where the correlation parameters in θ 0 = (θ1 , . . . , θp )T are all positive.
The Gz (h) can be viewed as an adjustment to the base GP by the impact of the qualitative
factor z (h) (h = 1, . . . q). It is a GP component concerning the hth qualitative factor coupled
with all quantitative factors. A general covariance function could be
p
( )
(h)
X
(3.4) T T T T
φh ((xi , zih ) , (xj , zjh ) ) = σh2 exp − θkzih zjh (xik − xjk ) 2
τz(h)
ih zjh
,
k=1
(h)
where the correlation parameters θkzih zjh (k = 1, . . . , p) are specific to the pair of levels
(h)
(zih , zjh ) in z (h) , and τzih zjh and σh2 are defined in (2.3) and (2.4), respectively. Clearly, such
a general form involves too many parameters and thus is hard to interpret. Note that the
additive model in [4] can be viewed as a special case of the general model in (3.1) when
(h) (h)
simplifying all θkzih zjh = θk for any (zih , zjh ) in (3.4) and not considering the base G0 in
(3.2). As discussed in section 1, such a simplification may not be reasonable in some practical
cases. Below, we will introduce the EzGP model which simplifies (3.4) in a more meaningful
way. When there are at least two qualitative factors, its structural formulation will be different
(h)
from the additive model in [4] regardless of the choice for τzih zjh .
For the formulation in (3.2), the base GP G0 is adjusted by GP component Gz (h) to
account for the effect of different levels in z (h) . This is analogous to using indicator functions
in variance decomposition. To enable an easy-to-interpret model, we consider the covariance
function of Gz (h) , h = 1, . . . , q as
p
( )
(h)
X
(3.5) φh ((xTi , zih )T , (xTj , zjh )T |Θ(h) ) = σh2 exp − θklh (xik − xjk )2 I(zih = zjh ≡ lh ),
k=1
where zih and zjh are the levels of z (h) in the ith and j th inputs, respectively; σh2 is the
variance parameter for z (h) ; lh takes values in {1, . . . , mh } and mh is the number of levels
(h)
in z (h) ; Θ(h) = (θklh )p×mh is the matrix for correlation parameters; the indicator function
I(zih = zjh ≡ lh ) = 1 for zih = zjh ≡ lh , otherwise 0. Without any prior information,
we can assume that different levels in z (h) will result in different and independent Gaussian
processes, and thus φh ((xTi , zih )T , (xTj , zjh )T |Θ(h) ) = 0 when zih 6= zjh . For distinct levels
(h)
lh in z (h) , parameters θklh are different, thus we have different Gaussian processes to depict
different computer models associated with the different levels of the qualitative factors. Such
a strategy makes the GP model structure parsimonious and easy to interpret, which avoids
directly modeling the correlation functions of qualitative factors.
6 Q. XIAO, A. MANDAL, CD. LIN AND X. DENG
Based on (3.2), (3.3), and (3.5), for any two inputs wi and wj , the covariance function
for the model in (3.1) can be specified by
This aggregated covariance function has (1+p+q+p qh=1 mh ) parameters which are estimated
P
simultaneously via the maximum likelihood estimation. The following example illustrates the
formulation of the EzGP model and its implication.
Example 3.1. Consider a computer experiment with two quantitative factors x(1) and x(2) ,
and two qualitative factors z (1) and z (2) each having two levels. Suppose that three inputs are
w1 = (x11 = a, x12 = b, z11 = 1, z12 = 2)T , w2 = (x21 = c, x22 = d, z21 = 1, z22 = 2)T and
w3 = (x31 = c, x32 = d, z31 = 2, z32 = 1)T where a, b, c and d are arbitrary real numbers.
Here xij and zij represent the j th entry in xi and zi , respectively. According to the covariance
function in (3.6), we have
φ(w1 , w2 )
2 2 2
( ) ( )
(0) (1)
X X X
= σ02 exp − θk (x1k − x2k )2 + σ12 exp − θkl1 (x1k − x2k )2
k=1 l1 =1 k=1
2 2
( )
(2)
X X
× I(z11 = z21 ≡ l1 ) + σ22 exp − θkl2 (x1k − x2k )2 I(z12 = z22 ≡ l2 )
l2 =1 k=1
n o n o
(0) (0) (1) (1)
= σ02 exp −θ1 (a − c)2 − θ2 (b − d)2 + σ12 exp −θ11 (a − c)2 − θ21 (b − d)2
n o
(2) (2)
+ σ22 exp −θ12 (a − c)2 − θ22 (b − d)2 .
Similarly, we have
n o
(0) (0)
φ(w1 , w3 ) = σ02 exp −θ1 (a − c)2 − θ2 (b − d)2 .
Clearly, Cov(Y (w1 ), Y (w2 )) > Cov(Y (w1 ), Y (w3 )). In the EzGPPmodel, it is straightfor-
q 2
ward to derive that all variances are equal; that is, Var(wi ) = i=0 σi . Thus, we have
Cor(Y (w1 ), Y (w2 )) > Cor(Y (w1 ), Y (w3 )), which is meaningful for interpretation. Given the
inputs w2 and w3 having the same quantitative part, it is straightforward that (w1 , w2 ) should
be more similar compared to (w1 , w3 ), since w1 and w2 have the same qualitative part but
w1 and w3 do not. Thus, the correlation between (w1 , w2 ) should be larger than that between
EZGP MODELS FOR QQ FACTORS 7
(w1 , w3 ), when at least one of the qualitative factor is significant (i.e., one of the σ12 and σ22
is not 0). Refer to Example 3.7 for an opposite case.
(h)
Lemma 3.2. Let A0 = (σ02 R(xi , xj |θ 0 ))n×n and Ahlh = (σh2 R(xi , xj |θ lh ))n×n where R(·|θ)
(h) (h)
is defined in (2.2) and θ lh = (θklh )p×1 . The covariance matrix of the output vector y induced
by the covariance function in (3.6) can be written as
q X
X mh
(3.7) Cov(y) = (φ(wi , wj ))n×n = A0 + (Bhlh BThlh ) ◦ Ahlh ,
h=1 lh =1
where ◦ is the Schur product (or Hadamard product). Here Bhlh = Eh (Imh )lh where (Imh )lh
is the lhth column of the identity matrix Imh , mh is the number of levels in z (h) , and Eh is an
n × mh expansion matrix of which each row is the dummy coding for the corresponding level
in z (h) .
Example 3.3. To illustrate matrices Bhlh and Eh in Lemma 3.2, consider the column for
z (1)in a computer experiment with 4 runs. In E1 , we use dummy coding (1, 0, 0), (0, 1, 0) and
(0, 0, 1) to code levels 1, 2 and 3, respectively. For h = 1 and l1 = 2, we have
1 1 0 0 0 0 0 0 0
2 0
z (1) = , B12 = E1 (I3 )2 = 0 1 0 1 = 1 , B12 BT12 = 0 1 0 1 .
3 0 0 1 0 0 0 0 0
0
2 0 1 0 1 0 1 0 1
Lemma 3.2 provides insights on the covariance structure of the EzGP model. In (3.7),
matrix A0 serves as the base which corresponds to all quantitative inputs, matrix Bhlh BThlh
selects all pairs of data that satisfy zih = zjh ≡ lh , and matrix Ahlh measures the adjustment
due to the level lh in qualitative factor z (h) (h = 1, . . . , q). Based on Lemma 3.2, we can prove
the following Lemma 3.4.
Lemma 3.4. Given n inputs wi = (xTi , zTi )T (i = 1, . . . , n), the covariance matrix of the
output vector y = (Y (w1 ), . . . , Y (wn ))T induced by the covariance function in (3.6) is positive
semi-definite, i.e., Cov(y) in (3.7) is positive semi-definite.
Lemma 3.4 holds for any w1 , . . . , wn , including duplicated inputs. For appropriate model
inference, Cov(y) needs to be positive definite, and the following Lemma 3.5 and Corollary 3.6
shed some lights on this aspect.
Lemma 3.5. Given n inputs wi = (xTi , zTi )T (i = 1, . . . , n), if there exists an h (h = 1, . . . , q)
such that any two inputs wi and wj (i 6= j) have distinct quantitative parts (xi 6= xj ) whenever
they have the same level in z (h) , the covariance matrix Cov(y) induced by the covariance
function in (3.6) is positive definite.
Corollary 3.6. If there are no duplicated runs in the quantitative part of the design matrix,
that is xi 6= xj for i 6= j, the covariance matrix Cov(y) induced by the covariance function in
(3.6) is positive definite.
Corollary 3.6 is a special case of Lemma 3.5, and its assumption is standard in computer
experiments. If Latin hypercube designs or space-filling designs [2] are used for quantitative
8 Q. XIAO, A. MANDAL, CD. LIN AND X. DENG
factors, it is clear that Cov(y) is positive definite by Corollary 3.6. When the conditions in
Lemma 3.5 are not satisfied, one can simply add a nugget term to make the covariance matrix
positive definite, which is a standard technique in Kriging [14, 23, 22].
Besides the additive covariance function in (3.6), one could think of using indicator func-
tions under a multiplicative covariance structure as
q
Y
∗ 2
φ (wi , wj ) = Cov(Y (wi ), Y (wj )) = σ R(xi , xj |θ 0 ) R(xi , xj |Θ(h) )
h=1
p
( )
(0)
X
= σ 2 exp − θk (xik − xjk )2
k=1
q Y
mh
" ( p
)#I(zih =zjh ≡lh )
(h)
Y X
2
(3.8) × exp − θklh (xik − xjk ) .
h=1 lh =1 k=1
However, such a covariance function may not properly quantify the correlation for two inputs
as illustrated in the following example.
Example 3.7. For the three inputs w1 , w2 and w3 in Example 3.1, under the multiplicative
covariance function in (3.8), we have:
( 2 ) ( 2 )
X (0) X (1)
∗ 2 2 2
φ (w1 , w2 ) = σ exp − θk (x1k − x2k ) exp − θk1 (x1k − x2k )
k=1 k=1
2
( )
(2)
X
× exp − θk2 (x1k − x2k )2
k=1
2 (0) (1) (2) (0) (1) (2)
=σ exp{−(θ1 + θ11 + θ12 )(a − c)2 − (θ2 + θ21 + θ22 )(b − d)2 },
n X2 o
(0) (0) (0)
φ∗ (w1 , w3 ) = σ 2 exp − θk (x1k − x2k )2 = σ 2 exp − θ1 (a − c)2 − θ2 (b − d)2 .
k=1
It is easy to derive that Cor(Y (w1 ), Y (w2 )) < Cor(Y (w1 ), Y (w3 )), which is counter-intuitive
and not interpretable. As shown in Example 3.1, Cor(Y (w1 ), Y (w2 )) should be no less than
Cor(Y (w1 ), Y (w3 )), since (w1 , w2 ) are more similar compared to (w1 , w3 ).
3.2. The Efficient EzGP Pq (EEzGP) Model. The EzGP model with the covariance function
in (3.6) has (2 + p + q + p h=1 mh ) parameters. For data with many qualitative factors, this
number can be quite large, which may result in high prediction variance. In this part, we
propose a so-called Efficient EzGP (EEzGP) model for data with many qualitative factors.
The EEzGP model follows the same (3.1)–(3.3) as the EzGP, but simplifies the correlation
(h) (h)
parameter θklh in (3.5) to θlh . It considers Gz (h) (x) (h = 1, . . . , q) to be a GP with the
covariance function:
( p )
X (h)
(3.9) φh ((xTi , zih )T , (xTj , zjh )T ) = σh2 exp − θlh (xik − xjk )2 I(zih = zjh ≡ lh ).
k=1
EZGP MODELS FOR QQ FACTORS 9
Compared to the covariance function in (3.5) which adopts distinct correlation parameters
(h)
θklh to scale each quantitative factor separately, the covariance function in (3.9) adopts a single
(h)
correlation parameter θlh to scale all quantitative factors together. As the EEzGP model
includes a base GP component G0 where distinct correlation parameters have been used for
different quantitative factors, it may not be necessary to scale each quantitative dimension
again when considering the coupled quantitative effects in the adjustment part Gz (h) (x). Thus,
such a simplification may not sacrifice much in model prediction accuracy. Examples in
sections 4 and 5 will illustrate this point. When using the EEzGP model, we should always
normalize the quantitative factors to [0, 1] range. To avoid over-parameterization in (3.9), we
(h)
fix θ1 = 1 for the first level in z (h) , which can be viewed as a benchmark for the adjustment.
For two inputs wi = (xTi , zTi )T and wj = (xTj , zTj )T , the covariance function φ(wi , wj ) (for
any i, j = 1, . . . , n) in the EEzGP model is
Lemmas 3.4 and 3.5 and Corollary 3.6 in subsection 3.1 also apply to the EEzGP model, since
it is a special case of the EzGP. When Latin hypercube designs or space-filling designs are
used for quantitative factors, the covariance matrix of observed responses Cov(y) induced by
(3.10) is positive definite.
In the EEzGP model, the number of parameters is 2+p+ qh=1 mh , which is much smaller
P
than that in the EzGP model. A rule of thumb for run-size in computer experiments is at
least 10(p + q), ten times of the dimensions [17]. Taking m1 = . . . mq = m for illustration, it
is easy to show that when the number of levels in qualitative factors m ≤ 10, the number of
parameters in the EEzGP model will be less than 10(p + q).
3.3. The Localized EzGP (LEzGP) Method. Note that for the EzGP and EEzGP mod-
els, the computational complexity and memory space complexity are O(n3 ) and O(n2 ), respec-
tively, where n is the size of training data. To facilitate the analysis of data with large size n,
we propose the so-called LEzGP method. Its key idea is to select a proper subset of training
data to fit the EEzGP (or EzGP) model given a target input. For an input w = (xT , zT )T and
a target input w∗ = ((x∗ )T , (z∗ )T )T , we denote Nz (w, w∗ ) to be the number of same levels in
their qualitative parts (between z and z∗ ). For example, when z = (1, 2, 3)T and z∗ = (3, 2, 1)T ,
there is only one same level at the corresponding positions, and thus Nz (w, w∗ ) = 1. The
LEzGP method includes the following three steps:
Step 1. Select an appropriate tuning parameter ns ;
Step 2. For a chosen target input w∗ , select the training data wi (i ∈ {1, . . . , n}) satisfying
Nz (wi , w∗ ) ≥ ns to form the key subset, denoted as Ks ;
Step 3. Use Ks as the new training set and fit it with the EEzGP (or EzGP) model to make
prediction at the target input w∗ in Step 2.
10 Q. XIAO, A. MANDAL, CD. LIN AND X. DENG
Clearly, the value of ns determines the size of the key subset Ks . It means that the data
points in Ks have at least ns number of the same levels as the target input in their qualitative
parts. The following example illustrates the first two steps in the LEzGP method.
Example 3.8. Consider a computer experiment with five runs, one quantitative and four
qualitative factors. Its design matrix D is shown as below. Suppose that the chosen target
input w∗ = (0.3, 1, 2, 3, 1)T and the tuning parameter ns = 3. Then, the key subset Ks will
only include those runs that have at least 3 same levels as w∗ in their qualitative parts z.
One primary rationale of the LEzGP method is that predictions from the GP model fitted
by a relevant subset of data can be more accurate than those from the GP model fitted by
the entire training set of large data. As shown in [10], the predicted response at target input
(a.k.a. target response) will be less accurate, if its training set contains certain responses
following significantly different GPs compared to that followed by the target response. In
computer experiments with qualitative factors, when an observed input has no or few common
qualitative levels as the target input, their responses may follow different GPs. Thus, for the
data with large size, it would be appropriate to exclude such irrelevant data points in predicting
the target input.
Generally speaking, in the LEzGP method, a larger ns chosen in Step 1 would lead to
a smaller key subset Ks in Step 2 and less computation required in Step 3. A larger ns
and smaller Ks will not necessarily reduce the prediction accuracy. Choosing a proper value
of ns reduces the computational cost, and could improve the prediction accuracy in certain
situations; refer to the Example 4.3 in section 4. The proper selection of ns often depends on
the budget (e.g. available computing resources), the design matrix of the training data and
the target input to be predicted. In practice, budget is often the key constraint. In most
cases, one can easily choose an appropriate ns , because a small difference in ns will lead to a
big difference in the run size of Ks ; see Example 4.3.
For a large-size computer experiment with QQ inputs, one general suggestion on ns is to
choose q/2 < ns 6 nup , where nup is the largest integer such that the size of its key subset
Ks is larger than the number of parameters in the model. A rule of thumb for the choice of
ns would be q/2 < ns 6 n∗up , where n∗up is the integer such that the size of its corresponding
key subset is closest to 10(p + q) [17]. A too small ns will not be desirable since it will require
more computing resources. We suggest to use ns > q/2 here, which can guarantee that every
pair of data points in Ks will have at least one common level in the same qualitative factor.
We would like to note that it is possible for the LEzGP method using other models in its
Step 3. But adopting the EEzGP (or EzGP) model appears to provide better justifications.
The underlying assumption of the LEzGP method is that when two inputs have an increased
number of common levels in qualitative factors, these two inputs are more relevant and thus
EZGP MODELS FOR QQ FACTORS 11
their correlation should increase. In the covariance function (3.10) (or (3.6)), more positive
covariance components due to the same qualitative levels are added when two inputs have
more common levels lh in zh (h = 1, . . . , q), which exactly matches the assumption of the
LEzGP method.
3.4. Parameter Estimation. The EzGP model with the covariance function in (3.6)
(0) (h)
contains the parameters µ, σ02 , θk , σh2 and θklh where h = 1, . . . , q, k = 1, . . . , p and
lh = 1, . . . , mh . Denote vector σ 2 = (σ02 , σ12 , . . . , σq2 )T and matrix Θ = (θ (0) , Θ(1) , . . . , Θ(q) )
(0) (h)
where θ (0) = (θk )p×1 and Θ(h) = (θklh )p×mh . Denote the covariance matrix by Φ =
Φ(σ 2 , Θ) = (φ(wi , wj ))n×n which follows the covariance function in (3.6). Under the GP
model in (3.1) and after dropping some constants, maximizing the log-likelihood function
l(µ, σ 2 , Θ) is equivalent to minimizing log|Φ| + (y − µ1)T Φ−1 (y − µ1). For given σ 2 and
Θ, the maximum likelihood estimator (MLE) of µ is µ b = (1T Φ−1 1)−1 1T Φ−1 y. Thus we can
obtain the profile likelihood for the MLE of σ 2 and Θ :
[σ 2 , Θ] = argmin log |Φ| + (yT Φ−1 y) − (1T Φ−1 1)−1 (1T Φ−1 y)2 .
(3.11)
This minimization problem can be solved via some standard global optimization algorithms
in R or Matlab, such as genetic algorithms [22, 18]. In this work, we adopt the R package
“rgenoud” [20] which combines evolutionary search algorithms [15] with the derivative-based
quasi-Newton methods to solve difficult optimization problems. In particular, we have derived
all parameters’ analytical gradients to facilitate the computation, which are reported in the
Appendix B.
Given parameters µ, σ 2 and Θ, the prediction at a new location w∗ is the condition mean:
Φ; thus, γ T Φ−1 is the ith row of ΦΦ−1 , which is a row vector with its ith entry being 1 and
otherwise 0. Therefore, it is straightforward to show Yb (w∗ ) = Yb (wi ) = yi by (3.12). Similar
parameter estimations apply for the EEzGP model which is a special case of the EzGP model.
More details on the derivations for the general GP model can be found in [14, 23].
4. Simulation Study. In this section, we use three numerical examples to examine per-
formances of our proposed models. We measure the performance via the root mean square
error (RMSE) for predictions:
v
u nt
u1 X
RMSE = t (Yb (wi ) − Y (wi ))2 ,
nt
i=1
where nt is the number of data points in the test set, Ŷ (wi ) and Y (wi ) are the predicted
and actual responses of the input wi in the test set. In addition, we use the Nash-Sutcliffe
12 Q. XIAO, A. MANDAL, CD. LIN AND X. DENG
where Ȳ is the average of the predicted responses. The NSE represents an estimate of the
proportion of the response variability explained by the model, which is analogous to the R2 in
linear regression. Generally speaking, a method with a lower RMSE will a yield higher NSE.
Example 4.1. Consider a computer experiment with p = 3 quantitative factors and q = 3
qualitative factors each having 3 levels, and its computer model has the following form (x =
(x1 , x2 , x3 )):
y = fi (x) × (gj (x) + hk (x))
where i, j, k are the levels for the first, second and third qualitative factors, 0 6 xi 6 1 for
i = 1, 2, 3, and we list functions fi , gj and hk as below :
f1 (x) = x1 + x22 + x33 , f2 (x) = x21 + x2 + x33 ,
f3 (x) = x31 + x22 + x3 , g1 (x) = cos(x1 ) + cos(2x2 ) + cos(3x3 ),
g2 (x) = cos(3x1 ) + cos(2x2 ) + cos(x3 ), g3 (x) = cos(2x1 ) + cos(x2 ) + cos(3x3 ),
h1 (x) = sin(x1 ) + sin(2x2 ) + sin(3x3 ), h2 (x) = sin(3x1 ) + sin(2x2 ) + sin(x3 ),
h3 (x) = sin(2x1 ) + sin(x2 ) + sin(3x3 ).
In Example 4.1, the computer model includes both multiplicative and additive structures,
which leads to a fair comparison of different multiplicative and additive GP models. In
Figure 2, we show the boxplots of RMSEs for the EzGP, EEzGP, EC, MC, UC, AD EC,
AD MC and AD UC models over 50 simulations. In each simulation, a 81-run design is
used where three replicates of a 33 full factorial design are adopted for qualitative factors
and a random Latin hypercube design is adopted for quantitative factors. The RMSEs are
computed based on a 1215-run test set consisting of 45 replicates of a 33 full factorial design
for qualitative factors and a random Latin hypercube design for quantitative factors.
Figure 2 clearly shows that the EzGP and EEzGP models perform better than other mod-
els with smaller RMSEs. Here, the EzGP model performs the best, and it has more parameters
than the EEzGP model. For experiments with a relatively small number of quantitative and
qualitative factors, the EzGP model is usually preferred due to its flexibility. The median
NSE value for the EzGP model here is as high as 0.92, which is analogous to achieving an
R2 = 0.92 in linear regression. Thus, the EzGP model fits the data well. It should be noted
that the multiplicative models EC, MC and UC perform better than the additive models:
AD EC, AD MC and AD UC here. Thus, the success of EzGP and EEzGP methods is not
because this simulation setting is in favor of additive models. In this example, the computer
models are of different expressions for the distinct level combinations of the qualitative factors.
The key idea of the proposed models is using the indicator functions appropriately in the GP
covaraince function to make the response surface different under the different level combina-
tions of the qualitative factors. Thus, the superior performances of the EzGP and EEzGP
methods here could be explained by using the meaningful additive covariance structures via
the indicator functions.
EZGP MODELS FOR QQ FACTORS 13
● ●
●
0.7
0.6
●
RMSE
●
0.5
0.4
0.3
Models
where 0 6 xi 6 1 for i = 1, . . . , 9. Here, the nine qualitative factors z (1) , . . . , z (9) cor-
respond to functions f (1) , f (2) , f (3) , g (1) , g (2) , g (3) , h(1) , h(2) and h(3) , respectively, and
i1 , i2 , i3 , j1 , j2 , j3 , k1 , k2 , k3 ∈ {1, 2, 3} are the levels for these nine qualitative factors. We list
functions f , g and h as below:
As illustrated in subsection 3.2, it is not recommended to use the EzGP model for computer
experiments with many factors, and thus we do not compare it here.
●
●
1.3
● ●
●
1.2
●
●
RMSE
1.1
1.0
0.9
Models
Example 4.3. This example is to examine the performance of the proposed LEzGP method.
Consider a computer experiment with n = 19,683 runs, p = 9 quantitative factors and q = 9
qualitative factors each having 3 levels. The computer models are the same as those in Exam-
ple 4.2. A 19,683-run design is use with a random Latin hypercube design for the quantitative
factors and a 39 full factorial for the qualitative factors. The RMSEs are computed based on
a test set consisting of m = 100 data points where a random Latin hypercube design used for
the quantitative factors and a single random level combination used for the qualitative factors.
We replicate this simulation 50 times and display the boxplots of RMSEs in Figure 4.
For such a computer experiment with a large run size n, it is difficult to directly apply
existing GP models (EzGP, EEzGP, EC, MC, UC, AD EC, AD MC and AD UC models).
Thus, the proposed LEzGP method is in a better position for evaluation. For the LEzGP,
it is straightforward to show that the number of data points in the key subset Ks is mq [1 −
P ns −1 q i q−i ] with the tuning parameter n . We set the tuning parameter
i=0 i (1/m) (1 − 1/m) s
ns = 7 according to the rule of thumb in subsection 3.3, and consequently the LEzGP method
selects a Ks of 163 training data from the overall 19,683 ones. The LEzGP method significantly
reduces the computation and memory space required in model estimation.
In Figure 4, we compare the performance of the LEzGP method with that of the EEzGP
model in Example 4.2, since both examples use the same computer model. From Figure 4, the
LEzGP method can provide more accurate predictions using only 163 training data, compared
with the EEzGP model using 243 training data. The median NSE for the LEzGP method
here is 0.87, larger than that of 0.76 for the EEzGP model. Moreover, the success of LEzGP
method also provides some justifications on the assumptions of our proposed models: a data
point will not contribute much to the prediction of the target input, if it has no same level
as the target in their qualitative parts. Note that when the tuning parameter ns is 5, 6, 7,
8 or 9 (ns must be an integer), the corresponding training set Ks will include 2851, 835, 163,
19 and 1 runs, respectively. Clearly, a Ks with 1 or 19 runs is too small and a Ks with 2851
EZGP MODELS FOR QQ FACTORS 15
runs can be too big for the LEzGP model with 38 parameters in this example. Additional
results have shown that using ns = 6 (with 835 runs) and ns = 7 (with 163 runs) will lead to
very similar performances on predictions. Thus, the rule of thumb ns = 7 is preferred, which
requires much less computation.
Figure 4. The boxplots of RMSEs for the LEzGP in Example 4.3 and the EEzGP in Example 4.2
1.1
1.0
0.9
RMSE
0.8
0.7
0.6
LEzGP EEzGP
Models
5. Real Data Analysis. In this section, we apply the proposed models to a real computer
experiment with p = 1 quantitative factor and q = 3 qualitative factors. A fully 3D coupled
finite element model has been calibrated and verified by successfully modeling the performance
of a full-scale embankment constructed on soft soil [24]. The following Figure 5 (source
from [4]) illustrates the structure of this full scale embankment where sub-figure (a) is the
finite element mesh and sub-figure (b) is the schematic view of embankment constructed
on foundation soil. The finite element discretization here had 36,802 elements and 69,667
nodes. The average run-time for one case of this size is approximately 9 hours via a 12-
noded super-computer at the High Performance Computing Virtual Laboratory (HPCVL). In
this computer experiment, the three qualitative factors are “embankment construction rate”
(z (1) = 1, 5, 10 m/month), “Young’s modulus of columns” (z (2) = 50, 100, 200 MPa), and
“reinforcement stiffness” (z (3) = 1578, 4800, 8000 kN/m). The single quantitative factor x(1)
is the distance from the embankment shoulder to the embankment center line. The response
here is the final embankment crest settlement, which is an important embankment working
indicator. The training set of this computer experiment has 261 runs. The quantitative factor
x(1) takes the 29 values uniformly from the interval [0, 14]. For each distinct value of x(1) , a
9-run, 3-factor and 3-level fractional factorial design is used for the qualitative factors. The
test set has 29 runs where x(1) takes the 29 values uniformly from the interval [0, 14] and
(z (1) , z (2) , z (3) ) = (5, 100, 4800). Note that such a setting of qualitative factors is not used in
the training set.
To evaluate the proposed methods, we compare the EzGP and EEzGP models with the
EC, MC, UC and AD UC models as in [4]. We repeat each model estimation 100 times as in
[4]. Figure 6(a) shows the boxplots of log(RMSE) for different models, and it clearly shows
that the EzGP, EEzGP and AD UC models perform much better than the EC, MC and UC
16 Q. XIAO, A. MANDAL, CD. LIN AND X. DENG
models. Then, we further compare the EzGP, EEzGP and AD UC models in Figure 6(b) and
Table 1. For the AD UC method, we exclude outliers in Figure 6(b). From the figure and
table, it is clear that the EzGP model performs the best in terms of both mean and median
log(RMSE), and it is also the most robust one with the smallest standard deviation. Note
that there is only one quantitative factor and three qualitative factors here. For cases with
only a few factors, the EzGP model is usually preferred due to its flexibility. The average
NSE for the EzGP model is 0.77 which is viewed to be high in practice. In the EzGP model,
the estimate of σ02 appears to be the largest among those for σ12 , σ22 and σ32 in each replication.
This indicates a significant base GP between the output and the quantitative inputs. It makes
practical sense that the distance from the embankment shoulder to the embankment center
line has significant impact on the final embankment crest settlement [24]. In addition, the
estimate of variance parameter σ12 is larger than that of σ22 and σ32 , which suggests that the
embankment construct rate (z (1) ) may have stronger impact on the output compared with
the other two qualitative factors.
●
4
−3.015
●
2
●
−3.020
Log(RMSE)
Log(RMSE)
●
●
●
0
−3.025
−2
−3.030
●
●
● ● ●
● ●
Models Models
6. Discussion. In this work, we propose the EzGP model for computer experiments with
both quantitative and qualitative factors, and develop its two useful variants, EEzGP for
EZGP MODELS FOR QQ FACTORS 17
Table 1
Comparison between methods in terms of Log(RMSE)
Mean Median SD
EzGP −3.026 −3.026 0.0005
EEzGP −3.025 −3.025 0.0040
AD UC −3.021 −3.021 0.0056
data with many factors and LEzGP for data with many runs. The proposed models have
easy-to-interpret covariance structures and can provide desirable prediction performances.
Specifically, the proposed models are suitable for handling complex computer experiments
with quantitative factors and multiple qualitative factors, where the computer models are
very different for the distinct level combinations of the qualitative factors. Note that the pro-
posed methods quantify the underlying response surfaces of the quantitative factors differently
under the different level combinations of the qualitative factors via the additive GP structure.
Hence, it is more flexible in terms of quantifying the variance and correlation structure of
the quantitative factors compared to [21], while it could be a bit more restrictive in terms of
quantifying the correlation of the qualitative factors due to the use of indicator functions.
The current paper focuses on the “first-order” GP components Gz (h) (h = 1, . . . q) in the
EzGP framework, which is analogous to the main-effect under the context of GPs. A further
research can include the “second-order” GP components Gz (h) z (s) (h, s = 1, . . . q and h 6= s)
which can be viewed as the adjustment by the interaction of z (h) and z (s) . One can consider
its covariance function as φhs ((xTi , zih , zis )T , (xTj , zjh , zjs )T ) = I(zih = zjh ≡ lh )I(zis = zjs ≡
p
2 exp{− (hs)
θklh ls (xik − xjk )2 }. However, such an EzGP (or EEzGP) model may contain
P
ls )σhs
k=1
too many parameters, and thus may over-fit the data in practice.
Here, we would like to remark that the proposed EzGP framework can provide good
interpretations on the importance of qualitative factors via the variance parameters. To get
robust variance parameter estimations and alleviate too complex model structures, one could
add a penalty term of the variance parameters to the likelihood function in the proposed
models. Adding a penalty term for GP modeling is used in the literature [11], and variable
screening for computer experiments with QQ inputs can be another topic of future research.
It will be an interesting investigation to further enhance the LEzGP method. Better
strategies of selecting the tuning parameter ns need to be investigated. Other methods in
selecting subsets may also be useful in the LEzGP method, e.g. the localization method in
[7]. In addition, one issue of the current LEzGP method is that when there are many different
level combinations of qualitative factors in the target inputs, the model estimation can still
be computationally cumbersome, if the goal is to predict the whole response surface. One
possible solution is to arrange the target inputs into a few groups according to their level
combinations, and then apply a more flexible LEzGP method to each of these groups.
Good experimental designs usually have significant impacts on both computer and physical
experiments [6, 26, 29]. For the standard GP models, space-filling designs are usually preferred
[30, 16, 28]. The marginally coupled designs were proposed for computer experiments with
18 Q. XIAO, A. MANDAL, CD. LIN AND X. DENG
QQ inputs [3], but their run and factor sizes are not flexible. Construction of good space-filling
designs of flexible sizes for GP models with QQ inputs remains a challenging problem.
(h)
Since A0 = (σ02 R(xi , xj |θ 0 ))n×n and Ahlh = (σh2 R(xi , xj |θ lh ))n×n with the Gaussian cor-
relation function R(·|θ), it is straightforward that matrices A0 and Ahlh (h = 1, . . . , q and
lh = 1, . . . , mh ) are all positive semi-definite [23]. By definition, it is clear that matrices
Bhlh BThlh (h = 1, . . . , q and lh = 1, . . . , mh ) are also positive semi-definite. According to
Theorem 7.5.3 in [9], we have
Lemma A.1. (Schur Product Theorem) Let A and B be n × n positive semi-definite ma-
trices, their Schur product A ◦ B is positive semi-definite.
By Lemma A.1, all (Bhlh BThlh ) ◦ Ahlh are positive semi-definite. As the sum of positive
semi-definite matrices are still positive semi-definite, the covariance matrix Cov(y) is positive
semi-definite.
Proof of Lemma 3.5. In the EzGP model with the covariance function in (3.6), the co-
variance matrix of y can be written as Cov(y) = (φ(wi , wj ))n×n = Φ0 + Φ1 + . . . + Φq , where
Φ0 = (σ02 R(xi , xj |θ 0 ))n×n with the Gaussian correlation function R(·|θ) is positive semi-
definite. Let z(h) be the n×1 column vector of the hth qualitative factor. There exists an n×n
permutation matrix P such that Pz(h) is the sorted vector (1, . . . , 1, 2, . . . , 2, mh , . . . , mh )T .
Let ΦTh be the covariance matrix corresponding to the permuted data by P, and ΦTh = PΦh PT .
By (3.5), we have φh ((xi , zih ), (xj , zjh )|Θ(h) ) = 0 when zih 6= zjh , and thus ΦTh is block diag-
onal where (h)
B1
(h)
T
B2
Φh =
..
.
.
(h)
Bmh
(h) (h)
For l = 1, . . . , mh , let nl be the number of level l in z(h) ; Bl = (σh2 R(xi , xj |θ l ))nl ×nl and
(h) (h)
R(xi , xj |θ l ) = exp{− pk=1 θkl (xik − xjk )2 } which is a Gaussian correlation function. Thus,
P
(h)
Bl is positive semi-definite for l = 1, . . . , mh , and then ΦTh is positive semi-definite. Since
ΦTh = PΦh PT , it is straightforward to prove that Φh is also positive semi-definite. If there
(h)
exists an h (h = 1, . . . , q) such that xi 6= xj whenever zih = zjh , all diagonal matrices Bl
(l = 1, . . . , mh ) in ΦTh are positive definite, and thus ΦTh and then Φh are positive definite.
Finally, we have Φ = Φ0 + Φ1 + . . . Φq is positive definite.
where the covariance matrix Φ depends on the parameters σ 2 and Θ. Writing the first order
conditions results in analytical expressions for µ as a function of σ 2 and Θ:
b)T Φ−1 (y − µ
−2l(σ 2 , Θ) = n log(2π) + log |Φ| + (y − µ b).
For any parameters inside Φ, the expression of the analytical gradient given µ
b is:
∂l ∂Φ ∂Φ −1
−2 = tr(Φ−1 b)T Φ−1
) − (y − µ Φ (y − µ
b).
∂• ∂• ∂•
Specifically, for the EzGP model with the covariance function in (3.6), for any i, j = 1, . . . , n,
we have:
p
!
∂Φ X (0) 2
= exp{− θk (xik − xjk ) } ,
∂σ02
k=1 n×n
mh p
∂Φ X X (h)
= I(zih = zjh ≡ lh )exp{− θklh (xik − xjk )2 } , for l = 1, . . . , q,
∂σh2
lh =1 k=1
n×n
p
!
∂Φ X (0)
(0)
= −σ02 (xik∗ 2
− xjk∗ ) exp{− θk (xik 2
− xjk ) } ,
∂θk∗ k=1 n×n
p
!
∂Φ X (h∗ )
(h∗ )
= −σh2∗ (xik∗ − xjk∗ )2 exp{− θkl∗ (xik − xjk )2 }I(zih∗ = zjh∗ ≡ lh∗ ) .
h
∂θk∗ l∗ k=1 n×n
h
The above expressions of likelihoods and analytical gradients also apply to the EEzGP model,
since it is a special case of the EzGP model.
REFERENCES
[1] H. Bhuiyan, J. Chen, M. Khan, and M. V. Marathe, Fast parallel algorithms for edge-switching to
achieve a target visit rate in heterogeneous graphs, in Parallel Processing (ICPP), 2014 43rd Interna-
tional Conference on IEEE, 2014, pp. 60–69.
[2] A. Dean, M. Morris, J. Stufken, and D. Bingham, Handbook of Design and Analysis of Experiments,
Chapman & Hall/CRC, Boca Raton, 2015.
[3] X. Deng, Y. Hung, and C. D. Lin, Design for computer experiments with qualitative and quantitative
factors, Statist. Sinica, 25 (2015), pp. 1567–1581.
20 Q. XIAO, A. MANDAL, CD. LIN AND X. DENG
[4] X. Deng, C. D. Lin, K.-W. Liu, and R. Rowe, Additive Gaussian process for computer models with
qualitative and quantitative factors, Technometrics, 59 (2017), pp. 283–292.
[5] X. Du, R. Grandin, and L. Leifsson, Surrogate modeling of ultrasonic simulations using data-driven
methods, in AIP Conference Proceedings, vol. 36, AIP Publishing, 2017, pp. 150002–1–150002–9.
[6] K.-T. Fang, R. Li, and A. Sudjianto, Design and Modeling for Computer Experiments, Chapman &
Hall/CRC, Boca Raton, 2005.
[7] R. B. Gramacy and D. W. Apley, Local Gaussian process approximation for large computer experi-
ments, J. Comput. Graph. Statist., 24 (2015), pp. 561–578.
[8] G. Han, T. J. Santner, W. I. Notz, and D. L. Bartel, Prediction for computer experiments having
quantitative and qualitative input variables, Technometrics, 51 (2009), pp. 278–288.
[9] R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge university press, New York, 2013.
[10] H. Huang, D. K. Lin, M. Liu, and J. Yang, Computer experiments with both qualitative and quanti-
tative variables, Technometrics, 58 (2016), pp. 495–507.
[11] Y. Hung, Penalized blind kriging in computer experiments, Statist. Sinica, 21 (2011), pp. 1171–1190.
[12] V. R. Joseph and J. D. Delaney, Functionally induced priors for the analysis of experiments, Techno-
metrics, 49 (2007), pp. 1–11.
[13] C. G. Kaufman, D. Bingham, S. Habib, K. Heitmann, and J. A. Frieman, Efficient emulators of
computer experiments using compactly supported correlation functions, with an application to cosmol-
ogy, Ann. Appl. Stat., (2011), pp. 2470–2492.
[14] J. P. Kleijnen, Kriging metamodeling in simulation: a review, European J. Oper. Res., 192 (2009),
pp. 707–716.
[15] C. D. Lin, C. M. Anderson-Cook, M. S. Hamada, L. M. Moore, and R. R. Sitter, Using genetic
algorithms to design experiments: a review, Qual. Reliab. Eng. Int., 31 (2015), pp. 155–167.
[16] C. D. Lin and B. Tang, Latin hypercubes and space-filling designs, Handbook of design and analysis of
experiments, (2015), pp. 593–625.
[17] J. L. Loeppky, J. Sacks, and W. J. Welch, Choosing the sample size of a computer experiment: a
practical guide, Technometrics, 51 (2009), pp. 366–376.
[18] B. MacDonald, P. Ranjan, and H. Chipman, GPfit: an R package for Gaussian process model fitting
using a new optimization algorithm, arXiv preprint arXiv:1305.0759, (2013).
[19] N. J. McMillan, J. Sacks, W. J. Welch, and F. Gao, Analysis of protein activity data by Gaussian
stochastic process models, J. Biopharm. Stat., 9 (1999), pp. 145–160.
[20] W. R. J. Mebane and J. S. Sekhon, Genetic optimization using derivatives: the rgenoud package for
R, Journal of Statistical Software, 42 (2011), pp. 1–26, http://www.jstatsoft.org/v42/i11/.
[21] P. Z. G. Qian, H. Wu, and C. J. Wu, Gaussian process models for computer experiments with qualitative
and quantitative factors, Technometrics, 50 (2008), pp. 383–396.
[22] P. Ranjan, R. Haynes, and R. Karsten, A computationally stable approach to Gaussian process
interpolation of deterministic computer simulation data, Technometrics, 53 (2011), pp. 366–378.
[23] C. E. Rasmussen and C. K. Williams, Gaussian Processes for Machine Learning, MIT press, Cam-
bridge, 2006.
[24] R. K. Rowe and K.-W. Liu, Three-dimensional finite element modelling of a full-scale geosynthetic-
reinforced, pile-supported embankment, Canadian Geotechnical Journal, 52 (2015), pp. 2041–2054.
[25] J. Sacks, W. J. Welch, T. J. Mitchell, and H. P. Wynn, Design and analysis of computer experi-
ments, Statist. Sci., 4 (1989), pp. 409–423.
[26] T. J. Santner, B. J. Williams, and W. I. Notz, The Design and Analysis of Computer Experiments,
Springer, New York, 2003.
[27] L. P. Swiler, P. D. Hough, P. Qian, X. Xu, C. Storlie, and H. Lee, Surrogate models for
mixed discrete-continuous variables, in Constraint Programming and Decision Making, Springer, 2014,
pp. 181–202.
[28] L. Wang, Q. Xiao, H. Xu, et al., Optimal maximin L 1-distance Latin hypercube designs based on
good lattice point designs, Ann. Statist., 46 (2018), pp. 3741–3766.
[29] Q. Xiao, L. Wang, and H. Xu, Application of kriging models for a drug combination experiment on
lung cancer, Stat. Med., 38 (2019), pp. 236–246.
[30] Q. Xiao and H. Xu, Construction of maximin distance Latin squares and related Latin hypercube designs,
Biometrika, 104 (2017), pp. 455–464.
EZGP MODELS FOR QQ FACTORS 21
[31] Q. Xiao and H. Xu, Construction of maximin distance designs via level permutation and expansion,
Statist. Sinica, 28 (2018), pp. 1395–1414.
[32] Q. Zhang, P. Chien, Q. Liu, L. Xu, and Y. Hong, Mixed-input Gaussian process emulators for
computer experiments with a large number of categorical levels, J. Qual. Technol., (2020), pp. 1–11.
[33] Y. Zhang and W. I. Notz, Computer experiments with qualitative and quantitative variables: a review
and reexamination, Quality Engineering, 27 (2015), pp. 2–13.
[34] Y. Zhang, S. Tao, W. Chen, and D. W. Apley, A latent variable approach to Gaussian process
modeling with qualitative and quantitative factors, Technometrics, (2019), pp. 1–12, https://doi.org/
10.1080/00401706.2019.1638834.
[35] Q. Zhou, P. Z. Qian, and S. Zhou, A simple approach to emulation for computer models with qualitative
and quantitative factors, Technometrics, 53 (2011), pp. 266–273.