0% found this document useful (0 votes)

7 views21 pages

Ezgp: Easy-To-Interpret Gaussian Process Models For Computer Experiments With Both Quantitative and Qualitative Factors

Uploaded by

Mika Herinïåïna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views21 pages

Ezgp: Easy-To-Interpret Gaussian Process Models For Computer Experiments With Both Quantitative and Qualitative Factors

Uploaded by

Mika Herinïåïna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

EzGP: Easy-to-Interpret Gaussian Process Models for Computer Experiments

with Both Quantitative and Qualitative Factors∗

Qian Xiao† , Abhyuday Mandal† , C. Devon Lin‡ , and Xinwei Deng §

Abstract. Computer experiments with both quantitative and qualitative (QQ) inputs are commonly used in
science and engineering applications. Constructing desirable emulators for such computer exper-
iments remains a challenging problem. In this article, we propose an easy-to-interpret Gaussian
process (EzGP) model for computer experiments to reflect the change of the computer model under
the different level combinations of qualitative factors. The proposed modeling strategy, based on
an additive Gaussian process, is flexible to address the heterogeneity of computer models involving
multiple qualitative factors. We also develop two useful variants of the EzGP model to achieve com-
arXiv:2203.10130v1 [stat.ME] 18 Mar 2022

putational efficiency for data with high dimensionality and large sizes. The merits of these models
are illustrated by several numerical examples and a real data application.

Key words. Additive model; Big Data; Categorical Data; Emulator; Kriging.

AMS subject classifications. 60G15, 60G25, 62G08, 62M20

1. Introduction. Computer experiments are now ubiquitous in scientific researches and

engineering. The computer models used in computer experiments are often very complex
and computationally expensive, and thus require emulators in the analysis [6]. Gaussian
process (GP) models, a.k.a. Kriging, have been used as a core tool for modeling computer
experiments [6, 26]. The conventional GP models often only consider quantitative inputs;
while many practical applications have both quantitative and qualitative (QQ) inputs, e.g.,
the data center computer experiment [21], the epidemiology study [1], the bio-engineering
computer experiment [8], the study of high performance computing systems [32], and the
finite element modeling of full-scale embankment [4, 24].
For emulating computer experiments with qualitative factors, a naive approach would
conduct distinct GP models for data collected at the different level combinations of the quali-
tative factors. Clearly, such an approach is unwise as there could be many level combinations
of the qualitative factors, and it could overlook possible dependency between responses (or
outputs) at the different level combinations of the qualitative factors [21]. Alternatively, it
would be natural to consider the use of indicator variables, often applied in linear models, to
address GP models with qualitative factors. However, a counter example is given to show
that using indicator variables for qualitative factors in the multiplicative correlation function
is problematic [33]. We will further illustrate this problem in Example 3.7 of subsection 3.1.
In this work, we propose an easy-to-interpret Gaussian process (EzGP) model to appropri-
ately use indicator functions in additive GP models for incorporating qualitative factors with
meaningful interpretations and accurate predictions.
For GP models of computer experiments with both QQ inputs, many existing works focus
∗
Submitted on September 19, 2019.
†
Department of Statistics, University of Georgia, Athens, GA ([email protected], [email protected]).
‡
Department of Mathematics and Statistics, Queen’s University, Kingston, Canada ([email protected]).
§
Department of Statistics, Virginia Tech, Blacksburg, VA ([email protected]).
1
2 Q. XIAO, A. MANDAL, CD. LIN AND X. DENG

on constructing correlations between the levels for each qualitative factor, and then use the
multiplicative structure to link them with the correlation functions for quantitative factors
[21, 35]. Such a multiplicative correlation function requires the “shape” of local variation as
a function of quantitative factors to be the same for all level combinations of the qualitative
factors; that is, the correlation parameters and process variances are the same for different
qualitative level combinations [33]. This is a strong assumption since computer models can be
quite different for distinct qualitative level combinations, especially when there are multiple
qualitative factors. Such a way to construct correlation functions for qualitative factors is
also applied to the additive GP models in [4]. Yet, it may not be interpretable in practice.
As an illustration, consider a computer experiment with one quantitative factor x and two
qualitative factors z1 and z2 each having two nominal values. For its four different qualitative
level combinations, the corresponding computer models are 3x, 4 sin(1.5x), x3 and ln(x), which
are shown in Figure 1. Here, it is not easy to interpret if one simply uses a scalar value, i.e., the
correlation between two levels for each qualitative factor, to quantify the complex relationship
between different functions of computer models [4, 21]. It would be more natural to use
indicator functions to reflect the GP being adjusted from a base GP under the different level
combinations of the qualitative factors.

Figure 1. The response curve with respect to the quantitative factor x under each level combination of the
two qualitative factors
y(x, z1 = 1, z2 = 1) = 3x y(x, z1 = 1, z2 = 2) = 4sin(1.5 x) y(x, z1 = 2, z2 = 1) = x3 y(x, z1 = 1, z2 = 1) = ln(x)
3.0

1.0
4

0
0.8

−1
3
2.0

0.6

−2
2
y

y
0.4

−3
1.0

0.2

−4
0.0

0.0
0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

x x x x

In this article, we first lay out a general additive GP structure, and then make several
reasonable assumptions to appropriately adopt indicator functions in developing the proposed
EzGP model. The proposed method has a clear interpretation of its additive covariance struc-
ture to reflect the relationship between the response and the quantitative factors and identify
how the qualitative factors affect such relations. It is suitable for dealing with discontinuities
in response surfaces due to qualitative factors in computer experiments. The key idea of the
proposed EzGP model is to start with a base GP accounting for only quantitative factors,
and have GP components in an additive fashion to adjust the different level combinations
of the qualitative factors. It follows a similar spirit of using indicator functions in variance
decomposition, but at the scope of Gaussian process under each level of the qualitative fac-
tors. Compared to existing models using scalars to quantify the correlations between levels in
the qualitative factors [4, 21], the EzGP model does not explicitly construct the correlation
functions for the qualitative factors. Instead, it quantifies the relationships among different
response surfaces under the different level combinations of the qualitative factors through an
additive combination of several GPs, which leads to an easy-to-interpret covariance structure.
The EzGP model is proposed for computer experiments with QQ inputs where multiple
EZGP MODELS FOR QQ FACTORS 3

qualitative factors are involved. Specifically, we focus on complex computer experiments where
differences between the computer models for the distinct level combinations are large. In such
cases, the types of functions in computer models can be different, e.g. the toy example in
Figure 1, the simulation example in [33] and some real data in [5, 24]. Based on the EzGP
model, we further develop a variant, the efficient EzGP (EEzGP) model, suitable for computer
experiments with a large number of qualitative factors. We also develop another variant, the
localized EzGP (LEzGP) method, to efficiently deal with large sample sizes.
The remainder of this article is organized as follows. Section 2 provides a brief introduction
to GP models and review existing methods. Section 3 details the proposed EzGP, EEzGP
and LEzGP methods. Section 4 presents several numerical examples and section 5 reports a
real application of the proposed models. Section 6 concludes this work and discusses some
future work. All proofs and technical details are relegated to the Appendix.
2. Notation and Literature Review. In this section, we introduce notation and review
some current literature. Throughout this paper, we consider an n-run computer experiment
with p quantitative factors and q qualitative factors. We denote the ith quantitative factor
as x(i) (i = 1, . . . , p) and the j th qualitative factor as z (j) (j = 1, . . . , q). There are mj levels
({1, . . . , mj }) of the qualitative factor z (j) . Denote the k th (k = 1, . . . , n) data input as wk =
(xTk , zTk )T where xk = (xk1 , . . . , xkp )T ∈ Rp is the quantitative part and zk = (zk1 , . . . , zkq )T ∈
Nq is the qualitative part (coded in levels) of the input. Denote Y (wk ) as the output from
the input wk and the response (or output) vector y = (Y (w1 ), . . . , Y (wn ))T .
In the standard GP model [14, 23, 25], the inputs are all quantitative and the outputs
can be viewed as realizations of a GP. The correlation between outputs is determined by
a stationary correlation function, e.g., Gaussian, power-exponential and Matérn correlation
functions. To model the relationship between outputs Y (x) and inputs x, one popular GP
model, known as an ordinary GP (Kriging) model, assumes,

(2.1) Y (x) = µ + G(x),

where µ is the constant mean, G(x) is a GP with zero mean and the covariance function
φ(·) = σ 2 R(·|θ). A popular choice for R(·|θ) is the Gaussian correlation function
p
X
(2.2) R(xi , xj |θ) = exp{− θk (xik − xjk )2 },
k=1

where two inputs xi = (xi1 , . . . , xip )T and xj = (xj1 , . . . , xjp )T , and the correlation parameters
θ = (θ1 , . . . , θp )T with all θk > 0 (k = 1, . . . , p).
To deal with QQ inputs, a popular GP based model [21, 35] was introduced among many
others [8, 27, 33, 34]. Specifically, an ordinary GP model with a multiplicative covariance
function is considered (for any two inputs w1 and w2 ):
q
Y
(2.3) Cov(Y (w1 ), Y (w2 )) = σ 2 τz(j)
1j z2j
R(x1 , x2 |θ),
j=1

(j)
where the parameter τz1j z2j represents the correlation between two levels (z1j and z2j ) in the
4 Q. XIAO, A. MANDAL, CD. LIN AND X. DENG

(j)
j th qualitative factor z (j) , and R(x1 , x2 |θ) is defined in (2.2). Denote Tj = (τz1j z2j )mj ×mj as
(j)
the correlation matrix for z (j) (j = 1, . . . , q). Three different functions of τz1j z2j can be used:
(j)
1. the exchangeable correlation function (EC) [12]: τz1j z2j = c (0 < c < 1) when z1j 6= z2j ;
(j)
otherwise, τz1j z2j = 1 ;
(j)
2. the multiplicative correlation function (MC) [19]: τz1j z2j = exp{−(θz1j + θz2j )} when
(j)
z1j 6= z2j ; otherwise, τz1j z2j = 1, where θz1j , θz2j > 0;
3. the unrestrictive correlation function (UC) [21, 35]: define Tj = Lj LTj where Lj is
(j) (j) (j)
a lower triangular matrix; for the rth row (lr1 , . . . , lrr ) in Lj , l11 = 1, and for r =
2, . . . , mj ,
 (j)
 lr1 = cos(ϕj,r,1 )

(j)
lrs = sin(ϕj,r,1 ) . . . sin(ϕj,r,s−1 ) cos(ϕj,r,s ), for s = 2, . . . , r − 1
 (j)

lrr = sin(ϕj,r,1 ) . . . sin(ϕj,r,r−1 ),

where ϕj,r,s ∈ (0, π) for s = 1, . . . , r − 1.

We denote these three multiplicative GP models as the EC, MC and UC models, respectively.
An additive GP model was proposed in [4], which adopts an additive covariance function:
q
X
(2.4) Cov(Y (w1 ), Y (w2 )) = σj2 τz(j)
1j z2j
R(x1 , x2 |θ (j) )
j=1

where σj2 and θ (j) (j = 1, . . . , q) are the process variance parameters and the correlation
parameters corresponding to z (j) , respectively. The same as above, three different choices
(j)
of τz1j z2j : the exchangeable, multiplicative and unrestrictive correlation functions, can be
adopted in (2.4). We denote them as the AD EC, AD MC and AD UC models, respectively.
(j)
Note that if any τz1j z2j has a zero (or near zero) value, the overall covariance in (2.3) will be
zero (or near zero). Such problems are avoided in the additive model structure in (2.4).
3. Easy-to-Interpret Gaussian Process (EzGP) Models. In this section, we first lay out
a general additive GP structure and describe in details the proposed EzGP model. Then
we illustrate the Efficient EzGP (EEzGP) model for data with many qualitative factors, and
discuss the Localized EzGP (LEzGP) method for data of large run sizes.
3.1. The EzGP Model. For an n-run computer experiment with p quantitative factors
and q qualitative factors, we model the output at w = (xT , zT )T as

(3.1) Y (w) = µ + Gz (x).

It means that for any level combination of z, Y (w) is a Gaussian process. Specifically, we
consider

(3.2) Gz (x) = G0 (x) + Gz (1) (x) + · · · + Gz (q) (x),

where G0 and Gz (h) (h = 1, . . . q) are independent Gaussian processes with mean zero and the
covariance functions φ0 and φh (h = 1, . . . q), respectively. Here, G0 is a standard GP taking
EZGP MODELS FOR QQ FACTORS 5

only quantitative inputs x, which can be viewed as the base GP reflecting the intrinsic relation
between y and x. The standard Gaussian covariance function is adopted for G0 , which is
p
( )
(0)
X
(3.3) φ0 (xi , xj |θ 0 ) = σ02 exp − θk (xik − xjk )2
,
k=1

(0) (0)
where the correlation parameters in θ 0 = (θ1 , . . . , θp )T are all positive.
The Gz (h) can be viewed as an adjustment to the base GP by the impact of the qualitative
factor z (h) (h = 1, . . . q). It is a GP component concerning the hth qualitative factor coupled
with all quantitative factors. A general covariance function could be
p
( )
(h)
X
(3.4) T T T T
φh ((xi , zih ) , (xj , zjh ) ) = σh2 exp − θkzih zjh (xik − xjk ) 2
τz(h)
ih zjh
,
k=1

(h)
where the correlation parameters θkzih zjh (k = 1, . . . , p) are specific to the pair of levels
(h)
(zih , zjh ) in z (h) , and τzih zjh and σh2 are defined in (2.3) and (2.4), respectively. Clearly, such
a general form involves too many parameters and thus is hard to interpret. Note that the
additive model in [4] can be viewed as a special case of the general model in (3.1) when
(h) (h)
simplifying all θkzih zjh = θk for any (zih , zjh ) in (3.4) and not considering the base G0 in
(3.2). As discussed in section 1, such a simplification may not be reasonable in some practical
cases. Below, we will introduce the EzGP model which simplifies (3.4) in a more meaningful
way. When there are at least two qualitative factors, its structural formulation will be different
(h)
from the additive model in [4] regardless of the choice for τzih zjh .
For the formulation in (3.2), the base GP G0 is adjusted by GP component Gz (h) to
account for the effect of different levels in z (h) . This is analogous to using indicator functions
in variance decomposition. To enable an easy-to-interpret model, we consider the covariance
function of Gz (h) , h = 1, . . . , q as

p
( )
(h)
X
(3.5) φh ((xTi , zih )T , (xTj , zjh )T |Θ(h) ) = σh2 exp − θklh (xik − xjk )2 I(zih = zjh ≡ lh ),
k=1

where zih and zjh are the levels of z (h) in the ith and j th inputs, respectively; σh2 is the
variance parameter for z (h) ; lh takes values in {1, . . . , mh } and mh is the number of levels
(h)
in z (h) ; Θ(h) = (θklh )p×mh is the matrix for correlation parameters; the indicator function
I(zih = zjh ≡ lh ) = 1 for zih = zjh ≡ lh , otherwise 0. Without any prior information,
we can assume that different levels in z (h) will result in different and independent Gaussian
processes, and thus φh ((xTi , zih )T , (xTj , zjh )T |Θ(h) ) = 0 when zih 6= zjh . For distinct levels
(h)
lh in z (h) , parameters θklh are different, thus we have different Gaussian processes to depict
different computer models associated with the different levels of the qualitative factors. Such
a strategy makes the GP model structure parsimonious and easy to interpret, which avoids
directly modeling the correlation functions of qualitative factors.
6 Q. XIAO, A. MANDAL, CD. LIN AND X. DENG

Based on (3.2), (3.3), and (3.5), for any two inputs wi and wj , the covariance function
for the model in (3.1) can be specified by

φ(wi , wj ) = Cov(Y (wi ), Y (wj ))

q
X
= φ0 (xi , xj |θ 0 ) + φh (xi , xj |Θ(h) )
h=1
p
(0)
X
= σ02 exp{− θk (xik − xjk )2 }
k=1
q X
mh p
( )
(h)
X X
(3.6) + I(zih = zjh ≡ lh )σh2 exp − θklh (xik − xjk )2 .
h=1 lh =1 k=1

This aggregated covariance function has (1+p+q+p qh=1 mh ) parameters which are estimated
P
simultaneously via the maximum likelihood estimation. The following example illustrates the
formulation of the EzGP model and its implication.
Example 3.1. Consider a computer experiment with two quantitative factors x(1) and x(2) ,
and two qualitative factors z (1) and z (2) each having two levels. Suppose that three inputs are
w1 = (x11 = a, x12 = b, z11 = 1, z12 = 2)T , w2 = (x21 = c, x22 = d, z21 = 1, z22 = 2)T and
w3 = (x31 = c, x32 = d, z31 = 2, z32 = 1)T where a, b, c and d are arbitrary real numbers.
Here xij and zij represent the j th entry in xi and zi , respectively. According to the covariance
function in (3.6), we have

φ(w1 , w2 )
2 2 2
( ) ( )
(0) (1)
X X X
= σ02 exp − θk (x1k − x2k )2 + σ12 exp − θkl1 (x1k − x2k )2
k=1 l1 =1 k=1
2 2
( )
(2)
X X
× I(z11 = z21 ≡ l1 ) + σ22 exp − θkl2 (x1k − x2k )2 I(z12 = z22 ≡ l2 )
l2 =1 k=1
n o n o
(0) (0) (1) (1)
= σ02 exp −θ1 (a − c)2 − θ2 (b − d)2 + σ12 exp −θ11 (a − c)2 − θ21 (b − d)2
n o
(2) (2)
+ σ22 exp −θ12 (a − c)2 − θ22 (b − d)2 .

Similarly, we have
n o
(0) (0)
φ(w1 , w3 ) = σ02 exp −θ1 (a − c)2 − θ2 (b − d)2 .

Clearly, Cov(Y (w1 ), Y (w2 )) > Cov(Y (w1 ), Y (w3 )). In the EzGPPmodel, it is straightfor-
q 2
ward to derive that all variances are equal; that is, Var(wi ) = i=0 σi . Thus, we have
Cor(Y (w1 ), Y (w2 )) > Cor(Y (w1 ), Y (w3 )), which is meaningful for interpretation. Given the
inputs w2 and w3 having the same quantitative part, it is straightforward that (w1 , w2 ) should
be more similar compared to (w1 , w3 ), since w1 and w2 have the same qualitative part but
w1 and w3 do not. Thus, the correlation between (w1 , w2 ) should be larger than that between
EZGP MODELS FOR QQ FACTORS 7

(w1 , w3 ), when at least one of the qualitative factor is significant (i.e., one of the σ12 and σ22
is not 0). Refer to Example 3.7 for an opposite case.
(h)
Lemma 3.2. Let A0 = (σ02 R(xi , xj |θ 0 ))n×n and Ahlh = (σh2 R(xi , xj |θ lh ))n×n where R(·|θ)
(h) (h)
is defined in (2.2) and θ lh = (θklh )p×1 . The covariance matrix of the output vector y induced
by the covariance function in (3.6) can be written as
q X
X mh
(3.7) Cov(y) = (φ(wi , wj ))n×n = A0 + (Bhlh BThlh ) ◦ Ahlh ,
h=1 lh =1

where ◦ is the Schur product (or Hadamard product). Here Bhlh = Eh (Imh )lh where (Imh )lh
is the lhth column of the identity matrix Imh , mh is the number of levels in z (h) , and Eh is an
n × mh expansion matrix of which each row is the dummy coding for the corresponding level
in z (h) .
Example 3.3. To illustrate matrices Bhlh and Eh in Lemma 3.2, consider the column for
z (1)in a computer experiment with 4 runs. In E1 , we use dummy coding (1, 0, 0), (0, 1, 0) and
(0, 0, 1) to code levels 1, 2 and 3, respectively. For h = 1 and l1 = 2, we have
       
1 1 0 0   0 0 0 0 0
2  0
z (1) =   , B12 = E1 (I3 )2 = 0 1 0 1 = 1 , B12 BT12 = 0 1 0 1 .
    
3 0 0 1 0 0 0 0 0
0
2 0 1 0 1 0 1 0 1

Lemma 3.2 provides insights on the covariance structure of the EzGP model. In (3.7),
matrix A0 serves as the base which corresponds to all quantitative inputs, matrix Bhlh BThlh
selects all pairs of data that satisfy zih = zjh ≡ lh , and matrix Ahlh measures the adjustment
due to the level lh in qualitative factor z (h) (h = 1, . . . , q). Based on Lemma 3.2, we can prove
the following Lemma 3.4.
Lemma 3.4. Given n inputs wi = (xTi , zTi )T (i = 1, . . . , n), the covariance matrix of the
output vector y = (Y (w1 ), . . . , Y (wn ))T induced by the covariance function in (3.6) is positive
semi-definite, i.e., Cov(y) in (3.7) is positive semi-definite.
Lemma 3.4 holds for any w1 , . . . , wn , including duplicated inputs. For appropriate model
inference, Cov(y) needs to be positive definite, and the following Lemma 3.5 and Corollary 3.6
shed some lights on this aspect.
Lemma 3.5. Given n inputs wi = (xTi , zTi )T (i = 1, . . . , n), if there exists an h (h = 1, . . . , q)
such that any two inputs wi and wj (i 6= j) have distinct quantitative parts (xi 6= xj ) whenever
they have the same level in z (h) , the covariance matrix Cov(y) induced by the covariance
function in (3.6) is positive definite.
Corollary 3.6. If there are no duplicated runs in the quantitative part of the design matrix,
that is xi 6= xj for i 6= j, the covariance matrix Cov(y) induced by the covariance function in
(3.6) is positive definite.
Corollary 3.6 is a special case of Lemma 3.5, and its assumption is standard in computer
experiments. If Latin hypercube designs or space-filling designs [2] are used for quantitative
8 Q. XIAO, A. MANDAL, CD. LIN AND X. DENG

factors, it is clear that Cov(y) is positive definite by Corollary 3.6. When the conditions in
Lemma 3.5 are not satisfied, one can simply add a nugget term to make the covariance matrix
positive definite, which is a standard technique in Kriging [14, 23, 22].
Besides the additive covariance function in (3.6), one could think of using indicator func-
tions under a multiplicative covariance structure as
q
Y
∗ 2
φ (wi , wj ) = Cov(Y (wi ), Y (wj )) = σ R(xi , xj |θ 0 ) R(xi , xj |Θ(h) )
h=1
p
( )
(0)
X
= σ 2 exp − θk (xik − xjk )2
k=1
q Y
mh
" ( p
)#I(zih =zjh ≡lh )
(h)
Y X
2
(3.8) × exp − θklh (xik − xjk ) .
h=1 lh =1 k=1

However, such a covariance function may not properly quantify the correlation for two inputs
as illustrated in the following example.
Example 3.7. For the three inputs w1 , w2 and w3 in Example 3.1, under the multiplicative
covariance function in (3.8), we have:
( 2 ) ( 2 )
X (0) X (1)
∗ 2 2 2
φ (w1 , w2 ) = σ exp − θk (x1k − x2k ) exp − θk1 (x1k − x2k )
k=1 k=1
2
( )
(2)
X
× exp − θk2 (x1k − x2k )2
k=1
2 (0) (1) (2) (0) (1) (2)
=σ exp{−(θ1 + θ11 + θ12 )(a − c)2 − (θ2 + θ21 + θ22 )(b − d)2 },

n X2 o
(0) (0) (0)
φ∗ (w1 , w3 ) = σ 2 exp − θk (x1k − x2k )2 = σ 2 exp − θ1 (a − c)2 − θ2 (b − d)2 .

k=1

It is easy to derive that Cor(Y (w1 ), Y (w2 )) < Cor(Y (w1 ), Y (w3 )), which is counter-intuitive
and not interpretable. As shown in Example 3.1, Cor(Y (w1 ), Y (w2 )) should be no less than
Cor(Y (w1 ), Y (w3 )), since (w1 , w2 ) are more similar compared to (w1 , w3 ).
3.2. The Efficient EzGP Pq (EEzGP) Model. The EzGP model with the covariance function
in (3.6) has (2 + p + q + p h=1 mh ) parameters. For data with many qualitative factors, this
number can be quite large, which may result in high prediction variance. In this part, we
propose a so-called Efficient EzGP (EEzGP) model for data with many qualitative factors.
The EEzGP model follows the same (3.1)–(3.3) as the EzGP, but simplifies the correlation
(h) (h)
parameter θklh in (3.5) to θlh . It considers Gz (h) (x) (h = 1, . . . , q) to be a GP with the
covariance function:
( p )
X (h)
(3.9) φh ((xTi , zih )T , (xTj , zjh )T ) = σh2 exp − θlh (xik − xjk )2 I(zih = zjh ≡ lh ).
k=1
EZGP MODELS FOR QQ FACTORS 9

Compared to the covariance function in (3.5) which adopts distinct correlation parameters
(h)
θklh to scale each quantitative factor separately, the covariance function in (3.9) adopts a single
(h)
correlation parameter θlh to scale all quantitative factors together. As the EEzGP model
includes a base GP component G0 where distinct correlation parameters have been used for
different quantitative factors, it may not be necessary to scale each quantitative dimension
again when considering the coupled quantitative effects in the adjustment part Gz (h) (x). Thus,
such a simplification may not sacrifice much in model prediction accuracy. Examples in
sections 4 and 5 will illustrate this point. When using the EEzGP model, we should always
normalize the quantitative factors to [0, 1] range. To avoid over-parameterization in (3.9), we
(h)
fix θ1 = 1 for the first level in z (h) , which can be viewed as a benchmark for the adjustment.
For two inputs wi = (xTi , zTi )T and wj = (xTj , zTj )T , the covariance function φ(wi , wj ) (for
any i, j = 1, . . . , n) in the EEzGP model is

φ(wi , wj ) = Cov(Y (wi ), Y (wj ))

( p )
X
= σ02 exp − θk (xik − xjk )2
k=1
q X
mh p
( )
(h)
X X
(3.10) + I(zih = zjh ≡ lh )σh2 exp − θlh (xik 2
− xjk ) .
h=1 lh =1 k=1

Lemmas 3.4 and 3.5 and Corollary 3.6 in subsection 3.1 also apply to the EEzGP model, since
it is a special case of the EzGP. When Latin hypercube designs or space-filling designs are
used for quantitative factors, the covariance matrix of observed responses Cov(y) induced by
(3.10) is positive definite.
In the EEzGP model, the number of parameters is 2+p+ qh=1 mh , which is much smaller
P
than that in the EzGP model. A rule of thumb for run-size in computer experiments is at
least 10(p + q), ten times of the dimensions [17]. Taking m1 = . . . mq = m for illustration, it
is easy to show that when the number of levels in qualitative factors m ≤ 10, the number of
parameters in the EEzGP model will be less than 10(p + q).
3.3. The Localized EzGP (LEzGP) Method. Note that for the EzGP and EEzGP mod-
els, the computational complexity and memory space complexity are O(n3 ) and O(n2 ), respec-
tively, where n is the size of training data. To facilitate the analysis of data with large size n,
we propose the so-called LEzGP method. Its key idea is to select a proper subset of training
data to fit the EEzGP (or EzGP) model given a target input. For an input w = (xT , zT )T and
a target input w∗ = ((x∗ )T , (z∗ )T )T , we denote Nz (w, w∗ ) to be the number of same levels in
their qualitative parts (between z and z∗ ). For example, when z = (1, 2, 3)T and z∗ = (3, 2, 1)T ,
there is only one same level at the corresponding positions, and thus Nz (w, w∗ ) = 1. The
LEzGP method includes the following three steps:
Step 1. Select an appropriate tuning parameter ns ;
Step 2. For a chosen target input w∗ , select the training data wi (i ∈ {1, . . . , n}) satisfying
Nz (wi , w∗ ) ≥ ns to form the key subset, denoted as Ks ;
Step 3. Use Ks as the new training set and fit it with the EEzGP (or EzGP) model to make
prediction at the target input w∗ in Step 2.
10 Q. XIAO, A. MANDAL, CD. LIN AND X. DENG

Clearly, the value of ns determines the size of the key subset Ks . It means that the data
points in Ks have at least ns number of the same levels as the target input in their qualitative
parts. The following example illustrates the first two steps in the LEzGP method.
Example 3.8. Consider a computer experiment with five runs, one quantitative and four
qualitative factors. Its design matrix D is shown as below. Suppose that the chosen target
input w∗ = (0.3, 1, 2, 3, 1)T and the tuning parameter ns = 3. Then, the key subset Ks will
only include those runs that have at least 3 same levels as w∗ in their qualitative parts z.

x(1) z (1) z (2) z (3) z (4)

 
w1 0.1 1 3 3 1 x(1) z (1) z (2) z (3) z (4)
 
w2 
 0.3 1 2 1 3 
 w1 0.1 1 3 3 1
D= w3 
 0.2 2 2 3 1 ; Ks =
 w3  0.2 2 2 3 1 .
w4  0.9 2 2 3 2  w5 0.5 1 2 3 1
w5 0.5 1 2 3 1

One primary rationale of the LEzGP method is that predictions from the GP model fitted
by a relevant subset of data can be more accurate than those from the GP model fitted by
the entire training set of large data. As shown in [10], the predicted response at target input
(a.k.a. target response) will be less accurate, if its training set contains certain responses
following significantly different GPs compared to that followed by the target response. In
computer experiments with qualitative factors, when an observed input has no or few common
qualitative levels as the target input, their responses may follow different GPs. Thus, for the
data with large size, it would be appropriate to exclude such irrelevant data points in predicting
the target input.
Generally speaking, in the LEzGP method, a larger ns chosen in Step 1 would lead to
a smaller key subset Ks in Step 2 and less computation required in Step 3. A larger ns
and smaller Ks will not necessarily reduce the prediction accuracy. Choosing a proper value
of ns reduces the computational cost, and could improve the prediction accuracy in certain
situations; refer to the Example 4.3 in section 4. The proper selection of ns often depends on
the budget (e.g. available computing resources), the design matrix of the training data and
the target input to be predicted. In practice, budget is often the key constraint. In most
cases, one can easily choose an appropriate ns , because a small difference in ns will lead to a
big difference in the run size of Ks ; see Example 4.3.
For a large-size computer experiment with QQ inputs, one general suggestion on ns is to
choose q/2 < ns 6 nup , where nup is the largest integer such that the size of its key subset
Ks is larger than the number of parameters in the model. A rule of thumb for the choice of
ns would be q/2 < ns 6 n∗up , where n∗up is the integer such that the size of its corresponding
key subset is closest to 10(p + q) [17]. A too small ns will not be desirable since it will require
more computing resources. We suggest to use ns > q/2 here, which can guarantee that every
pair of data points in Ks will have at least one common level in the same qualitative factor.
We would like to note that it is possible for the LEzGP method using other models in its
Step 3. But adopting the EEzGP (or EzGP) model appears to provide better justifications.
The underlying assumption of the LEzGP method is that when two inputs have an increased
number of common levels in qualitative factors, these two inputs are more relevant and thus
EZGP MODELS FOR QQ FACTORS 11

their correlation should increase. In the covariance function (3.10) (or (3.6)), more positive
covariance components due to the same qualitative levels are added when two inputs have
more common levels lh in zh (h = 1, . . . , q), which exactly matches the assumption of the
LEzGP method.
3.4. Parameter Estimation. The EzGP model with the covariance function in (3.6)
(0) (h)
contains the parameters µ, σ02 , θk , σh2 and θklh where h = 1, . . . , q, k = 1, . . . , p and
lh = 1, . . . , mh . Denote vector σ 2 = (σ02 , σ12 , . . . , σq2 )T and matrix Θ = (θ (0) , Θ(1) , . . . , Θ(q) )
(0) (h)
where θ (0) = (θk )p×1 and Θ(h) = (θklh )p×mh . Denote the covariance matrix by Φ =
Φ(σ 2 , Θ) = (φ(wi , wj ))n×n which follows the covariance function in (3.6). Under the GP
model in (3.1) and after dropping some constants, maximizing the log-likelihood function
l(µ, σ 2 , Θ) is equivalent to minimizing log|Φ| + (y − µ1)T Φ−1 (y − µ1). For given σ 2 and
Θ, the maximum likelihood estimator (MLE) of µ is µ b = (1T Φ−1 1)−1 1T Φ−1 y. Thus we can
obtain the profile likelihood for the MLE of σ 2 and Θ :

[σ 2 , Θ] = argmin log |Φ| + (yT Φ−1 y) − (1T Φ−1 1)−1 (1T Φ−1 y)2 .

(3.11)

This minimization problem can be solved via some standard global optimization algorithms
in R or Matlab, such as genetic algorithms [22, 18]. In this work, we adopt the R package
“rgenoud” [20] which combines evolutionary search algorithms [15] with the derivative-based
quasi-Newton methods to solve difficult optimization problems. In particular, we have derived
all parameters’ analytical gradients to facilitate the computation, which are reported in the
Appendix B.
Given parameters µ, σ 2 and Θ, the prediction at a new location w∗ is the condition mean:

(3.12) Yb (w∗ ) = E(Y (w∗ )|y1 , . . . , yn ) = µ

b + γ T Φ−1 (y − µ
b1),

where γ is the covariance vector (φ(w∗ , wi ))n×1 for i = 1, . . . , n. As Yb (w∗ ) is unbiased, we

−1
(1−1T Φ γ )2
obtain MSE(Yb (w∗ )) = Var(Yb (w∗ )), which is expressed as qi=0 σi2 − γ T Φ−1 γ +
P
−1 .
1T Φ 1
∗ th T
For the interpolation property, when w is the i observed input wi , γ is the i row in th

Φ; thus, γ T Φ−1 is the ith row of ΦΦ−1 , which is a row vector with its ith entry being 1 and
otherwise 0. Therefore, it is straightforward to show Yb (w∗ ) = Yb (wi ) = yi by (3.12). Similar
parameter estimations apply for the EEzGP model which is a special case of the EzGP model.
More details on the derivations for the general GP model can be found in [14, 23].
4. Simulation Study. In this section, we use three numerical examples to examine per-
formances of our proposed models. We measure the performance via the root mean square
error (RMSE) for predictions:
v
u nt
u1 X
RMSE = t (Yb (wi ) − Y (wi ))2 ,
nt
i=1

where nt is the number of data points in the test set, Ŷ (wi ) and Y (wi ) are the predicted
and actual responses of the input wi in the test set. In addition, we use the Nash-Sutcliffe
12 Q. XIAO, A. MANDAL, CD. LIN AND X. DENG

efficiency (NSE) [13] to describe model’s goodness-of-fit, which is defined as

Pnt 2
i=1 (Ŷ (wi ) − Y (wi ))
NSE = 1 − P nt
,
2
i=1 (Ŷ (wi ) − Ȳ )

where Ȳ is the average of the predicted responses. The NSE represents an estimate of the
proportion of the response variability explained by the model, which is analogous to the R2 in
linear regression. Generally speaking, a method with a lower RMSE will a yield higher NSE.
Example 4.1. Consider a computer experiment with p = 3 quantitative factors and q = 3
qualitative factors each having 3 levels, and its computer model has the following form (x =
(x1 , x2 , x3 )):
y = fi (x) × (gj (x) + hk (x))
where i, j, k are the levels for the first, second and third qualitative factors, 0 6 xi 6 1 for
i = 1, 2, 3, and we list functions fi , gj and hk as below :
f1 (x) = x1 + x22 + x33 , f2 (x) = x21 + x2 + x33 ,
f3 (x) = x31 + x22 + x3 , g1 (x) = cos(x1 ) + cos(2x2 ) + cos(3x3 ),
g2 (x) = cos(3x1 ) + cos(2x2 ) + cos(x3 ), g3 (x) = cos(2x1 ) + cos(x2 ) + cos(3x3 ),
h1 (x) = sin(x1 ) + sin(2x2 ) + sin(3x3 ), h2 (x) = sin(3x1 ) + sin(2x2 ) + sin(x3 ),
h3 (x) = sin(2x1 ) + sin(x2 ) + sin(3x3 ).
In Example 4.1, the computer model includes both multiplicative and additive structures,
which leads to a fair comparison of different multiplicative and additive GP models. In
Figure 2, we show the boxplots of RMSEs for the EzGP, EEzGP, EC, MC, UC, AD EC,
AD MC and AD UC models over 50 simulations. In each simulation, a 81-run design is
used where three replicates of a 33 full factorial design are adopted for qualitative factors
and a random Latin hypercube design is adopted for quantitative factors. The RMSEs are
computed based on a 1215-run test set consisting of 45 replicates of a 33 full factorial design
for qualitative factors and a random Latin hypercube design for quantitative factors.
Figure 2 clearly shows that the EzGP and EEzGP models perform better than other mod-
els with smaller RMSEs. Here, the EzGP model performs the best, and it has more parameters
than the EEzGP model. For experiments with a relatively small number of quantitative and
qualitative factors, the EzGP model is usually preferred due to its flexibility. The median
NSE value for the EzGP model here is as high as 0.92, which is analogous to achieving an
R2 = 0.92 in linear regression. Thus, the EzGP model fits the data well. It should be noted
that the multiplicative models EC, MC and UC perform better than the additive models:
AD EC, AD MC and AD UC here. Thus, the success of EzGP and EEzGP methods is not
because this simulation setting is in favor of additive models. In this example, the computer
models are of different expressions for the distinct level combinations of the qualitative factors.
The key idea of the proposed models is using the indicator functions appropriately in the GP
covaraince function to make the response surface different under the different level combina-
tions of the qualitative factors. Thus, the superior performances of the EzGP and EEzGP
methods here could be explained by using the meaningful additive covariance structures via
the indicator functions.
EZGP MODELS FOR QQ FACTORS 13

Figure 2. The boxplots of RMSEs for different models in Example 4.1

● ●
●

0.7
0.6
●
RMSE
●
0.5
0.4
0.3

EzGP EEzGP EC MC UC AD_EC AD_MC AD_UC

Models

Example 4.2. Consider a computer experiment with p = 9 quantitative factors and q = 9

qualitative factors each having 3 levels, and the computer model has the following form:
(1) (1) (2) (2)
y =fi1 (x1 , x2 , x3 )gj1 (x1 , x2 , x3 ) + fi2 (x4 , x5 , x6 )gj2 (x4 , x5 , x6 )+
(3) (3) (1) (1)
fi3 (x7 , x8 , x9 )gj3 (x7 , x8 , x9 ) + fi1 (x7 , x8 , x9 )hk1 (x7 , x8 , x9 )+
(2) (2) (3) (3)
fi2 (x4 , x5 , x6 )hk2 (x4 , x5 , x6 ) + fi3 (x1 , x2 , x3 )hk3 (x1 , x2 , x3 ),

where 0 6 xi 6 1 for i = 1, . . . , 9. Here, the nine qualitative factors z (1) , . . . , z (9) cor-
respond to functions f (1) , f (2) , f (3) , g (1) , g (2) , g (3) , h(1) , h(2) and h(3) , respectively, and
i1 , i2 , i3 , j1 , j2 , j3 , k1 , k2 , k3 ∈ {1, 2, 3} are the levels for these nine qualitative factors. We list
functions f , g and h as below:

fs(l) (a, b, c) = a(r1 +1) + b(r2 +1) + c(r3 +1) ,

gs(l) (a, b, c) = cos((r2 + 1)a) + cos((r1 + 1)b) + cos((r3 + 1)c),
h(l)
s (a, b, c) = sin((r3 + 1)a) + sin((r2 + 1)b) + sin((r1 + 1)c),

where parameters r1 = s + l + 1 (mod 3), r2 = s + l + 2 (mod 3) and r3 = s + l (mod 3) for

l = 1, 2, 3 and s = 1, 2, 3.
In Example 4.2, the computer experiment has many factors and very complex computer
models, which is suitable to test emulator’s prediction power and robustness. In Figure 3, we
display the boxplots of the RMSEs for each model over 50 simulations. In each simulation,
a 243-run design is adopted, where a space-filling 3-level orthogonal array [31] is used for
the qualitative factors and a random Latin hypercube design is used for the quantitative
factors. The RMSEs are computed based on a 1215-run test set consisting of a random 3-level
fractional factorial design for the qualitative factors and a random Latin hypercube design
for the quantitative factors. From Figure 3, we can see that the EEzGP model outperforms
all others in terms of the median RMSE. It is also the most stable model if we look at the
worst case scenario. The median NSE for the EEzGP model is 0.76, which is good in practice.
14 Q. XIAO, A. MANDAL, CD. LIN AND X. DENG

As illustrated in subsection 3.2, it is not recommended to use the EzGP model for computer
experiments with many factors, and thus we do not compare it here.

Figure 3. The boxplots of RMSEs for different models in Example 4.2

●
●
1.3

● ●
●
1.2

●
●
RMSE

1.1
1.0
0.9

EEzGP EC MC UC AD_EC AD_MC AD_UC

Models

Example 4.3. This example is to examine the performance of the proposed LEzGP method.
Consider a computer experiment with n = 19,683 runs, p = 9 quantitative factors and q = 9
qualitative factors each having 3 levels. The computer models are the same as those in Exam-
ple 4.2. A 19,683-run design is use with a random Latin hypercube design for the quantitative
factors and a 39 full factorial for the qualitative factors. The RMSEs are computed based on
a test set consisting of m = 100 data points where a random Latin hypercube design used for
the quantitative factors and a single random level combination used for the qualitative factors.
We replicate this simulation 50 times and display the boxplots of RMSEs in Figure 4.
For such a computer experiment with a large run size n, it is difficult to directly apply
existing GP models (EzGP, EEzGP, EC, MC, UC, AD EC, AD MC and AD UC models).
Thus, the proposed LEzGP method is in a better position for evaluation. For the LEzGP,
it is straightforward to show that the number of data points in the key subset Ks is mq [1 −
P ns −1 q i q−i ] with the tuning parameter n . We set the tuning parameter
i=0 i (1/m) (1 − 1/m) s
ns = 7 according to the rule of thumb in subsection 3.3, and consequently the LEzGP method
selects a Ks of 163 training data from the overall 19,683 ones. The LEzGP method significantly
reduces the computation and memory space required in model estimation.
In Figure 4, we compare the performance of the LEzGP method with that of the EEzGP
model in Example 4.2, since both examples use the same computer model. From Figure 4, the
LEzGP method can provide more accurate predictions using only 163 training data, compared
with the EEzGP model using 243 training data. The median NSE for the LEzGP method
here is 0.87, larger than that of 0.76 for the EEzGP model. Moreover, the success of LEzGP
method also provides some justifications on the assumptions of our proposed models: a data
point will not contribute much to the prediction of the target input, if it has no same level
as the target in their qualitative parts. Note that when the tuning parameter ns is 5, 6, 7,
8 or 9 (ns must be an integer), the corresponding training set Ks will include 2851, 835, 163,
19 and 1 runs, respectively. Clearly, a Ks with 1 or 19 runs is too small and a Ks with 2851
EZGP MODELS FOR QQ FACTORS 15

runs can be too big for the LEzGP model with 38 parameters in this example. Additional
results have shown that using ns = 6 (with 835 runs) and ns = 7 (with 163 runs) will lead to
very similar performances on predictions. Thus, the rule of thumb ns = 7 is preferred, which
requires much less computation.

Figure 4. The boxplots of RMSEs for the LEzGP in Example 4.3 and the EEzGP in Example 4.2

1.1
1.0
0.9
RMSE

0.8
0.7
0.6

LEzGP EEzGP

Models

5. Real Data Analysis. In this section, we apply the proposed models to a real computer
experiment with p = 1 quantitative factor and q = 3 qualitative factors. A fully 3D coupled
finite element model has been calibrated and verified by successfully modeling the performance
of a full-scale embankment constructed on soft soil [24]. The following Figure 5 (source
from [4]) illustrates the structure of this full scale embankment where sub-figure (a) is the
finite element mesh and sub-figure (b) is the schematic view of embankment constructed
on foundation soil. The finite element discretization here had 36,802 elements and 69,667
nodes. The average run-time for one case of this size is approximately 9 hours via a 12-
noded super-computer at the High Performance Computing Virtual Laboratory (HPCVL). In
this computer experiment, the three qualitative factors are “embankment construction rate”
(z (1) = 1, 5, 10 m/month), “Young’s modulus of columns” (z (2) = 50, 100, 200 MPa), and
“reinforcement stiffness” (z (3) = 1578, 4800, 8000 kN/m). The single quantitative factor x(1)
is the distance from the embankment shoulder to the embankment center line. The response
here is the final embankment crest settlement, which is an important embankment working
indicator. The training set of this computer experiment has 261 runs. The quantitative factor
x(1) takes the 29 values uniformly from the interval [0, 14]. For each distinct value of x(1) , a
9-run, 3-factor and 3-level fractional factorial design is used for the qualitative factors. The
test set has 29 runs where x(1) takes the 29 values uniformly from the interval [0, 14] and
(z (1) , z (2) , z (3) ) = (5, 100, 4800). Note that such a setting of qualitative factors is not used in
the training set.
To evaluate the proposed methods, we compare the EzGP and EEzGP models with the
EC, MC, UC and AD UC models as in [4]. We repeat each model estimation 100 times as in
[4]. Figure 6(a) shows the boxplots of log(RMSE) for different models, and it clearly shows
that the EzGP, EEzGP and AD UC models perform much better than the EC, MC and UC
16 Q. XIAO, A. MANDAL, CD. LIN AND X. DENG

Figure 5. An illustration of the full scale embankment structure

models. Then, we further compare the EzGP, EEzGP and AD UC models in Figure 6(b) and
Table 1. For the AD UC method, we exclude outliers in Figure 6(b). From the figure and
table, it is clear that the EzGP model performs the best in terms of both mean and median
log(RMSE), and it is also the most robust one with the smallest standard deviation. Note
that there is only one quantitative factor and three qualitative factors here. For cases with
only a few factors, the EzGP model is usually preferred due to its flexibility. The average
NSE for the EzGP model is 0.77 which is viewed to be high in practice. In the EzGP model,
the estimate of σ02 appears to be the largest among those for σ12 , σ22 and σ32 in each replication.
This indicates a significant base GP between the output and the quantitative inputs. It makes
practical sense that the distance from the embankment shoulder to the embankment center
line has significant impact on the final embankment crest settlement [24]. In addition, the
estimate of variance parameter σ12 is larger than that of σ22 and σ32 , which suggests that the
embankment construct rate (z (1) ) may have stronger impact on the output compared with
the other two qualitative factors.

Figure 6. The boxplots of log(RMSE) for different models

(a) (b)

●
4

−3.015

●
2

●
−3.020
Log(RMSE)

Log(RMSE)

●
●
●
0

−3.025
−2

−3.030

●
●
● ● ●
● ●

EzGP EEzGP EC MC UC AD_UC EzGP EEzGP AD_UC

Models Models

6. Discussion. In this work, we propose the EzGP model for computer experiments with
both quantitative and qualitative factors, and develop its two useful variants, EEzGP for
EZGP MODELS FOR QQ FACTORS 17

Table 1
Comparison between methods in terms of Log(RMSE)

Mean Median SD
EzGP −3.026 −3.026 0.0005
EEzGP −3.025 −3.025 0.0040
AD UC −3.021 −3.021 0.0056

data with many factors and LEzGP for data with many runs. The proposed models have
easy-to-interpret covariance structures and can provide desirable prediction performances.
Specifically, the proposed models are suitable for handling complex computer experiments
with quantitative factors and multiple qualitative factors, where the computer models are
very different for the distinct level combinations of the qualitative factors. Note that the pro-
posed methods quantify the underlying response surfaces of the quantitative factors differently
under the different level combinations of the qualitative factors via the additive GP structure.
Hence, it is more flexible in terms of quantifying the variance and correlation structure of
the quantitative factors compared to [21], while it could be a bit more restrictive in terms of
quantifying the correlation of the qualitative factors due to the use of indicator functions.
The current paper focuses on the “first-order” GP components Gz (h) (h = 1, . . . q) in the
EzGP framework, which is analogous to the main-effect under the context of GPs. A further
research can include the “second-order” GP components Gz (h) z (s) (h, s = 1, . . . q and h 6= s)
which can be viewed as the adjustment by the interaction of z (h) and z (s) . One can consider
its covariance function as φhs ((xTi , zih , zis )T , (xTj , zjh , zjs )T ) = I(zih = zjh ≡ lh )I(zis = zjs ≡
p
2 exp{− (hs)
θklh ls (xik − xjk )2 }. However, such an EzGP (or EEzGP) model may contain
P
ls )σhs
k=1
too many parameters, and thus may over-fit the data in practice.
Here, we would like to remark that the proposed EzGP framework can provide good
interpretations on the importance of qualitative factors via the variance parameters. To get
robust variance parameter estimations and alleviate too complex model structures, one could
add a penalty term of the variance parameters to the likelihood function in the proposed
models. Adding a penalty term for GP modeling is used in the literature [11], and variable
screening for computer experiments with QQ inputs can be another topic of future research.
It will be an interesting investigation to further enhance the LEzGP method. Better
strategies of selecting the tuning parameter ns need to be investigated. Other methods in
selecting subsets may also be useful in the LEzGP method, e.g. the localization method in
[7]. In addition, one issue of the current LEzGP method is that when there are many different
level combinations of qualitative factors in the target inputs, the model estimation can still
be computationally cumbersome, if the goal is to predict the whole response surface. One
possible solution is to arrange the target inputs into a few groups according to their level
combinations, and then apply a more flexible LEzGP method to each of these groups.
Good experimental designs usually have significant impacts on both computer and physical
experiments [6, 26, 29]. For the standard GP models, space-filling designs are usually preferred
[30, 16, 28]. The marginally coupled designs were proposed for computer experiments with
18 Q. XIAO, A. MANDAL, CD. LIN AND X. DENG

QQ inputs [3], but their run and factor sizes are not flexible. Construction of good space-filling
designs of flexible sizes for GP models with QQ inputs remains a challenging problem.

Appendix A. Proofs for Theoretical Results.

Proof of Lemma 3.4. By Lemma 3.2, we have
q X
X mh
Cov(y) = (φ(wi , wj ))n×n = A0 + (Bhlh BThlh ) ◦ Ahlh .
h=1 lh =1

(h)
Since A0 = (σ02 R(xi , xj |θ 0 ))n×n and Ahlh = (σh2 R(xi , xj |θ lh ))n×n with the Gaussian cor-
relation function R(·|θ), it is straightforward that matrices A0 and Ahlh (h = 1, . . . , q and
lh = 1, . . . , mh ) are all positive semi-definite [23]. By definition, it is clear that matrices
Bhlh BThlh (h = 1, . . . , q and lh = 1, . . . , mh ) are also positive semi-definite. According to
Theorem 7.5.3 in [9], we have
Lemma A.1. (Schur Product Theorem) Let A and B be n × n positive semi-definite ma-
trices, their Schur product A ◦ B is positive semi-definite.
By Lemma A.1, all (Bhlh BThlh ) ◦ Ahlh are positive semi-definite. As the sum of positive
semi-definite matrices are still positive semi-definite, the covariance matrix Cov(y) is positive
semi-definite.
Proof of Lemma 3.5. In the EzGP model with the covariance function in (3.6), the co-
variance matrix of y can be written as Cov(y) = (φ(wi , wj ))n×n = Φ0 + Φ1 + . . . + Φq , where
Φ0 = (σ02 R(xi , xj |θ 0 ))n×n with the Gaussian correlation function R(·|θ) is positive semi-
definite. Let z(h) be the n×1 column vector of the hth qualitative factor. There exists an n×n
permutation matrix P such that Pz(h) is the sorted vector (1, . . . , 1, 2, . . . , 2, mh , . . . , mh )T .
Let ΦTh be the covariance matrix corresponding to the permuted data by P, and ΦTh = PΦh PT .
By (3.5), we have φh ((xi , zih ), (xj , zjh )|Θ(h) ) = 0 when zih 6= zjh , and thus ΦTh is block diag-
onal where  (h) 
B1
 (h) 
T
 B2 
Φh =  
..
.
.

 
(h)
Bmh
(h) (h)
For l = 1, . . . , mh , let nl be the number of level l in z(h) ; Bl = (σh2 R(xi , xj |θ l ))nl ×nl and
(h) (h)
R(xi , xj |θ l ) = exp{− pk=1 θkl (xik − xjk )2 } which is a Gaussian correlation function. Thus,
P
(h)
Bl is positive semi-definite for l = 1, . . . , mh , and then ΦTh is positive semi-definite. Since
ΦTh = PΦh PT , it is straightforward to prove that Φh is also positive semi-definite. If there
(h)
exists an h (h = 1, . . . , q) such that xi 6= xj whenever zih = zjh , all diagonal matrices Bl
(l = 1, . . . , mh ) in ΦTh are positive definite, and thus ΦTh and then Φh are positive definite.
Finally, we have Φ = Φ0 + Φ1 + . . . Φq is positive definite.

Appendix B. Expressions of Likelihoods and Analytical Gradients. Under notations in

EZGP MODELS FOR QQ FACTORS 19

section 3, the likelihood is

1 1
(B.1) L= n/2 1/2
exp − (y − µ1n )T Φ−1 (y − µ1n )
(2π) |Φ| 2

where the covariance matrix Φ depends on the parameters σ 2 and Θ. Writing the first order
conditions results in analytical expressions for µ as a function of σ 2 and Θ:

b = (1T Φ−1 1)−1 1T Φ−1 y.

Therefore maximizing the likelihood in (B.1) is equivalent to maximizing the “concentrated”

log-likelihood obtained by plugging in the expression of µ (over the parameters σ 2 and Θ):

b)T Φ−1 (y − µ
−2l(σ 2 , Θ) = n log(2π) + log |Φ| + (y − µ b).

For any parameters inside Φ, the expression of the analytical gradient given µ
b is:
∂l ∂Φ ∂Φ −1
−2 = tr(Φ−1 b)T Φ−1
) − (y − µ Φ (y − µ
b).
∂• ∂• ∂•
Specifically, for the EzGP model with the covariance function in (3.6), for any i, j = 1, . . . , n,
we have:
p
!
∂Φ X (0) 2
= exp{− θk (xik − xjk ) } ,
∂σ02
k=1 n×n
 
mh p
∂Φ X X (h)
= I(zih = zjh ≡ lh )exp{− θklh (xik − xjk )2 } , for l = 1, . . . , q,
∂σh2
lh =1 k=1
n×n
p
!
∂Φ X (0)
(0)
= −σ02 (xik∗ 2
− xjk∗ ) exp{− θk (xik 2
− xjk ) } ,
∂θk∗ k=1 n×n

p
!
∂Φ X (h∗ )
(h∗ )
= −σh2∗ (xik∗ − xjk∗ )2 exp{− θkl∗ (xik − xjk )2 }I(zih∗ = zjh∗ ≡ lh∗ ) .
h
∂θk∗ l∗ k=1 n×n
h

The above expressions of likelihoods and analytical gradients also apply to the EEzGP model,
since it is a special case of the EzGP model.

REFERENCES

[1] H. Bhuiyan, J. Chen, M. Khan, and M. V. Marathe, Fast parallel algorithms for edge-switching to
achieve a target visit rate in heterogeneous graphs, in Parallel Processing (ICPP), 2014 43rd Interna-
tional Conference on IEEE, 2014, pp. 60–69.
[2] A. Dean, M. Morris, J. Stufken, and D. Bingham, Handbook of Design and Analysis of Experiments,
Chapman & Hall/CRC, Boca Raton, 2015.
[3] X. Deng, Y. Hung, and C. D. Lin, Design for computer experiments with qualitative and quantitative
factors, Statist. Sinica, 25 (2015), pp. 1567–1581.
20 Q. XIAO, A. MANDAL, CD. LIN AND X. DENG

[4] X. Deng, C. D. Lin, K.-W. Liu, and R. Rowe, Additive Gaussian process for computer models with
qualitative and quantitative factors, Technometrics, 59 (2017), pp. 283–292.
[5] X. Du, R. Grandin, and L. Leifsson, Surrogate modeling of ultrasonic simulations using data-driven
methods, in AIP Conference Proceedings, vol. 36, AIP Publishing, 2017, pp. 150002–1–150002–9.
[6] K.-T. Fang, R. Li, and A. Sudjianto, Design and Modeling for Computer Experiments, Chapman &
Hall/CRC, Boca Raton, 2005.
[7] R. B. Gramacy and D. W. Apley, Local Gaussian process approximation for large computer experi-
ments, J. Comput. Graph. Statist., 24 (2015), pp. 561–578.
[8] G. Han, T. J. Santner, W. I. Notz, and D. L. Bartel, Prediction for computer experiments having
quantitative and qualitative input variables, Technometrics, 51 (2009), pp. 278–288.
[9] R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge university press, New York, 2013.
[10] H. Huang, D. K. Lin, M. Liu, and J. Yang, Computer experiments with both qualitative and quanti-
tative variables, Technometrics, 58 (2016), pp. 495–507.
[11] Y. Hung, Penalized blind kriging in computer experiments, Statist. Sinica, 21 (2011), pp. 1171–1190.
[12] V. R. Joseph and J. D. Delaney, Functionally induced priors for the analysis of experiments, Techno-
metrics, 49 (2007), pp. 1–11.
[13] C. G. Kaufman, D. Bingham, S. Habib, K. Heitmann, and J. A. Frieman, Efficient emulators of
computer experiments using compactly supported correlation functions, with an application to cosmol-
ogy, Ann. Appl. Stat., (2011), pp. 2470–2492.
[14] J. P. Kleijnen, Kriging metamodeling in simulation: a review, European J. Oper. Res., 192 (2009),
pp. 707–716.
[15] C. D. Lin, C. M. Anderson-Cook, M. S. Hamada, L. M. Moore, and R. R. Sitter, Using genetic
algorithms to design experiments: a review, Qual. Reliab. Eng. Int., 31 (2015), pp. 155–167.
[16] C. D. Lin and B. Tang, Latin hypercubes and space-filling designs, Handbook of design and analysis of
experiments, (2015), pp. 593–625.
[17] J. L. Loeppky, J. Sacks, and W. J. Welch, Choosing the sample size of a computer experiment: a
practical guide, Technometrics, 51 (2009), pp. 366–376.
[18] B. MacDonald, P. Ranjan, and H. Chipman, GPfit: an R package for Gaussian process model fitting
using a new optimization algorithm, arXiv preprint arXiv:1305.0759, (2013).
[19] N. J. McMillan, J. Sacks, W. J. Welch, and F. Gao, Analysis of protein activity data by Gaussian
stochastic process models, J. Biopharm. Stat., 9 (1999), pp. 145–160.
[20] W. R. J. Mebane and J. S. Sekhon, Genetic optimization using derivatives: the rgenoud package for
R, Journal of Statistical Software, 42 (2011), pp. 1–26, http://www.jstatsoft.org/v42/i11/.
[21] P. Z. G. Qian, H. Wu, and C. J. Wu, Gaussian process models for computer experiments with qualitative
and quantitative factors, Technometrics, 50 (2008), pp. 383–396.
[22] P. Ranjan, R. Haynes, and R. Karsten, A computationally stable approach to Gaussian process
interpolation of deterministic computer simulation data, Technometrics, 53 (2011), pp. 366–378.
[23] C. E. Rasmussen and C. K. Williams, Gaussian Processes for Machine Learning, MIT press, Cam-
bridge, 2006.
[24] R. K. Rowe and K.-W. Liu, Three-dimensional finite element modelling of a full-scale geosynthetic-
reinforced, pile-supported embankment, Canadian Geotechnical Journal, 52 (2015), pp. 2041–2054.
[25] J. Sacks, W. J. Welch, T. J. Mitchell, and H. P. Wynn, Design and analysis of computer experi-
ments, Statist. Sci., 4 (1989), pp. 409–423.
[26] T. J. Santner, B. J. Williams, and W. I. Notz, The Design and Analysis of Computer Experiments,
Springer, New York, 2003.
[27] L. P. Swiler, P. D. Hough, P. Qian, X. Xu, C. Storlie, and H. Lee, Surrogate models for
mixed discrete-continuous variables, in Constraint Programming and Decision Making, Springer, 2014,
pp. 181–202.
[28] L. Wang, Q. Xiao, H. Xu, et al., Optimal maximin L 1-distance Latin hypercube designs based on
good lattice point designs, Ann. Statist., 46 (2018), pp. 3741–3766.
[29] Q. Xiao, L. Wang, and H. Xu, Application of kriging models for a drug combination experiment on
lung cancer, Stat. Med., 38 (2019), pp. 236–246.
[30] Q. Xiao and H. Xu, Construction of maximin distance Latin squares and related Latin hypercube designs,
Biometrika, 104 (2017), pp. 455–464.
EZGP MODELS FOR QQ FACTORS 21

[31] Q. Xiao and H. Xu, Construction of maximin distance designs via level permutation and expansion,
Statist. Sinica, 28 (2018), pp. 1395–1414.
[32] Q. Zhang, P. Chien, Q. Liu, L. Xu, and Y. Hong, Mixed-input Gaussian process emulators for
computer experiments with a large number of categorical levels, J. Qual. Technol., (2020), pp. 1–11.
[33] Y. Zhang and W. I. Notz, Computer experiments with qualitative and quantitative variables: a review
and reexamination, Quality Engineering, 27 (2015), pp. 2–13.
[34] Y. Zhang, S. Tao, W. Chen, and D. W. Apley, A latent variable approach to Gaussian process
modeling with qualitative and quantitative factors, Technometrics, (2019), pp. 1–12, https://doi.org/
10.1080/00401706.2019.1638834.
[35] Q. Zhou, P. Z. Qian, and S. Zhou, A simple approach to emulation for computer models with qualitative
and quantitative factors, Technometrics, 53 (2011), pp. 266–273.

Statistical Formulae and Tables
No ratings yet
Statistical Formulae and Tables
12 pages
Homework 1 Tarea 1
100% (1)
Homework 1 Tarea 1
11 pages
Development of Robust Design Under Contaminated and Non-Normal Data
No ratings yet
Development of Robust Design Under Contaminated and Non-Normal Data
21 pages
Métodos numéricos aplicados a Ingeniería: Casos de estudio usando MATLAB
From Everand
Métodos numéricos aplicados a Ingeniería: Casos de estudio usando MATLAB
Héctor Jorquera González
5/5 (1)
Theory and Application of The Linear Models - F. Graybill
33% (3)
Theory and Application of The Linear Models - F. Graybill
18 pages
Those Lazy Hazy Crazy Days of Summer Sheet Music Nat King Cole
100% (1)
Those Lazy Hazy Crazy Days of Summer Sheet Music Nat King Cole
3 pages
Energetic Variational Gaussian Process Regression For Computer Experiments
No ratings yet
Energetic Variational Gaussian Process Regression For Computer Experiments
19 pages
Journal of Statistical Software
No ratings yet
Journal of Statistical Software
23 pages
SAS Program For Student
No ratings yet
SAS Program For Student
13 pages
Menu - 634641189420527500 - CS5106 Soft Computing Lab Assignments
0% (1)
Menu - 634641189420527500 - CS5106 Soft Computing Lab Assignments
3 pages
Grubbs' Outlier Test
No ratings yet
Grubbs' Outlier Test
2 pages
Gaussian Process Regression With Heteroscedastic Residuals
No ratings yet
Gaussian Process Regression With Heteroscedastic Residuals
15 pages
Plackett 1946
No ratings yet
Plackett 1946
21 pages
Exact Inference For Gaussian Process Regression in Case of Big Data With The Cartesian Product Structure
No ratings yet
Exact Inference For Gaussian Process Regression in Case of Big Data With The Cartesian Product Structure
10 pages
Durrande 2020
No ratings yet
Durrande 2020
90 pages
2.2 Simulación de Un Modelo Tridimensional en SPSS
No ratings yet
2.2 Simulación de Un Modelo Tridimensional en SPSS
8 pages
Computer Experiments
No ratings yet
Computer Experiments
48 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Defect Prediction in Software Development & Maintainence
From Everand
Defect Prediction in Software Development & Maintainence
Rudra Kumar
No ratings yet
General Introduction To Design of Experiments (DOE) : July 2011
No ratings yet
General Introduction To Design of Experiments (DOE) : July 2011
7 pages
EScholarship UC Item 5d01s25c
No ratings yet
EScholarship UC Item 5d01s25c
177 pages
II PU STATISTICSudupi
No ratings yet
II PU STATISTICSudupi
4 pages
Catreg
No ratings yet
Catreg
12 pages
Practical Heteros Ceda Stic GP Bino Is
No ratings yet
Practical Heteros Ceda Stic GP Bino Is
15 pages
Statistics Formula List
No ratings yet
Statistics Formula List
8 pages
Snelson 2005 Sparse Gps
No ratings yet
Snelson 2005 Sparse Gps
8 pages
The Design of Optimum Multifactorial Experiments
No ratings yet
The Design of Optimum Multifactorial Experiments
22 pages
Inse 6220
No ratings yet
Inse 6220
51 pages
Agustin Et Al Comput Stat & Data Anal. 2012
No ratings yet
Agustin Et Al Comput Stat & Data Anal. 2012
6 pages
Deep Gaussian Covariance Network
No ratings yet
Deep Gaussian Covariance Network
14 pages
1) Common Univariate Summaries: I) I) Iii) I) Ii)
No ratings yet
1) Common Univariate Summaries: I) I) Iii) I) Ii)
5 pages
Instructors Manual
No ratings yet
Instructors Manual
60 pages
Formula List MF15
No ratings yet
Formula List MF15
8 pages
Sahu 2021
No ratings yet
Sahu 2021
8 pages
American Statistical Association, Taylor & Francis, Ltd. The American Statistician
No ratings yet
American Statistical Association, Taylor & Francis, Ltd. The American Statistician
8 pages
Reliability Design Based On Dynamic Factorial Experimental Model
No ratings yet
Reliability Design Based On Dynamic Factorial Experimental Model
7 pages
T9 Factorial 2nd Part
No ratings yet
T9 Factorial 2nd Part
8 pages
Interval Estimation For The Correlation Coefficient: New Confidence Intervals
No ratings yet
Interval Estimation For The Correlation Coefficient: New Confidence Intervals
1 page
Joining Instructions Lisboa
No ratings yet
Joining Instructions Lisboa
8 pages
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
An Approach For The Quantification of Qualitative Sensory Variables Using Orthogonal Polynomials
No ratings yet
An Approach For The Quantification of Qualitative Sensory Variables Using Orthogonal Polynomials
9 pages
Exact Gaussian Processes On A Million Data Points: Equal Contribution
No ratings yet
Exact Gaussian Processes On A Million Data Points: Equal Contribution
13 pages
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet
Gaussian Process Emulation of Dynamic Computer Codes (Conti, Gosling Et Al)
No ratings yet
Gaussian Process Emulation of Dynamic Computer Codes (Conti, Gosling Et Al)
14 pages
EJMCM Volume 7 Issue 10 Pages 1400-1409
No ratings yet
EJMCM Volume 7 Issue 10 Pages 1400-1409
10 pages
Hyper-Parameter Initialization For Squared Exponential Kernel-Based Gaussian Process Regression
No ratings yet
Hyper-Parameter Initialization For Squared Exponential Kernel-Based Gaussian Process Regression
6 pages
R.A. Thisted - Elements of Statistical Computing - Numerical Computation-Routledge (1988)
100% (4)
R.A. Thisted - Elements of Statistical Computing - Numerical Computation-Routledge (1988)
456 pages
An Efficient and Flexible Mechanism For Constructing Membership
No ratings yet
An Efficient and Flexible Mechanism For Constructing Membership
12 pages
An e Cient and Exible Mechanism For Constructing Membership Functions
No ratings yet
An e Cient and Exible Mechanism For Constructing Membership Functions
12 pages
N (X, Average, Stdev)
No ratings yet
N (X, Average, Stdev)
8 pages
MF15
100% (1)
MF15
8 pages
Exegeses ANOVA III
No ratings yet
Exegeses ANOVA III
26 pages
Spring07 OBrien T
No ratings yet
Spring07 OBrien T
40 pages
Ajuste de Curvas
No ratings yet
Ajuste de Curvas
35 pages
ASM Compre Paper (Sem-I) (2021-22)
No ratings yet
ASM Compre Paper (Sem-I) (2021-22)
2 pages
Functional Data Analysis With PACE: Kehui Chen
No ratings yet
Functional Data Analysis With PACE: Kehui Chen
37 pages
II Pu Formula List
No ratings yet
II Pu Formula List
9 pages
Applied Statistics - Principles and Examples - D. Cox, E. Snell (Chapman and Hall, 1981) WW
No ratings yet
Applied Statistics - Principles and Examples - D. Cox, E. Snell (Chapman and Hall, 1981) WW
196 pages
Generalized Estimating Equations (Gees)
No ratings yet
Generalized Estimating Equations (Gees)
40 pages
Tutorial
No ratings yet
Tutorial
11 pages
Distributed QR For Longtudinal Big Data
No ratings yet
Distributed QR For Longtudinal Big Data
29 pages
User Guide For 3 Axis TB6560 Driver Board
No ratings yet
User Guide For 3 Axis TB6560 Driver Board
9 pages
AI-Assisted Edge Vision For Violence Detection in IoT-Based Industrial Surveillance Networks
No ratings yet
AI-Assisted Edge Vision For Violence Detection in IoT-Based Industrial Surveillance Networks
13 pages
Money Bootcamp - 8 Pasos Content Cash
No ratings yet
Money Bootcamp - 8 Pasos Content Cash
23 pages
Tittle Sample
100% (1)
Tittle Sample
10 pages
Ce162p Module 2 Lecture 3 Strap Footing
No ratings yet
Ce162p Module 2 Lecture 3 Strap Footing
11 pages
Welcome On Board: Bluefox International Mail Order
No ratings yet
Welcome On Board: Bluefox International Mail Order
8 pages
Strategic Management of HSBC
0% (1)
Strategic Management of HSBC
39 pages
Daily Accomplishment Report: Republic of The Philippines Department of Education Caraga Region
No ratings yet
Daily Accomplishment Report: Republic of The Philippines Department of Education Caraga Region
1 page
Vindoe Sweden
No ratings yet
Vindoe Sweden
10 pages
MR Aviation Turbine Fuel (Def Stan 91 91 Issue 6)
No ratings yet
MR Aviation Turbine Fuel (Def Stan 91 91 Issue 6)
4 pages
Effectiveness of Pangasinan State University Library Management System
No ratings yet
Effectiveness of Pangasinan State University Library Management System
5 pages
Ai A Blessing or A Curse
No ratings yet
Ai A Blessing or A Curse
16 pages
Synopsis: An Indian Vineyard and Wine Producing Company
No ratings yet
Synopsis: An Indian Vineyard and Wine Producing Company
7 pages
Modified Air Cooler With Split Cooling Unit (Original)
100% (4)
Modified Air Cooler With Split Cooling Unit (Original)
41 pages
Methods of Testing Glassfibre Reinforced Concrete (GRC) Material
No ratings yet
Methods of Testing Glassfibre Reinforced Concrete (GRC) Material
20 pages
Emami
100% (1)
Emami
39 pages
Bill Overview: Jlwqicxg Qdgxfivcivc Ykpfivblukywmxxmrk8Vmpcafgatc7F
No ratings yet
Bill Overview: Jlwqicxg Qdgxfivcivc Ykpfivblukywmxxmrk8Vmpcafgatc7F
5 pages
Types of Communication
No ratings yet
Types of Communication
32 pages
Reunion - Cascade Mountain Harem (Sample) - Mack Landry
No ratings yet
Reunion - Cascade Mountain Harem (Sample) - Mack Landry
20 pages
RoHS and REACH Compliance Page
No ratings yet
RoHS and REACH Compliance Page
3 pages
BS 1139-1.2-1990
67% (3)
BS 1139-1.2-1990
9 pages
ECT and A/T Indicator, Engine Control
No ratings yet
ECT and A/T Indicator, Engine Control
26 pages
Buku PUIL
No ratings yet
Buku PUIL
133 pages
Anything Above Marked With A Red X Will Not Appear in This Smartart Graphic and Will Not Be Saved Message
No ratings yet
Anything Above Marked With A Red X Will Not Appear in This Smartart Graphic and Will Not Be Saved Message
2 pages
Norsok L-004-CR Ror Og
100% (2)
Norsok L-004-CR Ror Og
29 pages
Supply Chain Simulations (I) : Information Flow
No ratings yet
Supply Chain Simulations (I) : Information Flow
10 pages
Computer History Handout
No ratings yet
Computer History Handout
3 pages
FSG Project Overview
No ratings yet
FSG Project Overview
29 pages
Form 3
No ratings yet
Form 3
2 pages