0% found this document useful (0 votes)
13 views

Surrogate Modeling of High-Dimensional Problems Via Data-Driven

Uploaded by

7mt7vrrrqh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Surrogate Modeling of High-Dimensional Problems Via Data-Driven

Uploaded by

7mt7vrrrqh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Available online at www.sciencedirect.

com
ScienceDirect

Comput. Methods Appl. Mech. Engrg. 364 (2020) 112906


www.elsevier.com/locate/cma

Surrogate modeling of high-dimensional problems via data-driven


polynomial chaos expansions and sparse partial least square
Yicheng Zhou, Zhenzhou Lu ∗, Jinghan Hu, Yingshi Hu
School of Aeronautics, Northwestern Polytechnical University, Xi’an, 710072, PR China
Received 14 July 2019; received in revised form 2 February 2020; accepted 3 February 2020
Available online xxxx

Abstract
Surrogate modeling techniques such as polynomial chaos expansion (PCE) are widely used to simulate the behavior of
manufactured and physical systems for uncertainty quantification. An inherent limitation of many surrogate modeling methods
is their susceptibility to the curse of dimensionality, that is, the computational cost becomes intractable for problems involving
a high-dimensionality of the uncertain input parameters. In the paper, we address the issue by proposing a novel surrogate
modeling method that enables the solution of high dimensional problems. The proposed surrogate model relies on a dimension
reduction technique, called sparse partial least squares (SPLS), to identify the projection directions with largest predictive
significance in the PCE surrogate. Moreover, the method does not require (or even assume the existence of) a functional form
of the distribution of input variables, since a data-driven construction, which can ensure that the polynomial basis maintains
the orthogonality for arbitrary mutually dependent randomness, is applied to surrogate modeling. To assess the performance of
the method, a detailed comparison is made with several well-established surrogate modeling methods. The results show that
the proposed method can provide an accurate representation of the response of high-dimensional problems.
⃝c 2020 Elsevier B.V. All rights reserved.

Keywords: Polynomial chaos expansion; Dimensionality reduction; Stochastic partial differential equations; Data driven

1. Introduction
Computer simulation is a powerful tool for studying the behavior of physical and engineering systems. State-
of-the-art simulators are parameterized by a very large number of quantities that describe the characteristics of the
system being simulated, such as initial conditions, boundary conditions, constitutive laws, etc. In a real-world setting,
the values of input parameters can be uncertain or even unknown. In this situation, an uncertainty quantification
procedure can be utilized to identify and quantify the sources of uncertainty and assess its effect on the quality of the
model response. However, modern computer codes often demand considerably extensive computational power that
makes direct approaches such as Monte Carlo simulation (MCS) intractable. As a result, surrogate modeling methods
are gaining increasing popularity. The key idea thereof is to train a surrogate model using limited simulation-based
∗ Correspondence to: P.O. Box 120, School of Aeronautics, Northwestern Polytechnical University, Xi’an City, 710072, Shaanxi
Province, PR China.
E-mail address: [email protected] (Z. Lu).

https://doi.org/10.1016/j.cma.2020.112906
0045-7825/⃝ c 2020 Elsevier B.V. All rights reserved.
2 Y. Zhou, Z. Lu, J. Hu et al. / Computer Methods in Applied Mechanics and Engineering 364 (2020) 112906

data, and then perform prediction and uncertainty propagation tasks using the surrogate instead of evaluating the
actual model directly [1].
In the literature, various surrogate modeling techniques have been well-established, e.g., Gaussian Process [2–7]
(also known as Kriging), low-rank tensor approximation (LRA) [8,9], polynomial chaos expansions (PCE) [10–14]
and support vector machines [15,16]. However, most existing surrogate models have difficulty extending to high-
dimensional problems, because the computational effort required to build such surrogate models grows dramatically
with the number of input variables. For instance, when a Kriging model is built using the correlation functions,
the size of the covariance matrix increases dramatically with respect to the dimension of the problem. As a
result, inverting the covariance matrix to optimize hyperparameters is computationally expensive. Although this
computational cost may be reduced with techniques such as the Qrthogonal Triangle (QR) factorization [17] and
sparse grid interpolation [18,19]. The methods could still lose it efficiency advantage over MCS, whose convergence
rate does not depend on the input dimensionality. In order to deal with high dimensional problems for surrogate
modeling, a dimension reduction procedure is required.
Several strategies have been explored in the literature to deal with high dimensional problems. A common
approach is importance analysis (also known as sensitivity analysis) [20–23], which consists of identifying the “most
important” inputs according to some importance measure or sensitivity indices and ignoring the others by setting
them to their nominal values. However, importance analysis techniques cannot handle problems where random input
variables have roughly equal impacts on the model response and problems involving functional uncertainties, such
as stochastic partial differential equations (PDEs).
Another upcoming approach for handling high-dimensional problems is to learn the latent input representation
automatically by supervising model response. This is the central idea of deep neural networks (DNNs) [24,25],
which represent multivariate functions using a hierarchy of features of increasing complexity [26]. Well-designed
DNNs show unique generalization property for a small dataset, which makes it feasible for surrogate modeling. In
the context, Ref. [27] used a convolutional DNN surrogate to learn a map between input variables to a functional of
the stochastic PDE solution. Ref. [28] treated the convolutional encoder–decoder network to be Bayesian and scaled
a gradient descent approach to model deep convolutional networks surrogate. Ref. [29] incorporated the governing
equations of the physical model in the loss/likelihood functions. The resulting physics-constrained deep learning
surrogates are trained with only input data and constructed a convolutional neural network as well as a conditional
flow-based generative model. These studies offer results for challenging problems with high-dimensional models
in uncertainty quantification. However, specifically in the context of stochastic PDEs, they are only applicable to
image-to-image regression tasks where input parameters and model responses are treated as images. On the other
hand, the task of selecting an optimal neural network structure and values of hyperparameters is a persistent problem
in the application of DNNs for uncertainty quantification.
In the paper, we focus on the concept of dimension reduction which essentially consists in projecting the high
dimensional input parameters to a low-dimensional subspace (reduced space) before the surrogate modeling stage.
In the context, a two-step approach is often implemented to deal with high dimensional problems. First, the input
dimension is reduced and then, the surrogate model is constructed directly using the compressed experimental
design in the reduced space. A commonly used dimension reduction method is principle component analysis (PCA)
which works by reducing the input dimension through using the correlations among different input variables [30,31].
However, PCA is an unsupervised technique focused on the discovery of the input manifold without taking into
account the associated model response. As a result, it may lead to a highly complex input–output map that is
difficult to surrogate. To solve the problem, various supervised dimension reduction techniques have been emerged.
One such method that has received much attention is active subspaces (AS) [32,33]. The technique captures most of
the model response variation by recovering an orthogonal projection matrix obtained through an eigendecomposition
of an empirical covariance matrix of model response gradients. A low-dimensional subspace can then be obtained by
retaining only the important directions with significant variability. AS theory has been successfully applied to various
engineering application, such as sensitivity analysis [34], optimization design [35], and reliability analysis [36].
However, AS relies on the gradient information of the model response to determine the orthogonal projection matrix.
However, in high-dimensional problems with black box numerical models [37], the additional computational cost
from the numerical evaluation of the gradients might be high [38]. Moreover, in the absence of model output
gradients, the orthogonal projection matrix can be estimated by posing it as a hyperparameter in a Gaussian process
regression model and learnt from the available data through a maximum likelihood computation [39].
Y. Zhou, Z. Lu, J. Hu et al. / Computer Methods in Applied Mechanics and Engineering 364 (2020) 112906 3

Other supervised techniques, that rely on the properties of a specific conjunction of either dimension reduction
or surrogate modeling, have been proposed in the literature. Ref. [40] employed multi-layer neural networks for
both the dimension reduction and surrogate modeling stages. Specifically, an unsupervised objective named as
the reconstruction error is followed by a generalization performance objective that aims at optimizing the model
hyperparameters with respect to a measure of the surrogate modeling error. In Ref. [41], this approach is extended
by combining (kernel) principle component analysis with Kriging or generalized PCE within a nested optimization
framework. The studies offer results for challenging problems in uncertainty quantification. However, they require a
lot of cross validation steps to search for an optimal structure. Another approach for achieving supervised dimension
reduction is to project a M-dimensional input (m < M) into a m-dimensional subspace, which is called as an
sufficient dimension reduction (SDR) space capable of capturing the essential information of model response [42].
A series of regression techniques have been developed to search the SDR space, such as the partial least square
(PLS) [43] and the sliced inverse regression (SIR) [44]. In particular, SIR can be viewed as a supervised companion
of PCA and has gained great attention in surrogate modeling literature since it is easy to implement and has
a solid theoretical foundation. Ref. [45] employed SIR-based uncertainty quantification approach to convert the
original high-dimensional problem to a low-dimensional one, which is then directly solved by building a generalized
PCE. In Ref. [46], the idea is extended by combining SIR with the sparse PCE (SIR-PCE) constructed by least
angle regression (LAR) algorithm [47,48] within the dimension reduction framework. Moreover, some variants
of these dimension reduction methods have been proposed for addressing problems where the underlying process
is nonlinear [41]. Ref. [37] proposed a PCE-driven PLS algorithm to identify simultaneously a set of projection
directions and the associated model coefficients along each direction.
All methods discussed above show that supervised dimension reduction yields a significant accuracy advantage
over the unsupervised one. However, we note that the methods rely on the explicit knowledge of the underlying
probability density functions (PDFs) and/or the assumption of mutual independence between the input variables
in the low-dimensional subspace. In general, for dimension reduction, the new variables in the reduced space do
not retain the mutual independence even if input variables in the original high-dimensional space are independent
identically distributed. A pre-processing step is often required to map the dependent components onto independent
ones as introduced in Ref. [49]. In general, the transformation methods (e.g. Rosenblatt transform [50] and vine
copula [51]) require the computation of conditional PDFs, which are hardly known especially when only training
data with small size is available. Furthermore, these methods extract combinations of all the input variables in the
dimension reduction stage, which can make it difficult to interpret the derived components in the reduced space.
In this paper, we propose a novel method to embed dimension reduction into PCE surrogate modeling within a
data-driven setting. The goal is to capitalize on the performance gains of dimension reduction based on a given
experimental design, irrespective of possible distributions and mutual dependences of the derived components. It
is different from the aforementioned supervised approaches based on the generalized PCE, and therefore can be
particularly useful when the input distribution information is not sufficient. Moreover, we employ a variant version
of the PLS algorithm called sparse partial least square (SPLS) [52,53], which has been used in field of neuroimaging
and genetics for regression of high dimensional data. The proposed method modifies it for combining with PCE
surrogates and takes advantage of the cross validation technique to determine the projection directions with the
largest predictive significance in the PCE surrogate. The proposed method is fully non-intrusive, and therefore is
ideally suited for application to high-dimensional problems with black box models. Numerical experiments are
carried out to support our statements.
The rest of this paper is organized as follows. In Section 2, we introduce the main ingredients required by
the proposed method, including dimension reduction and data-driven PCE surrogate. In Section 3, we discuss
dimension reduction techniques with an emphasis on SPLS. The detailed description of the surrogate modeling
method introduced in the present work closes this core section. Numerical examples with high dimensionality in
uncertainty quantification are reported in Section 4 to validate the performance of the proposed method. Finally,
some concluding discussions are given in Section 5.

2. Ingredients for surrogate modeling for high dimension


This section aims at highlighting the main features of dimension reduction and PCE, and how they can be
employed for surrogate modeling.
4 Y. Zhou, Z. Lu, J. Hu et al. / Computer Methods in Applied Mechanics and Engineering 364 (2020) 112906

2.1. Dimension reduction

Consider a high-dimensional random vector X = {X 1 , . . . , X M } ∈ R M , dimension reduction refers to the


parametric mapping M: X ∈ R M ↦→ T ∈ Rm with T = {T1 , . . . , Tm } using the following form,
T = M (X; R) (1)
where R is the set of parameters associated with the transformation in Eq. (1). Dimension reduction is achieved
with m ≪ M. The size of R depends on the specific dimension reduction method under consideration and the
nature of the physical system of interest.
Many high-dimensional problems for uncertainty quantification are intrinsically low-dimensional, because the
variation of model response is often caused by only a few components varying within a low-dimensional subspace
in the statistics literature. As a result, an important assumption of the transformation in Eq. (1) is that realizations
of X rely on some principle components with dimensionality m that is embedded within the M-dimensional space.
The specific value of m is referred to as the intrinsic dimensionality [41]. Unsupervised dimension reduction
methods focus on only the input variables and do not show how the model response depends on the input variables.
Performing a compression of input without taking into account the associated model response may lead to a highly
complex input–output map that is difficult to surrogate. In a supervised dimension reduction method, which is of
interest herein, the intrinsic dimensionality refers to the minimum number of scalars that is required to represent
X for capturing the essential information of the model response quantity of interest.
For a detailed description of the specific dimension reduction method used in this paper, the reader is referred
to Section 3.

2.2. The limits of the generalized PCE

Let f X (x) denote the prescribed joint PDF of the M-dimensional random input vector X. Due to the existing
uncertainties embodied in X, the model response becomes random. For the sake of simplicity, the case of a scalar
response Y quantity of interest is considered hereafter. The computational model g (X) is then considered as the
following mapping,
X ∈ R M ↦→ Y = g (X) ∈ R1 (2)
With known realizations of X, g (X) may be evaluated to predict the corresponding response. In general, the map
which represents a computationally intensive process in Eq. (2) is not known in a closed analytical form. Direct MCS
or related approaches might be used. However, MCS usually requires that the computational model is run several
thousands of times for different realizations of X. As a result, it is impracticable to manage uncertainty propagation
issues because most computational models used in engineering have high computational cost. Consequently, a
surrogate model that simulates the behavior of Y = g (X) is built. In this case, the main goal is to possess similar
statistical properties with while maintaining an easy-to-evaluate form. This is frequently done via the generalized
PCE, which seeks to construct a polynomial approximation as the following form [11],

g (X) = βα ψα (X) (3)
α∈N M

where α = {α1 , . . . , α M } (with αi ≥ 0, i {= 1, . . . ., M)} is a M-dimensional index, βα : α ∈ N M is a set of


{ }

unknown polynomial chaos coefficients and ψα : α ∈ N M is a set of multivariate orthonormal polynomial basis.
The polynomial basis functions are defined as tensor-products of the univariate orthogonal polynomials with respect
to the predefined PDF of each input variable X i (i = 1, . . . , M), that is [10],
M ∫
(i)
κα(k) (X i ) κα(l)i (X i ) f X i (xi ) d xi = cα(k,l) δ (k,l)

ψα = καi (X i ) with i i
(4)
i=1 Xi

where δ (k,l) is the Kronecker symbol with δ (k,l) = 1 if l = k otherwise δ (k,l) = 0, and cα(k,l)
i
is a constant (l ̸= k).
Eq. (3) must be truncated for practical applications, so that a finite number of terms of the polynomials are needed
to be computed. A common truncation scheme is to set a total expansion degree p and remains only the multivariate
Y. Zhou, Z. Lu, J. Hu et al. / Computer Methods in Applied Mechanics and Engineering 364 (2020) 112906 5

orthonormal polynomials of degree at most p. This leads to a truncated polynomial chaos approximation,

g (X) ≈ ĝ (X) = βα ψα (X) (5)
α∈Λ p

where Λ p = α ∈ N M | ∥α∥1 ≤ p{ with L 1 -norm αi . For simplicity, one can place an order on the
{ } ∑M
∥α∥1 = i=1
multi-dimensional indices (i.e., α|α ∈ Λ p ↔ {1, . . . , P} where P is the total number of basis) and given by
}

( ) (M + p)!
P = Card Λ p = (6)
M! p!
Therefore, Eq. (5) is rewritten as the following single index version,
P

ĝ (X) = β j ψ j (X) = β T ψ (X) (7)
j=1
( (1) )T
Consider an experimental design X = x , . . . , x (N ) and the corresponding responses
( (1) ( (1) ) T
, . . . , y (N ) = g x (N )
( ))
Y = y =g x of the original model. The final goal is to construct a surrogate
model to approximate the actual model solely based on the training set (X ,Y). To distinguish various construction
schemes, a surrogate whose parameters directly estimated from (X ,Y) is denoted by ĝ (X|X ,Y). Due to the high
dimensionality of input variables, ĝ (X|X ,Y) may lead to poor performance or it even { may not be}Tcomputationally
tractable. To reduce the dimensionality, a compressed experimental design T = t (1) , . . . , t (N ) of dimensions
N × m with m ≪ M can be provided by the dimension reduction transformation T = M (X; R) shown in Eq. (1).
In this case, a PCE surrogate model expressed by ĝ (T |T ,Y) becomes tractable if m is sufficiently small.
However, the PDF of T may be unknown for calculating the inner products in Eq. (4). In case of insuf-
ficient distribution information, fitting standard types of distribution (e.g., Gaussian or uniform distributions) to
measurements may introduce a large error in the construction of polynomial basis. Furthermore, correlations can
exist among the components in T due to the parametric map T = M (X; R). In the case, simply assuming that
each component is mutually independent and directly applying the product of univariate polynomials to construct
the polynomial basis can lead to incorrect inferences. Even though correlated components in T can generally
be transformed to independent ones and then a polynomial chaos approximation of Y can be built in terms of
the independent components, these transformations are in general highly nonlinear, which makes the map from
transformed input components to Y more difficult to approximate by polynomial basis. With respect to insufficient
distribution information of T , we introduce a data-driven method to construct multivariate orthonormal basis in
Section 2.3.

2.3. Polynomial chaos expansion in a data-driven setting

A data-driven construction, first introduced in Ref. [54] under the form of arbitrary PCE, consists of constructing
a set of basis orthonormal to the statistical moments of input variables. The arbitrary PCE has already been exploited
for applications in various disciplines [55,56]. In addition, it is shown that arbitrary PCE can explicitly account for
correlation among input variables [57].
When we have obtained a compressed experimental design T , the set of multivariate orthonormal polynomial
basis φ (T ) = {φ1 (T ) , . . . , φ P (T )}T is then constructed via the Gram–Schmidt orthogonalization method. Firstly,
a set of linearly independent monomial polynomials is constructed as [57],
m ( j) m
αi ( j)
j = 1, . . . , P, α ( j) 1 =
∏ ∑
φ j (T ) = αi ≤ p
 
Ti (8)
i=1 i=1
( j) ( j) ( j)
{ }
where α ( j) = α1 , . . . , αm is the m-dimensional index with αi ≥ 0.
According to the Gram–Schmidt orthogonalization, the arbitrary multivariate orthonormal polynomials φ j (T )
are constructed as
φ0 (T ) = 1
∑ j−1 (9)
φ j (T ) = φ j (T ) − k=0 c jk φk (T ) j = 1, . . . , P
6 Y. Zhou, Z. Lu, J. Hu et al. / Computer Methods in Applied Mechanics and Engineering 364 (2020) 112906

where the coefficient c jk is determined by imposing an orthonormal condition,


φ j (T ) φk (T ) f T (t) d t

c jk = ∫T (10)
T φk (T ) φk (T ) f T (t) d t
Eq. (9) generates a set of orthogonal polynomials as basis irrespective of the mutual dependence between the
components in T . Once the set of arbitrary orthogonal polynomials {φ1 (T ) , . . . , φ P (T )}T is constructed, it can be
used to replace the generalized orthogonal polynomials for uncertainty propagation.
The estimate β̂ of β can be determined by minimizing the expectation of the original least square residual:
⎡ ⎤
N P
⎣ y (i) − β j φ j t (i) ⎦
∑ ∑
β̂ = arg min
( )
(11)
β∈Dβ
i=1 j=1
( T )−1 T
where{ D(β (i)is)}the feasible domain of β. The solution of Eq. (11) is given by β̂ = Φ Φ Φ Y, where
Φ= φ j t 1≤ j≤P,1≤i≤N
is the N × P matrix of polynomial basis functions computed by the compressed
experimental design T . However, The least square problem shown in Eq. (11) cannot be solved when N < P
)−1
because the matrix Φ T Φ
(
does not exist in this case.
To overcome the problem, one adaptive sparse construction technique based on LAR algorithm is adopted in
the paper. It is obtained with UQLAB [58], a well-validated MALAB-based uncertainty quantification software
available at http://www.uqlab.com. It is worth mentioning that the LAR-based adaptive technique is run using
degree adaptivity, as described in [48], to exploit the full potential of PCE. The use of LAR for PCE was initially
proposed in Ref. [48], in which the readers can refer to more details.

3. The proposed surrogate modeling method


The potential of ĝ (T |T ,Y) to achieve satisfactory accuracy depends on the learning capacity of the type of
surrogate model itself and the assumption that the essential information of the input–output map X ↦→ T can be
sufficiently well approximated by a smaller set of components with intrinsic dimensionality via the transformation
in Eq. (1). We assume that the learning capacity of the data-driven PCE introduced in Section 2.3 is adequate, and
main discussion of Section 3 focuses on a dimension reduction method namely SPLS to seek a low-dimensional
subspace and how to nest SPLS with data-driven PCE within a surrogate modeling technique (which we name
SPLS-PCE) to enable the solution of high-dimensional problems.

3.1. Sparse partial least square

The PLS is a statistical method that finds a relationship between input variables and output variable (or vector)
by projecting original input variables to a new reduced space formed by newly chosen variables, called principal
components (or latent variables) which are linear combinations of input variables. It is designed to find the best
multidimensional direction in input space that explains the characteristics of model response. In its classical form,
PLS is based on the nonlinear iterative partial least squares (NIPALS) algorithm [59,60]. After centering and scaling
the entire training set (X , Y), standard PLS projects the experimental
{ design matrix } X to the first component t 1
T
of dimensions N × 1 by finding the weight vector denoted as w1 = w1(1) , . . . , w1(M) . The algorithm proceeds by
maximizing the covariance between t 1 = X w 1 and Y under the L 2 -norm constraint ∥w 1 ∥2 = 1. The corresponding
optimization problem is defined as,
w1 = arg max Cov (X w, Y)
w∈Dw
(12)
= arg max w T X T Y subject to ∥w∥2 = 1
w∈Dw

where Dw is the feasible domain of w. The exact solution of Eq. (12) is given as,
X TY
w1 =  T  (13)
X Y 
2
Although dimension reduction via PLS is a principled way of dealing with high-dimensional problems, PLS
extracts linear combinations of all the original input variables, which can make it difficult to interpret the derived
Y. Zhou, Z. Lu, J. Hu et al. / Computer Methods in Applied Mechanics and Engineering 364 (2020) 112906 7

components. As an illustration, consider a computational model Y = exp (−0.75Xη + 1), where X = {X 1 , . . . , X 6 }


is a six-dimensional vector following a standard normal distribution and η = {1, −1, 0, 0, 0, 0}T . In this case, the real
weight vector is w1 = {−0.75, 0.75, 0, 0, 0, 0}T . When an experimental design X with size N = 100 is obtained, the
PLS estimate of w 1 is {−0.731, 0.731, 0.006, −0.031, −0.038, 0.112}T . Although some weights are small, all input
variables are included in the extracted linear combination, which obscures the fact that only input variables X 1 and
X 2 are significant for the model response. To achieve a better interpretability for dimension reduction, it is desirable
to automatically select relevant input variables that contribute to the first principal component t 1 = Xw 1 . In this
case, imposing sparsity in the procedure of the dimension reduction stage might lead to simultaneous dimension
reduction and variable selection.
In order to motivate sparsity in the dimension reduction stage, it is necessary to introduce another approach
called Canonical Correlation Analysis (CCA) [61], which can be viewed as one of the variants of PLS. For a
scalar response, CCA finds w1 such that the correlation between X w 1 and Y is maximized under the constraint
w T X T X w ≤ 1. Ref. [62] proposed a sparse version of CCA which imposes an L 1 -norm constraint to the
optimization problem of CCA,
w1 = arg max w T X T Y
w∈Dw (14)
subject to w T X T X w ≤ 1, ∥w∥1 ≤ cw
where cw is a hyperparameter controlling the L 1 -norm constraint of w. Therefore, it determines the amount of
sparsity, i.e., the lower the value of cw is, the stronger the L1 -norm constraint on the w 1 is. Solving Eq. (14)
requires the inversion of X T X , which does not exist in cases N < M. The matrix X T X is substituted by the
identity matrix I X of dimensions M × M, which leads to the following optimization problem
w1 = arg max w T X T Y
w∈Dw (15)
subject to ∥w∥22 ≤ 1, ∥w∥1 ≤ cw
However, this constraint type can only select up to N features if N < M. This issue has been addressed by
the addition of L 2 -norm[ constraint
√ ] [63]. In order to make the both L 1 -norm and L 2 -norm constraints active, must
be in the range cw ∈ 1, M [53]. It is important to note that since what is being maximized is no longer the
correlation between X w and Y, but the covariance. Therefore, the optimization problem is equivalent to SPLS.
The scalars w1(1) , . . . , w1(M) can be interpreted as measuring the importance of X 1 , . . . , X M respectively, for
constructing the first principal component where its correlation with the output is maximized. Since sparsity
constraint is applied to dimension reduction, some entries of w 1 tend to zeros for those input variables that
are insignificant. Therefore, only a relatively small set is remained and the level of sparseness is determined
automatically. To the end, the optimization problem shown in Eq. (15) is solved by the algorithm shown in
Algorithm 1. In order to obey sparsity constraint, the algorithm iteratively searches for ∆ (w 1 ), such that ∥w1 ∥1 ≈
cw [53].

if z < 0

⎨ −1
Note: (z)+ is equal to z if z ≥ 0 otherwise z = 0, and sgn (z) = 0 if z = 0
1 if z > 0

8 Y. Zhou, Z. Lu, J. Hu et al. / Computer Methods in Applied Mechanics and Engineering 364 (2020) 112906

To obtain the next principle component, two residual matrices E (2) and F (2) are respectively calculated by
subtracting their rank-one approximations from E (1) = X and F (1) = Y based on t 1 = X w 1 ,
E (2) = E (1) − t 1 pT1
(16)
F (2) = F (1) − c1 t 1
where p1 = X T t 1 /t T1 t 1 contains the regression coefficients of the local regression of X onto the first principal
component t 1 and c1 = t T1 Y/t T1 t 1 is the regression coefficient of the local regression of Y onto the first principal
component t 1 . The second principal component t 2 , which is orthogonal to the first one, can then be sequentially
extracted from the deflated matrices E (2) and F (2) . The procedure is continued until a certain target measure is
satisfied. As a result, the l-th principal component t l can be further defined as
t l = E (l) wl = X r l (l = 1, . . . , m) (17)
where the matrix R = {r 1 , . . . , r m } of dimensions M × m is obtained by [64],
)−1
R = W PTW
(
(18)
with W = {w 1 , . . . , w m } and P = p1 , . . . , pm . Note that the matrix R rotates X to the compressed one T by
{ }

T =XR (19)
To the end, a compressed experimental design T is obtained and a surrogate model ĝ (T |T ,Y) can be constructed
based on a transformed training set (T , Y). As the rotation matrix R defining the dominant projection directions
is in general not orthogonal, the projection of X on the columns of R will not retain the mutual independence.
Therefore, employing the data-driven method to define the orthogonal polynomial basis is in particular beneficial
in this case.

3.2. Combination of sparse partial least square regression and data-driven polynomial chaos expansion

In the data-driven applications, or when the computational model is expensive to evaluate in many engineering
applications, only a limited data (X , Y) is available. Accurately capturing the essential information of the model
response for achieving the dimension reduction requires choosing a suitable value of constraint parameter. The
generalization error, which is the most well-known accuracy measure for most surrogates, can be used to help the
choice,
[( )2 ]
E Y − ĝ (T |T ,Y)
εg = (20)
Var (Y )
However, in the absence of a validation set, it is not possible to calculate the generalization error analytically. A
classic estimator of the generalization error{ is the empirical
}T mean-square error (MSE) at the simulation points. For
the compressed experimental design T = t (1) , . . . , t (N ) of size N, this empirical MSE is computed as,
∑ N ( (i) ))2
− ĝ t (i) |T ,Y
(
i=1 y
εM S E = )2 (21)
∑ N ( (i) 1 ∑N (i)
i=1 y − N i=1 y
However, in case of small sample size, the empirical MSE is prone to underestimate drastically the true
generalization error, due to the overfitting phenomenon. In order to overcome the underestimate issue, the
leave-one-out cross-validation (LOOCV) technique is used in the paper.
LOOCV is commonly used in machine learning for quality assessment and model selection{[65]. } It is performed
by selecting a validation point t (i) and using the remaining N − 1 training points T (−i) = T / t (i) to refit a model
ĝ (−i) for obtaining the prediction error eC(i)V at t (i)
eC(i)V = ĝ t (i) |T ,Y − ĝ (−i) t (i) |T (−i) ,Y (−i)
( ) ( )
(22)
In order to build ĝ (−i) , we need to re-estimate coefficient vector β by LAR algorithm after removing t (i) . This a
time-consuming process especially for cases with large sample sizes. To alleviate computational burden, the estimate
Y. Zhou, Z. Lu, J. Hu et al. / Computer Methods in Applied Mechanics and Engineering 364 (2020) 112906 9

Fig. 1. The use of LOOCV to select the hyperparameter cw for Eq. (25) with N = 60 and N = 120 (accuracy is measured by the
generalization error computed at 20000 random samples, see Section 5).

of β can be fixed as constants in the cross-validation process. In this context, the prediction error eC(i)V can be
calculated efficiently as [48],
y (i) − ĝ t (i) |T ,Y
( )
(i)
eC V = (23)
1 − τi
Φ . After computing eC(i)V for all t (i) in the compressed
)−1 T
where τi is the ith diagonal element of the matrix Φ Φ T Φ
(

experimental design, the LOOCV error ε L O O (cw ) with respect to hyperparameter cw can be computed as,
∑ N ( (i) )2
i=1 eC V
ε L O O (cw ) = ∑ N ( (24)
(i) − µ̂ 2
)
i=1 y Y

The fast computation in Eq. (24) decreases the computational complexity of ε L O O (cw ), since it constructs the
PCE only once. The following formulation of the optimization problem is then considered,
cw∗ = arg min ε L O O (cw ) (25)
cw ∈Dcw

In the optimization process, the values of constraint parameter cw updated in a grid-search form. The grid-search
approach for solving Eq. (25) is similar to that of Ref. [41], in which the optimization is set by adjusting model
hyperparameters directly instead of the L1 -norm constraint. The proposed surrogate modeling method comes with
benefit that the search space is significantly small because only one design variable cw is requested in the dimension
reduction stage.
Fig. 1 presents a typical example illustrating the change in the generalization error in conjunction with a single
component m = 1, as the tolerance hyper parameter cw is increased using the two experimental designs with N = 60
and N = 120. The figure also plots the LOOCV error, a good indicator of the generalization error behavior. The
vertical line represents the value of cw chosen by LOOCV error and the horizontal line is the LOOCV error in the
resulting PCE. There is a little bias (underestimation of generalization error) in the LOOCV error, yet despite this
bias, cross validation consistently chooses a cw that produces a near minimal error.
To the end, the SPLS-PCE algorithm for high-dimensional surrogate modeling √ is detailed in Algorithm 2.
Optimization is performed using 50 equidistant points in range 1 ≤ cw ≤ M. The LOOCV errors for specific
values of parameter cw are then computed and the one resulting in the smallest LOOCV error is retained. Step 10
utilizes a target accuracy measure to monitor the update of the direction vector w m . SPLS-PCE is fairly insensitive
to the choice of the tolerance εw∗ . Then, in the proposed implementation, εw∗ is usually set to 10−4 . Note also that the
surrogate construction involves the determination of the suitable number m of principle components. An efficient
method is to construct PCE by adding component one by one and the associated LOOCV error ε (m) L O O and empirical
10 Y. Zhou, Z. Lu, J. Hu et al. / Computer Methods in Applied Mechanics and Engineering 364 (2020) 112906

MSE ε̂G(m) are computed. When the ratio of the LOOCV error ε (m) L O O of the current components to the empirical MSE
ε̂G(m−1) of the previous components (i.e., ε (m) /ε̂
LOO G
(m−1)
) exceeds 1, it indicates that the newly added component is
insignificant to improve the prediction ability and should not be added into the surrogate model.
The algorithm terminates when the lasted cycle fails to improve the prediction ability of the current surrogate
model. To the end, the algorithm returns the PCE surrogate ĝ ∗ (T |T ,Y) with rotation matrix R∗ which defines the
optimal direction projections.

4. Applications

The section is dedicated to the validation and assessment of the proposed method. Three applications are first
considered. The first one is an analytical function with 53 input variables. The second one is a heat diffusion
model with 110 inputs variables. The third one is a more challenging structural mechanics model with 300 input
variables. In three cases, the intrinsic dimensionality of actual model is unknown. The performance of the proposed
method is compared with PLS combined with data-driven PCE (named as PLS-PCE) and SIR-PCE [46]. To improve
readability, the details regarding the algorithm of PLS-PCE is omitted from the main text and given in the Appendix.
For all methods, we choose a maximum polynomial degree of pmax = 10 for arbitrary PCE. Moreover, the LRA
surrogate modeling method of UQLAB [58] is also chosen to compare with our method, as it has been shown better
performance than sparse PCE in moderate-dimensional problems and small size of experimental designs. Moreover,
the computational cost of building a LRA surrogate in high dimension problems is feasible as its construction
is based on products of univariate polynomial expansion. The surrogate performance is compared in terms of
Y. Zhou, Z. Lu, J. Hu et al. / Computer Methods in Applied Mechanics and Engineering 364 (2020) 112906 11

Fig. 2. A cantilever tube structure.

generalization error, which is calculated as,


∑ Ntest ( (l) )2
y − ŷ (l)
εg = ∑ N (
l=1
)2 Ntest = 20000
test
l=1 y (l) − y
}T }T
where y and ŷ (1) , . . . , ŷ (Ntest ) denote the mean value of y (1) , . . . , y (Ntest ) and the model responses evaluated by
{ {
surrogate model, respectively. In addition, the surrogate modeling stage of these methods is deployed with UQLAB.
For the sake of consistency, the same settings are used in these methods unless explicitly stated otherwise.

4.1. Cantilever tube structure

A cantilever tube structure shown in Fig. 2 is considered in the example [66]. The structure is subjected to three
external forces F1 (t), F2 and PDx , and a torque T r (t). F1 (t) and T r (t) are time variant and thus are modeled with
random process. The performance function is given as,

g (t) = R(t) − σ x2 (t) + 3τzx
2 (t) (26)
where,
F1 (t) sin (θ1 ) + F2 sin (θ2 ) + PDx M(t)d
R(t) = R0 (1 − 0.01t) σx (t) = +
A 2I
π[ 2
M(t) = F1 (t) cos (θ1 ) L 1 + F2 cos (θ2 ) L 2 2
]
A= d − (d − 2h)
4
π [ 4 T r (t) d
d − (d − 2h)4 τzx (t) =
]
I =
64 4I
Details of the distributions of the input parameters in the problem are given in Table 1. The structure is studied
over a time interval of 5 years. F1 (t) and T r (t) are modeled with stationary Gaussian random processes indexed
on the bounded domain D = [0, 5] and respectively defined as,
F1 (t) = 1800 + 180ρ F1 (t)
(27)
T r (t) = 1900 + 190ρT r (t)
where ρ F1 (t) and ρT r (t) denote standard Gaussian random processes with autocorrelation function c t, t ′ defined
( )
as exp(−|t − t ′ |/4) and exp(−(t − t ′ )2 /0.52 ), respectively.
Since F1 (t) and T r (t) on different time nodes are correlated, we approximately represent them using the
expansion optimal linear estimation (EOLE) method [67]. Let {t1 , . . . , tn } denote the nodes of a defined grid in
a bounded domain D. By remaining the first L terms in the EOLE series, a standard Gaussian random process (or
random field) ρ (t) can be approximated by
L
∑ ξl
ρ (t) ≈ √ γ l c (t) (28)
l=1
el
12 Y. Zhou, Z. Lu, J. Hu et al. / Computer Methods in Applied Mechanics and Engineering 364 (2020) 112906

Table 1
Parameter of the cantilever structure.
Parameter Distribution Mean Standard deviation
L1 Deterministic 60 (mm) –
L2 Deterministic 120 (mm) –
θ1 Deterministic 10◦ –
θ2 Deterministic 5◦ –
d Gaussian 42 (mm) 0.798 (mm)
h Gaussian 5 (mm) 0.1 (mm)
R0 Gaussian 560 (MPa) 560 (MPa)
F1 (t) Gaussian process 1800 (N) 180 (N)
F2 Gaussian 1800 (N) 180 (N)
P Gumbel 1000 (N) 100 (N)
Tr(t) Gaussian process 1900 (N) 190 (N)

Fig. 3. Comparison of the SPLS-based and linear PLS adaptation at various number of dimension reduction components.

where {ξ1 , . . . , ξ L } are independent standard normal variables and c (t) = {c (t, t1 ) , . . . , c (t, tn )}T . ξl and el define
eigenvalues ) and eigenvectors respectively of the correlation matrix C with elements
ck, j = c tk , t j (t, j = 1, . . . , n). The size of element in the EOLE grid is set equal to 0.5/4 (which is the quarter of
(

the lowest autocorrelation length 0.5 for the random process T r (t)). This yields a number of 41 time nodes. In the
following analysis, Gaussian random processes F1 (t) and T r (t) are approximated by means of 40 and 18 standard
Gaussian random variables, which allow to represent 99.35% and 98.90% of the variance of the original processes,
respectively. The total number of input random variables is equal to 63 and the response quantity of interest to be
predicted is the performance function g(t) at t = 5 year.
The analyses are first performed using two nested Latin Hypercube sampling (LHS) [68] experimental designs
of size N = 60 and N = 120. The samples are directly imported from UQLab in our framework so that exactly
the same information is provided to these methods. Comparing the SPLS and PLS dimension reduction methods,
we see that the PCE combined with SPLS gives consistently lower generalization errors in Fig. 3.
The performance of SPLS-PCE is then compared with PLS-PCE, SIR-PCE and LRA surrogate modeling
methods. The number m of components identified by all the optimal SPLS-PCE, PLS-PCE, and SIR-PCE is reported
in Table 2. In Fig. 4, we compare the analytical response PDF f G with a logarithmic scale to the respective kernel
density estimates (KDEs) estimated by SPLS-PCE, PLS-PCE, SIR-PCE and LRA. The KDEs are based on the
output responses of surrogate models at a set of 2 × 104 random samples. By using the logarithmic scale, the
behavior at the tails of the PDF is emphasized. It is shown that with the SPLS-PCE method, an experimental design
of size as small as N = 60 is sufficient to approximate its entire range including the tails. PLS-PCE, SIR-PCE and
LRA converge more slowly to the reference PDF. For N = 60, the discrepancies between the PLS-PCE-based,
SIR-PCE-based and LRA-based KDEs from the reference one are obvious. For N = 120, PLS-PCE-based and
Y. Zhou, Z. Lu, J. Hu et al. / Computer Methods in Applied Mechanics and Engineering 364 (2020) 112906 13

Table 2
Optimal configuration parameter m for the dimension reduction-based methods.
N Method m
SPLS-PCE 6
60 PLS-PCE 6
SIR-PCE 7
SPLS-PCE 6
120 PLS-PCE 5
SIR-PCE 6

Fig. 4. PDF of the response in the logarithm scale.

SIR-PCE-based KDEs become accurate but LRA-based KDE is still inaccurate with respect to their tails. Fig. 5
depicts the output responses of these surrogate models versus the actual output responses. The output responses
obtained from SPLS-PCE are characterized by an overall smaller dispersion around the actual responses, which
will lead to a smaller generalization error. The above analysis demonstrates that the output responses obtained
from SPLS-PCE tend to be unbiased, even in cases when the size of experimental design is smaller than the input
dimensionality of the actual model.
The convergence with the respect to the generalization error is reported in Fig. 6. It is shown that SPLS-PCE
presents a faster convergence compared to PLS-PCE, SIR-PCE and LRA. In particular, it requires an experimental
design of size N = 180 to achieve an accuracy of 10−3 , whereas the other method achieve the accuracy of 10−1
using the same size of the experimental design.
The robustness of these methods is now assessed by replicating the analyses using 50 random experimental
designs of the same size. The results are provided in Fig. 7 under the form of box plots. Each box is characterized
by the first quartile (bottom line), the median (red line) and the third quartile (upper line). The whiskers indicate
the variability of the data outside the first and third quartiles. The ends of the whiskers lie at the distance of 1.5
interquartile range from the first/third quartile. The figure turns out that both the median and interquartile ranges are
significantly smaller for the proposed method, which demonstrates its superior robustness over PLS-PCE, SIR-PCE
and LRA.
14 Y. Zhou, Z. Lu, J. Hu et al. / Computer Methods in Applied Mechanics and Engineering 364 (2020) 112906

Fig. 5. Surrogate model versus actual model responses at a validation set.

Fig. 6. Convergence curves of the surrogate models constructed by SPLS-PCE, PLS-PCE, SIR-PCE and LRA using the same experimental
design generated by LHS.

4.2. Heat conduction with spatially varying diffusion coefficient

This second example consists in one stationary heat diffusion problem [9,69], which is defined on the square
domain D = [−0.5, 0.5] × [−0.5, 0.5]. The temperature field E (z), z = {z 1 , z 2 } ∈ D, is described by the following
PDE,
−∇ (κ (z) ∇ E (z)) = Q I A (z) (29)
with boundary conditions E (z) = 0 on the top boundary and ∇ E (z)·n = 0 on the left, right and bottom boundaries,
where n denotes the vector normal to the boundary. Q = 2·103 W/m3 is the heat source, A = [0.2, 0.3]×[0.2, 0.3]
corresponds to a square domain within D and the indicator function I A = 1 if z ∈ A otherwise I A = 0. The diffusion
Y. Zhou, Z. Lu, J. Hu et al. / Computer Methods in Applied Mechanics and Engineering 364 (2020) 112906 15

Fig. 7. Box plots of the generalization error based on 50 random experimental designs.. (For interpretation of the references to color in this
figure legend, the reader is referred to the web version of this article.)

Fig. 8. Finite-element mesh of heat-conduction problem.

coefficient κ (z) is a lognormal random field defined as,


κ (z) = exp (aκ + bκ ρκ (z)) (30)
where parameters aκ = 1 W/ C and bκ = 0.3 W/ C. In Eq. (30), ρκ (z) denotes a standard Gaussian random field
◦ ◦

with autocorrelation function


(  2 )
c z, z ′ = exp −  z − z ′  /λ2
( )
(31)
where the autocorrelation length λ = 0.2 m. To solve Eq. (29), the Gaussian random field ρκ (z) is first discretized
using the EOLE, see Eq. (28).
In [70], it is recommended that for a square-exponential autocorrelation function, the size of the element in the
EOLE grid must be 1/2 or 1/3λ. Therefore, for discretization of the square domain D, the size of the element in
the EOLE grid is 0.1 m, thus comprising n = 121 nodes. A relative variance error less than 1% for the random
field requires 110 independent standard normal variables, thus leading M = 110 random variables in this example.
The problem is solved with a finite element analysis code developed in MATLAB. The employed finite element
discretization in 8542 triangular T3 elements is depicted in Fig. 8. A realization of the diffusion coefficient random
field κ (z) corresponding to the 110 input variables of the model and the associated realization of temperature
random field E (z) are shown in Fig. 9. The corresponding response ∫quantity of interest is the mean temperature
in the square region B = [−0.3, −0.2] × [−0.3, −0.2], that is E B = z∈B E (z) d z (see Fig. 9(b)).
16 Y. Zhou, Z. Lu, J. Hu et al. / Computer Methods in Applied Mechanics and Engineering 364 (2020) 112906

Fig. 9. Illustration of the model input and output of heat-conduction problem.

Fig. 10. Comparison of the SPLS-based and linear PLS adaptation at various number of dimension reduction components.

Building upon the work of Ref. [9], two nested LHS experimental designs of size N = 100 and N = 200
are created. The first analysis consists in comparing the generalization performance of surrogate as a function
of the component number m for SPLS and PCE. The results of generalization errors are provided in Fig. 10. In
conjunction with the SPLS dimension reduction method, it is seen that the PCE surrogate yields consistently lower
errors, whereas using any additional components. This is to be expected as sparse solution obtained by SPLS is
more generalizable than the full solution obtained by linear PLS that includes all available variables for determining
the dominant projection directions. Note that the generalization error can be increased by increasing the number m
of components. This can be attributed to the overfitting phenomenon. The number of unknown coefficients raises
polynomially with both the total degree p and the number m of components. As a result, the generalization error
might increase due to the complexity of the polynomial chaos approximation. It is also worth mentioning that PLS-
based PCE surrogate modeling is much more sensitive to overfitting than SPLS-based PCE surrogate modeling in
Fig. 10(b).
In the subsequent analysis, we compare the performance of SPLS-PCE against other methods based on the same
experimental design scheme. The number m of components identified by all the optimal SPLS-PCE, PLS-PCE, and
SIR-PCE is given in Table 3. We compare the KDEs of the response PDF f E B obtained with the surrogate modeling
methods with that obtained with the actual model in Fig. 11 using the logarithmic scale. All aforementioned KDEs
of f E B are based on the evaluations of the actual model and those obtained by these surrogate models at a validation
set with 2 × 104 random samples in the input space. When SPLS-PCE is employed, the experimental design of
Y. Zhou, Z. Lu, J. Hu et al. / Computer Methods in Applied Mechanics and Engineering 364 (2020) 112906 17

Table 3
Optimal configuration parameter m for the dimension reduction-based methods.
N Method m
SPLS-PCE 5
100 PLS-PCE 9
SIR-PCE 6
SPLS-PCE 6
200 PLS-PCE 7
SIR-PCE 6

Fig. 11. PDF of the response in the logarithm scale.

size N = 100 is sufficient to approximate the PDF with high accuracy in the overall range including the tails. The
PLS-PCE, SIR-PCE and LRA methods exhibit a bias at low experimental designs and capture the response PDF
increasingly well as the size of experimental design rises. To further check the quality of the surrogates, we plot the
responses of SPLS-PCE, PLS-PCE, SIR-PCE and LRA against the actual output responses in Fig. 12. The figure
shows that SPLS-PCE has the best performance when the number of input variables exceed the experimental design
size (N < M), while PLS-PCE, SIR-PCE and LRA produce many outliers.
For the sake of completeness, the convergence with respect to the generalization error is reported in Fig. 13.
It illustrates again the major benefit of SPLS-PCE, as a faster convergence is achieved with relatively small
experimental design size.
As in the previous test case, the robustness of these methods is now assessed by replicating the analyses using
50 random experimental designs of the same size. The results are provided in Fig. 14 under the form of box plots
with experimental design size N = 100 and N = 200. SPLS-PCE performs the best in terms of median values and
seems to be the least scattered method as shown by the smallest interquartile range.

4.3. Clamped isotropic plate with spatially varying pressure load and modulus of elasticity

For a third example, we consider a isotropic plate defined on the domain D = [−0.5, 0.5]× [−0.8, 0.8] with a
hole A = [−0.05, 0.05] × [−0.4, 0.4]. The Poisson ratio is v = 0.29 and the plate thickness h is 0.1 m. Pressure
18 Y. Zhou, Z. Lu, J. Hu et al. / Computer Methods in Applied Mechanics and Engineering 364 (2020) 112906

Fig. 12. Surrogate model versus actual model responses at a validation set.

Fig. 13. Convergence curves of the surrogate models constructed by SPLS-PCE, PLS-PCE, SIR-PCE and LRA using the same experimental
design generated by LHS.

load q (z) and modulus of elasticity E (z) are used to characterize spatial variance in terms of lognormal random
fields. For the random field q (z), the mean µq and standard deviation σq are respectively set equal to 96 Mpa and
9.6 Mpa. For the random field E (z), the mean µ E and standard deviation σ E are respectively set equal to 2 × 105
Mpa and 2 × 104 Mpa. It is assumed that autocorrelation functions of the underlying standard Gaussian random
fields ρq (z) and ρ E (z) are modeled by the same square-exponential model
c z, z ′ = exp − (z 1 − z 1′ )2 /λ21 + (z 2 − z 2′ )2 /λ22
( ) ( [ ])
(32)
where the autocorrelation lengths λ1 = 1.65 m and λ2 = 1.75 m. Accordingly, we use a square grid with element
size 0.8 m, thus comprising 169 nodes. In the following analysis, the realizations of each standard Gaussian random
field are computed using 150 terms in the EOLE series, which allows to represent 99.9% of the variance of the
original field, thus working with two random fields leads to 300 random variables. Given the configuration of the
plate, the model can be simplified under the plane stress hypothesis, where the transverse deflection field u (z) is
Y. Zhou, Z. Lu, J. Hu et al. / Computer Methods in Applied Mechanics and Engineering 364 (2020) 112906 19

Fig. 14. Box plots of the generalization error based on 50 random experimental designs.

Fig. 15. Finite-element mesh of clamped isotropic plate.

the solution of the PDE,


( )
E (z) h 3
∇2 ) ∇ 2 u (z) = −q (z) z ∈ D/A (33)
12 1 − v 2
(

The boundary conditions are u (z) = 0 and ∇u (z) · n = 0 on each boundary, where n denotes the vector normal
to the boundary. The spatial domain of the plate is discretized into 3222 triangular T3 elements as shown in Fig. 15.
Each realization of the two random fields q (z) and E (z) is discretized over the mesh in Fig. 15. Fig. 16(a) and (b)
show one realization of q (z) and E (z) corresponding to the input variables of the model. The response quantity
of interest is the minimum transverse deflection u min denoted by “white plus” in Fig. 16(c).
The surrogate models are constructed using two nested LHS experimental designs of size N = 300 and N = 400,
respectively. Fig. 17 shows the generalization error of surrogate for increasing the number of components m. It
illustrates again that incorporating SPLS into PCE surrogate gives consistently lower generalization error. Note also
that SPLS-based PCE surrogate modeling method is much less sensitive to overfitting as shown in Section 4.2.
In Fig. 18, we assess the accuracy of SPLS-PCE, PLS-PCE, SIR-PCE and LAR in estimating the logarithm
response PDF log f u min and the corresponding number m of components is reported in Table 4. All PDFs are herein
computed with the KDE method, where the analytical one is obtained by evaluating the actual model at a validation
set with 2×104 random samples. The PDFs are computed with the 2×104 random samples but using the respective
surrogate models in lieu of the original model. It is also remarkable that with PLS-PCE, an experimental design of
size as small as N = 300 (equal to the input dimensionality) is sufficient to obtain a fairly good approximation of
20 Y. Zhou, Z. Lu, J. Hu et al. / Computer Methods in Applied Mechanics and Engineering 364 (2020) 112906

Fig. 16. Illustration of the model input and output of clamped isotropic plate problem.

Fig. 17. Comparison of the SPLS-based and linear PLS adaptation at various number of dimension reduction components.

the middle and upper tail of the reference log f u min . Fig. 19 indicates that the responses predicted by SPLS-PCE
exhibit a smaller dispersion around the actual output responses.
In Fig. 20, we plot the generalization error of these methods versus the experimental design size for u min . It is
found that, for a sufficiently large sample size, SPLS-PCE, PLS-PCE and SIR-PC converge to the same solution,
but a faster convergence rate is achieved by SPLS-PCE.
Finally, we generate random experimental designs of size N = 300 and N = 400, and then construct the surrogate
models in correspondence of each experimental design. The analysis at each experimental design is replicated
50 times to assess the robustness. The results are provided in Fig. 21(a) and (b) under the form of box plots.
Y. Zhou, Z. Lu, J. Hu et al. / Computer Methods in Applied Mechanics and Engineering 364 (2020) 112906 21

Fig. 18. PDF of the response in the logarithm scale.

Table 4
Optimal configuration parameter m for the dimension reduction-based methods.
N Method m
SPLS-PCE 9
300 PLS-PCE 7
SIR-PCE 5
SPLS-PCE 8
400 PLS-PCE 7
SIR-PCE 7

Similar to the previous test cases, it shows that both the median and interquartile ranges are significantly smaller
for the proposed method, which demonstrates superior robustness of SPLS-PCE over PLS-PCE and SIR-PCE for
the problem.

5. Conclusion
In order to mitigate the curse of dimensionality, this paper proposes a novel surrogate modeling method, which
involves seeking a sufficient dimension reduction space and approximating the original high-dimensional model
with a low-dimensional PCE. The proposed surrogate modeling method benefits from both the SPLS dimension
reduction technique and the orthonormal basis expansion. Specially, SPLS maps the high-dimensional input space
to a suitable lower dimensional space by providing a combination of only small subsets of input variables. To
determine the dominant projection directions, the hyperparameter cw controlling the number of features and the
dimensionality m of the reduced space are determined based on the LOOCV error of surrogate model. In addition,
since Gram–Schmidt orthogonalization process is adopted to construct polynomial basis, the proposed method does
not assume mutual independence between components in the reduced space and therefore can be applied to complex
system where information about the underlying distribution is implicit. The formulation and construction of such
surrogate model in a non-intrusive manner are detailed.
22 Y. Zhou, Z. Lu, J. Hu et al. / Computer Methods in Applied Mechanics and Engineering 364 (2020) 112906

Fig. 19. Surrogate model versus actual model responses at a validation set.

Fig. 20. Convergence curves of the surrogate models constructed by SPLS-PCE, PLS-PCE, SIR-PCE and LRA using the same experimental
design generated by LHS.

For validation purpose, three benchmark problems with different complexity are considered. An in-depth
comparison with the PLS-PCE, SIR-PCE and LAR surrogate modeling methods is carried out. For the sake of
consistency, the same settings are used in these methods. It is shown that the proposed method yields more accurate
model for experimental design sizes in the order of the input dimension.
In future development of this work, research will be given to improve the method by using adaptive sampling
techniques such as D-optimal design and S-optimal design [71]. Additionally, since the Bayesian approach would
allow one to better quantify the epistemic uncertainty induced by limited data, this work can be extended to
a Bayesian treatment of PCE, i.e., imposing a prior on the basis coefficients and using approximate inference
techniques such as sparse Bayesian learning [72] to estimate the posterior distribution over the unknown coefficients.
Y. Zhou, Z. Lu, J. Hu et al. / Computer Methods in Applied Mechanics and Engineering 364 (2020) 112906 23

Fig. 21. Box plots of the generalization error based on 50 random experimental designs.

Acknowledgments
This research was partially supported by the National Natural Science Foundation of China (Grant No.
51775439), National Science and Technology Major Project, China (2017-IV-0009-0046) and Innovation Foundation
for Doctor Dissertation of Northwestern Polytechnical University, China (Grant No. CX201935). The supports are
gratefully acknowledged.

Appendix

References
[1] A. Forrester, A. Sobester, A. Keane, Engineering Design Via Surrogate Modelling: A Practical Guide, John Wiley & Sons, 2008.
[2] C.E. Rasmussen, Gaussian Processes for Machine Learning, 2006.
[3] R. Tripathy, I. Bilionis, M. Gonzalez, Gaussian processes with built-in dimensionality reduction: applications to high-dimensional
uncertainty propagation, J. Comput. Phys. 321 (2016) 191–223.
[4] I. Bilionis, N. Zabaras, Multi-output local Gaussian process regression: applications to uncertainty quantification, J. Comput. Phys. 231
(2012) 5718–5746.
24 Y. Zhou, Z. Lu, J. Hu et al. / Computer Methods in Applied Mechanics and Engineering 364 (2020) 112906

[5] I. Bilionis, N. Zabaras, B.A. Konomi, G. Lin, Multi-output separable Gaussian process: towards an efficient, fully Bayesian paradigm
for uncertainty quantification, J. Comput. Phys. 241 (2013) 212–239.
[6] J. Lu, Z. Zhan, D.W. Apley, W. Chen, Uncertainty propagation of frequency response functions using a multi-output Gaussian process
model, Comput. Struct. 217 (2019) 1–17.
[7] W. Yun, Z. Lu, X. Jiang, AK-SYSi: An improved adaptive Kriging model for system reliability analysis with multiple failure modes
by a refined U learning function, Struct. Multidiscip. Optim. 59 (1) (2019) 263–278.
[8] K. Konakli, B. Sudret, Global sensitivity analysis using low-rank tensor approximations, Reliab. Eng. Syst. Saf. 156 (2016) 64–83.
[9] K. Konakli, B. Sudret, Reliability analysis of high-dimensional models using low-rank tensor approximations, Probab. Eng. Mech. 4
(2016) 18–36.
[10] D. Xiu, G.E. Karniadakis, The Wiener–Askey polynomial chaos for stochastic differential equations, SIAM J. Sci. Comput. 24 (2002)
619–644.
[11] D. Xiu, G.E. Karniadakis, Modeling uncertainty in flow simulations via generalized polynomial chaos, J. Comput. Phys. 187 (2003)
137–167.
[12] P. Diaz, A. Doostan, J. Hampton, Sparse polynomial chaos expansions via compressed sensing and D-optimal design, Comput. Methods
Appl. Mech. Engrg. 336 (2018) 640–666.
[13] Q. Shao, A. Younes, M. Fahs, T.A. Mara, Bayesian sparse polynomial chaos expansion for global sensitivity analysis, Comput. Methods
Appl. Mech. Engrg. 318 (2017) 474–496.
[14] Y. Zhou, Z. Lu, K. Cheng, C. Ling, An efficient and robust adaptive sampling method for polynomial chaos expansion in sparse
Bayesian learning framework, Comput. Methods Appl. Mech. Engrg. 352 (2019) 654–674.
[15] K. Cheng, Z. Lu, Y. Wei, Y. Shi, Y. Zhou, Mixed kernel function support vector regression for global sensitivity analysis, Mech. Syst.
Signal Process. 96 (2017) 201–214.
[16] Q. Pan, D. Dias, An efficient reliability method combining adaptive support vector machine and Monte Carlo simulation, Struct. Safefy
67 (2017) 85–95.
[17] S.N. Lophaven, H.B. Nielsen, J. Søndergaard, DACE: A Matlab Kriging Toolbox (Vol.2), in: IMM, Informatics and Mathematical
Modelling, The Technical University of Denmark, 2002.
[18] F. Nobile, R. Tempone, C. Webster, A sparse grid stochastic collocation method for partial differential equations with random input
data, SIAM J. Numer. Anal. 46 (5) (2008) 2309–2345.
[19] B. Ganapathysubramanian, N. Zabaras, Sparse grid collocation schemes for stochastic natural convection problems, J. Comput. Phys.
225 (1) (2007) 652–685.
[20] S. Kucherenko, B. Feil, N. Shah, W. Mauntz, The identification of model effective dimensions using global sensitivity analysis, Reliab.
Eng. Syst. Saf. 96 (4) (2011) 440–449.
[21] A. Kundu, H.G. Matthies, M.I. Friswell, Probabilistic optimization of engineering system with prescribed target design in a reduced
parameter space, Comput. Methods Appl. Mech. Engrg. 337 (2018) 281–304.
[22] P. Wei, Z. Lu, J. Song, Variable importance analysis: a comprehensive review, Reliab. Eng. Syst. Saf. 142 (2015) 399–432.
[23] A. Saltelli, M. Ratto, T. Andres, F. Campolongo, J. Cariboni, D. Gatelli, M. Saisana, S. Tarantola, Global Sensitivity Analysis – The
Primer, Wiley, 2008.
[24] J. Schmidhuber, Deep learning in neural networks: an overview, Neural Netw. 61 (2015) 85–117.
[25] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016.
[26] M.D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, in: European Conference on Computer Vision, Springer,
2014, pp. 818–833.
[27] Y. Khoo, J. Lu, L. Ying, Solving parametric PDE problems with artificial neural networks, 2017, preprint, arXiv:1707.03351.
[28] Y. Zhu, N. Zabaras, Bayesian deep convolutional encoder–decoder networks for surrogate modeling and uncertainty quantification, J.
Comput. Phys. 366 (2018) 415–447.
[29] Y. Zhu, N. Zabaras, P.S. Koutsourelakis, P. Perdikaris, Physics-constrained deep learning for high-dimensional surrogate modeling and
uncertainty quantification without labeled data, J. Comput. Phys. 394 (2019) 56–81.
[30] I. Jolliffe, Principal Component Analysis, Springer Berlin Heidelberg, 2011.
[31] S. Wold, K. Esbensen, P. Geladi, Principal component analysis, Chemometr. Intell. Lab. Syst. 2 (1–3) (1987) 37–52.
[32] P.G. Constantine, E. Dow, Q. Wang, Active subspace methods in theory and practice: applications to Kriging surfaces, SIAM J. Sci.
Comput. 36 (4) (2014) 1500–1524.
[33] P.G. Constantine, M. Emory, J. Larsson, G. Iaccarino, Exploiting active subspaces to quantify uncertainty in the numerical simulation
of the HyShot II scramjet, J. Comput. Phys. 302 (2015) 1–20.
[34] P.G. Constantine, P. Diaz, Global sensitivity metrics from active subspaces, Reliab. Eng. Syst. Saf. 162 (2017) 1–13.
[35] J. Li, J. Cai, K. Qu, Surrogate-based aerodynamic shape optimization with the active subspace method, Struct. Multidiscip. Optim. 59
(2) (2019) 403–419.
[36] Z. Jiang, J. Li, High dimensional structural reliability with dimension reduction, Struct. Saf. 69 (2017) 35–46.
[37] I. Papaioannou, M. Ehre, D. Straub, PLS-based adaptation for efficient PCE representation in high dimensions, J. Comput. Phys. 387
(2019) 186–204.
[38] P.S. Palar, K. Shimoyama, On the accuracy of Kriging model in active subspaces, in: 2018 AIAA/ASCE/AHS/ASC Structures, Structural
Dynamics, and Materials Conference, 2018.
[39] R. Tripathy, I. Bilionis, M. Gonzalez, Gaussian processes with built-in dimensionality reduction: Applications to high-dimensional
uncertainty propagation, J. Comput. Phys. 321 (2016) 191–223.
[40] G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science 313 (5786) (2006) 504–507.
Y. Zhou, Z. Lu, J. Hu et al. / Computer Methods in Applied Mechanics and Engineering 364 (2020) 112906 25

[41] C. Lataniotis, S. Marelli, B. Sudret, Extending classical surrogate modelling to ultrahigh dimensional problems through supervised
dimensionality reduction: a data-driven approach.
[42] K.Y. Le, B. Li, F. Chiaromonte, A general theory for nonlinear sufficient dimension reduction: Formulation and estimation, Ann. Statist.
41 (1) (2013) 221–249.
[43] S. Wold, A. Ruhe, H. Wold, W. Dunn, The collinearity problem in linear regression. the partial least squares (PLS) approach to
generalized inverses, SIAM J. Sci. Stat. Comput. 5 (1984) 735–743.
[44] K.C. Li, Sliced inverse regression for dimension reduction, J. Amer. Statist. Assoc. 86 (414) (1991) 316–327.
[45] W. Li, G. Lin, B. Li, Inverse regression-based uncertainty quantification algorithms for high-dimensional models: Theory and practice,
J. Comput. Phys. 321 (2016) 259–278.
[46] Q. Pan, D. Dias, Sliced inverse regression-based sparse polynomial chaos expansions for reliability analysis in high dimensions, Reliab.
Eng. Syst. Saf. 167 (2017) 484–493.
[47] B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, Least angle regression, Ann. Statist. 32 (2004) 407–499.
[48] G. Blatman, B. Sudret, Adaptive sparse polynomial chaos expansion based on least angle regression, J. Comput. Phys. 230 (6) (2011)
2345–2367.
[49] E. Torre, S. Marelli, P. Embrechts, B. Sudret, A general framework for data-driven uncertainty quantification under complex input
dependencies using vine copulas, Probab. Eng. Mech. 55 (2019) 1–16.
[50] M. Rosenblatt, Remarks on a multivariate transformation, Ann. Math. Stat. 23 (1952) 470–472.
[51] T. Bedford, R.M. Cooke, Vines–A new graphical model for dependent random variables, Ann. Statist. 30 (4) (2002) 1031–1068.
[52] K.A. Lê Cao, D. Rossouw, C. Robert-Granié, P. Besse, A sparse PLS forvariable selection when integrating omics data, Stat. Appl.
Genet. Mol. Biol. 7 (1) (2008) 35.
[53] D.M. Witten, R.J. Tibshirani, Extensions of sparse canonical correlation analysis with applications to genomic data, Stat. Appl. Genet.
Mol. Biol. 8 (1) (2009) 1–27.
[54] S. Oladyshkin, W. Nowak, Data-driven uncertainty quantification using the arbitrary polynomial chaos expansion, Reliab. Eng. Syst.
Saf. 106 (2012) 179–190.
[55] R. Ahlfeld, B. Belkouchi, F. Montomoli, Samba: sparse approximation of moment-based arbitrary polynomial chaos, J. Comput. Phys.
320 (2016) 1–16.
[56] F. Wang, F. Xiong, H. Jiang, J. Song, An enhanced data-driven polynomial chaos method for uncertainty propagation, Eng. Optim.
(2017) 1–20.
[57] J.A. Paulson, E.A. Buehler, A. Mesbah, Arbitrary polynomial chaos for uncertainty propagation of correlated random variables in
dynamic systems, IFAC-PapersOnLine 50 (1) (2017) 3548–3553.
[58] S. Marelli, B. Sudret, UQLab: A framework for uncertainty quantification in Matlab, in: Vulnerability, Uncertainty, and Risk:
Quantification, Mitigation, and Management, 2014, pp. 2554–2563.
[59] A. Höskuldsson, PLS regression methods, J. Chemom. 2 (1988) 211–228.
[60] P. Alberto, F. Gonz’alez, Partial least squares regression on symmetric positive-definite matrices, Rev. Colombiana Estadíst. 36 (1)
(2012) 177–192.
[61] H. Hotelling, Relations between two sets of variates, Biometrika 28 (3/4) (1936) 321–377.
[62] E.P. Elena, T. David, B. Joseph, Sparse canonical correlation analysis with application to genomic data integration, Stat. Appl. Genet.
Mol. Biol. 8 (1) (2009) 1–34.
[63] H. Zou, T. Hastie, Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005) 301–320.
[64] R. Manne, Analysis of two partial-least-squares algorithms for multivariate calibration, Chemometr. Intell. Lab. Syst. 2 (1–3) (1987)
187–197.
[65] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Datamining, Inference and Prediction, Springer, New York,
2001.
[66] C.C. Jiang, X.P. Huang, X.X. Han, D.Q. Zhang, A time-variant reliability analysis method based on stochastic process discretization,
Trans. ASME, J. Mech. Des. 136 (2014) 91009.
[67] C. Li, A. DerKiureghian, Optimal discretization of random fields, J. Eng. Mech. 119 (6) (1993) 1136–1154.
[68] M.D. McKay, R.J. Beckman, W.J. Conover, Comparison of three methods for selecting values of input variables in the analysis of
output from a computer code, Technometrics 21 (1979) 239–245.
[69] A. Nouy, Proper generalized decompositions and separated representations for the numerical solution of high dimensional stochastic
problems, Arch. Comput. Methods Eng. 17 (2010) 403–434.
[70] B. Sudret, A. DerKiureghian, Stochastic Finite Element Methods and Reliability: A State-of-the-Art Report, Department of Civil and
Environmental Engineering. University of California, Berkeley, 2000.
[71] N. Fajraoui, S. Marelli, B. Sudret, On optimal experimental designs for sparse polynomial chaos expansions, J. Uncertain. Quantif. 5
(1) (2017) 1061–1085.
[72] M.E. Tipping, Sparse Bayesian learning and the relevance vector machine, J. Mach. Learn. Res. 1 (2001) 211–244.

You might also like