0% found this document useful (0 votes)
43 views13 pages

Zhang 2020

Uploaded by

Jean Lucas Belo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views13 pages

Zhang 2020

Uploaded by

Jean Lucas Belo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Engineering Geology 271 (2020) 105617

Contents lists available at ScienceDirect

Engineering Geology
journal homepage: www.elsevier.com/locate/enggeo

Reliability analysis of slopes using the improved stochastic response surface T


methods with multicollinearity

T. Zhanga,b, X.P. Zhoua,b, , X.F. Liua,b
a
State Key Laboratory of Coal Mine Disaster Dynamics and Control, Chongqing University, Chongqing 400045, PR China
b
School of Civil Engineering, Chongqing University, Chongqing 400045, PR China

A R T I C LE I N FO A B S T R A C T

Keywords: It is known that uncertainties in geotechnical engineering are unavoidable. Slope reliability analysis is vital
Slope stability because the failure of a slope may cause great loss. In the reliability analysis of slopes, the stochastic response
Multicollinearity surface method (SRSM) provides an effective way to address the non-convergence of calculation results and has
Stepwise regression the advantage of high accuracy for highly nonlinear problems. However, multicollinearity, defined as the ex-
Least absolute shrinkage and selection operator
istence of exact correlations or highly correlated relationships among the explanatory variables, is not con-
(Lasso)
Elastic net
sidered in the SRSM, leading to local solutions. To address the multicollinearity existing in slope reliability
Improved stochastic response surface method analysis, the least absolute shrinkage and selection operator (Lasso)-based SRSM, the ridge regression-based
SRSM, the elastic net regression-based SRSM, and the stepwise regression-based SRSM are proposed in this
paper. Monte Carlo simulations and the traditional SRSM are used to validate the accuracy of the proposed
methods. It is found from the numerical results that the stepwise regression-based SRSM is the most competitive
among the four proposed methods in addressing the uncertainties of slope stability due to its accuracy and
efficiency.

1. Introduction spectral stochastic finite element method (Ghanem and Spanos, 1991).
Compared with MCS, the SFEM has the advantage of being less time
Uncertainties are an inherent property of geological materials. In consuming, while it has the disadvantage of requiring the transforma-
traditional slope stability analysis, safety factor approaches have been tion of the governing equation of stochastic problems into a determi-
used to account for uncertainties due to their simplicity and con- nate equation, which is difficult. RSMs form a useful collection of
venience. However, such approaches do not rigorously address un- mathematical and statistical techniques to model and analyze problems
certainties. Reliability analysis can compensate for the shortcomings of in which the response of interest is influenced by several variables, and
deterministic methods in geotechnical engineering. Probabilistic the objective is to optimize this response (Tan et al., 2013). For ex-
methods, such as Monte Carlo simulation (MCS) (Davis and Peterkeller, ample, Li et al. (2015) adopted the RSM to investigate slope reliability.
1997), the stochastic finite element method (SFEM) (Jiang et al., 2014), Although the RSM costs less time in calculation, it is difficult to guar-
and the response surface method (RSM) (Li et al., 2016), provide a antee convergence and has the disadvantage of low accuracy for highly
powerful tool for handling uncertainties in geological materials. It is nonlinear problems (Guan and Melchers, 2001).
well accepted that MCS is the most accurate method because large In recent years, the SRSM, an extension of the RSM in which
numbers of sampling points are covered. However, MCS is time con- Hermite polynomial chaos expansion is used to fit the limit perfor-
suming and is restricted by its large computational complexity. This mance function with ordinary least squares estimates (OLS), has pro-
makes MCS unsuitable for very complicated problems. Advanced MCS vided a new and effective way to analyze reliability problems (Huang
methods, such as subset simulation, have been developed to sig- et al., 2007; Isukapalli et al., 2010; Phoon and Huang, 2007). Li et al.
nificantly reduce the computational complexity (Au and Wang, 2014). (2009) first proposed a probability collocation method based on the
Subset simulation has been used in the reliability analysis of geo- linear independence principle to reduce the number of collocation
technical and geological problems, such as slope stability (Wang et al., points to the number of polynomial chaos expansion (PCE) coefficients.
2011) and pile foundations (Wang and Cao, 2013). The main SFEMs are Jiang et al. (2012) gave more details about the linear independence
the perturbed stochastic finite element method (Fu et al., 2001) and the principle, such as the steps for choosing collocation points. Jiang et al.


Corresponding authorr at: State Key Laboratory of Coal Mine Disaster Dynamics and Control, Chongqing University, Chongqing 400045, PR China
E-mail address: [email protected] (X.P. Zhou).

https://doi.org/10.1016/j.enggeo.2020.105617
Received 2 July 2018; Received in revised form 28 March 2020; Accepted 1 April 2020
Available online 02 April 2020
0013-7952/ © 2020 Elsevier B.V. All rights reserved.
T. Zhang, et al. Engineering Geology 271 (2020) 105617

(2012) noted that the linear independence principle means that the ⎧ y1 = β0 + β1 x11 + β2 x12 +⋯+β Na − 1 x1Na − 1 + ε1
design matrix composed of collocation points during the regression ⎪ y = β + β x21 + β x22 +⋯+β
2 0 1 2 Na − 1 x2Na − 1 + ε2
process is full rank. Xiong et al. (2016) considered the probabilistic ⎨ ⋯
weights of samples to improve the accuracy in regression. In order to ⎪ y = β + β xN 1 + β xN 2 +⋯+β
⎩ N 0 1 2 Na − 1 xNNa − 1 + εN (2)
improve the accuracy of the SRSM and reduce the amount of calcula-
tion, Isukapalli (1999) suggested that collocation points should be twice Eq. (2) can be rewritten in matrix form:
the PCE coefficients. However, OLS estimates in the frame of the con- y = Xβ + ε (3)
ventional SRSM often have low bias but large variance (Tibshirani,
1996). y
⎡ 1⎤
⎢ y2 ⎥
Although researchers have made efforts to improve the accuracy of where y= is the response value of the model,
⎢⋮⎥
the SRSM, the effects of multicollinearity in the regression process have ⎢ yN ⎥
been ignored, leading to estimations of PCE coefficients that are ⎣ ⎦
sometimes inaccurate. To date, few researchers have investigated the ⎡1 x11 ⋯ x1Na − 1 ⎤
multicollinearity among regression variables in the reliability analysis ⎢1 x21 ⋯ x2Na − 1 ⎥
X=⎢ is the matrix composed of the vectors of model
of geotechnical engineering. Zhou and Huang (2018) investigated the ⎢⋮ ⋮ ⋮ ⎥⎥
probability of failure for Xiangjiashan landsides using RSM in which the ⎣1 xN 1 ⋯ xNNa − 1⎦
data of collocation points were used to fit an explicit analytical per- ⎡ β0 ⎤
formance function. Moreover, it was determined that there is multi- ⎢ β ⎥
inputs, β = ⎢ 1 ⎥ is composed of the coefficients of the model inputs,
collinearity in those regression data which leads to the relative error of ⎢ ⋮ ⎥
the failure probability of approximate 20% (Zhou and Huang, 2018). ⎢ β Na − 1⎥
⎣ ⎦
Multicollinearity refers to the distortion of the regression coefficient or ε1
⎡ ⎤
the difficulty in estimating this coefficient due to the existence of exact ε2
and ε = ⎢ ⎥ is composed of the random errors.
correlations or highly correlated relationships among the explanatory ⎢⋮⎥
variables in a linear regression model. Some methods can be used to ⎣ εN ⎥
⎢ ⎦
The OLS estimate of β is determined as:
investigate multicollinearity, such as Lasso, ridge regression, elastic net
regression and stepwise regression. Zhou and Huang (2018) suggested a 
β (OLS) = (X ′X )−1X ′y (4)
uniform design-based response surface method combined with Lasso to
consider the effects of multicollinearity among regression variables. where X′ denotes transpose matrix of X.
The variance of β can be given by the following short calculation:
Hoerl et al. (1985) used the ridge regression method to improve esti-
mation accuracy in statistics. Meinshausen and Bühlmann (2006) ap-  ) = σ 2 (X ′X )−1
D(β (5)
plied Lasso to select variables in high-dimensional graphs. Zhou et al.
(2012) used a stepwise regression method to estimate the dominant
electromechanical modes in the field of a power system. In this paper, 2.2. Multicollinearity
the above four methods are adopted to consider multicollinearity in the
frame of the SRSM, and the improved SRSM allows more collocation The major problem of multicollinearity is that the OLS estimators of
points to be used to overcome the drawbacks of the linear independence the coefficients of the variables involved in the linear dependencies
principle. have large variances (Mansfield and Helms, 1982). When exact multi-
The paper is organized as follows: multicollinearity in the linear collinearity exists in the regression model, we have:
regression model is introduced in Section 2. The improved stochastic
Rank(X ) < Na (6)
response surface methods are proposed in Section 3. Two numerical
examples of slopes are illustrated in Section 4. Discussions are given in In other words, the design matrix is not full rank and |X′X| = 0. In
Section 5. Conclusions are drawn in Section 6. this case, (X′X)−1does not exist, and Eq. (4) does not have a unique
solution. When approximate multicollinearity exists in the regression
model, although the design matrix is full rank, |X′X| ≈ 0. In this case,
2. Multicollinearity in linear regression model the diagonal elements of Eq. (5) are large. As a consequence, it is dif-
ficult to obtain reliable estimates of unknown coefficients. The con-
2.1. Linear regression model fidence intervals of the regression coefficients will be very wide. The
confidence intervals may even include zero, which means that one
The regression performed by Hermite polynomial chaos expansion cannot even know whether an increase in the X value is associated with
is multiple linear regression. If there is a linear relation among variables an increase or a decrease in Y (Paul, 2006).
X1, X2,∙ ∙ ∙,XNa−1 and random variable y, the probability model among In this paper, the steps of solving multicollinearity are described as
variables can be given by: follows:

y = β0 + β1 X1 + β2 X2 + ∙∙∙ + β Na − 1 X Na − 1 + ε (1) (1) Detection of multicollinearity

where y is the dependent variable, β0, β1, β2, ∙ ∙ ∙, βNa−1are unknown High correlation coefficients of regression variables indicate multi-
coefficients, Na is the number of unknown coefficients, X1, X2, X3, ∙ ∙ ∙, collinearity. However, low correlation coefficients do not indicate that
XNa−1 denote independent variables, and ε is random error. It is often multicollinearity among regression variables does not exist. In other
assumed that ε follows normal distribution and that E(ε) = 0 and Var words, high correlation coefficients are sufficient but not necessary
(ε) = σ2. conditions for multicollinearity. There are several ways to detect mul-
When Na is equal to 2, this probability model is called a simple ticollinearity. For example, Bary (2017) gave three methods to detect
linear regression model. When Na is equal to or greater than 3, it is multicollinearity: the analysis of scatter plots of pairs of independent
called a multiple linear regression model. variables, the analysis of variance inflation factors (VIFs) and the
For the stochastic response surface method, if N collocation points analysis of eigenvalues. Wang et al. (2015) utilized the collinearity
are used, the linear regression model (1) can be rewritten as follows: statistic of tolerance (< 0.1), the VIF (> 5), and the condition index
(> 10) to detect multicollinearity. Usually, VIF > 10 indicates

2
T. Zhang, et al. Engineering Geology 271 (2020) 105617

multicollinearity. Therefore, this paper focuses on VIF > 10. Step 4


Generally, the VIF can be obtained by:
Response values are calculated by substituting the collocation points
1
VIFj = , j = 1, 2, ⋯, Na − 1 chosen in step 3 into the revised limit state function in step 1.
1 − Rj 2 (7)
2 Step 5
where Rj denotes the coefficient of determination when Xj is regressed
on all other predictor variables and Na − 1 is the number of regression
Unknown coefficients in Hermite polynomial chaos expansion are
variables.
regressed using the collocation points chosen in step 3 and the response
values in step 4 based on OLS. Then, the approximated model is es-
(2) Solution of multicollinearity
tablished.
Representative methods include Lasso, ridge regression, elastic net
Step 6
regression and stepwise regression. Lasso, ridge regression and elastic
net regression can address multicollinearity by shrinking the coeffi-
Based on the approximate function in step 5, the reliability index
cients of regression variables. Stepwise regression is used to remove
and failure probability can be easily evaluated using a general relia-
unimportant regression variables from the model. Details of these
bility method such as the first-order reliability method (FORM).
methods will be described in Section 3.
In this paper, Hermite polynomial chaos expansion is applied to
approximate response of structure in the stochastic response surface
3. The proposed method method. Hermite polynomial chaos expansion is as follows:

G
3.1. The conventional stochastic response surface method n n n n n n
= α0 + ∑ αi1 H1 (Yi1) + ∑ ∑ αi1i2 H2 (Yi1 , Yi2) + ∑ ∑ ∑ αi1i2 i3 H3
Flow charts of both the conventional and the improved SRSM are i1= 1 i1= 1 i2 = 1 i1= 1 i2 = 1 i3 = 1

shown in Fig. 1. The procedure of the conventional SRSM in Fig. 1 is (Yi1 , Yi2 , Yi3) + ⋯ (8)
described as follows:
where G is the response value, α0, αi1, αi2 are PCE coefficients,
Step 1 Y = (Y1, Y2, …, Yn) is an independent standard normal random vector, n
is the number of variables, and Hp(Yi1, Yi2, …Yip) is polynomial chaos of
If the stochastic inputs are not independent standard random vari- order p, which is written as follows:
ables, they should be transformed into independent standard random 1 TY ∂p 1 T
variables, with the result that the limit state function is expressed in Hp (Yi1 , Yi2,…Yip) = (−1) pe2 Y e− 2 Y Y
∂Yi1 Yi2…Yip (9)
terms of independent standard random variables.
For order-p Hermite polynomial chaos expansion with n variables,
Step 2 the number Na of PCE coefficients is calculated by:
(n + p)!
Hermite polynomial chaos expansion is employed as stochastic Na =
n ! p! (10)
outputs. The order of Hermite polynomial chaos expansion should be
chosen. Usually, the higher the order is, the more accurate the calcu- The most important steps are the choice of rational collocation
lation result. points and the regression PCE coefficients. The probability collocation
method (Huang et al., 2009; Li et al., 2010) is commonly used to choose
Step 3 collocation points. For the probability collocation method, if the order
of Hermite polynomial chaos expansion is p, the collocation points are
Proper collocation points are chosen based on the probability col- usually taken as the roots of the polynomial of order p + 1 in Eq. (9). As
location method after all available collocation points are calculated. mentioned above, (p + 1)n collocation points are available to fit

Transform the correlated non-normal random

Determine the order of Hermite polynomial chaos expansion

Select collocation points


OLS Conventional SRSM

Calculate real response values Lasso regression


Ridge regression
Elastic net regression
Regress the PCE coefficients in Eq. (8)
Stepwise regression

Evaluate the reliability index and failure probability


Improved SRSM considering multicollinearity

Fig. 1. Flow chart of both the conventional and the improved SRSM.

3
T. Zhang, et al. Engineering Geology 271 (2020) 105617

Start

Set significant level α, Fin, Fout

Calculate the square sum of partial regression


of unintroduced variables

Calculate the F-test value of the


variable with maximum square sum of
partial regression Fmax

Delete this variable


N
Stop Fmax>Fin?

Y Retain this N
Fmin<Fout?
variable
Introduce this variable
into model

Calculate the F-test value of the


Calculate the square sum of partial
variable with the minimum square sum
regression of introduced variables
of partial regression Fmin

Fig. 2. The process of stepwise regression.

Table 1a
Stochastic variables in an anchored rock slope.
Variable Distribution Mean (μ) Standard Lower Upper
deviation (σ) bound (a) bound (b)

c (MPa) Normal 0.1 0.02 – –


φ (°) Normal 35 5 – –
zw/z Truncated 0.5 – 0 1
exponential
kh Truncated 0.08 – 0 0.16
exponential

Hermite polynomial chaos expansion. In practice, if 0 is not a root, it


becomes a collocation point. However, it is not necessary to use all
possible collocation points. The general principle for selecting collo-
cation points is that origin-symmetric points in high-probability regions
are preferred (Tatang et al., 1997). Jiang et al. (2012) suggested
ranking the available collocation points in terms of the distance be-
tween the collocation points and the origin, and the frontmost collo-
cation points should be chosen first.
Fig. 3. The anchored rock slope.

4
T. Zhang, et al. Engineering Geology 271 (2020) 105617

Table 1b
Parameters in the anchored rock slope in Fig. 3.
Symbol Value Meaning

B 15.35 Distance between the crest of slope and the tensile crack
(m)
H 60 Height of the rock slope (m)
kh – Horizontal seismic coefficient (dimensionless)
q 5 Surcharge pressure (MN/m2)
T 28.6 Anchoring force (MN/m)
U1 2
γw zw The horizontal water pressure in the tension crack (MN/m)
2
U2 γw zw (H − z ) The water pressure on failure plane (MN/m)
2 sin ψp

W 24.8 Weight of the sliding block (MN)


z 14 Depth of the tensile crack (m)
zw – Depth of water in the tensile crack (m)
α 35 The angle between anchoring force and the normal of the
failure plane (o)
ψf 50 The slope angle (o)
ψp 35 The inclination angle of the sliding surface (o)
c – Cohesion of the sliding surface (MN/m2)
φ – Frictional angle of the sliding surface (o)
γr 0.027 Unit weight of rock (MN/m3)
γw 0.001 Unit weight of water (MN/m3)

3.2. The improved SRSM

In the past, the multicollinearity among regression variables in


SRSM has not been considered. Consequently, of particular concern in
this paper is the multicollinearity in the linear regression model in the
SRSM. The procedures for the Lasso-based SRSM, ridge regression-
based SRSM, elastic net regression-based SRSM and stepwise regres-
sion-based SRSM are shown in Fig. 1, in which OLS is replaced by each
regression method mentioned above.

3.2.1. Lasso, ridge regression and elastic net regression


Lasso, ridge regression and elastic net regression have similar forms.
They all minimize the corresponding formula. The regression coeffi-
cients obtained by Lasso (Tibshirani, 1996), ridge regression (Hoerl and
Kennard, 2000) and elastic net regression (Hui and Hastie, 2005) can be
given by:
2
n p p
⎛ ⎞
β (Lasso) = min ∑ yi − β0 − ∑ xij βj ⎟ , subject to ∑ |βj| ≤ t
β ⎜
i=1 ⎝ j=1 ⎠ j=1

(11)
2
n p p
⎛ ⎞
β (Ridge ) = min ∑ yi − β0 − ∑ xij βj ⎟ , subject to ∑ βj 2 ≤ t
β ⎜
i=1 ⎝ j=1 ⎠ j=1

(12)

β (E lastic net)
n 2 p
p
⎧ ⎫
= min ∑ ⎛⎜yi − β0 − ∑ xij βj ⎞⎟ + λ ∑ [(1 − α ) βj2 + α |βj|]
β ⎨ i=1 j=1 j=1

⎩ ⎝ ⎠ ⎭ (13)
where n is the number of samples, p is the number of regression vari-
ables, t ≥ 0, and λ and α are tuning parameters.
Lasso regression methods are widely used in domains with massive
datasets. Lasso expects many coefficients to be zero and expects only a
small subset to be nonzero. Ridge regression performs well with many
predictors, each of which has a small effect (Ogutu et al., 2012). Ridge
regression shrinks the coefficients and does not force coefficients to
vanish. Elastic-net regression is a mixture of Lasso and ridge regression
and is robust to extreme correlations among the regression variables
(Friedman et al., 2010).
Fig. 4. The design matrix of the third-order polynomial chaos expansion is
composed of the frontmost 35 collocation points.
(1) The Lasso and ridge regression procedures are described as follows:

5
T. Zhang, et al. Engineering Geology 271 (2020) 105617

Cross validation (CV) is employed to search for the optimal λ and α.


For example, in fivefold cross validation, the original sample is
randomly partitioned into five subsamples. Of the five subsamples, a
single subsample is retained as the validation data for testing the model,
and the remaining four subsamples are used for training data. The cross
validation process is then repeated five times (the folds), with each of
the five subsamples being used exactly once as the validation data.
Then, the five results from the folds can be averaged (or otherwise
combined) to produce a single estimation.

3.2.2. Stepwise regression


Stepwise regression is different from the above three methods. The
basic idea of stepwise regression is to introduce variables into the model
one by one. After each introduction of an explanatory variable, the F
test should be carried out. The t-test is used to test the selected ex-
planatory variables one by one. At the same time, when the original
explanatory variable is no longer significant due to the introduction of a
Fig. 5. The VIFs of design matrix composed of the frontmost 35 points. later explanatory variable, it should be deleted. This method ensures
that only the significant variables are included in the regression eq. A
flow diagram of stepwise regression is plotted in Fig. 2. In this paper, α
is taken to be 0.05. Then, Fin and Fout can be easily obtained by α.

4. Numerical examples

In this section, the four methods mentioned above are compared


with the other two methods, which were suggested by Yang et al.
(2013) and Jiang et al. (2012). The linear independence principle and
stepwise regression are adopted in the frame of the SRSM by Yang et al.
(2013). It is worth noting that Yang et al. (2013) adopted stepwise
regression to reduce computation time. In this paper, a stepwise re-
gression-based SRSM is applied to solve multicollinearity, in which the
linear independence principle is not employed.
Elastic net regression is a combination of ridge regression and Lasso
regression in which many groups of parameters (λ, α) are adopted to
obtain unknown coefficients, and then the optimal group of parameters
is determined by cross validation. Thus, elastic net regression is much
more time consuming than ridge regression and lasso regression. The
tuning parameter α is set at 0.5 when elastic net regression is performed
Fig. 6. The VIFs of the design matrix employed by different methods with the below to make it less time consuming.
3rd Hermite polynomial chaos expansion in example 1. Stepwise regression is simpler than ridge regression, lasso regression
and elastic net regression because there is no determination of tuning
parameters during the process of stepwise regression.
Step 1

Standardization of stochastic inputs and outputs. 4.1. Example 1

Step 2 A slope stability problem that takes into account surcharge and
stochastic horizontal seismic loads is considered in this subsection. The
Calculation of the upper bound t0 of t, where t0 is the sum of coef- reliability of anchored rock slopes subjected to surcharge and random
ficients calculated by OLS, followed by minimization of the formula horizontal seismic loads is analyzed by different stability methods. The
based on different t values (0 ≤ t ≤ t0). geometrical model of this slope is plotted in Fig. 3. In this example,
shear strength parameters, depth of water in the tensile crack and
Step 3 horizontal seismic loads are considered to be stochastic variables; their
distributions are listed in Table 1a. The correlation coefficient of c and
Cross validation (CV) is employed to search for the optimal t. φ is −0.5. Other variables are listed in Table 1b.
Non-normal variables are first transformed into the standard normal
(2) The elastic net procedure is described as follows: variables. Such a process is called equivalent normalize transformation
Step 1 and can be written as follows:
c = μc + σc Y1 (14)
Standardization of stochastic inputs and outputs.
tan φ = μtan φ + σtan φ Y2 (15)
Step 2
zw ⎛ ⎛ 1 ⎞ ⎡ ⎛ 1 ⎞ ⎛ 1 ⎤⎞
Minimization of Eq. (13) based on different λ and α. = −μ zw ln ⎜exp ⎜− a − Φ(Y3) ⎢exp ⎜− a − exp ⎜− b
z z ⎜ μ zw ⎟ ⎢ μ zw ⎟ μ zw ⎥⎟
⎝ ⎝ z ⎠ ⎣ ⎝ z ⎠ ⎝ z ⎦⎠

Step 3 (16)

6
T. Zhang, et al. Engineering Geology 271 (2020) 105617

Fig. 7. The cumulative distribution function of the stochastic surface response.

Table 2
The failure probability of an anchored rock slope subject to surcharge and seismic loads.
Order SRSM in (Yang et al., SRSM in (Jiang et al., Lasso-based Ridge regression-based Elastic net regression-based Stepwise regression-based MCS
2013) 2012) SRSM SRSM SRSM SRSM

3 0.01145 0.01167 0.00312 0.00335 0.00388 0.00386 0.00350


4 0.00290 0.00311 0.00316 0 0.00372 0.00375 0.00350
5 0.00337 0.00385 0.00301 0.00160 0.00338 0.00348 0.00350
6 0.00361 0.00361 0.00306 0 0.00372 0.00355 0.00350

7
T. Zhang, et al. Engineering Geology 271 (2020) 105617

Fig. 8. The relate error of the failure probability of an anchored slope.

et al. (2012) because of the linear independence principle. Those works


have to consider 35 other collocation points, such that the design ma-
trix composed of these other points is full rank. In addition, the mul-
ticollinearity of this design matrix is studied by computing the VIFs.
The VIFs of this design matrix are illustrated in Fig. 5, in which all VIFs
are greater than 0.1. Moreover, 16 VIFs are greater than 10. This
finding implies that this series of data shows strong multicollinearity.
Unlike the SRSM in the works of Yang et al. (2013) and Jiang et al.
(2012), the methods that consider multicollinearity, namely, the Lasso-
based SRSM, ridge regression-based SRSM, elastic net regression-based
SRSM and stepwise regression-based SRSM, still accept the frontmost
35 collocation points. The Lasso-based SRSM, ridge regression-based
SRSM and elastic net regression-based SRSM address multicollinearity
by shrinking unknown coefficients. More concretely, these three
methods impose some constraints on the basis of OLS. Stepwise re-
gression considers multicollinearity by choosing significant regression
variables, and then OLS is used to regress significant regression vari-
ables.
Fig. 9. The topographic features of the Dalu ditch landslide. (Bai, 2014). The VIFs of the design matrix employed by different methods with
the 3rd Hermite polynomial chaos expansion are illustrated in Fig. 6. In
Fig. 6, a red point indicates VIF > 10, which suggests that multi-
⎛ ⎛ 1 ⎞ ⎡ ⎛ 1 ⎞ ⎛ 1 ⎤⎞ collinearity does exist in the traditional SRSM. Because the Lasso-based
kh = −μ kh ln ⎜exp ⎜− a − Φ(Y3) ⎢exp ⎜− a⎟ − exp ⎜− b
μ kh ⎟ ⎢ μ k μ kh ⎥⎟ SRSM, ridge regression-based method, and elastic net regression-based
⎝ ⎝ ⎠ ⎣ ⎝ h ⎠ ⎝ ⎦⎠
method adopt the same design matrix to conduct the regression pro-
(17) gress, these three proposed methods share the same VIFs, as shown in
where Y1, Y2, Y3, and Y4 are standard normal variables. Fig. 6. Fig. 6 also shows that for the VIFs obtained by the Lasso, ridge,
The polynomial H4(Yi, Yi, Yi, Yi) of order p + 1, where p = 3, is and elastic net regression-based SRSMs, by the SRSM in (Jiang et al.,
written as follows: 2012) and by the SRSM in (Yang et al., 2013), multicollinearity exists in
the design matrix. The differences are that the SRSM in (Jiang et al.,
1 2 2 ∂ 4 − 1 (Y12+ Y22) 2012) and the SRSM in (Yang et al., 2013) adopted direct OLS to obtain
H4 (Yi , Yi , Yi , Yi ) = e2 (Y1 + Y2 ) e 2 = Yi4 − 6Yi2 + 3 (i = 1, 2)
∂Yi4 the response surface, while the Lasso, ridge, and elastic net regression-
(18) based SRSMs consider extra techniques on the basis of direct OLS. On
the other hand, the VIFs of the design matrix employed by stepwise
The roots of Eq. (18) are usually taken as the collocation points, i.e., regression-based SRSM are all less than 5, which implies that stepwise
Yi = ± 3 ± 2.45 . Considering that 0 is also a collocation point, regression addresses multicollinearity by choosing a design matrix
(C4+11)4 = 625, and thus 625 collocation points are available in the composed of the significant variables while insignificant variables are
third-order polynomial with four random variables. removed.
From Eq. (10), it follows that the number of PCE coefficients is 35 in The stochastic response values are assumed to be the safety factor
the third order, i.e., α0, α1, α2,α3, α4, α11, α21,α31, α41, etc. The design minus one. The safety factor of the anchored rock slope subject to
matrix of the third-order polynomial chaos expansion composed of the surcharge and horizontal seismic loads was studied by Shukla and
frontmost 35 collocation points can be expressed as in Fig. 4. Hossain (2011) as follows:
The rank of the design matrix in Fig. 4 is equal to 28 but does not
have the full rank of 35. Thus, the frontmost 35 collocation points are
not accepted by the methods suggested by Yang et al. (2013) and Jiang

8
T. Zhang, et al. Engineering Geology 271 (2020) 105617

Fig. 10. (a). The engineering geological plan of the Dalu ditch landslide. (Bai, 2014). (b). A typical geological cross section of the Dalu ditch landslide. (Luo, 2012)

9
T. Zhang, et al. Engineering Geology 271 (2020) 105617

Table 3 The cumulative distribution functions (CDFs) of the stochastic sur-


The geotechnical parameters of the landslide. faces obtained by different reliability approaches are plotted in Fig. 7.
Soil number c(kPa) φ(°) γ(kN/m3) Fig. 7 shows that the CDF of the stochastic surface obtained by the
third-order SRSM in the works of Jiang et al. (2012) and Yang et al.
① Loess Mean 12 18.1 18.51 (2013) does not agree well with the real CDF, while the CDFs of the
Variance 2 2 –
stochastic surfaces obtained by the proposed methods considering
② Clay Mean 14 14.1 19.51
Variance 2 2 –
multicollinearity do agree well with the real CDF with the exception of
the CDF of the surface obtained by the ridge regression-based SRSM
because ridge regression does not force coefficients to vanish. Thus, the
collocation points are not sufficient for ridge regression.
Fig. 7 shows that the probability of failure corresponds to the
probability that the performance function is below zero; this probability
is also listed in Table 2. The relative errors of the failure probabilities
obtained by the different methods are shown in Fig. 8. It can be seen
from Fig. 8 that the elastic net regression-based SRSM and stepwise
regression-based SRSM perform better than the other methods.

4.2. Example 2

In this subsection, the effect of excavation on slope safety is ana-


lyzed. The Dalu ditch landslide is located in Shanxi, China. The resur-
rection of this old landslide was triggered by the excavation of the front
slope foot in 2007. As shown in Fig. 9, the Dalu ditch landslide is very
large, with a volume of approximately 856.8 × 104 m3. According to
works by Luo (2012), the planar width of the landslide is approximately
450 m, the planar distance along the sliding direction is approximately
400 m, and the height is approximately 125 m. The engineering geo-
logical plan and typical geological cross section of the Dalu ditch
landslides are depicted in Fig. 10. The geotechnical parameters of the
Fig. 11. The VIFs of the design matrix employed by different methods with the sliding body are listed in Table 3 (Bai, 2014).
3rd Hermite polynomial chaos expansion in example 2.
For such a slope without explicit analytical performance functions,
the computational expenses required for reliability analysis are major.

Fs =
(
2c∗P + (Q + 2q∗R)
cos(θ + ψp)
cos θ

zw∗2
γr∗
sin ψp −
zw∗2
γr∗
P )
+ 2T ∗ cos α tan ϕ The response values of the safety factor Fs in this example are calcu-
lated using a 3D rigorous limit equilibrium method (Zhou and Cheng,
sin(θ + ψp) zw∗2
(Q + 2q∗R) + cos ψp − 2T ∗ sin α 2014).
cos θ γr∗
According to the geological cross section and the geometry of the
(19) Dalu ditch landslide in Fig. 10, the sliding surface and slope surface are
where approximated as follows, respectively:

P = (1 − z ∗) cosec ψp , (20) (x − 100)2 y2 (z − 120)2


+ + =1
300 2 450 2 1202 (30)
Q = (1 − z ∗2) cot ψp − cot ψf (21)
⎧ z = x (100 < x ≤ 200)
R = (1 − z ∗) cot ψp − cot ψf (22) x
⎨z = + 80(200 < x ≤ 400)
⎩ 10 (31)
γr∗ = γr / γw (23)
To illustrate multicollinearity, Fig. 11 provides the VIFs of the de-
c∗ = c / γr H (24) sign matrix employed by different methods with the 3rd Hermite
polynomial chaos expansion. As shown in Fig. 11, with the exception of
q∗ = q/ γr H (25) the design matrix employed by stepwise regression-based SRSM, all the
design matrixes show multicollinearity.
z ∗ = z /H (26)
The results of the failure probability obtained by the proposed
z w∗ = zw / H (27) methods and other reference methods (Yang et al., 2013; Jiang et al.,
2012) are listed in Table 4. The corresponding relative error of the
T∗ = T / γr H2 (28) failure probability is plotted in Fig. 12. It can be found that the ridge
regression-based SRSM has poor performance. Two other proposed
θ = tan−1 (kh) (29)
methods, namely, the elastic net regression-based SRSM and the

Table 4
The failure probability of the Dalu ditch landslide.
Order SRSM in (Yang et al., SRSM in (Jiang et al., Lasso-based Ridge regression-based Elastic net regression-based Stepwise regression-based MCS
2013) 2012) SRSM SRSM SRSM SRSM

3 0.4099 0.4588 0.0410 0.1130 0.156 0.4616 0.5099


4 0.5399 0.4432 0.4518 0.3507 0.5213 0.5080
5 0.4895 0.4526 0.5468 0.9770 0.5105 0.5108
6 0.4944 0.4729 0.4602 0.4210 0.5066 0.5122

10
T. Zhang, et al. Engineering Geology 271 (2020) 105617

Fig. 12. The relative error of the failure probability of the Dalu ditch landslide.

Table 5
The coefficient of determination R2 for different models with the 4th Hermite polynomial chaos expansion.
Lasso–based SRSM Ridge regression–based SRSM Elastic net regression–based SRSM OLS

RSS 23.2352 319.8518 14.4120 93.8140


TSS 4950.4930
R2 0.9953 0.9353 0.9971 0.9811

Fig. 13. The residual squares (yi − 


yi )2 for the 4th Hermite polynomial chaos expansion.

stepwise regression-based SRSM, have improved the accuracy of the because of the multicollinearity, one cannot even be sure whether an
SRSM compared with the reference methods (Yang et al., 2013; Jiang increase in the value of the independent variable X is associated with an
et al., 2012). Among these six methods, the stepwise regression-based increase or a decrease in the value of the dependent variable Y. The
SRSM has the advantage of high accuracy with a relative error of 0.4% coefficient of determination, R2, is the proportion of the variance of the
in the fourth order. dependent variable Y that is predictable from the independent variable
The stepwise regression-based SRSM uses a data reduction tech- X and can reflect the variance of estimation. The coefficient of de-
nique to address multicollinearity. In other words, stepwise regression termination R2 is obtained by:
directly chooses the regression data without multicollinearity. Thus, as
shown in Fig. 11, the VIFs of the chosen predictors in the stepwise re- RSS
R2 = 1 −
gression-based SRSM are all less than 10, which indicates that the TSS (32)
multicollinearity of regression has been addressed.
n
The three other proposed methods sacrifice the unbiased estimator
of coefficients to decrease the variance. As mentioned in Section 2.2,
RSS = ∑ (yi − yi )2
i=1 (33)

11
T. Zhang, et al. Engineering Geology 271 (2020) 105617

n
regression process, are proposed to investigate slope reliability with
TSS = ∑ (yi − y )2 multicollinearity to improve accuracy. These four proposed methods
i=1 (34)
allow the use of more available collocation points to improve the ac-
where RSS denotes the residual sum of squares, TSS indicates the total curacy of the failure probability of slope instead of using a limited
sum of squares, yi is the real value, y is the mean value of the real value, number of collocation points. The main conclusions are drawn as fol-

yi is the estimated value, and n is the number of pieces of data. lows:
The coefficient of determination R2 normally ranges from 0 to 1.
The closer R2 is to 1, the better the approximate model. The R2 values (1) By analyzing variance inflation factors (VIFs), multicollinearity is
obtained by the three proposed models with the 4th Hermite poly- proven to exist in slope stability analysis.
nomial chaos expansion and by the traditional SRSM with OLS are (2) Four methods, including the Lasso-based SRSM, ridge regression-
listed in Table 5. As shown in Table 5, unlike the ridge regression-based based SRSM, elastic net regression-based SRSM, and stepwise re-
SRSM, the Lasso–based SRSM and elastic net regression-based SRSM gression-based SRSM, are proposed to address multicollinearity.
obtain R2 values that are larger than those obtained by the traditional (3) Two slope reliability analyses, one of which is a real engineering
SRSM with OLS. Table 5 also shows that the performance of the elastic geology problem, are tested. It is demonstrated that the stepwise
net regression-based SRSM is the best among these three proposed regression-based SRSM is competitive among the four proposed
methods. This agrees with the performance of the elastic net regression- methods in the study of slope stability and has the advantages of
based SRSM in the relative error of failure probability in Fig. 12. high accuracy and less time consuming.
Because only the RSS of the different models is different in the
computation of R2, the residual squares (yi −  yi )2 for each real response The results imply that multicollinearity in slope stability analysis
value yi are drawn in Fig. 13. It can be observed from Fig. 13 that the does exist and that the stepwise regression-based SRSM is able to ad-
ridge regression-based SRMS shows poor performance because the re- dress multicollinearity to improve accuracy. As a consequence, the
sidual squares are larger than those of the two other methods. proposed stepwise regression-based SRSM is useful for the three-di-
mensional stability analysis of complex slopes and can be applied in
5. Discussions engineering geology projects or practice.

Two slope reliability analyses, one of which is a real engineering Declaration of Competing Interest
geology problem, are conducted using the proposed methods and other
reference methods (Jiang et al., 2012; Yang et al., 2013) in this paper. We declare that we do not have any commercial or associative in-
On the one hand, multicollinearity in the SRSM is confirmed to exist in terest that represents a conflict of interest in connection with the work
both examples by analyzing variance inflation factors (VIFs). On the submitted.
other hand, the results of numerical examples imply that the stepwise
regression-based SRSM and elastic net regression-based SRSM can sig- Acknowledgements
nificantly improve the accuracy of the SRSM with regard to the failure
probability of the slope. This work is supported by the National Natural Science Foundation
The reason is that the stepwise regression-based SRSM always of China (Grant Nos. 51325903 and 51679017), Project 973 of China
chooses the significant regression variables to regress based on its (Grant No. 2014CB046903) and Natural Science Foundation Project of
computational procedure. The performance of the ridge regression- CQ-CSTC (Grant Nos. cstc2017jcyj-yszxX0013, cstc2015jcyjys30006
based SRSM with regard to the reliability of slope is less than sa- and cstc2016jcyjys0005).
tisfactory. Because all predictors, not only the most predictive subset of
predictors, participate in the prediction of the response for ridge re- References
gression, the collocation points are not sufficient for ridge regression.
However, for stepwise regression, the number of coefficients is reduced, Au, S.K., Wang, Y., 2014. Engineering risk assessment with subset simulation. Wiley.
and hence, the collocation points are sufficient for fitting. https://doi.org/10.1002/9781118398050.app1.
The stepwise regression-based SRSM effectively addresses multi- Bai, Y., 2014. Study on the Optimization of Dalu Ditch Superlarge Loess Landslide
Treatment Design in Wuqi County. MS dissertation. Xi’an University of Science and
collinearity among the regression variables of Hermite polynomial Technology (in chinses).
chaos expansion and is more competitive among the proposed four Bary, M.N.A., 2017. Robust regression diagnostic for detecting and solving multi-
methods for the following reasons: First, the stepwise regression-based collinearity and outlier problems: applied study by using financial data. Appl. Math.
Sci. 11, 601–622.
SRSM has high accuracy. Second, the stepwise regression-based SRSM
Davis, T.J., Peterkeller, C., 1997. Modelling uncertainty in natural resource analysis using
has inherently low computational complexity. Although the elastic net fuzzy sets and Monte Carlo simulation: slope stability prediction. Int. J. Geogr. Inf.
regression-based SRSM is sometimes more accurate than the stepwise Syst. 11 (5), 409–434.
regression-based SRSM, the elastic net regression-based SRSM spends Friedman, J.H., Hastie, T., Tibshirani, R., 2010. Regularization paths for generalized
linear models via coordinate descent. J. Stat. Softw. 33 (1), 1–22.
much more time searching for tuning parameters. In addition, numer- Fu, X.D., Qian, P., Liu, Z.D., 2001. The reliability analysis for slope stability by pertur-
ical examples imply that the convergence of the stepwise regression- bation stochastic finite element method. Rock Soil Mech. 22 (4), 413–418 (in chi-
based SRSM is fast. nese).
Ghanem, R.G., Spanos, P.D., 1991. Stochastic Finite Elements: A Spectral Approach.
Springer Berlinhttps://doi.org/10.1007/978-1-4612-3094-6.
6. Conclusions Guan, X.L., Melchers, R.E., 2001. Effect of response surface parameter variation on
structural reliability estimates. Struct. Saf. 23 (4), 429–444.
Hoerl, A., Kennard, R., 2000. Ridge regression: biased estimation for nonorthogonal
For irregular slopes without explicit analytical performance func- problems. Technometrics. 12 (1), 55–67.
tions, three-dimensional reliability analysis may take a long time. The Hoerl, A.E., Kennard, R.W., Hoerl, R.W., 1985. Practical use of Ridge Regression: a
stochastic response surface method (SRSM) reduces the computation Challenge Met. J. R. Stat. Soc. 34 (2), 114–120.
Huang, S.P., Mahadevan, S., Rebba, R., 2007. Collocation-based stochastic finite element
time by using a small number of collocation points to fit the perfor- analysis for random field problems. Prob. Eng. Mech. 22 (2), 194–205.
mance function. In addition, it is important to require the accuracy of Huang, S.P., Liang, B., Phoon, K.K., 2009. Geotechnical probabilistic analysis by collo-
the three-dimensional stability analysis if the failure of a slope may cation-based stochastic response surface method: an excel add-in implementation.
Georisk 3 (2), 75–86.
cause great loss of life and property. Hui, Z., Hastie, T., 2005. Regularization and variable selection via the elastic net. J. R.
In this paper, the improved SRSMs, in which Lasso, ridge regression, Stat. Soc. Ser. B 67 (2), 301–320.
elastic net regression and stepwise regression are used during the Isukapalli, S.S., 1999. Uncertainty Analysis of Transport-Transformation Models. 57(1).

12
T. Zhang, et al. Engineering Geology 271 (2020) 105617

Dissertations & Theses – Gradworks, pp. 31–32. based stochastic response surface method. In: Applications of Statistics and
Isukapalli, S.S., Roy, A., Georgopoulos, P.G., 2010. Stochastic response surface methods Probability in Civil Engineering - Proceedings of the 10th International Conference on
(SRSMs) for uncertainty propagation: application to environmental and biological Applications of Statistics and Probability. ICASP10, pp. 83–85.
systems. Risk Anal. 18 (3), 351–363. Shukla, S.K., Hossain, M.M., 2011. Analytical expression for factor of safety of an an-
Jiang, S.H., Li, D.Q., Zhou, C.B., 2012. Optimal probabilistic collocation points for sto- chored rock slope against plane failure. Int. J. Geotech. Eng. 5 (2), 181–187.
chastic response surface method. Chin. J. Comput. Mech. 29 (3), 345–351 (in Tan, X.H., Shen, M.F., Hou, X.L., Li, D.Q., Hu, N., 2013. Response surface method of
chinses). reliability analysis and its application in slope stability analysis. Geotech. Geol. Eng.
Jiang, S.H., Li, D.Q., Zhang, L.M., Zhou, C.B., 2014. Slope reliability analysis considering 31 (4), 1011–1025.
spatially variable shear strength parameters using a non-intrusive stochastic finite Tatang, M.A., Pan, W., Prinn, R.G., Mcrae, G.J., 1997. An efficient method for parametric
element method. Eng. Geol. 168 (1), 120–128. uncertainty analysis of numerical geophysical models. J. Geophys. Res.-Atmos. 102
Li, W.X., Lu, Z.M., Zhang, D.X., 2009. Stochastic analysis of unsaturated flow with (D18), 21925–21932.
probabilistic collocation method. Water Resour. Res. 45 (8), 2263–2289. Tibshirani, R.J., 1996. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. 58,
Li, D.Q., Zhou, C.B., Chen, Y.F., Jiang, Q.H., Rong, G., 2010. Reliability analysis of slope 267–288.
using stochastic response surface method and code implementation. Chin. J. Rock Wang, Y., Cao, Z.J., 2013. Expanded reliability-based design of piles in spatially variable
Mech. Eng. 29 (8), 1513–1523. soil using efficient Monte Carlo simulations. Soils Found. 53 (6), 820–834.
Li, D.Q., Jiang, S.H., Cao, Z.J., Zhou, C.B., Zhang, L.M., 2015. A multiple response-surface Wang, Y., Cao, Z.J., Au, S.K., 2011. Practical reliability analysis of slope stability by
method for slope reliability analysis considering spatial variability of soil properties. advanced Monte Carlo Simulations in spreadsheet. Can. Geotech. J. 48 (1), 162–172.
Eng. Geol. 187, 60–72. Wang, H.J., Chen, Y.N., Pan, Y.P., Li, W.H., 2015. Spatial and temporal variability of
Li, D.Q., Zheng, D., Cao, Z.J., Tang, X.S., Phoon, K.K., 2016. Response surface methods for drought in the arid region of China and its relationships to teleconnection indices. J.
slope reliability analysis: review and comparison. Eng. Geol. 203, 3–14. Hydrol. 523, 283–296.
Luo, L.J., 2012. Study on the Optimum Design of Anti-Sliding Pile in Landslide Treatment Xiong, F.F., Chen, W., Xiong, Y., Yang, S.X., 2016. Weighted stochastic response surface
Based on Performance. PhD dissertation. Chang’an university (in chinses). method considering sample weights. Struct. Multidiscip. Optim. 43 (6), 837–849.
Mansfield, E.R., Helms, B.P., 1982. Detecting multicollinearity. Am. Stat. 36 (3), Yang, L.F., Yang, X.F., Yu, B., Li, Z.Y., 2013. Improved stochastic response surface method
158–160. based on stepwise regression analysis. Chin. J. Comput. Mech. 30 (1), 88–93 (in
Meinshausen, N., Bühlmann, P., 2006. High-dimensional graphs and variable selection chinese).
with the Lasso. Ann. Stat. 34 (3), 1436–1462. Zhou, X.P., Cheng, H., 2014. Stability analysis of three-dimensional seismic landslide
Ogutu, J.O., Schulz-Streeck, T., Piepho, H.P., 2012. Genomic selection using regularized using the rigorous limit equilibrium method. Eng. Geol. 174, 87–102.
linear regression models: ridge regression, lasso, elastic net and their extensions. BMC Zhou, X.P., Huang, X.C., 2018. Reliability analysis of slopes using UD-based response
Proc. 6, S10. (2012). https://doi.org/10.1186/1753-6561-6-S2-S10. surface methods combined with LASSO. Eng. Geol. 233, 111–123.
Paul, R.K., 2006. Multicollinearity: Causes, Effects and Remedies. Agricultural Statistics, Zhou, N., Pierre, J.W., Trudnowski, D., 2012. A stepwise regression method for estimating
Roll No. 4405. IASRI, New Delhi. dominant electromechanical modes. IEEE Trans. Power Syst. 27 (2), 1051–1059.
Phoon, K.K., Huang, S.P., 2007. Geotechnical probabilistic analysis using collocation-

13

You might also like