Advanced Engineering Informatics: Sciencedirect
Advanced Engineering Informatics: Sciencedirect
a
Department of Civil and Environmental Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, China
b
Department of Building and Real Estate, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, China
c
Faculty of Engineering and IT, University of Technology Sydney, NSW 2007, Australia
d
College of Civil Engineering, Hunan University, Changsha 410082, China
e
MOE Key Laboratory of Building Safety and Energy Efficiency, Changsha 410082, China
Keywords: Prediction of ground responses is important for improving performance of tunneling. This study proposes a novel
Tunnel reinforcement learning (RL) based optimizer with the integration of deep-Q network (DQN) and particle swarm
Ground response optimization (PSO). Such optimizer is used to improve the extreme learning machine (ELM) based tunneling-
Reinforcement learning induced settlement prediction model. Herein, DQN-PSO optimizer is used to optimize the weights and biases of
Extreme learning machine
ELM. Based on the prescribed states, actions, rewards, rules and objective functions, DQN-PSO optimizer
Optimization
evaluates the rewards of actions at each step, thereby guides particles which action should be conducted and
when should take this action. Such hybrid model is applied in a practical tunnel project. Regarding the search of
global best weights and biases of ELM, the results indicate the DQN-PSO optimizer obviously outperforms
conventional metaheuristic optimization algorithms with higher accuracy and lower computational cost.
Meanwhile, this model can identify relationships among influential factors and ground responses through self-
practicing. The ultimate model can be expressed with an explicit formulation and used to predict tunneling-
induced ground response in real time, facilitating its application in engineering practice.
⁎
Corresponding author at: College of Civil Engineering, Hunan University, Changsha 410082, China.
E-mail address: [email protected] (R.-P. Chen).
https://doi.org/10.1016/j.aei.2020.101097
Received 18 September 2019; Received in revised form 27 March 2020; Accepted 31 March 2020
1474-0346/ © 2020 Elsevier Ltd. All rights reserved.
P. Zhang, et al. Advanced Engineering Informatics 45 (2020) 101097
Nevertheless, parameters of soil constitutive models need to be cali- explicitly programmed. ML algorithms have made a significant break-
brated by numerous experimental tests and back analysis of parameters through with appreciable performance in many domains. They have
also requires considerable skills [14]. A problem that all engineers have been considered to be the best choice for discovering the intricate re-
to confront is how to timely predict ground responses to tunneling and lationships among high-dimensional data [31]. Ground responses to
mitigates potential risks. Considering the influential factors such as tunneling is complicated with the coupled effects of intrinsic and ex-
operational parameters and geological conditions vary frequently with trinsic factors such as geological, geotechnical, geometric, shield op-
the advance of shield machine, empirical, analytical and numerical erational and anomalous parameters, which brings huge difficulties to
methods obviously exist their own deficiency in capturing the ground accurately predict tunneling-induced settlement. Moreover, tunneling is
responses to tunneling in real time. a dynamic process and its influential factors always change with the
To predict ground responses alone the whole tunnel alignment, advance of shield machine, thereby the real time prediction of settle-
machine learning (ML)-based surrogate models have recently been ment is vitally important in engineering practice. Conventional em-
proposed to complement the deficiency of conventional methods. Such pirical, analytical, numerical and physical modelling methods have
models have strong capability of identifying the nonlinear relationship their limitations and cannot predict soil-shield machine interaction in
between ground responses and various influential factors [15–17]. real time. ML algorithms provide a novel method to overcome this
Prediction models are established offline by directly learning from the issue.
in-situ data and used to online prediction of ground response in real Since the first application of ANN to predict tunneling-induced
time with high accuracy. The current ML-based tunneling-induced settlement conducted by Shi et al. [32], various ML algorithms have
ground response prediction models were developed upon quite limited been extensively used to predict soil-shield machine interaction in the
datasets (within 1000 datasets), thereby the model architecture is not last two decades. The most widely used ML algorithm is the ANN with
sophisticated (within 20 input variables). Researchers thus utilized the error backpropagation [33–39]. Meanwhile its variants such as
metaheuristic optimization algorithms such as particle swarm optimi- general regression neural network [40], wavelet neural network [14]
zation (PSO) to search the hyper-parameters and general parameters of and radial basis function neural network [40] have gained popularity in
these ML-based models [18]. PSO has been successfully used in many predicting soil-shield machine interaction. In the last decade, the de-
domains [19], but the original PSO primarily exists two issues: pre- velopment of ML has experienced a course of blossom. Consequently,
mature convergence and high computational cost. The premature researchers have implemented various ML algorithms to predict tun-
convergence means that PSO tends to be trapped in the local optima at neling-induced ground settlement such as extreme learning machine
the beginning of the search process. Meanwhile the computational cost [41], adaptive neuro fuzzy inference system [42,43], relevance vector
can increase dramatically with the increasing population size, although machine [44], least-squares support-vector machine [45], random
the diversity of swarm is beneficial to obtain global optima. To mitigate forest [46–48] and genetic expression programming [42].
these issues, numerous researchers have preoccupied with enhancing The key of developing a ML-based settlement prediction model is to
PSO algorithm, such as modified PSO with adaptive parameters determine the values of hyper-parameters. In addition, the weights and
[20,21], hybrid PSO [22,23]. Nevertheless, which action should be biases also need to be determined for ANN and its variants. The com-
chosen for particles effectively moving towards the best position and monly used methods for determining hyper-parameters involve trial
when should take this action are still a key challenge. and error, grid search and meta-heuristic algorithms [34,39,40]. The
In this study, a more general-purpose PSO optimizer enhanced by weights and biases of ANN-based models are generally determined
reinforcement learning (RL) deep Q-network (DQN) is proposed. In the using deterministic and stochastic optimization algorithms [18,35].
past several years, RL has driven impressive advances in artificial in- Trial and error and grid search methods can only search the parameters
telligence and rapidly extended their application scopes [24–27]. In in a limited space. Deterministic optimization algorithm such as gra-
particular, the models trained by DQN outperform human experts in dient descend may be trapped into local optima. Stochastic algorithms
Atari, Go and no-limit poker [28–30]. The most fundamental im- suffer from premature convergence and high computational cost. The
provement is that deep RL algorithm does not rely on hand-crafted global best parameters are thus hard to be obtained by using such
policy evaluation functions, compared with previous ML algorithms. method. To this end, this study proposes a RL algorithm DQN based
The agent of deep RL interacts with environment and learn past ex- optimizer to search the global optimum parameters of ML algorithms
perience like a human via self-playing, thereby continuously improve with higher accuracy and lower computational cost.
their performance. This success motivates us to propose a DQN-based
PSO optimizer (DQN-PSO), in which agent guides particles to choose
2.2. Reinforcement learning
the optimum action at each generation and move towards the best
position with the lowest computational cost. To the best knowledge of
2.2.1. Framework of reinforcement learning
the authors, this is first work to combine RL algorithm DQN based
Reinforcement learning (RL) is originated from a discrete-time and
optimizer to develop a global best ML based model for investigating
finite Markov decision process (MDP). RL consists of a learning agent, an
ground responses to tunneling.
environment, states, actions, and rewards. The agent interacts with an
Hence, this study aims to develop an ELM-based ground response
environment at some discrete time scale, t = 0, 1, …. On each time step
prediction model due to its fast calculation speed. The proposed DQN-
t, the agent perceives or observes the state of the environment, St (St
PSO optimizer is used to optimize ELM for identifying the global best
S), thereafter chooses a primitive action based on this perception or
weights and biases. A case study is implemented for validating the
observation, At (At ASt). In response to each action, a, the environ-
prediction performance of the proposed hybrid model. The framework
ment thereafter produces a numerical reward, Rt+1, and changes to a
of hybrid ELM and DQN-PSO optimizer proposed in this study can be
next state, St+1 (St+1 S). The whole dynamic transition process can be
replaced by various ML and metaheuristic algorithms to explore various
mathematically expressed by:
issues.
a
ss = Pr {St + 1 = s |St = s , At = a} (1)
2. Literature review and methodology
Rsa = E {Rt + 1 |Rt = s , Rt = a} (2)
2.1. Literature review
where ass = state transition probability matrix; Rsa = immediate re-
Machine learning (ML) is a subsection of artificial intelligence that ward.
imparts the system to automatically learn from the data without being Note that the action at a state is selected based on a policy, π.
2
P. Zhang, et al. Advanced Engineering Informatics 45 (2020) 101097
(a |s ) = P {At = a| St = s} (3)
V (s ) = E {Rt + 1 + Rt + 1 + 2Rt + 2 + …| St = s }
= E {Rt + 1 + V (St + 1 )| St = s }
Q (s, a) = E {Rt + 1 + Rt + 1 + 2R + …| St = s, At = a}
t+2
= Rsa + Pssa V (s )
s S
= Rsa + Pssa (a |s ) Q (s , a )
s S a A (5)
The objective of value-based RL algorithms such as Q-learning and
Sarsa is to determine the optimum state-value V*(s) or action-value
functions Q*(s, a), as shown in Eqs. (6) and (7). This study also utilizes
value-based RL algorithm to establish model.
V (s ) = Rsa + Pssa V (s )
a A s (6)
3
P. Zhang, et al. Advanced Engineering Informatics 45 (2020) 101097
2.4. Extreme learning machine input variables as mentioned above and ground maximum settlement S
is the only output variable. ELM based model with different number of
Extreme learning machine (ELM) is a modification of the single- hidden neurons was pre-trained for selecting an appropriate frame-
hidden layer feedforward neural network, that is, only one hidden layer work. Considering the focus to this study is to highlight the superiority
in this algorithm. The weights of input layer and the biases of hidden of proposed optimizer in the next section, the detailed processing for
layer are assigned randomly. The optimum ELM is obtained by calcu- determine the optimum number of hidden neurons are not presented
lating the weights of the hidden layer and the biases of the output layer, for brevity. The results indicate the performance of model is not sen-
thereby the calculation speed is much faster. In general, ELM can be sitive to the hyper-parameters (the number of hidden neurons) when
represented as: the number of hidden neurons exceed 15. Considering the computa-
tional cost and the model performance, 20 hidden neurons are ulti-
H = f (Xw+b) (12)
mately adopted in this study. The training of ELM-based model can be
H y =0 (13) obtained by:
where w = weight matrix of the input layer; b = bias vector of the H = fE (Xw+b); = H†y (14)
hidden layer; H = hidden layer output matrix; β = weight vector con- where X = input matrix (n × 12, n is the number of datasets);
necting the hidden nodes and the output nodes; y = outputs; f = acti- H = output of hidden layer (n × 20); y = output vector (n × 1);
vation function. The optimum ELM algorithm can be achieved by w = weights matrix (12 × 20); b = bias vector (1 × 20); H† is obtained
minimizing the value of H y . Detailed description of ELM algorithm by Moore–Penrose generalized inverse of matrix H [53] (20 × n), because
can refer to Huang et al. [52]. H is a nonsquare matrix; β = ultimate training result (20 × 1); fE is an
activation function used in the hidden layer of ELM, and sigmoid acti-
3. Introduction of proposed model vation function is adopted in this study, which can be expressed by:
1
3.1. Proposed ELM-based ground response prediction model fE (x ) =
1+e x (15)
4
P. Zhang, et al. Advanced Engineering Informatics 45 (2020) 101097
Vmax , if Vik + 1 > Vmax To validate the superiority of the proposed RL-based optimizer
Vik + 1 =
Vmin, if Vik + 1 < Vmax (18) DQN-PSO, an enhanced optimizer is also developed for comparison.
This enhanced PSO has two characteristics:
3.2.3. Basic framework (i) Adaptive accelerator parameters: as mentioned above, particles start
Fig. 5 presents the basic framework of the proposed optimizer DQN- from exploring in the search space and thereafter transfer to ex-
PSO. This optimizer starts from creating several populations, which is ploitation operation with the increase of generations. Therefore, a
5
P. Zhang, et al. Advanced Engineering Informatics 45 (2020) 101097
search strategy that c1 decreases linearly and c2 increases linearly which all particles move towards the positions with low values of SSE.
with the increase of generations has been developed [54]. This The reward rule is that the final reward r is 1 if the SSE yielded by gBest
strategy can improve the global search capability of PSO at the early is less than the prescribed goal value, otherwise, the current model
stage and local optimization capability at the later stage, as shown acquires reward of 0. Note that the exact rewards are only known at the
following: end of each episode.
k
c1k = c1_initial + (c1_final c1_initial ) 3.4.2. Model framework
t (19)
The pseudocode of the proposed hybrid ELM-based prediction
c2k = c2_initial +
k
(c2_final c2_initial ) model and DQN-PSO optimizer is presented in Algorithm 1. It can be
t (20) observed that the hybrid algorithm involves prescribed number of
where k = current generation; t = a total of generation; c1k , c2k
= values episodes. In each episode, states are updated continuously until the SSE
of c1 and c2 at the kth generation, respectively; c1_initial, c2_initial = initial value yielded by the gBest can satisfies the termination condition.
values of c1 and c2, respectively; c1_final, c2_final = final values of c1 and Algorithm 1: (Hybrid DQN and ELM algorithm).
c2, respectively. The update of each particle position and velocity
complies with Eqs. (10) and (11).
1. step = 0
(ii) Jump: unlike the DQN-PSO optimizer, enhanced PSO optimizer 2. for episode in range (number of prescribed episodes)
3.Randomly initialize state s
cannot intelligently select the action of particles based on the re- 4.while True:
ward of each action. Therefore, when the number of generations 5.Estimate rewards of each action and choose an action a
exceeds a critical value (Eq. (21)) and the difference of the objec- 7.Modify the current state s to s_ by taking a and the corresponding reward
tive function outputs generated at the adjacent time steps is less 8.Store [s, a, reward, s_] in the replay memory pool
9.if step satisfies the update condition
than a threshold value (Eq. (22)), a jump operation is activated.
10.Update target DNN
Thereafter the update of each particle position in the jump opera- 11.if gBest generated by state s_ satisfies the termination conditions:
tion complies with Eq. (16). 12.Evaluate the reward of this episode
13.break
k > NJ (21) 14.step = step + 1
k+1 k 15. Exit
fobj fobj fJ (22)
where k = current generation; fobj
k+1
, fobj
k
= output of objective function
at the kth and (k + 1)th generations, respectively; NJ, fJ = thresholds
for the number of generations and the difference of adjacent outputs of 4. Application of proposed model
the objective function, respectively.
4.1. Overview of case study
3.4. Proposed hybrid deep RL model
In order to investigate the feasibility and reliability of the proposed
3.4.1. Reward rule and actions selection hybrid DQN-PSO optimizer and ELM-based tunneling-induced ground
Note that the reward rule is not demonstrated in the former section, responses prediction model, an in-situ experiment conducted by Zhang
which needs to be determined by the hybrid model. As mentioned in the et al. [48] on a practical tunneling project from Changsha city, China is
description of ELM, the training process of ELM is merely completed by used in this study. This experimental zone consisted of five tunnel
computing the solution of a linear system. After the hyper-parameters sections with six metro stations. A total of 5.44 km was constructed
(the number of hidden neurons) of ELM are determined, the model using earth pressure balanced (EPB) shield machine (construction
performance depends heavily on the weights and biases. To improve the starting in 2016 and completing in 2019). The tunnel was primarily
performance of ELM-based ground response prediction model, the excavated in the weathered rocks, which means that the consolidation
weights and biases of ELM are optimized by the DQN-based optimizer. settlement completed rapidly after the tunnel was constructed. There-
In this hybrid algorithm, the state of the DQN-based optimizer re- fore, this case study focused on the tunneling-induced ultimately steady
presents the weights and biases of ELM, as shown in followings: ground settlement. Each monitoring cross-section of settlement was
positioned at a fixed interval of around 10 m.
X = [X1 , X2 , , Xi 1 , Xi , , Xn ]n × m (23) With regard to the collection of datasets, the geological conditions
Xi = [x1, x2, , x i 1, x i, , xm]1 × m (24) and geometric factor at each ring were obtained by the site investiga-
tion before tunneling process. The five operational parameters were
where X = an aggregate of all populations; Xi = a single population. i. recorded per minute by the shield machine data acquisition system, and
e, an aggregate of weights and biases of ELM; n = the size of popula- the average operational parameters at each ring were preprocessed. The
tion; m = the number of particles in each population, i.e. the number of ground settlement monitoring points were installed at an interval of
weights and biases in ELM (20 × 12 + 20 = 260, as mentioned in 10 m and was measured twice a day. The settlement of monitoring
Section 3.1.2); xi = a single particle of a population. Therefore, the points and the 12 input variables at the corresponding positions were
number of particles in each state is 260n. stored in the database for training ELM-based ground response pre-
The objective function of the DQN-PSO optimizer is determined by diction model, thereby synchronousness between the settlement data
the sum of squared errors (SSE), which is used to evaluate the reward and the input variables can be guaranteed. The database used in this
value. study can be downloaded in the Appendix A section.
n
SSE = [ i fE ( i x i + bi) yi ]2 4.2. Results
i=1 (25)
where yi = actual settlement; i fE ( i x i + bi ) = predicted settlement Table 1 presents the values of parameters used in all algorithms in
using the ELM-based model, in which parameters βi, ωi and bi derive this study. The experimental results indicate the model performance is
from β, w and b (see Section 3.1.2), respectively; xi = one set of input not particularly sensitive to the architecture of target DNN and Q-DNN.
variables. The updates of pBest and gBest are related to the SSE value, in The number of hidden layers in the target DNN and Q-DNN is 1, and the
6
P. Zhang, et al. Advanced Engineering Informatics 45 (2020) 101097
5. Discussion
a: (1, 8.395)
8 5.1. Compared with basic and enhanced PSO
phase I: a-b ELM-based ground responses prediction model optimized by three op-
timizers. The evolution of SSE value within 3000 generations is pre-
6 sented because three optimizers roughly converged at a fixed value. It
b: (1045, 5.288) c: (4418, 5.22) can be observed that DQN-PSO obviously outperforms the basic and
d: (7337, 5.1)
phase II: b-c phase III: c-d enhanced PSO with the lowest value of SSE and fastest convergence.
e: (8802, 4.997) The corresponding maximum generation of three types of optimizers
5
goal_SSE phase IV: : d-e when SSE values virtually converge at a constant value is presented in
Table 2. In detail, the SSE value optimized by the DQN-PSO starts to be
0 2000 4000 6000 8000 10000 less than the basic and enhanced PSO when the number of generations
Step exceeds 10, because the DQN-PSO optimizer always guides particles
selecting a correct action. Meanwhile the whole optimization process
Fig. 6. Evolution of SSE value in a typical episode. virtually completes at around the 1000th generation with SSE value of
7
P. Zhang, et al. Advanced Engineering Informatics 45 (2020) 101097
jump phase I
2
Action
exploitation
1
exploration
0
jump phase II
2
Action
exploitation
1
exploration
0
exploitation
1
exploration
0
jump phase IV
2
Action
exploitation
1
exploration
0
5.288, thereafter the objective of search operation is merely for predicted by the ELM-based prediction model optimized by three op-
achieving the prescribed goal value of SSE and the computational cost is timizers as well as the MAE values computed using Eq. (26). It can be
expensive. It can be seen from Fig. 9 that the computational cost for seen that the hybrid deep RL model outperforms ELM-based prediction
decreasing SSE value from 5.288 to the prescribed goal value is ap- models optimized by PSO and enhanced PSO. Enhanced PSO slightly
propriately seven times the figure for decreasing SSE value from the refines the predicted settlement evolution with a slight decrease in the
initial value to 5.288. It is noteworthy that the enhanced PSO also MAE value (from 2.64 to 2.51). The improvement in the prediction
outperform the basic PSO with lower value of SSE from the 326th performance of the hybrid deep RL model is remarkable, in which the
generation. Enhanced PSO indeed further optimizes the search trajec- MAE value decreases to 1.97. The great agreement between the pre-
tory of particles to a certain extent, but the key challenges including dicted and measured evolution of settlement and the improvement in
which action should be chose and when should take this action are still recognizing maximum settlement is observed. Meanwhile all datasets
dodged. It means that the enhanced PSO cannot avoid being trapped in are closer to the line with the slope of 1. Hence the tunneling-induced
the local optima, thereby the decrease in the convergence SSE value is ground responses prediction model can be established using the hybrid
not discernable, compared with the basic PSO. The basic PSO conducts deep RL algorithm.
the exploration action throughout the whole optimization process, n
thereby it is easy to be trapped in the local optima. The premature 1
MAE = |ri pi |
convergence problem is obvious, because the SSE value roughly n i=1 (26)
maintains constant when the number of generations reaches 500.
where r = measured settlement; p = the predicted settlement; n = a
Fig. 10 presents the evolution of ground responses for the test set
total number of datasets.
8
P. Zhang, et al. Advanced Engineering Informatics 45 (2020) 101097
Settlement (mm)
0
-20
-40
a
Settlement (mm)
-20
-40
b
Settlement (mm)
-20
Measured
-40 Predicted
e
0 10 20 30 40 50 60
Datasets
Fig. 8. Predicted settlement for the test set using the hybrid deep RL model generated at three steps.
10 in this study, and the detailed formulations are not presented for
9.0
basic PSO brevity, which can refer to Zhang [60]. The results of GSA are shown in
enhanced PSO Fig. 11, compared with the correlation coefficients which are calculated
9 8.5 7.6 DQN-PSO by absolute Pearson coefficients (see Eq. (27)). It can be observed that
the parameters that have strong correlations with settlement (Sp, St, C)
8.0 7.4 still have higher impact on the ELM based model. Th with the highest
SSE (mm)
8 10 11 12 13 14 15
Pearson value among five operational parameters is also the most im-
portant operational parameter in the ELM based model. Pr with the
7.2
7
300 310 320 330 340 350 lowest Pearson value is also the most insignificant parameter in the ELM
based model. The rank of other parameters merely has a slight varia-
tion. Such factors indicate the ELM based model optimized by DQN-PSO
6 obviously captures the potential correlations between the input and
output parameters. The generalization ability and the practicability of
such model can thus be guaranteed.
5
0 1000 2000 3000 n n n
n x i yi xi yi
Generation i=1 i=1 i=1
R=
2 2
Fig. 9. Comparison of DQN-PSO optimizer with basic and enhanced PSO op- n n n n
Table 2
Comparison among three optimizers.
6. Conclusions
Optimizer Generation SSE
The contribution of this study is that a hybrid deep reinforcement
Basic PSO 1497 6.860
Enhanced PSO 1280 6.540 learning (RL) model which integrates extreme learning machine (ELM)
DQN-PSO 1045 5.288 and deep RL algorithm deep-Q network (DQN) is proposed for pre-
dicting tunneling-induced ground responses in real time, in which the
relationships among influential factors and ground response were ex-
5.2. Sensitivity analysis plored through self-practicing. Another contribution is that the pro-
posed optimizer DQN-PSO knows which action should be conducted
To evaluate the performance of the proposed ELM based settlement and when should take this action, thereby ensures the global optima to
prediction model optimized by DQN-PSO. Global sensitivity analysis be obtained. Unlike previous metaheuristic optimization algorithms
(GSA) is conducted to reveal how model output uncertainty can be that guide the movement of particles in a rough manner, the reward
apportioned to the uncertainty in each input variable [55]. Variance- rule of the DQN-based optimizer focuses on evaluating the reward of
based GSA method has been extensively used in main domains [56–58], agent’s action, hence particles like an intelligent human always select
thereby it is used in this study. In this method, the total order index STi the optimum action at each step. To authors’ best knowledge, this is the
in the variance-based GSA method measures the effect of an input first work on using hybrid RL algorithm DQN and ML algorithm ELM to
parameter and its coupled effect with other input parameters on the investigate tunneling-induced ground responses. The following con-
model output. The calculation of STi proposed by Jansen [59] is adopted clusions can be drawn, based on the results of this work:
9
P. Zhang, et al. Advanced Engineering Informatics 45 (2020) 101097
Settlement (mm)
-20
-20
-40
MAE = 2.64
-40 -40 -20 0
basic PSO
Measured settlement (mm)
-20
-20
-40
MAE = 2.51
-40 -40 -20 0
enhanced PSO
Measured settlement (mm)
-20
-20
Measured -40
Predicted MAE = 1.97
-40 -40 -20 0
DQN-PSO
Measured settlement (mm)
0 10 20 30 40 50 60
Datasets
Fig. 10. Predicted settlement for the test set using ELM-based prediction models optimized by three optimizers.
Sensitivity index Absolute Pearson coefficient response in real time, overcoming the deficiency of empirical,
-0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 analytical and numerical models established by domain experts.
Sp Sp The ultimate ELM based model can be expressed with an explicit
St St formulation, which is user-friendly in engineering practice.
Meanwhile, the performance of prediction model can be improved
C C
with the increase in the datasets collected from the field construc-
MUCS W tion.
W MUCS (3) The hybrid deep RL model is genetic, which means that it can be
Th Th used to various situations with different states, actions, rules, re-
MDPT Cp wards and objective function defined by domain experts without
any debugging. Meanwhile the basic meta-heuristic and machine
MSPT MDPT
learning algorithms used in the hybrid deep RL model can be ran-
To MSPT domly replaced based on different situations. Such model offers a
Cp Gf pragmatic and reliable framework to develop a data-driven or
Gf To physical model.
Pr Pr
Declaration of Competing Interest
Fig. 11. Comparison between sensitivity indices and correlation coefficients of
input parameters. The authors declare that they have no known competing financial
interests or personal relationships that could have appeared to influ-
(1) Because DQN-PSO optimizer is able to guide particles to implement ence the work reported in this paper.
optimum action at each step, the global optima can be acquired
when the value of objective function converges at a fixed value. In Acknowledgement
other words, the DQN-PSO optimizer can search the global best
weights and biases of ELM with higher accuracy and lower com- This study is sponsored by the National Natural Science Foundation
putational cost, compared with basic or enhanced metaheuristic of China (No. 51938005) and the program of High-level Talent of
optimization algorithms. Innovative Research Team of Hunan Province 2019 (No. 2019RS1030).
(2) The hybrid deep RL model with the integration of ELM and DQN- The authors greatly appreciate these financial supports during this re-
PSO optimizer can accurately predict tunneling-induced ground search.
10
P. Zhang, et al. Advanced Engineering Informatics 45 (2020) 101097
Appendix A neural networks and tree search, Nature 529 (2016) 484–489.
[29] V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare,
A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie,
The database used in this study can be download at following link: A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis,
https://www.researchgate.net/publication/336208927_Database_ Human-level control through deep reinforcement learning, Nature 518 (2015)
529–533.
for_maximum_settlement_collected_from_Changsha_Metro_Line_4_ [30] T. Noam Brown, Sandholm, Superhuman AI for multiplayer poker, Science (2019)
Liugoulong_to_Fubuhe_station. 1–12.
[31] S. Dargan, M. Kumar, M.R. Ayyagari, G. Kumar, A survey of deep learning and its
applications: a new paradigm to machine learning, Arch. Comput. Methods Eng.
References (2019).
[32] J. Shi, J.A.R. Ortigao, J. Bai, Modular neural networks for predicting settlements
[1] P. Zhang, H.-N. Wu, R.-P. Chen, T.H.T. Chan, Hybrid meta-heuristic and machine during tunneling, J. Geotech. Geoenviron. Eng. 124 (1998) 389–395.
learning algorithms for tunneling-induced settlement prediction: A comparative [33] C.Y. Kim, G.J. Bae, S.W. Hong, C.H. Park, Neural network based prediction of
study, Tunnell. Undergr. Space Technol. 99 (2020) 103383. ground surface settlements due to tunnelling, Comput. Geotech. 28 (2001)
[2] C. Sagaseta, Analysis of undrained soil deformation due to ground loss, 517–547.
Géotechnique 37 (1987) 301–320. [34] S. Suwansawat, H.H. Einstein, Artificial neural networks for predicting the max-
[3] A. Verruijt, J.R. Booker, Surface settlements due to deformation of a tunnel in an imum surface settlement caused by EPB shield tunneling, Tunnell. Undergr. Space
elastic half plane, Géotechnique 48 (1996) 709–713. Technol. 21 (2006) 133–150.
[4] H.S. Yu, R.K. Rowe, Plasticity solutions for soil behaviour around contracting [35] O.J. Santos, T.B. Celestino, Artificial neural networks analysis of São Paulo subway
cavities and tunnels, Int. J. Numer. Anal. Met. 23 (1999) 1245–1279. tunnel settlement data, Tunnell. Undergr. Space Technol. 23 (2008) 481–491.
[5] F. Pinto, A.J. Whittle, Ground movements due to shallow tunnels in soft ground. I: [36] A. Marto, M. Hajihassani, R. Kalatehjari, E. Namazi, H. Sohaei, Simulation of
analytical solutions, J. Geotech. Geoenviron. 140 (2014) 04013040. longitudinal surface settlement due to tunnelling using artificial neural network,
[6] W. Broere, D. Festa, Correlation between the kinematics of a Tunnel Boring Int. Rev. Modelling Simul. 5 (2012) 1024–1031.
Machine and the observed soil displacements, Tunnell. Undergr. Space Technol. 70 [37] A. Darabi, K. Ahangari, A. Noorzad, A. Arab, Subsidence estimation utilizing var-
(2017) 125–147. ious approaches – A case study: Tehran No. 3 subway line, Tunnell. Undergr. Space
[7] S. Suwansawat, H.H. Einstein, Describing settlement troughs over twin tunnels Technol. 31 (2012) 117–127.
using a superposition technique, J. Geotech. Geoenviron. Eng. 133 (2007) 445–468. [38] R. Boubou, F. Emeriault, R. Kastner, Artificial neural network application for the
[8] P. Zhang, Z.Y. Yin, R.P. Chen, Analytical and semi-analytical solutions for de- prediction of ground surface movements induced by shield tunnelling, Can.
scribing tunneling-induced transverse and longitudinal settlement troughs, Int. J. Geotech. J. 47 (2010) 1214–1233.
Geomech. (2020), https://doi.org/10.1061/(ASCE)GM.1943-5622.0001748. [39] M. Hasanipanah, M. Noorian-Bidgoli, D. Jahed Armaghani, H. Khamesi, Feasibility
[9] R.B. Peck, Deep excavations and tunneling in soft ground, in: Proceedings of 7th of PSO-ANN model for predicting surface settlement caused by tunneling, Eng.
International Conference on Soil Mechanic and Foundation Engineering Mexico Comput-Germany 32 (2016) 705–715.
City, 1969, pp. 225–290. [40] R.P. Chen, P. Zhang, X. Kang, Z.Q. Zhong, Y. Liu, H.N. Wu, Prediction of maximum
[10] P. Zhang, R.-P. Chen, H.-N. Wu, Y. Liu, Ground settlement induced by tunneling surface settlement caused by EPB shield tunneling with ANN methods, Soils Found.
crossing interface of water-bearing mixed ground: A lesson from Changsha, China, 59 (2019) 284–295.
Tunnell. Undergr. Space Technol. 96 (2020) 103224. [41] R.P. Chen, P. Zhang, H.N. Wu, Z.T. Wang, Z.Q. Zhong, Prediction of shield tun-
[11] X.-T. Lin, R.-P. Chen, H.-N. Wu, H.-Z. Cheng, Deformation behaviors of existing neling-induced ground settlement using machine learning techniques, Front. Struct.
tunnels caused by shield tunneling undercrossing with oblique angle, Tunnell. Civ. Eng. 13 (2019) 1363–1378.
Undergr. Space Technol. 89 (2019) 78–90. [42] K. Ahangari, S.R. Moeinossadat, D. Behnia, Estimation of tunnelling-induced set-
[12] J. Yang, Z.Y. Yin, X.F. Liu, F.P. Gao, Numerical analysis for the role of soil prop- tlement by modern intelligent methods, Soils Found. 55 (2015) 737–748.
erties to the load transfer in clayfoundation due to the traffic load of the metro [43] D. Bouayad, F. Emeriault, Modeling the relationship between ground surface set-
tunnel, Transp. Geotech. 23 (2020) 100336. tlements induced by shield tunneling and the operational and geological parameters
[13] J. Ninić, S. Freitag, G. Meschke, A hybrid finite element and surrogate modelling based on the hybrid PCA/ANFIS method, Tunnell. Undergr. Space Technol. 68
approach for simulation and monitoring supported TBM steering, Tunnell. Undergr. (2017) 142–152.
Space Technol. 63 (2017) 12–28. [44] F. Wang, B. Gou, Y. Qin, Modeling tunneling-induced ground surface settlement
[14] A. Pourtaghi, M.A. Lotfollahi-Yaghin, Wavenet ability assessment in comparison to development using a wavelet smooth relevance vector machine, Comput. Geotech.
ANN for predicting the maximum surface settlement caused by tunneling, Tunnell. 54 (2013) 125–132.
Undergr. Space Technol. 28 (2012) 257–271. [45] L.M. Zhang, X.G. Wu, W.Y. Ji, S.M. AbouRizk, Intelligent approach to estimation of
[15] P. Zhang, Z.-Y. Yin, Y.-F. Jin, T.H.T. Chan, A novel hybrid surrogate intelligent tunnel-induced ground settlement using Wavelet Packet and Support Vector
model for creep index prediction based on particle swarm optimization and random Machines, J. Comput. Civil. Eng. 31 (2017) 04016053.
forest, Eng. Geol. 265 (2020) 105328. [46] J. Zhou, X. Shi, K. Du, X.Y. Qiu, X.B. Li, H.S. Mitri, Feasibility of Random-Forest
[16] P. Zhang, Z.Y. Yin, Y.F. Jin, G.L. Ye, An AI-based model for describing cyclic approach for prediction of ground settlements induced by the construction of a
characteristics of granular materials, Int. J. Numer. Anal. Met. (2020) 1–21. shield-driven tunnel, Int. J. Geomech. 17 (2016) 04016129.
[17] P. Zhang, Z.Y. Yin, Y.F. Jin, T. Chan, Intelligent Modelling of Clay Compressibility [47] V.R. Kohestani, M.R. Bazargan-Lari, J. Asgari-marnani, Prediction of maximum
using Hybrid Meta-Heuristic and Machine Learning Algorithms, Geosci. Front. surface settlement caused by earth pressure balance shield tunneling using random
(2020) in press. forest, J. AI Data Mining 5 (2017) 127–135.
[18] D.J. Armaghani, E.T. Mohamad, M.S. Narayanasamy, N. Narita, S. Yagiz, [48] P. Zhang, R.P. Chen, H.N. Wu, Real-time analysis and regulation of EPB shield
Development of hybrid intelligent models for predicting TBM penetration rate in steering using Random Forest, Automat. Constr. 106 (2019) 102860.
hard rock condition, Tunnell. Undergr. Space Technol. 63 (2017) 29–43. [49] C.J.C.H. Watkins, P. Dayan, Q-Learning, Mach. Learn. 8 (1992) 279–292.
[19] Z.Y. Yin, Y.F. Jin, S.L.J. Shen, P.Y. Hicher, Shen, Int. J. Numer. Anal. Met. 42 (2017) [50] Y. Yuan, Z.L. Yu, Z. Gu, Y. Yeboah, W. Wei, X. Deng, J. Li, Y. Li, A novel multi-step
1–25. Q-learning method to improve data efficiency for deep reinforcement learning,
[20] K.E. Parsopoulos, Parallel cooperative micro-particle swarm optimization: A mas- Knowl-Based Syst. 175 (2019) 107–117.
ter–slave model, Appl. Soft Comput. 12 (2012) 3552–3579. [51] J. Kennedy, R. Eberhart, Particle swarm optimization, in: IEEE International
[21] W.H. Lim, N.A. Mat Isa, Two-layer particle swarm optimization with intelligent Conference on Neural NetworksPerth, Australia, 1995, pp. 1942–1948.
division of labor, Eng. Appl. Artif. Intel. 26 (2013) 2327–2348. [52] G.B. Huang, Q.Y. Zhu, C.K. Siew, Extreme learning machine: Theory and applica-
[22] A. Gálvez, A. Iglesias, A new iterative mutually coupled hybrid GA–PSO approach tions, Neurocomputing 70 (2006) 489–501.
for curve fitting in manufacturing, Appl. Soft Comput. 13 (2013) 1491–1504. [53] C.R. Rao, S.K. Mitra, Generalized inverse of matrices and its applications, Wiley,
[23] S.Z. Zhao, P.N. Suganthan, Q.-K. Pan, M. Fatih Tasgetiren, Dynamic multi-swarm 1971.
particle swarm optimizer with harmony search, Expert Syst. Appl. 38 (2011) [54] L. Yang, H. Su, Z. Wen, Improved PLS and PSO methods-based back analysis for
3735–3742. elastic modulus of dam, Adv. Eng. Softw. 131 (2019) 205–216.
[24] M.A. Lopes Silva, S.R. de Souza, M.J. Freitas Souza, A.L.C. Bazzan, A reinforcement [55] A. Saltelli, I.M. Sobol, About the use of rank transformation in sensitivity analysis of
learning-based multi-agent framework applied for solving routing and scheduling model output, Reliab. Eng. Syst. Safe. 50 (1995) 225–239.
problems, Expert Syst. Appl. 131 (2019) 148–171. [56] C.Y. Zhao, A.A. Lavasan, R. Hölter, T. Schanz, Mechanized tunneling induced
[25] K. Zhang, H. Zhang, Y. Mu, S. Sun, Tracking control optimization scheme for a class building settlements and design of optimal monitoring strategies based on sensi-
of partially unknown fuzzy systems by using integral reinforcement learning ar- tivity field, Comput. Geotech. 97 (2018) 246–260.
chitecture, Appl. Math. Comput. 359 (2019) 344–356. [57] L.M. Zhang, X.G. Wu, H.P. Zhu, S.M. AbouRizk, Performing global uncertainty and
[26] Y. Ding, L. Ma, J. Ma, M. Suo, L. Tao, Y. Cheng, C. Lu, Intelligent fault diagnosis for sensitivity analysis from given data in tunnel construction, J. Comput. Civil. Eng. 31
rotating machinery using deep Q-network based health state classification: A deep (2017) 04017065.
reinforcement learning approach, Adv. Eng. Inform. 42 (2019). [58] K.M. Hamdia, H. Ghasemi, X.Y. Zhuang, N. Alajlan, T. Rabczuk, Sensitivity and
[27] F. Hourfar, H.J. Bidgoly, B. Moshiri, K. Salahshoor, A. Elkamel, A reinforcement uncertainty analysis for flexoelectric nanostructures, Comput. Method Appl. M. 337
learning approach for waterflooding optimization in petroleum reservoirs, Eng. (2018) 95–109.
Appl. Artif. Intel. 77 (2019) 98–116. [59] M.J.W. Jansen, Analysis of variance designs for model output, Comput. Phys.
[28] D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. van den Driessche, Commun. 117 (1999) 35–43.
J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, [60] P. Zhang, A novel feature selection method based on global sensitivity analysis with
D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, application in machine learning-based prediction model, Appl. Soft Comput. 85
K. Kavukcuoglu, T. Graepel, D. Hassabis, Mastering the game of Go with deep (2019) 105859.
11